Spaces:
Running
Running
enable faster whisper
Browse files- .gitattributes +0 -1
- .gitignore +4 -0
- README.md +144 -7
- app.py +45 -0
- packages.txt +1 -0
- requirements.txt +4 -0
- whisper.py +24 -0
.gitattributes
CHANGED
@@ -25,7 +25,6 @@
|
|
25 |
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
26 |
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
27 |
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
28 |
-
*.tar filter=lfs diff=lfs merge=lfs -text
|
29 |
*.tflite filter=lfs diff=lfs merge=lfs -text
|
30 |
*.tgz filter=lfs diff=lfs merge=lfs -text
|
31 |
*.wasm filter=lfs diff=lfs merge=lfs -text
|
|
|
25 |
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
26 |
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
27 |
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
|
|
28 |
*.tflite filter=lfs diff=lfs merge=lfs -text
|
29 |
*.tgz filter=lfs diff=lfs merge=lfs -text
|
30 |
*.wasm filter=lfs diff=lfs merge=lfs -text
|
.gitignore
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
venv
|
2 |
+
**/__pycache__
|
3 |
+
venv
|
4 |
+
.env
|
README.md
CHANGED
@@ -1,13 +1,150 @@
|
|
1 |
---
|
2 |
-
title:
|
3 |
-
emoji:
|
4 |
-
colorFrom:
|
5 |
-
colorTo:
|
6 |
sdk: gradio
|
7 |
-
sdk_version: 4.
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
-
|
|
|
11 |
---
|
12 |
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
title: ASR UI
|
3 |
+
emoji: 馃か
|
4 |
+
colorFrom: indigo
|
5 |
+
colorTo: red
|
6 |
sdk: gradio
|
7 |
+
sdk_version: 4.20.0
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
+
tags:
|
11 |
+
- whisper-event
|
12 |
---
|
13 |
|
14 |
+
|
15 |
+
# Instruccions per l'煤s d'interf铆cie d'usuari de reconeixement de la parla
|
16 |
+
|
17 |
+
|
18 |
+
## Resum
|
19 |
+
|
20 |
+
Aquest document est脿 preparat per a explicar l'煤s, requisits i la instal路laci贸 d'interf铆cie d'usuari (IU) de reconeixement de la parla de BSC-LT. Per poder comprovar els models que estem desenvolupant, hem preparat aquest IU. L'aplicaci贸 est脿 empaquetada amb docker, per aix貌 la instal路laci贸 de docker 茅s un dels requisits i l'aplicaci贸 est脿 dissenyada pels navegadors.
|
21 |
+
|
22 |
+
|
23 |
+
## Requisits
|
24 |
+
|
25 |
+
**OS:** Ubuntu/Debian (Est脿 comprovat amb 20.04.6 LTS)
|
26 |
+
|
27 |
+
**Requisits minims:**
|
28 |
+
|
29 |
+
* CPU 4 vCores
|
30 |
+
* 16 GB RAM
|
31 |
+
* 25 GB espai lliure
|
32 |
+
|
33 |
+
**Requisit suggerit:**
|
34 |
+
* GPU: Nvidia T4, 16GB VRAM
|
35 |
+
|
36 |
+
L鈥櫭簊 de GPU accelera la infer猫ncia considerablement. Sense GPU el rendiment de la infer猫ncia 茅s aproximadament x3 RTF (real-time-factor), 茅s a dir necessita 30 segons per transcriure 10 segons d'脿udio. Per貌 amb una GPU el RTF 茅s menor a 1.
|
37 |
+
|
38 |
+
## Instruccions
|
39 |
+
|
40 |
+
Passos per desplegar l'aplicaci贸 d'IU 茅s
|
41 |
+
|
42 |
+
* Instal路lar docker (opcional: afegir docker al grup d'usuaris sudo)
|
43 |
+
* Descarregar la imatge de docker mitjan莽ant el docker pull
|
44 |
+
* Configurar un reverse proxy per fer disponible l'aplicaci贸
|
45 |
+
|
46 |
+
## Instal路laci贸 docker
|
47 |
+
|
48 |
+
Es pot seguir les instruccions d'aquestes p脿gines
|
49 |
+
|
50 |
+
* [How To Install and Use Docker on Ubuntu 22.04 | DigitalOcean](https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-22-04)
|
51 |
+
* [https://docs.docker.com/engine/install/ubuntu/](https://docs.docker.com/engine/install/ubuntu/)
|
52 |
+
|
53 |
+
Per貌 en resum:
|
54 |
+
|
55 |
+
```
|
56 |
+
## Add Docker's official GPG key
|
57 |
+
|
58 |
+
$ sudo apt-get update
|
59 |
+
$ sudo apt-get install ca-certificates curl gnupg
|
60 |
+
$ sudo install -m 0755 -d /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
|
61 |
+
sudo chmod a+r /etc/apt/keyrings/docker.gpg
|
62 |
+
|
63 |
+
## Add the repository to Apt sources
|
64 |
+
|
65 |
+
$ echo \
|
66 |
+
"deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
|
67 |
+
"$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
|
68 |
+
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
|
69 |
+
$ sudo apt-get update && sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
|
70 |
+
|
71 |
+
## Post Installation Steps
|
72 |
+
|
73 |
+
#Add create docker group, add your user to the group for executing docker without sudo
|
74 |
+
sudo groupadd docker
|
75 |
+
sudo usermod -aG docker $USER
|
76 |
+
newgrp docker
|
77 |
+
|
78 |
+
#To automatically start Docker and containerd on boot for other Linux distributions using systemd, run the following commands:
|
79 |
+
sudo systemctl enable docker.service
|
80 |
+
sudo systemctl enable containerd.service
|
81 |
+
```
|
82 |
+
|
83 |
+
|
84 |
+
|
85 |
+
## Desplegament d'aplicaci贸
|
86 |
+
|
87 |
+
Per descarregar i c贸rrer la imatge de docker:
|
88 |
+
|
89 |
+
|
90 |
+
```
|
91 |
+
docker run -d -p 7860:7860 --name asr-inference --platform=linux/amd64 \
|
92 |
+
registry.hf.space/projecte-aina-asr-inference:latest python app.py
|
93 |
+
```
|
94 |
+
|
95 |
+
|
96 |
+
Durant la primera execuci贸 el docker run descarrega la imatge i els models necessaris per c贸rrer l'aplicaci贸. Si tot funciona sense errors, l'aplicaci贸 de web estar脿 disponible a localhost:7860. (IP_DE_SERVIDOR:7860)
|
97 |
+
|
98 |
+
Per consultar logs de l鈥檃plicacio:
|
99 |
+
|
100 |
+
|
101 |
+
```
|
102 |
+
docker logs asr-inference
|
103 |
+
```
|
104 |
+
|
105 |
+
|
106 |
+
Si despr茅s d鈥檜ns minuts d'iniciar l鈥檃plicaci贸 surt aix貌 als logs:
|
107 |
+
|
108 |
+
|
109 |
+
```
|
110 |
+
Running on local URL: http://0.0.0.0:7860
|
111 |
+
To create a public link, set `share=True` in `launch()`.
|
112 |
+
```
|
113 |
+
|
114 |
+
|
115 |
+
L'aplicaci贸 s'ha desplegat correctament.
|
116 |
+
|
117 |
+
Per aturar l'aplicaci贸:
|
118 |
+
|
119 |
+
|
120 |
+
```
|
121 |
+
docker stop asr-inference
|
122 |
+
```
|
123 |
+
|
124 |
+
|
125 |
+
Per reiniciar l'aplicaci贸
|
126 |
+
|
127 |
+
|
128 |
+
```
|
129 |
+
docker start asr-inference
|
130 |
+
```
|
131 |
+
|
132 |
+
|
133 |
+
|
134 |
+
## Connexi贸 a l'IU
|
135 |
+
|
136 |
+
Amb les instruccions anteriors si tot ha funcionat b茅 l'aplicaci贸 ja ser脿 disponible a localhost al port 7860. Si l'aplicaci贸 est脿 desplegada al local, nom茅s cal anar a <code>[http://0.0.0.0:7860](http://0.0.0.0:7860) </code> a un navegador per l'acc茅s. Si l'aplicaci贸 est脿 desplegada a una m脿quina virtual o un servidor en remot, hi ha dues maneres de conectar-hi:
|
137 |
+
|
138 |
+
**ssh port forwarding:**
|
139 |
+
|
140 |
+
|
141 |
+
```
|
142 |
+
ssh -L 7860:localhost:7860 <vm-server-address>
|
143 |
+
```
|
144 |
+
|
145 |
+
|
146 |
+
D'aquesta manera es pot connectar a l'IU com estigu茅s corrent al m脿quina local i.e. a l'adre莽a <code>[http://0.0.0.0:7860](http://0.0.0.0:7860) </code>
|
147 |
+
|
148 |
+
**Mitjan莽ant un web server:**
|
149 |
+
|
150 |
+
Si la m脿quina virtual de l'aplicaci贸 est脿 dins la mateixa xarxa i/o els ports de la m脿quina virtual estan oberts a fora, es pot connectar simplement amb apuntar el navegador a `http://<vm-server-address>:7860` per apuntar l'IP a un domini, es pot fer un reverse proxy mitjan莽ant una eina com Apache o nginx.
|
app.py
ADDED
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
import gradio as gr
|
3 |
+
from whisper import generate
|
4 |
+
from AinaTheme import theme
|
5 |
+
|
6 |
+
MODEL_NAME = "Systran/faster-whisper-large-v3"
|
7 |
+
|
8 |
+
def transcribe(inputs):
|
9 |
+
if inputs is None:
|
10 |
+
raise gr.Error("Cap fitxer d'脿udio introduit! Si us plau pengeu un fitxer "\
|
11 |
+
"o enregistreu un 脿udio abans d'enviar la vostra sol路licitud")
|
12 |
+
|
13 |
+
return generate(audio_path=inputs)
|
14 |
+
|
15 |
+
|
16 |
+
description_string = "Transcripci贸 autom脿tica de micr貌fon o de fitxers d'脿udio.\n Aquest demostrador s'ha desenvolupat per"\
|
17 |
+
" comprovar els models de reconeixement de parla per a m贸bils. Per ara utilitza el checkpoint "\
|
18 |
+
f"[{MODEL_NAME}](https://huggingface.co/{MODEL_NAME}) i la llibreria de 馃 Transformers per a la transcripci贸."
|
19 |
+
|
20 |
+
|
21 |
+
def clear():
|
22 |
+
return (None)
|
23 |
+
|
24 |
+
|
25 |
+
with gr.Blocks(theme=theme) as demo:
|
26 |
+
gr.Markdown(description_string)
|
27 |
+
with gr.Row():
|
28 |
+
with gr.Column(scale=1):
|
29 |
+
input = gr.Audio(sources=["upload", "microphone"], type="filepath", label="Audio")
|
30 |
+
|
31 |
+
with gr.Column(scale=1):
|
32 |
+
output = gr.Textbox(label="Output", lines=8)
|
33 |
+
|
34 |
+
with gr.Row(variant="panel"):
|
35 |
+
clear_btn = gr.Button("Clear")
|
36 |
+
submit_btn = gr.Button("Submit", variant="primary")
|
37 |
+
|
38 |
+
|
39 |
+
submit_btn.click(fn=transcribe, inputs=[input], outputs=[output])
|
40 |
+
clear_btn.click(fn=clear,inputs=[], outputs=[input], queue=False,)
|
41 |
+
|
42 |
+
|
43 |
+
if __name__ == "__main__":
|
44 |
+
demo.launch()
|
45 |
+
|
packages.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
ffmpeg
|
requirements.txt
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
faster_whisper
|
2 |
+
torch
|
3 |
+
gradio==4.20.0
|
4 |
+
aina-gradio-theme==2.3
|
whisper.py
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from faster_whisper import WhisperModel
|
2 |
+
import torch
|
3 |
+
|
4 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
5 |
+
torch_dtype = "float32"
|
6 |
+
|
7 |
+
MODEL_NAME = "Systran/faster-whisper-large-v3"
|
8 |
+
model = WhisperModel(MODEL_NAME, compute_type=torch_dtype)
|
9 |
+
|
10 |
+
def generate(audio_path):
|
11 |
+
#check audio lenght
|
12 |
+
segments, _ = model.transcribe(
|
13 |
+
audio_path,
|
14 |
+
# language="ca",
|
15 |
+
# chunk_length=30,
|
16 |
+
task="transcribe",
|
17 |
+
word_timestamps=False,
|
18 |
+
)
|
19 |
+
|
20 |
+
text = ""
|
21 |
+
for segment in segments:
|
22 |
+
text += " " + segment.text.strip()
|
23 |
+
return text
|
24 |
+
|