sadrasabouri
commited on
Commit
•
64771ba
1
Parent(s):
1da8a1a
Update README.md
Browse files
README.md
CHANGED
@@ -48,11 +48,56 @@ model-index:
|
|
48 |
|
49 |
[Sharif-wav2vec2](https://huggingface.co/SLPL/Sharif-wav2vec2/)
|
50 |
|
51 |
-
|
52 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
|
54 |
# [Paper](https://arxiv.org/abs/2006.11477)
|
55 |
|
|
|
|
|
|
|
56 |
# Authors: Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli
|
57 |
|
58 |
# **Abstract**
|
|
|
48 |
|
49 |
[Sharif-wav2vec2](https://huggingface.co/SLPL/Sharif-wav2vec2/)
|
50 |
|
51 |
+
Prior to the usage you may need to install below dependencies:
|
52 |
+
|
53 |
+
```shell
|
54 |
+
pip -q install pyctcdecode
|
55 |
+
python -m pip -q install pypi-kenlm
|
56 |
+
```
|
57 |
+
|
58 |
+
Then you can use it with:
|
59 |
+
```python
|
60 |
+
import tensorflow
|
61 |
+
import torchaudio
|
62 |
+
import torch
|
63 |
+
import librosa
|
64 |
+
import numpy as np
|
65 |
+
from transformers import AutoProcessor, AutoModelForCTC
|
66 |
+
|
67 |
+
processor = AutoProcessor.from_pretrained("SLPL/Sharif-wav2vec2")
|
68 |
+
model = AutoModelForCTC.from_pretrained("SLPL/Sharif-wav2vec2")
|
69 |
+
|
70 |
+
|
71 |
+
|
72 |
+
|
73 |
+
speech_array, sampling_rate = torchaudio.load("test.wav")
|
74 |
+
speech_array = speech_array.squeeze().numpy()
|
75 |
+
speech_array = librosa.resample(
|
76 |
+
np.asarray(speech_array),
|
77 |
+
sampling_rate,
|
78 |
+
processor.feature_extractor.sampling_rate)
|
79 |
+
|
80 |
+
|
81 |
+
features = processor(
|
82 |
+
speech_array,
|
83 |
+
sampling_rate=processor.feature_extractor.sampling_rate,
|
84 |
+
return_tensors="pt",
|
85 |
+
padding=True)
|
86 |
+
input_values = features.input_values
|
87 |
+
attention_mask = features.attention_mask
|
88 |
+
with torch.no_grad():
|
89 |
+
logits = model(input_values, attention_mask=attention_mask).logits
|
90 |
+
prediction = processor.batch_decode(logits.numpy()).text
|
91 |
+
|
92 |
+
print(prediction[0])
|
93 |
+
# تست
|
94 |
+
```
|
95 |
|
96 |
# [Paper](https://arxiv.org/abs/2006.11477)
|
97 |
|
98 |
+
The base model fine-tuned on 108 hours of Commonvoice on 16kHz sampled speech audio. When using the model
|
99 |
+
make sure that your speech input is also sampled at 16Khz.
|
100 |
+
|
101 |
# Authors: Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli
|
102 |
|
103 |
# **Abstract**
|