SLPL
/

Sharif-wav2vec2

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

sadrasabouri commited on Sep 1, 2022

Commit

64771ba

•

1 Parent(s): 1da8a1a

Update README.md

Files changed (1) hide show

README.md +47 -2

README.md CHANGED Viewed

@@ -48,11 +48,56 @@ model-index:
 [Sharif-wav2vec2](https://huggingface.co/SLPL/Sharif-wav2vec2/)
-The base model fine-tuned on 108 hours of Commonvoice on 16kHz sampled speech audio. When using the model
-make sure that your speech input is also sampled at 16Khz.
 # [Paper](https://arxiv.org/abs/2006.11477)
 # Authors: Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli
 # **Abstract**

 [Sharif-wav2vec2](https://huggingface.co/SLPL/Sharif-wav2vec2/)
+Prior to the usage you may need to install below dependencies:
+```shell
+pip -q install pyctcdecode
+python -m pip -q install pypi-kenlm
+```
+Then you can use it with:
+```python
+import tensorflow
+import torchaudio
+import torch
+import librosa
+import numpy as np
+from transformers import AutoProcessor, AutoModelForCTC
+processor = AutoProcessor.from_pretrained("SLPL/Sharif-wav2vec2")
+model = AutoModelForCTC.from_pretrained("SLPL/Sharif-wav2vec2")
+speech_array, sampling_rate = torchaudio.load("test.wav")
+speech_array = speech_array.squeeze().numpy()
+speech_array = librosa.resample(
+    np.asarray(speech_array),
+    sampling_rate,
+    processor.feature_extractor.sampling_rate)
+features = processor(
+    speech_array,
+    sampling_rate=processor.feature_extractor.sampling_rate,
+    return_tensors="pt",
+    padding=True)
+input_values = features.input_values
+attention_mask = features.attention_mask
+with torch.no_grad():
+    logits = model(input_values, attention_mask=attention_mask).logits
+    prediction = processor.batch_decode(logits.numpy()).text
+print(prediction[0])
+# تست
+```
 # [Paper](https://arxiv.org/abs/2006.11477)
+The base model fine-tuned on 108 hours of Commonvoice on 16kHz sampled speech audio. When using the model
+make sure that your speech input is also sampled at 16Khz.
 # Authors: Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli
 # **Abstract**