SLPL
/

sadrasabouri commited on
Commit
64771ba
1 Parent(s): 1da8a1a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -2
README.md CHANGED
@@ -48,11 +48,56 @@ model-index:
48
 
49
  [Sharif-wav2vec2](https://huggingface.co/SLPL/Sharif-wav2vec2/)
50
 
51
- The base model fine-tuned on 108 hours of Commonvoice on 16kHz sampled speech audio. When using the model
52
- make sure that your speech input is also sampled at 16Khz.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
  # [Paper](https://arxiv.org/abs/2006.11477)
55
 
 
 
 
56
  # Authors: Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli
57
 
58
  # **Abstract**
 
48
 
49
  [Sharif-wav2vec2](https://huggingface.co/SLPL/Sharif-wav2vec2/)
50
 
51
+ Prior to the usage you may need to install below dependencies:
52
+
53
+ ```shell
54
+ pip -q install pyctcdecode
55
+ python -m pip -q install pypi-kenlm
56
+ ```
57
+
58
+ Then you can use it with:
59
+ ```python
60
+ import tensorflow
61
+ import torchaudio
62
+ import torch
63
+ import librosa
64
+ import numpy as np
65
+ from transformers import AutoProcessor, AutoModelForCTC
66
+
67
+ processor = AutoProcessor.from_pretrained("SLPL/Sharif-wav2vec2")
68
+ model = AutoModelForCTC.from_pretrained("SLPL/Sharif-wav2vec2")
69
+
70
+
71
+
72
+
73
+ speech_array, sampling_rate = torchaudio.load("test.wav")
74
+ speech_array = speech_array.squeeze().numpy()
75
+ speech_array = librosa.resample(
76
+ np.asarray(speech_array),
77
+ sampling_rate,
78
+ processor.feature_extractor.sampling_rate)
79
+
80
+
81
+ features = processor(
82
+ speech_array,
83
+ sampling_rate=processor.feature_extractor.sampling_rate,
84
+ return_tensors="pt",
85
+ padding=True)
86
+ input_values = features.input_values
87
+ attention_mask = features.attention_mask
88
+ with torch.no_grad():
89
+ logits = model(input_values, attention_mask=attention_mask).logits
90
+ prediction = processor.batch_decode(logits.numpy()).text
91
+
92
+ print(prediction[0])
93
+ # تست
94
+ ```
95
 
96
  # [Paper](https://arxiv.org/abs/2006.11477)
97
 
98
+ The base model fine-tuned on 108 hours of Commonvoice on 16kHz sampled speech audio. When using the model
99
+ make sure that your speech input is also sampled at 16Khz.
100
+
101
  # Authors: Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli
102
 
103
  # **Abstract**