huseinzol05
commited on
Commit
•
cd5e758
1
Parent(s):
98d1e46
Update README.md
Browse files
README.md
CHANGED
@@ -15,4 +15,46 @@ Finetune Whisper Medium on Malaysian dataset,
|
|
15 |
|
16 |
Script at https://github.com/mesolitica/malaya-speech/tree/malaysian-speech/session/whisper
|
17 |
|
18 |
-
Wandb at https://wandb.ai/huseinzol05/malaysian-whisper-medium?workspace=user-huseinzol05
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
16 |
Script at https://github.com/mesolitica/malaya-speech/tree/malaysian-speech/session/whisper
|
17 |
|
18 |
+
Wandb at https://wandb.ai/huseinzol05/malaysian-whisper-medium?workspace=user-huseinzol05
|
19 |
+
|
20 |
+
## What languages we finetuned?
|
21 |
+
|
22 |
+
1. `ms`, Malay, can be standard malay and local malay.
|
23 |
+
2. `en`, English, can be standard english and manglish.
|
24 |
+
|
25 |
+
## how-to
|
26 |
+
|
27 |
+
```python
|
28 |
+
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq, pipeline
|
29 |
+
from datasets import Audio
|
30 |
+
import requests
|
31 |
+
|
32 |
+
sr = 16000
|
33 |
+
audio = Audio(sampling_rate=sr)
|
34 |
+
|
35 |
+
processor = AutoProcessor.from_pretrained("mesolitica/malaysian-whisper-medium")
|
36 |
+
model = AutoModelForSpeechSeq2Seq.from_pretrained("mesolitica/malaysian-whisper-medium")
|
37 |
+
|
38 |
+
r = requests.get('https://huggingface.co/datasets/huseinzol05/malaya-speech-stt-test-set/resolve/main/test.mp3')
|
39 |
+
y = audio.decode_example(audio.encode_example(r.content))['array']
|
40 |
+
inputs = processor([y], return_tensors = 'pt')
|
41 |
+
r = model.generate(inputs['input_features'], language='ms', return_timestamps=True)
|
42 |
+
processor.tokenizer.decode(r[0])
|
43 |
+
```
|
44 |
+
|
45 |
+
```text
|
46 |
+
'<|startoftranscript|><|ms|><|transcribe|> Zamily On Aging di Vener Australia, Australia yang telah diadakan pada tahun 1982 dan berasaskan unjuran tersebut maka jabatan perangkaan Malaysia menganggarkan menjelang tahun 2005 sejumlah 15% penduduk kita adalah daripada kalangan warga emas. Untuk makluman Tuan Yang Pertua dan juga Alian Bohon, pembangunan sistem pendafiran warga emas ataupun kita sebutkan event adalah usaha kerajaan ke arah merealisasikan objektif yang telah digangkatkan<|endoftext|>'
|
47 |
+
```
|
48 |
+
|
49 |
+
```python
|
50 |
+
r = model.generate(inputs['input_features'], language='en', return_timestamps=True)
|
51 |
+
processor.tokenizer.decode(r[0])
|
52 |
+
```
|
53 |
+
|
54 |
+
```text
|
55 |
+
<|startoftranscript|><|en|><|transcribe|> Assembly on Aging, Divina Australia, Australia, which has been provided in 1982 and the operation of the transportation of Malaysia's implementation to prevent the tourism of the 25th, 15% of our population is from the market. For the information of the President and also the respected, the development of the market system or we have made an event.<|endoftext|>
|
56 |
+
```
|
57 |
+
|
58 |
+
## how to predict longer audio?
|
59 |
+
|
60 |
+
You need to chunk the audio by 30 seconds, and predict each samples.
|