PATTARA TIPAKSORN commited on
Commit
7e80ee4
1 Parent(s): dae6ad4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -3
README.md CHANGED
@@ -1,3 +1,74 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - th
5
+ - en
6
+ pipeline_tag: text-generation
7
+ library_name: transformers
8
+ tags:
9
+ - chat
10
+ - audio
11
+ ---
12
+
13
+ # Pathumma-Audio Beta
14
+
15
+ ## Model Description
16
+ **qwen2-pathumma-7b-audio-beta** is a 7 billion parameter Thai large language model designed for audio understanding tasks. The model can process multiple types of audio inputs including **speech, general audio, and music**, convering them into text output.
17
+
18
+ ## Model Architecture
19
+ The model combines two key components:
20
+ - 1. Base Language Model: [OpenThaiLLM-DoodNiLT-V1.0.0-Beta-7B](https://huggingface.co/nectec/OpenThaiLLM-DoodNiLT-V1.0.0-Beta-7B) (Qwen2)
21
+ - 2. Base Speech Encoder: [pathumma-whisper-th-large-v3](https://huggingface.co/nectec/pathumma-whisper-th-large-v3) (Whisper)
22
+
23
+ ## Quickstart
24
+ To load the model and generate responses using the Hugging Face Transformers library, follow the steps below.
25
+
26
+ #### 1. Install the required dependencies:
27
+ Make sure you have the necessary libraries installed by running:
28
+ ```shell
29
+ pip install librosa==0.10.2.post1 torch==2.3.1 transformers==4.44.2 peft==0.12.0
30
+ ```
31
+ #### 2. Load the model and generate a response:
32
+ You can load the model and use it to generate a response with the following code snippet:
33
+ ```python
34
+ import torch
35
+ import librosa
36
+ from transformers import AutoModel
37
+
38
+ device = "cuda"
39
+
40
+ model = AutoModel.from_pretrained(
41
+ "pattara12345/qwen2_pathumma_audio_beta",
42
+ torch_dtype=torch.bfloat16,
43
+ lora_infer_mode=True,
44
+ init_from_scratch=True,
45
+ trust_remote_code=True
46
+ )
47
+ model = model.to(device)
48
+
49
+ prompt = "ช่วยถอดความเสียงนี้ให้หน่อย"
50
+ audio_path = "audio_path.wav"
51
+ audio, sr = librosa.load(audio_path, sr=16000)
52
+
53
+ model.eval()
54
+ with torch.no_grad():
55
+ response = model.generate(
56
+ raw_wave=audio,
57
+ prompts=prompt,
58
+ device=device,
59
+ max_new_tokens=200,
60
+ repetition_penalty=1.0,
61
+ )
62
+ print(response[0])
63
+ ```
64
+ ## Limitations
65
+ More information needed
66
+
67
+ ## Citation
68
+ More information needed
69
+
70
+ ## Acknowledgements
71
+ We are grateful to ThaiSC, also known as NSTDA Supercomputer Centre, for providing the LANTA that was utilised for model training and finetuning. Additionally, we would like to express our gratitude to the SALMONN team for making their code publicly available, and to Typhoon Audio at SCB 10X for making available the huggingface project, source code, and technical paper, which served as a valuable guide for us. Many other open-source projects have contributed valuable information, code, data, and model weights; we are grateful to them all.
72
+
73
+ ## Pathumma Audio Team
74
+ *Pattara Tipkasorn*, Wayupuk Sommuang, Oatsada Chatthong, *Kwanchiva Thangthai*