potsawee commited on
Commit
add295e
1 Parent(s): e1fd6c2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -9
README.md CHANGED
@@ -9,13 +9,9 @@ pipeline_tag: text-generation
9
 
10
  # Typhoon-Audio Preview
11
 
12
- <div align="center">
13
- <img src="https://i.postimg.cc/DycZ98w2/typhoon-audio.png" alt="typhoon-audio" style="width: 100%; max-width: 20cm; margin-left: 'auto'; margin-right:'auto'; display:'block'"/>
14
- </div>
15
-
16
  **llama-3-typhoon-v1.5-8b-audio-preview** is a 🇹🇭 Thai *audio-language* model. It supports both text and audio input modalities natively while the output is text. This version (August 2024) is our first audio-language model as a part of our multimodal effort, and it is a research *preview* version. The base language model is our [llama-3-typhoon-v1.5-8b-instruct](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b-instruct).
17
 
18
- More details can be found in our [release blog]() and [technical report](). *To acknowledge Meta's effort in creating the foundation model and to comply with the license, we explicitly include "llama-3" in the model name.
19
 
20
  ## Model Description
21
 
@@ -58,8 +54,31 @@ print(response)
58
 
59
  ## Evaluation Results
60
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
  ## Acknowledgements
62
- In addition to common libraries and tools, we would like to thank the following projects for releasing model weights and code:
63
- - Training recipe: [SALMONN](https://github.com/bytedance/SALMONN) from ByteDance
64
- - Audio encoder: [BEATs]( https://github.com/microsoft/unilm/tree/master/beats) from Microsoft
65
- - Whisper encoder: [Fine-tuned Whisper](https://huggingface.co/biodatlab/whisper-th-large-v3-combined) from Biomedical and Data Lab @ Mahidol University
 
9
 
10
  # Typhoon-Audio Preview
11
 
 
 
 
 
12
  **llama-3-typhoon-v1.5-8b-audio-preview** is a 🇹🇭 Thai *audio-language* model. It supports both text and audio input modalities natively while the output is text. This version (August 2024) is our first audio-language model as a part of our multimodal effort, and it is a research *preview* version. The base language model is our [llama-3-typhoon-v1.5-8b-instruct](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b-instruct).
13
 
14
+ More details can be found in our [release blog](https://blog.opentyphoon.ai/typhoon-audio-preview-release-6fbb3f938287) and [technical report](). *To acknowledge Meta's effort in creating the foundation model and to comply with the license, we explicitly include "llama-3" in the model name.
15
 
16
  ## Model Description
17
 
 
54
 
55
  ## Evaluation Results
56
 
57
+ | Model | ASR-en (WER↓) | ASR-th (WER↓) | En2Th (BLEU↑) | X2Th (BLEU↑) | Th2En (BLEU↑) |
58
+ |:----------------------------|:-------------------|:--------------|:--------------|:-------------|:--------------|
59
+ | SALMONN-13B | 5.79 | 98.07 | 0.07 | 0.10 | 14.97 |
60
+ | DiVA-8B | 30.28 | 65.21 | 9.82 | 5.31 | 7.97 |
61
+ | Gemini-1.5-pro-001 | 5.98 | 13.56 | 20.69 | 13.52 | 22.54 |
62
+ | Typhoon-Audio-Preview | 8.72 | 14.17 | 17.52 | 10.67 | 24.14 |
63
+
64
+
65
+ | Model | Gender-th (Acc) | SpokenQA-th (F1) | SpeechInstruct-th |
66
+ |:-------------------------------|:---------------|:-------------------|:-------------------|
67
+ | SALMONN-13B | 93.26 | 2.95 | 1.18 |
68
+ | DiVA-8B | 50.12 | 15.13 | 2.68 |
69
+ | Gemini-1.5-pro-001 | 81.32 | 62.10 | 3.93 |
70
+ | Typhoon-Audio-Preview | 93.74 | 64.60 | 6.11 |
71
+
72
+
73
+ ## Intended Uses & Limitations
74
+ This model is a pretrained base model. Thus, it may not be able to follow human instructions without using one/few-shot learning or instruction fine-tuning. The model does not have any moderation mechanisms, and may generate harmful or inappropriate responses.
75
+
76
+ ## Follow us & Support
77
+ - https://twitter.com/opentyphoon
78
+ - https://discord.gg/CqyBscMFpg
79
+
80
  ## Acknowledgements
81
+ We would like to thank the SALMONN team for open-sourcing their code and data, and thanks to the Biomedical and Data Lab at Mahidol University for releasing the fine-tuned Whisper that allowed us to adopt its encoder. Thanks to many other open-source projects for their useful knowledge sharing, data, code, and model weights.
82
+
83
+ ## Typhoon Team
84
+ Potsawee Manakul, Sittipong Sripaisarnmongkol, Natapong Nitarach, Warit Sirichotedumrong, Adisai Na-Thalang, Phatrasek Jirabovonvisut, Parinthapat Pengpun, Pathomporn Chokchainant, Kasima Tharnpipitchai, Kunat Pipatanakul