Update README.md
Browse files
README.md
CHANGED
@@ -9,13 +9,9 @@ pipeline_tag: text-generation
|
|
9 |
|
10 |
# Typhoon-Audio Preview
|
11 |
|
12 |
-
<div align="center">
|
13 |
-
<img src="https://i.postimg.cc/DycZ98w2/typhoon-audio.png" alt="typhoon-audio" style="width: 100%; max-width: 20cm; margin-left: 'auto'; margin-right:'auto'; display:'block'"/>
|
14 |
-
</div>
|
15 |
-
|
16 |
**llama-3-typhoon-v1.5-8b-audio-preview** is a 🇹🇭 Thai *audio-language* model. It supports both text and audio input modalities natively while the output is text. This version (August 2024) is our first audio-language model as a part of our multimodal effort, and it is a research *preview* version. The base language model is our [llama-3-typhoon-v1.5-8b-instruct](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b-instruct).
|
17 |
|
18 |
-
More details can be found in our [release blog]() and [technical report](). *To acknowledge Meta's effort in creating the foundation model and to comply with the license, we explicitly include "llama-3" in the model name.
|
19 |
|
20 |
## Model Description
|
21 |
|
@@ -58,8 +54,31 @@ print(response)
|
|
58 |
|
59 |
## Evaluation Results
|
60 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
61 |
## Acknowledgements
|
62 |
-
|
63 |
-
|
64 |
-
|
65 |
-
|
|
|
9 |
|
10 |
# Typhoon-Audio Preview
|
11 |
|
|
|
|
|
|
|
|
|
12 |
**llama-3-typhoon-v1.5-8b-audio-preview** is a 🇹🇭 Thai *audio-language* model. It supports both text and audio input modalities natively while the output is text. This version (August 2024) is our first audio-language model as a part of our multimodal effort, and it is a research *preview* version. The base language model is our [llama-3-typhoon-v1.5-8b-instruct](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b-instruct).
|
13 |
|
14 |
+
More details can be found in our [release blog](https://blog.opentyphoon.ai/typhoon-audio-preview-release-6fbb3f938287) and [technical report](). *To acknowledge Meta's effort in creating the foundation model and to comply with the license, we explicitly include "llama-3" in the model name.
|
15 |
|
16 |
## Model Description
|
17 |
|
|
|
54 |
|
55 |
## Evaluation Results
|
56 |
|
57 |
+
| Model | ASR-en (WER↓) | ASR-th (WER↓) | En2Th (BLEU↑) | X2Th (BLEU↑) | Th2En (BLEU↑) |
|
58 |
+
|:----------------------------|:-------------------|:--------------|:--------------|:-------------|:--------------|
|
59 |
+
| SALMONN-13B | 5.79 | 98.07 | 0.07 | 0.10 | 14.97 |
|
60 |
+
| DiVA-8B | 30.28 | 65.21 | 9.82 | 5.31 | 7.97 |
|
61 |
+
| Gemini-1.5-pro-001 | 5.98 | 13.56 | 20.69 | 13.52 | 22.54 |
|
62 |
+
| Typhoon-Audio-Preview | 8.72 | 14.17 | 17.52 | 10.67 | 24.14 |
|
63 |
+
|
64 |
+
|
65 |
+
| Model | Gender-th (Acc) | SpokenQA-th (F1) | SpeechInstruct-th |
|
66 |
+
|:-------------------------------|:---------------|:-------------------|:-------------------|
|
67 |
+
| SALMONN-13B | 93.26 | 2.95 | 1.18 |
|
68 |
+
| DiVA-8B | 50.12 | 15.13 | 2.68 |
|
69 |
+
| Gemini-1.5-pro-001 | 81.32 | 62.10 | 3.93 |
|
70 |
+
| Typhoon-Audio-Preview | 93.74 | 64.60 | 6.11 |
|
71 |
+
|
72 |
+
|
73 |
+
## Intended Uses & Limitations
|
74 |
+
This model is a pretrained base model. Thus, it may not be able to follow human instructions without using one/few-shot learning or instruction fine-tuning. The model does not have any moderation mechanisms, and may generate harmful or inappropriate responses.
|
75 |
+
|
76 |
+
## Follow us & Support
|
77 |
+
- https://twitter.com/opentyphoon
|
78 |
+
- https://discord.gg/CqyBscMFpg
|
79 |
+
|
80 |
## Acknowledgements
|
81 |
+
We would like to thank the SALMONN team for open-sourcing their code and data, and thanks to the Biomedical and Data Lab at Mahidol University for releasing the fine-tuned Whisper that allowed us to adopt its encoder. Thanks to many other open-source projects for their useful knowledge sharing, data, code, and model weights.
|
82 |
+
|
83 |
+
## Typhoon Team
|
84 |
+
Potsawee Manakul, Sittipong Sripaisarnmongkol, Natapong Nitarach, Warit Sirichotedumrong, Adisai Na-Thalang, Phatrasek Jirabovonvisut, Parinthapat Pengpun, Pathomporn Chokchainant, Kasima Tharnpipitchai, Kunat Pipatanakul
|