scb10x
/

llama-3-typhoon-v1.5-8b-audio-preview

@@ -9,13 +9,9 @@ pipeline_tag: text-generation
 # Typhoon-Audio Preview
-<div align="center">
-<img src="https://i.postimg.cc/DycZ98w2/typhoon-audio.png" alt="typhoon-audio" style="width: 100%; max-width: 20cm;  margin-left: 'auto'; margin-right:'auto'; display:'block'"/>
-</div>
 **llama-3-typhoon-v1.5-8b-audio-preview** is a 🇹🇭 Thai *audio-language* model. It supports both text and audio input modalities natively while the output is text. This version (August 2024) is our first audio-language model as a part of our multimodal effort, and it is a research *preview* version. The base language model is our [llama-3-typhoon-v1.5-8b-instruct](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b-instruct).
-More details can be found in our [release blog]() and [technical report](). *To acknowledge Meta's effort in creating the foundation model and to comply with the license, we explicitly include "llama-3" in the model name.
 ## Model Description
@@ -58,8 +54,31 @@ print(response)
 ## Evaluation Results
 ## Acknowledgements
-In addition to common libraries and tools, we would like to thank the following projects for releasing model weights and code:
-- Training recipe: [SALMONN](https://github.com/bytedance/SALMONN) from ByteDance
-- Audio encoder: [BEATs]( https://github.com/microsoft/unilm/tree/master/beats) from Microsoft
-- Whisper encoder: [Fine-tuned Whisper](https://huggingface.co/biodatlab/whisper-th-large-v3-combined) from Biomedical and Data Lab @ Mahidol University

 # Typhoon-Audio Preview
 **llama-3-typhoon-v1.5-8b-audio-preview** is a 🇹🇭 Thai *audio-language* model. It supports both text and audio input modalities natively while the output is text. This version (August 2024) is our first audio-language model as a part of our multimodal effort, and it is a research *preview* version. The base language model is our [llama-3-typhoon-v1.5-8b-instruct](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b-instruct).
+More details can be found in our [release blog](https://blog.opentyphoon.ai/typhoon-audio-preview-release-6fbb3f938287) and [technical report](). *To acknowledge Meta's effort in creating the foundation model and to comply with the license, we explicitly include "llama-3" in the model name.
 ## Model Description
 ## Evaluation Results
+| Model                       | ASR-en (WER↓)      | ASR-th (WER↓) | En2Th (BLEU↑) | X2Th (BLEU↑) | Th2En (BLEU↑) |
+|:----------------------------|:-------------------|:--------------|:--------------|:-------------|:--------------|
+| SALMONN-13B                 | 5.79      | 98.07         | 0.07         | 0.10        | 14.97        |
+| DiVA-8B                     | 30.28     | 65.21         | 9.82         | 5.31        | 7.97         |
+| Gemini-1.5-pro-001          | 5.98      | 13.56         | 20.69        | 13.52       | 22.54        |
+| Typhoon-Audio-Preview       | 8.72      | 14.17         | 17.52        | 10.67       | 24.14        |
+| Model                          | Gender-th (Acc) | SpokenQA-th (F1)   | SpeechInstruct-th |
+|:-------------------------------|:---------------|:-------------------|:-------------------|
+| SALMONN-13B                   |     93.26       |    2.95     |        1.18         |
+| DiVA-8B                       |     50.12       |    15.13    |        2.68         |
+| Gemini-1.5-pro-001            |     81.32       |    62.10    |        3.93         |
+| Typhoon-Audio-Preview         |     93.74       |    64.60    |        6.11         |
+## Intended Uses & Limitations
+This model is a pretrained base model. Thus, it may not be able to follow human instructions without using one/few-shot learning or instruction fine-tuning. The model does not have any moderation mechanisms, and may generate harmful or inappropriate responses.
+## Follow us & Support
+- https://twitter.com/opentyphoon
+- https://discord.gg/CqyBscMFpg
 ## Acknowledgements
+We would like to thank the SALMONN team for open-sourcing their code and data, and thanks to the Biomedical and Data Lab at Mahidol University for releasing the fine-tuned Whisper that allowed us to adopt its encoder. Thanks to many other open-source projects for their useful knowledge sharing, data, code, and model weights.
+## Typhoon Team
+Potsawee Manakul, Sittipong Sripaisarnmongkol, Natapong Nitarach, Warit Sirichotedumrong, Adisai Na-Thalang, Phatrasek Jirabovonvisut, Parinthapat Pengpun, Pathomporn Chokchainant, Kasima Tharnpipitchai, Kunat Pipatanakul