erax-ai
/

EraX-VL-2B-V1.5

@@ -28,32 +28,17 @@ widget:
 </p>
-# EraX-VL-2B-V2.0
 ## Introduction 🎉
-<!-- <p style="text-align: justify;">
-    We are excited to introduce **EraX-VL-2B-V2.0**, a robust multimodal model for **OCR (optical character recognition)** and **VQA (visual question-answering)** that excels in various languages 🌍, with a particular focus on Vietnamese 🇻🇳. The `EraX-VL-2B` model stands out for its precise recognition capabilities across a range of documents 📝, including medical forms 🩺, invoices 🧾, bills of sale 💳, quotes 📄, and medical records 💊. This functionality is expected to be highly beneficial for hospitals 🏥, clinics 💉, insurance companies 🛡️, and other similar applications 📋. Built on the solid foundation of the [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)[1], which we found to be of high quality and fluent in Vietnamese, `EraX-VL-2B` has been fine-tuned to enhance its performance. We plan to continue improving and releasing new versions for free, along with sharing performance benchmarks in the near future.
-One standing-out feature of **EraX-VL-2B-V2.0** is the capability to do multi-turn Q&A with pretty good reasoning! Thanks for the size of 7+ billions parameters of base model.
-**EraX-VL-2B-V2.0** is a young member of our **EraX's LànhGPT** collection of LLM models.
-</p> -->
-**WE ARE MOVING to <a href="https://huggingface.co/erax-ai/EraX-VL-2B-V2.0/" target="_blank">EraX-AI</a> repository from 22 December 2024. Follow up so you do not miss great news coming up.**
-We are excited to introduce **EraX-VL-2B-V2.0**, a robust multimodal model for **OCR (optical character recognition)** and **VQA (visual question-answering)** that excels in various languages 🌍, with a particular focus on Vietnamese 🇻🇳. The `EraX-VL-2B` model stands out for its precise recognition capabilities across a range of documents 📝, including medical forms 🩺, invoices 🧾, bills of sale 💳, quotes 📄, and medical records 💊. This functionality is expected to be highly beneficial for hospitals 🏥, clinics 💉, insurance companies 🛡️, and other similar applications 📋. Built on the solid foundation of the [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)[1], which we found to be of high quality and fluent in Vietnamese, `EraX-VL-2B` has been fine-tuned to enhance its performance. We plan to continue improving and releasing new versions for free, along with sharing performance benchmarks in the near future.
-One standing-out feature of **EraX-VL-2B-V2.0** is the capability to do multi-turn Q&A with pretty good reasoning! Thanks for the size of 2+ billions parameters of base model.
-***NOTA BENE***: EraX-VL-2B-V2.0 is NOT a typical OCR-only tool likes Tesseract but is a Multimodal LLM-based model. To use it effectively, you may have to **twist your prompt carefully** depending on your tasks.
-**EraX-VL-2B-V2.0** is a young member of our **EraX's LànhGPT** collection of LLM models.
-- **Developed by:**
-  - Nguyễn Anh Nguyên ([email protected])
-  - Nguyễn Hồ Nam (BCG)
-  - Phạm Đình Thục ([email protected])
-- **Funded by:** [Bamboo Capital Group](https://bamboocap.com.vn) and EraX
 - **Model type:** Multimodal Transformer with over 2B parameters
 - **Languages (NLP):** Primarily Vietnamese with multilingual capabilities
 - **License:** Apache 2.0
@@ -70,7 +55,7 @@ One standing-out feature of **EraX-VL-2B-V2.0** is the capability to do multi-tu
         <td><b>VI-MTVQA</b></td>
     </tr>
     <tr>
-        <th align="left">EraX-VL-7B-V2.0 🥇 </th>
         <td align="middle">✘</td>
         <td>47.2 </td>
     </tr>
@@ -85,7 +70,7 @@ One standing-out feature of **EraX-VL-2B-V2.0** is the capability to do multi-tu
         <td>39.1 </td>
     </tr>
     <tr>
-        <th align="left"><font color=darkred>EraX-VL-2B-V2.0</font></th>
         <td align="middle"> ✅ </td>
         <td>38.2 </td>
     </tr>
@@ -360,27 +345,6 @@ output_text = processor.batch_decode(
 print(output_text[0])
 ```
-## Acknowledgments 👏
-We thank Khang Đoàn ([5CD-AI](https://huggingface.co/5CD-AI)) for his invaluable support in order to train `EraX-VL-2B-V2.0`. Our appreciation also goes to AAA JS Company for their support and resources, which significantly contributed to this project.
-## Citation 📝
-<!-- - title={EraX-VL-2B-V2.0: A Highly Efficient Multimodal LLM for Vietnamese, especially for medical forms and bills.},
-- author={Nguyễn Anh Nguyên and Nguyễn Hồ Nam (BCG) and Dũng Hoàng and Thục Phạm and Nhật Phạm},
-- helpers={Khang Đoàn and AAA JS Company},
-- contact={[email protected]},
-- organization={EraX} -->
-If you find our project useful, we would appreciate it if you could star our repository and cite our work as follows:
-```
-@article{EraX-VL-2B-V2.0,
-  title={EraX-VL-2B-V2.0: A Highly Efficient Multimodal LLM for Vietnamese, especially for medical forms and bills},
-  author={Nguyễn Anh Nguyên and Nguyễn Hồ Nam (BCG) and Phạm Đình Thục and Phạm Huỳnh Nhật},
-  organization={EraX},
-  year={2024},
-  url={https://huggingface.co/erax-ai/EraX-VL-2B-V2.0},
-  github={https://github.com/EraX-JS-Company/EraX-VL-2B-V2.0/}
-}
-```
 ## References 📑
 [1] Qwen team. Qwen2-VL. 2024.

 </p>
+# EraX-VL-2B-V1.5
 ## Introduction 🎉
+We are excited to introduce **EraX-VL-2B-V1.5**, a robust multimodal model for **OCR (optical character recognition)** and **VQA (visual question-answering)** that excels in various languages 🌍, with a particular focus on Vietnamese 🇻🇳. The `EraX-VL-2B` model stands out for its precise recognition capabilities across a range of documents 📝, including medical forms 🩺, invoices 🧾, bills of sale 💳, quotes 📄, and medical records 💊. This functionality is expected to be highly beneficial for hospitals 🏥, clinics 💉, insurance companies 🛡️, and other similar applications 📋. Built on the solid foundation of the [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)[1], which we found to be of high quality and fluent in Vietnamese, `EraX-VL-2B` has been fine-tuned to enhance its performance. We plan to continue improving and releasing new versions for free, along with sharing performance benchmarks in the near future.
+One standing-out feature of **EraX-VL-2B-V1.5** is the capability to do multi-turn Q&A with reasonable reasoning capability!
+***NOTA BENE***: EraX-VL-2B-V1.5 is NOT a typical OCR-only tool likes Tesseract but is a Multimodal LLM-based model. To use it effectively, you may have to **twist your prompt carefully** depending on your tasks.
+**EraX-VL-2B-V1.5** is a young member of our **EraX's LànhGPT** collection of LLM models.
 - **Model type:** Multimodal Transformer with over 2B parameters
 - **Languages (NLP):** Primarily Vietnamese with multilingual capabilities
 - **License:** Apache 2.0
         <td><b>VI-MTVQA</b></td>
     </tr>
     <tr>
+        <th align="left">EraX-VL-7B-V1.5 🥇 </th>
         <td align="middle">✘</td>
         <td>47.2 </td>
     </tr>
         <td>39.1 </td>
     </tr>
     <tr>
+        <th align="left"><font color=darkred>EraX-VL-2B-V1.5</font></th>
         <td align="middle"> ✅ </td>
         <td>38.2 </td>
     </tr>
 print(output_text[0])
 ```
 ## References 📑
 [1] Qwen team. Qwen2-VL. 2024.