Update README.md
Browse files
README.md
CHANGED
@@ -28,32 +28,17 @@ widget:
|
|
28 |
</p>
|
29 |
|
30 |
|
31 |
-
# EraX-VL-2B-
|
32 |
## Introduction 🎉
|
33 |
|
34 |
-
|
35 |
-
We are excited to introduce **EraX-VL-2B-V2.0**, a robust multimodal model for **OCR (optical character recognition)** and **VQA (visual question-answering)** that excels in various languages 🌍, with a particular focus on Vietnamese 🇻🇳. The `EraX-VL-2B` model stands out for its precise recognition capabilities across a range of documents 📝, including medical forms 🩺, invoices 🧾, bills of sale 💳, quotes 📄, and medical records 💊. This functionality is expected to be highly beneficial for hospitals 🏥, clinics 💉, insurance companies 🛡️, and other similar applications 📋. Built on the solid foundation of the [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)[1], which we found to be of high quality and fluent in Vietnamese, `EraX-VL-2B` has been fine-tuned to enhance its performance. We plan to continue improving and releasing new versions for free, along with sharing performance benchmarks in the near future.
|
36 |
|
37 |
-
One standing-out feature of **EraX-VL-2B-
|
38 |
|
39 |
-
|
40 |
-
</p> -->
|
41 |
|
42 |
-
**
|
43 |
|
44 |
-
We are excited to introduce **EraX-VL-2B-V2.0**, a robust multimodal model for **OCR (optical character recognition)** and **VQA (visual question-answering)** that excels in various languages 🌍, with a particular focus on Vietnamese 🇻🇳. The `EraX-VL-2B` model stands out for its precise recognition capabilities across a range of documents 📝, including medical forms 🩺, invoices 🧾, bills of sale 💳, quotes 📄, and medical records 💊. This functionality is expected to be highly beneficial for hospitals 🏥, clinics 💉, insurance companies 🛡️, and other similar applications 📋. Built on the solid foundation of the [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)[1], which we found to be of high quality and fluent in Vietnamese, `EraX-VL-2B` has been fine-tuned to enhance its performance. We plan to continue improving and releasing new versions for free, along with sharing performance benchmarks in the near future.
|
45 |
-
|
46 |
-
One standing-out feature of **EraX-VL-2B-V2.0** is the capability to do multi-turn Q&A with pretty good reasoning! Thanks for the size of 2+ billions parameters of base model.
|
47 |
-
|
48 |
-
***NOTA BENE***: EraX-VL-2B-V2.0 is NOT a typical OCR-only tool likes Tesseract but is a Multimodal LLM-based model. To use it effectively, you may have to **twist your prompt carefully** depending on your tasks.
|
49 |
-
|
50 |
-
**EraX-VL-2B-V2.0** is a young member of our **EraX's LànhGPT** collection of LLM models.
|
51 |
-
|
52 |
-
- **Developed by:**
|
53 |
-
- Nguyễn Anh Nguyên ([email protected])
|
54 |
-
- Nguyễn Hồ Nam (BCG)
|
55 |
-
- Phạm Đình Thục ([email protected])
|
56 |
-
- **Funded by:** [Bamboo Capital Group](https://bamboocap.com.vn) and EraX
|
57 |
- **Model type:** Multimodal Transformer with over 2B parameters
|
58 |
- **Languages (NLP):** Primarily Vietnamese with multilingual capabilities
|
59 |
- **License:** Apache 2.0
|
@@ -70,7 +55,7 @@ One standing-out feature of **EraX-VL-2B-V2.0** is the capability to do multi-tu
|
|
70 |
<td><b>VI-MTVQA</b></td>
|
71 |
</tr>
|
72 |
<tr>
|
73 |
-
<th align="left">EraX-VL-7B-
|
74 |
<td align="middle">✘</td>
|
75 |
<td>47.2 </td>
|
76 |
</tr>
|
@@ -85,7 +70,7 @@ One standing-out feature of **EraX-VL-2B-V2.0** is the capability to do multi-tu
|
|
85 |
<td>39.1 </td>
|
86 |
</tr>
|
87 |
<tr>
|
88 |
-
<th align="left"><font color=darkred>EraX-VL-2B-
|
89 |
<td align="middle"> ✅ </td>
|
90 |
<td>38.2 </td>
|
91 |
</tr>
|
@@ -360,27 +345,6 @@ output_text = processor.batch_decode(
|
|
360 |
print(output_text[0])
|
361 |
```
|
362 |
|
363 |
-
## Acknowledgments 👏
|
364 |
-
We thank Khang Đoàn ([5CD-AI](https://huggingface.co/5CD-AI)) for his invaluable support in order to train `EraX-VL-2B-V2.0`. Our appreciation also goes to AAA JS Company for their support and resources, which significantly contributed to this project.
|
365 |
-
|
366 |
-
## Citation 📝
|
367 |
-
<!-- - title={EraX-VL-2B-V2.0: A Highly Efficient Multimodal LLM for Vietnamese, especially for medical forms and bills.},
|
368 |
-
- author={Nguyễn Anh Nguyên and Nguyễn Hồ Nam (BCG) and Dũng Hoàng and Thục Phạm and Nhật Phạm},
|
369 |
-
- helpers={Khang Đoàn and AAA JS Company},
|
370 |
-
- contact={[email protected]},
|
371 |
-
- organization={EraX} -->
|
372 |
-
If you find our project useful, we would appreciate it if you could star our repository and cite our work as follows:
|
373 |
-
```
|
374 |
-
@article{EraX-VL-2B-V2.0,
|
375 |
-
title={EraX-VL-2B-V2.0: A Highly Efficient Multimodal LLM for Vietnamese, especially for medical forms and bills},
|
376 |
-
author={Nguyễn Anh Nguyên and Nguyễn Hồ Nam (BCG) and Phạm Đình Thục and Phạm Huỳnh Nhật},
|
377 |
-
organization={EraX},
|
378 |
-
year={2024},
|
379 |
-
url={https://huggingface.co/erax-ai/EraX-VL-2B-V2.0},
|
380 |
-
github={https://github.com/EraX-JS-Company/EraX-VL-2B-V2.0/}
|
381 |
-
}
|
382 |
-
```
|
383 |
-
|
384 |
## References 📑
|
385 |
[1] Qwen team. Qwen2-VL. 2024.
|
386 |
|
|
|
28 |
</p>
|
29 |
|
30 |
|
31 |
+
# EraX-VL-2B-V1.5
|
32 |
## Introduction 🎉
|
33 |
|
34 |
+
We are excited to introduce **EraX-VL-2B-V1.5**, a robust multimodal model for **OCR (optical character recognition)** and **VQA (visual question-answering)** that excels in various languages 🌍, with a particular focus on Vietnamese 🇻🇳. The `EraX-VL-2B` model stands out for its precise recognition capabilities across a range of documents 📝, including medical forms 🩺, invoices 🧾, bills of sale 💳, quotes 📄, and medical records 💊. This functionality is expected to be highly beneficial for hospitals 🏥, clinics 💉, insurance companies 🛡️, and other similar applications 📋. Built on the solid foundation of the [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)[1], which we found to be of high quality and fluent in Vietnamese, `EraX-VL-2B` has been fine-tuned to enhance its performance. We plan to continue improving and releasing new versions for free, along with sharing performance benchmarks in the near future.
|
|
|
35 |
|
36 |
+
One standing-out feature of **EraX-VL-2B-V1.5** is the capability to do multi-turn Q&A with reasonable reasoning capability!
|
37 |
|
38 |
+
***NOTA BENE***: EraX-VL-2B-V1.5 is NOT a typical OCR-only tool likes Tesseract but is a Multimodal LLM-based model. To use it effectively, you may have to **twist your prompt carefully** depending on your tasks.
|
|
|
39 |
|
40 |
+
**EraX-VL-2B-V1.5** is a young member of our **EraX's LànhGPT** collection of LLM models.
|
41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
- **Model type:** Multimodal Transformer with over 2B parameters
|
43 |
- **Languages (NLP):** Primarily Vietnamese with multilingual capabilities
|
44 |
- **License:** Apache 2.0
|
|
|
55 |
<td><b>VI-MTVQA</b></td>
|
56 |
</tr>
|
57 |
<tr>
|
58 |
+
<th align="left">EraX-VL-7B-V1.5 🥇 </th>
|
59 |
<td align="middle">✘</td>
|
60 |
<td>47.2 </td>
|
61 |
</tr>
|
|
|
70 |
<td>39.1 </td>
|
71 |
</tr>
|
72 |
<tr>
|
73 |
+
<th align="left"><font color=darkred>EraX-VL-2B-V1.5</font></th>
|
74 |
<td align="middle"> ✅ </td>
|
75 |
<td>38.2 </td>
|
76 |
</tr>
|
|
|
345 |
print(output_text[0])
|
346 |
```
|
347 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
348 |
## References 📑
|
349 |
[1] Qwen team. Qwen2-VL. 2024.
|
350 |
|