erax commited on
Commit
4636d03
·
verified ·
1 Parent(s): a817d1e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -43
README.md CHANGED
@@ -28,32 +28,17 @@ widget:
28
  </p>
29
 
30
 
31
- # EraX-VL-2B-V2.0
32
  ## Introduction 🎉
33
 
34
- <!-- <p style="text-align: justify;">
35
- We are excited to introduce **EraX-VL-2B-V2.0**, a robust multimodal model for **OCR (optical character recognition)** and **VQA (visual question-answering)** that excels in various languages 🌍, with a particular focus on Vietnamese 🇻🇳. The `EraX-VL-2B` model stands out for its precise recognition capabilities across a range of documents 📝, including medical forms 🩺, invoices 🧾, bills of sale 💳, quotes 📄, and medical records 💊. This functionality is expected to be highly beneficial for hospitals 🏥, clinics 💉, insurance companies 🛡️, and other similar applications 📋. Built on the solid foundation of the [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)[1], which we found to be of high quality and fluent in Vietnamese, `EraX-VL-2B` has been fine-tuned to enhance its performance. We plan to continue improving and releasing new versions for free, along with sharing performance benchmarks in the near future.
36
 
37
- One standing-out feature of **EraX-VL-2B-V2.0** is the capability to do multi-turn Q&A with pretty good reasoning! Thanks for the size of 7+ billions parameters of base model.
38
 
39
- **EraX-VL-2B-V2.0** is a young member of our **EraX's LànhGPT** collection of LLM models.
40
- </p> -->
41
 
42
- **WE ARE MOVING to <a href="https://huggingface.co/erax-ai/EraX-VL-2B-V2.0/" target="_blank">EraX-AI</a> repository from 22 December 2024. Follow up so you do not miss great news coming up.**
43
 
44
- We are excited to introduce **EraX-VL-2B-V2.0**, a robust multimodal model for **OCR (optical character recognition)** and **VQA (visual question-answering)** that excels in various languages 🌍, with a particular focus on Vietnamese 🇻🇳. The `EraX-VL-2B` model stands out for its precise recognition capabilities across a range of documents 📝, including medical forms 🩺, invoices 🧾, bills of sale 💳, quotes 📄, and medical records 💊. This functionality is expected to be highly beneficial for hospitals 🏥, clinics 💉, insurance companies 🛡️, and other similar applications 📋. Built on the solid foundation of the [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)[1], which we found to be of high quality and fluent in Vietnamese, `EraX-VL-2B` has been fine-tuned to enhance its performance. We plan to continue improving and releasing new versions for free, along with sharing performance benchmarks in the near future.
45
-
46
- One standing-out feature of **EraX-VL-2B-V2.0** is the capability to do multi-turn Q&A with pretty good reasoning! Thanks for the size of 2+ billions parameters of base model.
47
-
48
- ***NOTA BENE***: EraX-VL-2B-V2.0 is NOT a typical OCR-only tool likes Tesseract but is a Multimodal LLM-based model. To use it effectively, you may have to **twist your prompt carefully** depending on your tasks.
49
-
50
- **EraX-VL-2B-V2.0** is a young member of our **EraX's LànhGPT** collection of LLM models.
51
-
52
- - **Developed by:**
53
- - Nguyễn Anh Nguyên ([email protected])
54
- - Nguyễn Hồ Nam (BCG)
55
- - Phạm Đình Thục ([email protected])
56
- - **Funded by:** [Bamboo Capital Group](https://bamboocap.com.vn) and EraX
57
  - **Model type:** Multimodal Transformer with over 2B parameters
58
  - **Languages (NLP):** Primarily Vietnamese with multilingual capabilities
59
  - **License:** Apache 2.0
@@ -70,7 +55,7 @@ One standing-out feature of **EraX-VL-2B-V2.0** is the capability to do multi-tu
70
  <td><b>VI-MTVQA</b></td>
71
  </tr>
72
  <tr>
73
- <th align="left">EraX-VL-7B-V2.0 🥇 </th>
74
  <td align="middle">✘</td>
75
  <td>47.2 </td>
76
  </tr>
@@ -85,7 +70,7 @@ One standing-out feature of **EraX-VL-2B-V2.0** is the capability to do multi-tu
85
  <td>39.1 </td>
86
  </tr>
87
  <tr>
88
- <th align="left"><font color=darkred>EraX-VL-2B-V2.0</font></th>
89
  <td align="middle"> ✅ </td>
90
  <td>38.2 </td>
91
  </tr>
@@ -360,27 +345,6 @@ output_text = processor.batch_decode(
360
  print(output_text[0])
361
  ```
362
 
363
- ## Acknowledgments 👏
364
- We thank Khang Đoàn ([5CD-AI](https://huggingface.co/5CD-AI)) for his invaluable support in order to train `EraX-VL-2B-V2.0`. Our appreciation also goes to AAA JS Company for their support and resources, which significantly contributed to this project.
365
-
366
- ## Citation 📝
367
- <!-- - title={EraX-VL-2B-V2.0: A Highly Efficient Multimodal LLM for Vietnamese, especially for medical forms and bills.},
368
- - author={Nguyễn Anh Nguyên and Nguyễn Hồ Nam (BCG) and Dũng Hoàng and Thục Phạm and Nhật Phạm},
369
- - helpers={Khang Đoàn and AAA JS Company},
370
- - contact={[email protected]},
371
- - organization={EraX} -->
372
- If you find our project useful, we would appreciate it if you could star our repository and cite our work as follows:
373
- ```
374
- @article{EraX-VL-2B-V2.0,
375
- title={EraX-VL-2B-V2.0: A Highly Efficient Multimodal LLM for Vietnamese, especially for medical forms and bills},
376
- author={Nguyễn Anh Nguyên and Nguyễn Hồ Nam (BCG) and Phạm Đình Thục and Phạm Huỳnh Nhật},
377
- organization={EraX},
378
- year={2024},
379
- url={https://huggingface.co/erax-ai/EraX-VL-2B-V2.0},
380
- github={https://github.com/EraX-JS-Company/EraX-VL-2B-V2.0/}
381
- }
382
- ```
383
-
384
  ## References 📑
385
  [1] Qwen team. Qwen2-VL. 2024.
386
 
 
28
  </p>
29
 
30
 
31
+ # EraX-VL-2B-V1.5
32
  ## Introduction 🎉
33
 
34
+ We are excited to introduce **EraX-VL-2B-V1.5**, a robust multimodal model for **OCR (optical character recognition)** and **VQA (visual question-answering)** that excels in various languages 🌍, with a particular focus on Vietnamese 🇻🇳. The `EraX-VL-2B` model stands out for its precise recognition capabilities across a range of documents 📝, including medical forms 🩺, invoices 🧾, bills of sale 💳, quotes 📄, and medical records 💊. This functionality is expected to be highly beneficial for hospitals 🏥, clinics 💉, insurance companies 🛡️, and other similar applications 📋. Built on the solid foundation of the [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)[1], which we found to be of high quality and fluent in Vietnamese, `EraX-VL-2B` has been fine-tuned to enhance its performance. We plan to continue improving and releasing new versions for free, along with sharing performance benchmarks in the near future.
 
35
 
36
+ One standing-out feature of **EraX-VL-2B-V1.5** is the capability to do multi-turn Q&A with reasonable reasoning capability!
37
 
38
+ ***NOTA BENE***: EraX-VL-2B-V1.5 is NOT a typical OCR-only tool likes Tesseract but is a Multimodal LLM-based model. To use it effectively, you may have to **twist your prompt carefully** depending on your tasks.
 
39
 
40
+ **EraX-VL-2B-V1.5** is a young member of our **EraX's LànhGPT** collection of LLM models.
41
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  - **Model type:** Multimodal Transformer with over 2B parameters
43
  - **Languages (NLP):** Primarily Vietnamese with multilingual capabilities
44
  - **License:** Apache 2.0
 
55
  <td><b>VI-MTVQA</b></td>
56
  </tr>
57
  <tr>
58
+ <th align="left">EraX-VL-7B-V1.5 🥇 </th>
59
  <td align="middle">✘</td>
60
  <td>47.2 </td>
61
  </tr>
 
70
  <td>39.1 </td>
71
  </tr>
72
  <tr>
73
+ <th align="left"><font color=darkred>EraX-VL-2B-V1.5</font></th>
74
  <td align="middle"> ✅ </td>
75
  <td>38.2 </td>
76
  </tr>
 
345
  print(output_text[0])
346
  ```
347
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
348
  ## References 📑
349
  [1] Qwen team. Qwen2-VL. 2024.
350