Update README.md
Browse files
README.md
CHANGED
@@ -5,12 +5,12 @@ language:
|
|
5 |
- en
|
6 |
- zh
|
7 |
base_model:
|
8 |
-
- Qwen/Qwen2-VL-
|
9 |
library_name: transformers
|
10 |
tags:
|
11 |
- erax
|
12 |
- multimodal
|
13 |
-
- erax-vl-
|
14 |
- insurance
|
15 |
- ocr
|
16 |
- vietnamese
|
@@ -28,38 +28,36 @@ widget:
|
|
28 |
</p>
|
29 |
|
30 |
|
31 |
-
# EraX-VL-
|
32 |
## Introduction 🎉
|
33 |
|
34 |
<!-- <p style="text-align: justify;">
|
35 |
-
We are excited to introduce **EraX-VL-
|
36 |
|
37 |
-
One standing-out feature of **EraX-VL-
|
38 |
|
39 |
-
**EraX-VL-
|
40 |
</p> -->
|
41 |
|
42 |
-
**WE ARE MOVING to <a href="https://huggingface.co/erax-ai/EraX-VL-
|
43 |
|
44 |
-
We are excited to introduce **EraX-VL-
|
45 |
|
46 |
-
One standing-out feature of **EraX-VL-
|
47 |
|
48 |
-
***NOTA BENE***: EraX-VL-
|
49 |
|
50 |
-
**EraX-VL-
|
51 |
|
52 |
- **Developed by:**
|
53 |
- Nguyễn Anh Nguyên ([email protected])
|
54 |
- Nguyễn Hồ Nam (BCG)
|
55 |
-
- Hoàng Tiến Dũng ([email protected])
|
56 |
-
- Phạm Huỳnh Nhật ([email protected])
|
57 |
- Phạm Đình Thục ([email protected])
|
58 |
- **Funded by:** [Bamboo Capital Group](https://bamboocap.com.vn) and EraX
|
59 |
-
- **Model type:** Multimodal Transformer with over
|
60 |
- **Languages (NLP):** Primarily Vietnamese with multilingual capabilities
|
61 |
- **License:** Apache 2.0
|
62 |
-
- **Fine-tuned from:** [Qwen/Qwen2-VL-
|
63 |
|
64 |
## Benchmarks 📊
|
65 |
|
@@ -273,7 +271,7 @@ python -m pip install qwen-vl-utils
|
|
273 |
pip install flash-attn --no-build-isolation
|
274 |
```
|
275 |
|
276 |
-
Then you can use `EraX-VL-
|
277 |
```python
|
278 |
import os
|
279 |
import base64
|
@@ -287,7 +285,7 @@ import torch
|
|
287 |
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
|
288 |
from qwen_vl_utils import process_vision_info
|
289 |
|
290 |
-
model_path = "erax/EraX-VL-
|
291 |
|
292 |
model = Qwen2VLForConditionalGeneration.from_pretrained(
|
293 |
model_path,
|
@@ -371,23 +369,23 @@ print(output_text[0])
|
|
371 |
```
|
372 |
|
373 |
## Acknowledgments 👏
|
374 |
-
We thank Khang Đoàn ([5CD-AI](https://huggingface.co/5CD-AI)) for his invaluable support in order to train `EraX-VL-
|
375 |
|
376 |
## Citation 📝
|
377 |
-
<!-- - title={EraX-VL-
|
378 |
- author={Nguyễn Anh Nguyên and Nguyễn Hồ Nam (BCG) and Dũng Hoàng and Thục Phạm and Nhật Phạm},
|
379 |
- helpers={Khang Đoàn and AAA JS Company},
|
380 |
- contact={[email protected]},
|
381 |
- organization={EraX} -->
|
382 |
If you find our project useful, we would appreciate it if you could star our repository and cite our work as follows:
|
383 |
```
|
384 |
-
@article{EraX-VL-
|
385 |
-
title={EraX-VL-
|
386 |
author={Nguyễn Anh Nguyên and Nguyễn Hồ Nam (BCG) and Hoàng Tiến Dũng and Phạm Đình Thục and Phạm Huỳnh Nhật},
|
387 |
organization={EraX},
|
388 |
year={2024},
|
389 |
-
url={https://huggingface.co/erax-ai/EraX-VL-
|
390 |
-
github={https://github.com/EraX-JS-Company/
|
391 |
}
|
392 |
```
|
393 |
|
@@ -407,4 +405,4 @@ If you find our project useful, we would appreciate it if you could star our rep
|
|
407 |
|
408 |
## Contact 🤝
|
409 |
- For correspondence regarding this work or inquiry for API trial, please contact Nguyễn Anh Nguyên at [[email protected]]([email protected]).
|
410 |
-
- Follow us on <b><a href="https://github.com/EraX-JS-Company/
|
|
|
5 |
- en
|
6 |
- zh
|
7 |
base_model:
|
8 |
+
- Qwen/Qwen2-VL-2B-Instruct
|
9 |
library_name: transformers
|
10 |
tags:
|
11 |
- erax
|
12 |
- multimodal
|
13 |
+
- erax-vl-2B
|
14 |
- insurance
|
15 |
- ocr
|
16 |
- vietnamese
|
|
|
28 |
</p>
|
29 |
|
30 |
|
31 |
+
# EraX-VL-2B-V2.0
|
32 |
## Introduction 🎉
|
33 |
|
34 |
<!-- <p style="text-align: justify;">
|
35 |
+
We are excited to introduce **EraX-VL-2B-V2.0**, a robust multimodal model for **OCR (optical character recognition)** and **VQA (visual question-answering)** that excels in various languages 🌍, with a particular focus on Vietnamese 🇻🇳. The `EraX-VL-2B` model stands out for its precise recognition capabilities across a range of documents 📝, including medical forms 🩺, invoices 🧾, bills of sale 💳, quotes 📄, and medical records 💊. This functionality is expected to be highly beneficial for hospitals 🏥, clinics 💉, insurance companies 🛡️, and other similar applications 📋. Built on the solid foundation of the [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)[1], which we found to be of high quality and fluent in Vietnamese, `EraX-VL-2B` has been fine-tuned to enhance its performance. We plan to continue improving and releasing new versions for free, along with sharing performance benchmarks in the near future.
|
36 |
|
37 |
+
One standing-out feature of **EraX-VL-2B-V2.0** is the capability to do multi-turn Q&A with pretty good reasoning! Thanks for the size of 7+ billions parameters of base model.
|
38 |
|
39 |
+
**EraX-VL-2B-V2.0** is a young member of our **EraX's LànhGPT** collection of LLM models.
|
40 |
</p> -->
|
41 |
|
42 |
+
**WE ARE MOVING to <a href="https://huggingface.co/erax-ai/EraX-VL-2B-V2.0/" target="_blank">EraX-AI</a> repository from 22 December 2024. Follow up so you do not miss great news coming up.**
|
43 |
|
44 |
+
We are excited to introduce **EraX-VL-2B-V2.0**, a robust multimodal model for **OCR (optical character recognition)** and **VQA (visual question-answering)** that excels in various languages 🌍, with a particular focus on Vietnamese 🇻🇳. The `EraX-VL-2B` model stands out for its precise recognition capabilities across a range of documents 📝, including medical forms 🩺, invoices 🧾, bills of sale 💳, quotes 📄, and medical records 💊. This functionality is expected to be highly beneficial for hospitals 🏥, clinics 💉, insurance companies 🛡️, and other similar applications 📋. Built on the solid foundation of the [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)[1], which we found to be of high quality and fluent in Vietnamese, `EraX-VL-2B` has been fine-tuned to enhance its performance. We plan to continue improving and releasing new versions for free, along with sharing performance benchmarks in the near future.
|
45 |
|
46 |
+
One standing-out feature of **EraX-VL-2B-V2.0** is the capability to do multi-turn Q&A with pretty good reasoning! Thanks for the size of 2+ billions parameters of base model.
|
47 |
|
48 |
+
***NOTA BENE***: EraX-VL-2B-V2.0 is NOT a typical OCR-only tool likes Tesseract but is a Multimodal LLM-based model. To use it effectively, you may have to **twist your prompt carefully** depending on your tasks.
|
49 |
|
50 |
+
**EraX-VL-2B-V2.0** is a young member of our **EraX's LànhGPT** collection of LLM models.
|
51 |
|
52 |
- **Developed by:**
|
53 |
- Nguyễn Anh Nguyên ([email protected])
|
54 |
- Nguyễn Hồ Nam (BCG)
|
|
|
|
|
55 |
- Phạm Đình Thục ([email protected])
|
56 |
- **Funded by:** [Bamboo Capital Group](https://bamboocap.com.vn) and EraX
|
57 |
+
- **Model type:** Multimodal Transformer with over 2B parameters
|
58 |
- **Languages (NLP):** Primarily Vietnamese with multilingual capabilities
|
59 |
- **License:** Apache 2.0
|
60 |
+
- **Fine-tuned from:** [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)
|
61 |
|
62 |
## Benchmarks 📊
|
63 |
|
|
|
271 |
pip install flash-attn --no-build-isolation
|
272 |
```
|
273 |
|
274 |
+
Then you can use `EraX-VL-2B-V2.0` like this:
|
275 |
```python
|
276 |
import os
|
277 |
import base64
|
|
|
285 |
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
|
286 |
from qwen_vl_utils import process_vision_info
|
287 |
|
288 |
+
model_path = "erax/EraX-VL-2B-V2.0"
|
289 |
|
290 |
model = Qwen2VLForConditionalGeneration.from_pretrained(
|
291 |
model_path,
|
|
|
369 |
```
|
370 |
|
371 |
## Acknowledgments 👏
|
372 |
+
We thank Khang Đoàn ([5CD-AI](https://huggingface.co/5CD-AI)) for his invaluable support in order to train `EraX-VL-2B-V2.0`. Our appreciation also goes to AAA JS Company for their support and resources, which significantly contributed to this project.
|
373 |
|
374 |
## Citation 📝
|
375 |
+
<!-- - title={EraX-VL-2B-V2.0: A Highly Efficient Multimodal LLM for Vietnamese, especially for medical forms and bills.},
|
376 |
- author={Nguyễn Anh Nguyên and Nguyễn Hồ Nam (BCG) and Dũng Hoàng and Thục Phạm and Nhật Phạm},
|
377 |
- helpers={Khang Đoàn and AAA JS Company},
|
378 |
- contact={[email protected]},
|
379 |
- organization={EraX} -->
|
380 |
If you find our project useful, we would appreciate it if you could star our repository and cite our work as follows:
|
381 |
```
|
382 |
+
@article{EraX-VL-2B-V2.0,
|
383 |
+
title={EraX-VL-2B-V2.0: A Highly Efficient Multimodal LLM for Vietnamese, especially for medical forms and bills},
|
384 |
author={Nguyễn Anh Nguyên and Nguyễn Hồ Nam (BCG) and Hoàng Tiến Dũng and Phạm Đình Thục and Phạm Huỳnh Nhật},
|
385 |
organization={EraX},
|
386 |
year={2024},
|
387 |
+
url={https://huggingface.co/erax-ai/EraX-VL-2B-V2.0},
|
388 |
+
github={https://github.com/EraX-JS-Company/EraX-VL-2B-V2.0/}
|
389 |
}
|
390 |
```
|
391 |
|
|
|
405 |
|
406 |
## Contact 🤝
|
407 |
- For correspondence regarding this work or inquiry for API trial, please contact Nguyễn Anh Nguyên at [[email protected]]([email protected]).
|
408 |
+
- Follow us on <b><a href="https://github.com/EraX-JS-Company/EraX-VL-2B-V2.0/" target="_blank">EraX Github</a></b>
|