ThucPD commited on
Commit
88c124a
·
verified ·
1 Parent(s): 66c4470

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -24
README.md CHANGED
@@ -5,12 +5,12 @@ language:
5
  - en
6
  - zh
7
  base_model:
8
- - Qwen/Qwen2-VL-7B-Instruct
9
  library_name: transformers
10
  tags:
11
  - erax
12
  - multimodal
13
- - erax-vl-7b
14
  - insurance
15
  - ocr
16
  - vietnamese
@@ -28,38 +28,36 @@ widget:
28
  </p>
29
 
30
 
31
- # EraX-VL-7B-V1
32
  ## Introduction 🎉
33
 
34
  <!-- <p style="text-align: justify;">
35
- We are excited to introduce **EraX-VL-7B-v1**, a robust multimodal model for **OCR (optical character recognition)** and **VQA (visual question-answering)** that excels in various languages 🌍, with a particular focus on Vietnamese 🇻🇳. The `EraX-VL-7B` model stands out for its precise recognition capabilities across a range of documents 📝, including medical forms 🩺, invoices 🧾, bills of sale 💳, quotes 📄, and medical records 💊. This functionality is expected to be highly beneficial for hospitals 🏥, clinics 💉, insurance companies 🛡️, and other similar applications 📋. Built on the solid foundation of the [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)[1], which we found to be of high quality and fluent in Vietnamese, `EraX-VL-7B` has been fine-tuned to enhance its performance. We plan to continue improving and releasing new versions for free, along with sharing performance benchmarks in the near future.
36
 
37
- One standing-out feature of **EraX-VL-7B-v1** is the capability to do multi-turn Q&A with pretty good reasoning! Thanks for the size of 7+ billions parameters of base model.
38
 
39
- **EraX-VL-7B-V1** is a young member of our **EraX's LànhGPT** collection of LLM models.
40
  </p> -->
41
 
42
- **WE ARE MOVING to <a href="https://huggingface.co/erax-ai/EraX-VL-7B-V1/" target="_blank">EraX-AI</a> repository from 22 October 2024. Follow up so you do not miss great news coming up.**
43
 
44
- We are excited to introduce **EraX-VL-7B-v1**, a robust multimodal model for **OCR (optical character recognition)** and **VQA (visual question-answering)** that excels in various languages 🌍, with a particular focus on Vietnamese 🇻🇳. The `EraX-VL-7B` model stands out for its precise recognition capabilities across a range of documents 📝, including medical forms 🩺, invoices 🧾, bills of sale 💳, quotes 📄, and medical records 💊. This functionality is expected to be highly beneficial for hospitals 🏥, clinics 💉, insurance companies 🛡️, and other similar applications 📋. Built on the solid foundation of the [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)[1], which we found to be of high quality and fluent in Vietnamese, `EraX-VL-7B` has been fine-tuned to enhance its performance. We plan to continue improving and releasing new versions for free, along with sharing performance benchmarks in the near future.
45
 
46
- One standing-out feature of **EraX-VL-7B-v1** is the capability to do multi-turn Q&A with pretty good reasoning! Thanks for the size of 7+ billions parameters of base model.
47
 
48
- ***NOTA BENE***: EraX-VL-7B-V1 is NOT a typical OCR-only tool likes Tesseract but is a Multimodal LLM-based model. To use it effectively, you may have to **twist your prompt carefully** depending on your tasks.
49
 
50
- **EraX-VL-7B-V1** is a young member of our **EraX's LànhGPT** collection of LLM models.
51
 
52
  - **Developed by:**
53
  - Nguyễn Anh Nguyên ([email protected])
54
  - Nguyễn Hồ Nam (BCG)
55
- - Hoàng Tiến Dũng ([email protected])
56
- - Phạm Huỳnh Nhật ([email protected])
57
  - Phạm Đình Thục ([email protected])
58
  - **Funded by:** [Bamboo Capital Group](https://bamboocap.com.vn) and EraX
59
- - **Model type:** Multimodal Transformer with over 7B parameters
60
  - **Languages (NLP):** Primarily Vietnamese with multilingual capabilities
61
  - **License:** Apache 2.0
62
- - **Fine-tuned from:** [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
63
 
64
  ## Benchmarks 📊
65
 
@@ -273,7 +271,7 @@ python -m pip install qwen-vl-utils
273
  pip install flash-attn --no-build-isolation
274
  ```
275
 
276
- Then you can use `EraX-VL-7B-V1` like this:
277
  ```python
278
  import os
279
  import base64
@@ -287,7 +285,7 @@ import torch
287
  from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
288
  from qwen_vl_utils import process_vision_info
289
 
290
- model_path = "erax/EraX-VL-7B-V1"
291
 
292
  model = Qwen2VLForConditionalGeneration.from_pretrained(
293
  model_path,
@@ -371,23 +369,23 @@ print(output_text[0])
371
  ```
372
 
373
  ## Acknowledgments 👏
374
- We thank Khang Đoàn ([5CD-AI](https://huggingface.co/5CD-AI)) for his invaluable support in order to train `EraX-VL-7B-V1`. Our appreciation also goes to AAA JS Company for their support and resources, which significantly contributed to this project.
375
 
376
  ## Citation 📝
377
- <!-- - title={EraX-VL-7B-V1: A Highly Efficient Multimodal LLM for Vietnamese, especially for medical forms and bills.},
378
  - author={Nguyễn Anh Nguyên and Nguyễn Hồ Nam (BCG) and Dũng Hoàng and Thục Phạm and Nhật Phạm},
379
  - helpers={Khang Đoàn and AAA JS Company},
380
  - contact={[email protected]},
381
  - organization={EraX} -->
382
  If you find our project useful, we would appreciate it if you could star our repository and cite our work as follows:
383
  ```
384
- @article{EraX-VL-7B-V1,
385
- title={EraX-VL-7B-V1: A Highly Efficient Multimodal LLM for Vietnamese, especially for medical forms and bills},
386
  author={Nguyễn Anh Nguyên and Nguyễn Hồ Nam (BCG) and Hoàng Tiến Dũng and Phạm Đình Thục and Phạm Huỳnh Nhật},
387
  organization={EraX},
388
  year={2024},
389
- url={https://huggingface.co/erax-ai/EraX-VL-7B-V1},
390
- github={https://github.com/EraX-JS-Company/erax-vl-7b-v1/}
391
  }
392
  ```
393
 
@@ -407,4 +405,4 @@ If you find our project useful, we would appreciate it if you could star our rep
407
 
408
  ## Contact 🤝
409
  - For correspondence regarding this work or inquiry for API trial, please contact Nguyễn Anh Nguyên at [[email protected]]([email protected]).
410
- - Follow us on <b><a href="https://github.com/EraX-JS-Company/erax-vl-7b-v1/" target="_blank">EraX Github</a></b>
 
5
  - en
6
  - zh
7
  base_model:
8
+ - Qwen/Qwen2-VL-2B-Instruct
9
  library_name: transformers
10
  tags:
11
  - erax
12
  - multimodal
13
+ - erax-vl-2B
14
  - insurance
15
  - ocr
16
  - vietnamese
 
28
  </p>
29
 
30
 
31
+ # EraX-VL-2B-V2.0
32
  ## Introduction 🎉
33
 
34
  <!-- <p style="text-align: justify;">
35
+ We are excited to introduce **EraX-VL-2B-V2.0**, a robust multimodal model for **OCR (optical character recognition)** and **VQA (visual question-answering)** that excels in various languages 🌍, with a particular focus on Vietnamese 🇻🇳. The `EraX-VL-2B` model stands out for its precise recognition capabilities across a range of documents 📝, including medical forms 🩺, invoices 🧾, bills of sale 💳, quotes 📄, and medical records 💊. This functionality is expected to be highly beneficial for hospitals 🏥, clinics 💉, insurance companies 🛡️, and other similar applications 📋. Built on the solid foundation of the [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)[1], which we found to be of high quality and fluent in Vietnamese, `EraX-VL-2B` has been fine-tuned to enhance its performance. We plan to continue improving and releasing new versions for free, along with sharing performance benchmarks in the near future.
36
 
37
+ One standing-out feature of **EraX-VL-2B-V2.0** is the capability to do multi-turn Q&A with pretty good reasoning! Thanks for the size of 7+ billions parameters of base model.
38
 
39
+ **EraX-VL-2B-V2.0** is a young member of our **EraX's LànhGPT** collection of LLM models.
40
  </p> -->
41
 
42
+ **WE ARE MOVING to <a href="https://huggingface.co/erax-ai/EraX-VL-2B-V2.0/" target="_blank">EraX-AI</a> repository from 22 December 2024. Follow up so you do not miss great news coming up.**
43
 
44
+ We are excited to introduce **EraX-VL-2B-V2.0**, a robust multimodal model for **OCR (optical character recognition)** and **VQA (visual question-answering)** that excels in various languages 🌍, with a particular focus on Vietnamese 🇻🇳. The `EraX-VL-2B` model stands out for its precise recognition capabilities across a range of documents 📝, including medical forms 🩺, invoices 🧾, bills of sale 💳, quotes 📄, and medical records 💊. This functionality is expected to be highly beneficial for hospitals 🏥, clinics 💉, insurance companies 🛡️, and other similar applications 📋. Built on the solid foundation of the [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)[1], which we found to be of high quality and fluent in Vietnamese, `EraX-VL-2B` has been fine-tuned to enhance its performance. We plan to continue improving and releasing new versions for free, along with sharing performance benchmarks in the near future.
45
 
46
+ One standing-out feature of **EraX-VL-2B-V2.0** is the capability to do multi-turn Q&A with pretty good reasoning! Thanks for the size of 2+ billions parameters of base model.
47
 
48
+ ***NOTA BENE***: EraX-VL-2B-V2.0 is NOT a typical OCR-only tool likes Tesseract but is a Multimodal LLM-based model. To use it effectively, you may have to **twist your prompt carefully** depending on your tasks.
49
 
50
+ **EraX-VL-2B-V2.0** is a young member of our **EraX's LànhGPT** collection of LLM models.
51
 
52
  - **Developed by:**
53
  - Nguyễn Anh Nguyên ([email protected])
54
  - Nguyễn Hồ Nam (BCG)
 
 
55
  - Phạm Đình Thục ([email protected])
56
  - **Funded by:** [Bamboo Capital Group](https://bamboocap.com.vn) and EraX
57
+ - **Model type:** Multimodal Transformer with over 2B parameters
58
  - **Languages (NLP):** Primarily Vietnamese with multilingual capabilities
59
  - **License:** Apache 2.0
60
+ - **Fine-tuned from:** [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)
61
 
62
  ## Benchmarks 📊
63
 
 
271
  pip install flash-attn --no-build-isolation
272
  ```
273
 
274
+ Then you can use `EraX-VL-2B-V2.0` like this:
275
  ```python
276
  import os
277
  import base64
 
285
  from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
286
  from qwen_vl_utils import process_vision_info
287
 
288
+ model_path = "erax/EraX-VL-2B-V2.0"
289
 
290
  model = Qwen2VLForConditionalGeneration.from_pretrained(
291
  model_path,
 
369
  ```
370
 
371
  ## Acknowledgments 👏
372
+ We thank Khang Đoàn ([5CD-AI](https://huggingface.co/5CD-AI)) for his invaluable support in order to train `EraX-VL-2B-V2.0`. Our appreciation also goes to AAA JS Company for their support and resources, which significantly contributed to this project.
373
 
374
  ## Citation 📝
375
+ <!-- - title={EraX-VL-2B-V2.0: A Highly Efficient Multimodal LLM for Vietnamese, especially for medical forms and bills.},
376
  - author={Nguyễn Anh Nguyên and Nguyễn Hồ Nam (BCG) and Dũng Hoàng and Thục Phạm and Nhật Phạm},
377
  - helpers={Khang Đoàn and AAA JS Company},
378
  - contact={[email protected]},
379
  - organization={EraX} -->
380
  If you find our project useful, we would appreciate it if you could star our repository and cite our work as follows:
381
  ```
382
+ @article{EraX-VL-2B-V2.0,
383
+ title={EraX-VL-2B-V2.0: A Highly Efficient Multimodal LLM for Vietnamese, especially for medical forms and bills},
384
  author={Nguyễn Anh Nguyên and Nguyễn Hồ Nam (BCG) and Hoàng Tiến Dũng and Phạm Đình Thục and Phạm Huỳnh Nhật},
385
  organization={EraX},
386
  year={2024},
387
+ url={https://huggingface.co/erax-ai/EraX-VL-2B-V2.0},
388
+ github={https://github.com/EraX-JS-Company/EraX-VL-2B-V2.0/}
389
  }
390
  ```
391
 
 
405
 
406
  ## Contact 🤝
407
  - For correspondence regarding this work or inquiry for API trial, please contact Nguyễn Anh Nguyên at [[email protected]]([email protected]).
408
+ - Follow us on <b><a href="https://github.com/EraX-JS-Company/EraX-VL-2B-V2.0/" target="_blank">EraX Github</a></b>