Update README.md
Browse files
README.md
CHANGED
@@ -31,13 +31,13 @@ widget:
|
|
31 |
# EraX-VL-2B-V1.5
|
32 |
## Introduction π
|
33 |
|
34 |
-
We are excited to introduce **EraX-VL-2B-V1.5**, a robust multimodal model for **OCR (optical character recognition)** and **VQA (visual question-answering)** that excels in various languages π, with a particular focus on Vietnamese
|
35 |
|
36 |
-
One standing-out feature of **EraX-VL-2B-V1.5** is the capability to do multi-turn Q&A with reasonable reasoning capability
|
37 |
|
38 |
***NOTA BENE***: EraX-VL-2B-V1.5 is NOT a typical OCR-only tool likes Tesseract but is a Multimodal LLM-based model. To use it effectively, you may have to **twist your prompt carefully** depending on your tasks.
|
39 |
|
40 |
-
**EraX-VL-2B-V1.5** is a young member of our **EraX's LΓ nhGPT** collection of LLM models.
|
41 |
|
42 |
- **Model type:** Multimodal Transformer with over 2B parameters
|
43 |
- **Languages (NLP):** Primarily Vietnamese with multilingual capabilities
|
@@ -56,7 +56,7 @@ One standing-out feature of **EraX-VL-2B-V1.5** is the capability to do multi-tu
|
|
56 |
</tr>
|
57 |
<tr>
|
58 |
<th align="middle">EraX-VL-7B-V1.5 π₯ </th>
|
59 |
-
<td align="middle"
|
60 |
<td align="middle">47.2 </td>
|
61 |
</tr>
|
62 |
<tr>
|
|
|
31 |
# EraX-VL-2B-V1.5
|
32 |
## Introduction π
|
33 |
|
34 |
+
We are excited to introduce **EraX-VL-2B-V1.5**, a robust multimodal model for **OCR (optical character recognition)** and **VQA (visual question-answering)** that excels in various languages π, with a particular focus on **Vietnamese π»π³**. The `EraX-VL-2B` model stands out for its precise recognition capabilities across a range of documents π, including medical forms π©Ί, invoices π§Ύ, bills of sale π³, quotes π, and medical records π. This functionality is expected to be highly beneficial for hospitals π₯, clinics π, insurance companies π‘οΈ, and other similar applications π. Built on the solid foundation of the [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)[1], which we found to be of high quality and fluent in Vietnamese, `EraX-VL-2B` has been fine-tuned to enhance its performance. We plan to continue improving and releasing new versions for free, along with sharing performance benchmarks in the near future.
|
35 |
|
36 |
+
One standing-out feature of **EraX-VL-2B-V1.5** is the capability to do multi-turn Q&A with reasonable reasoning capability at its small size of only +2 billions parameters.
|
37 |
|
38 |
***NOTA BENE***: EraX-VL-2B-V1.5 is NOT a typical OCR-only tool likes Tesseract but is a Multimodal LLM-based model. To use it effectively, you may have to **twist your prompt carefully** depending on your tasks.
|
39 |
|
40 |
+
**EraX-VL-2B-V1.5** is a young and tiny member of our **EraX's LΓ nhGPT** collection of LLM models.
|
41 |
|
42 |
- **Model type:** Multimodal Transformer with over 2B parameters
|
43 |
- **Languages (NLP):** Primarily Vietnamese with multilingual capabilities
|
|
|
56 |
</tr>
|
57 |
<tr>
|
58 |
<th align="middle">EraX-VL-7B-V1.5 π₯ </th>
|
59 |
+
<td align="middle">(soon))</td>
|
60 |
<td align="middle">47.2 </td>
|
61 |
</tr>
|
62 |
<tr>
|