Update README.md
Browse files
README.md
CHANGED
@@ -2,9 +2,11 @@
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
|
|
5 |
# syntheticbot/Qwen-VL-7B-ocr
|
6 |
|
7 |
|
|
|
8 |
## Introduction
|
9 |
|
10 |
syntheticbot/Qwen-VL-7B-ocr is a fine-tuned model for Optical Character Recognition (OCR) tasks, derived from the base model [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct). This model is engineered for high accuracy in extracting text from images, including documents and scenes containing text.
|
@@ -30,22 +32,6 @@ syntheticbot/Qwen-VL-7B-ocr is a fine-tuned model for Optical Character Recognit
|
|
30 |
This repository provides the instruction-tuned and OCR-optimized 7B Qwen-VL-7B-ocr model. For comprehensive details about the foundational model architecture, please refer to the [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) repository, as well as the [Blog](https://qwenlm.github.io/blog/qwen2.5-vl/) and [GitHub](https://github.com/QwenLM/Qwen2.5-VL) pages for Qwen2.5-VL.
|
31 |
|
32 |
|
33 |
-
## Evaluation
|
34 |
-
|
35 |
-
### OCR Benchmarks
|
36 |
-
|
37 |
-
| Benchmark | Qwen2-VL-7B |syntheticbot/Qwen-VL-7B-ocr | Improvement | Notes |
|
38 |
-
| :--- | :---: | :---: | :---: | :---: |
|
39 |
-
| DocVQA<sub>test</sub> | 94.5 | **96.5** | +2.0 | Document VQA, OCR accuracy relevant |
|
40 |
-
| InfoVQA<sub>test</sub> | 76.5 | **84.5** | +8.0 | Information seeking VQA, OCR accuracy crucial |
|
41 |
-
| ChartQA<sub>test</sub> | 83.0 | **89.0** | +6.0 | Chart understanding with text, OCR accuracy important |
|
42 |
-
| TextVQA<sub>val</sub> | 84.3 | **86.3** | +2.0 | Text-based VQA, direct OCR relevance |
|
43 |
-
| OCRBench | 845 | **885** | +40 | Direct OCR benchmark |
|
44 |
-
| CC_OCR | 61.6 | **81.8**| +20.2 | Chinese Character OCR benchmark |
|
45 |
-
| MMStar (Text Reading Focus) | 60.7| **65.9**| +5.2 | MMStar with focus on text reading tasks |
|
46 |
-
| **Average OCR-Related Score** | **77.8** | **84.9** | **+7.1** | Approximate average across OCR-focused benchmarks |
|
47 |
-
|
48 |
-
|
49 |
## Requirements
|
50 |
For optimal performance and access to OCR-specific features, it is recommended to build from source:
|
51 |
```
|
@@ -272,9 +258,7 @@ messages = [
|
|
272 |
```
|
273 |
|
274 |
|
275 |
-
## Citation
|
276 |
-
|
277 |
-
If you utilize syntheticbot/Qwen-VL-7B-ocr, please cite the base Qwen2.5-VL models:
|
278 |
|
279 |
```
|
280 |
@misc{qwen2.5-VL,
|
@@ -298,4 +282,4 @@ If you utilize syntheticbot/Qwen-VL-7B-ocr, please cite the base Qwen2.5-VL mode
|
|
298 |
journal={arXiv preprint arXiv:2308.12966},
|
299 |
year={2023}
|
300 |
}
|
301 |
-
```
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
+
|
6 |
# syntheticbot/Qwen-VL-7B-ocr
|
7 |
|
8 |
|
9 |
+
|
10 |
## Introduction
|
11 |
|
12 |
syntheticbot/Qwen-VL-7B-ocr is a fine-tuned model for Optical Character Recognition (OCR) tasks, derived from the base model [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct). This model is engineered for high accuracy in extracting text from images, including documents and scenes containing text.
|
|
|
32 |
This repository provides the instruction-tuned and OCR-optimized 7B Qwen-VL-7B-ocr model. For comprehensive details about the foundational model architecture, please refer to the [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) repository, as well as the [Blog](https://qwenlm.github.io/blog/qwen2.5-vl/) and [GitHub](https://github.com/QwenLM/Qwen2.5-VL) pages for Qwen2.5-VL.
|
33 |
|
34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
## Requirements
|
36 |
For optimal performance and access to OCR-specific features, it is recommended to build from source:
|
37 |
```
|
|
|
258 |
```
|
259 |
|
260 |
|
261 |
+
## Citation from base model
|
|
|
|
|
262 |
|
263 |
```
|
264 |
@misc{qwen2.5-VL,
|
|
|
282 |
journal={arXiv preprint arXiv:2308.12966},
|
283 |
year={2023}
|
284 |
}
|
285 |
+
```
|