Safetensors
qwen2_5_vl
syntheticbot commited on
Commit
6a07280
·
verified ·
1 Parent(s): dfc7fce

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -20
README.md CHANGED
@@ -2,9 +2,11 @@
2
  license: apache-2.0
3
  ---
4
 
 
5
  # syntheticbot/Qwen-VL-7B-ocr
6
 
7
 
 
8
  ## Introduction
9
 
10
  syntheticbot/Qwen-VL-7B-ocr is a fine-tuned model for Optical Character Recognition (OCR) tasks, derived from the base model [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct). This model is engineered for high accuracy in extracting text from images, including documents and scenes containing text.
@@ -30,22 +32,6 @@ syntheticbot/Qwen-VL-7B-ocr is a fine-tuned model for Optical Character Recognit
30
  This repository provides the instruction-tuned and OCR-optimized 7B Qwen-VL-7B-ocr model. For comprehensive details about the foundational model architecture, please refer to the [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) repository, as well as the [Blog](https://qwenlm.github.io/blog/qwen2.5-vl/) and [GitHub](https://github.com/QwenLM/Qwen2.5-VL) pages for Qwen2.5-VL.
31
 
32
 
33
- ## Evaluation
34
-
35
- ### OCR Benchmarks
36
-
37
- | Benchmark | Qwen2-VL-7B |syntheticbot/Qwen-VL-7B-ocr | Improvement | Notes |
38
- | :--- | :---: | :---: | :---: | :---: |
39
- | DocVQA<sub>test</sub> | 94.5 | **96.5** | +2.0 | Document VQA, OCR accuracy relevant |
40
- | InfoVQA<sub>test</sub> | 76.5 | **84.5** | +8.0 | Information seeking VQA, OCR accuracy crucial |
41
- | ChartQA<sub>test</sub> | 83.0 | **89.0** | +6.0 | Chart understanding with text, OCR accuracy important |
42
- | TextVQA<sub>val</sub> | 84.3 | **86.3** | +2.0 | Text-based VQA, direct OCR relevance |
43
- | OCRBench | 845 | **885** | +40 | Direct OCR benchmark |
44
- | CC_OCR | 61.6 | **81.8**| +20.2 | Chinese Character OCR benchmark |
45
- | MMStar (Text Reading Focus) | 60.7| **65.9**| +5.2 | MMStar with focus on text reading tasks |
46
- | **Average OCR-Related Score** | **77.8** | **84.9** | **+7.1** | Approximate average across OCR-focused benchmarks |
47
-
48
-
49
  ## Requirements
50
  For optimal performance and access to OCR-specific features, it is recommended to build from source:
51
  ```
@@ -272,9 +258,7 @@ messages = [
272
  ```
273
 
274
 
275
- ## Citation
276
-
277
- If you utilize syntheticbot/Qwen-VL-7B-ocr, please cite the base Qwen2.5-VL models:
278
 
279
  ```
280
  @misc{qwen2.5-VL,
@@ -298,4 +282,4 @@ If you utilize syntheticbot/Qwen-VL-7B-ocr, please cite the base Qwen2.5-VL mode
298
  journal={arXiv preprint arXiv:2308.12966},
299
  year={2023}
300
  }
301
- ```
 
2
  license: apache-2.0
3
  ---
4
 
5
+
6
  # syntheticbot/Qwen-VL-7B-ocr
7
 
8
 
9
+
10
  ## Introduction
11
 
12
  syntheticbot/Qwen-VL-7B-ocr is a fine-tuned model for Optical Character Recognition (OCR) tasks, derived from the base model [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct). This model is engineered for high accuracy in extracting text from images, including documents and scenes containing text.
 
32
  This repository provides the instruction-tuned and OCR-optimized 7B Qwen-VL-7B-ocr model. For comprehensive details about the foundational model architecture, please refer to the [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) repository, as well as the [Blog](https://qwenlm.github.io/blog/qwen2.5-vl/) and [GitHub](https://github.com/QwenLM/Qwen2.5-VL) pages for Qwen2.5-VL.
33
 
34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  ## Requirements
36
  For optimal performance and access to OCR-specific features, it is recommended to build from source:
37
  ```
 
258
  ```
259
 
260
 
261
+ ## Citation from base model
 
 
262
 
263
  ```
264
  @misc{qwen2.5-VL,
 
282
  journal={arXiv preprint arXiv:2308.12966},
283
  year={2023}
284
  }
285
+ ```