syntheticbot
/

ocr-qwen

Safetensors

qwen2_5_vl

Model card Files Files and versions Community

syntheticbot commited on 18 days ago

Commit

6a07280

verified ·

1 Parent(s): dfc7fce

Update README.md

Browse files

Files changed (1) hide show

README.md +4 -20

README.md CHANGED Viewed

@@ -2,9 +2,11 @@
 license: apache-2.0
 ---
 # syntheticbot/Qwen-VL-7B-ocr
 ## Introduction
 syntheticbot/Qwen-VL-7B-ocr is a fine-tuned model for Optical Character Recognition (OCR) tasks, derived from the base model [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct). This model is engineered for high accuracy in extracting text from images, including documents and scenes containing text.
@@ -30,22 +32,6 @@ syntheticbot/Qwen-VL-7B-ocr is a fine-tuned model for Optical Character Recognit
 This repository provides the instruction-tuned and OCR-optimized 7B Qwen-VL-7B-ocr model.  For comprehensive details about the foundational model architecture, please refer to the [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) repository, as well as the [Blog](https://qwenlm.github.io/blog/qwen2.5-vl/) and [GitHub](https://github.com/QwenLM/Qwen2.5-VL) pages for Qwen2.5-VL.
-## Evaluation
-### OCR Benchmarks
-| Benchmark | Qwen2-VL-7B |syntheticbot/Qwen-VL-7B-ocr | Improvement | Notes |
-| :--- | :---: | :---: | :---: | :---: |
-| DocVQA<sub>test</sub>  | 94.5 | **96.5** | +2.0 |  Document VQA, OCR accuracy relevant |
-| InfoVQA<sub>test</sub>  | 76.5 | **84.5** | +8.0 | Information seeking VQA, OCR accuracy crucial |
-| ChartQA<sub>test</sub>  | 83.0 | **89.0** | +6.0 |  Chart understanding with text, OCR accuracy important |
-| TextVQA<sub>val</sub>  | 84.3 | **86.3** | +2.0 | Text-based VQA, direct OCR relevance |
-| OCRBench | 845 | **885** | +40 |  Direct OCR benchmark |
-| CC_OCR | 61.6 | **81.8**| +20.2 | Chinese Character OCR benchmark |
-| MMStar (Text Reading Focus) | 60.7| **65.9**| +5.2 | MMStar with focus on text reading tasks |
-| **Average OCR-Related Score** | **77.8** | **84.9** | **+7.1** | Approximate average across OCR-focused benchmarks |
 ## Requirements
 For optimal performance and access to OCR-specific features, it is recommended to build from source:
 ```
@@ -272,9 +258,7 @@ messages = [
 ```
-## Citation
-If you utilize syntheticbot/Qwen-VL-7B-ocr, please cite the base Qwen2.5-VL models:
 ```
 @misc{qwen2.5-VL,
@@ -298,4 +282,4 @@ If you utilize syntheticbot/Qwen-VL-7B-ocr, please cite the base Qwen2.5-VL mode
   journal={arXiv preprint arXiv:2308.12966},
   year={2023}
 }
-```

 license: apache-2.0
 ---
 # syntheticbot/Qwen-VL-7B-ocr
 ## Introduction
 syntheticbot/Qwen-VL-7B-ocr is a fine-tuned model for Optical Character Recognition (OCR) tasks, derived from the base model [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct). This model is engineered for high accuracy in extracting text from images, including documents and scenes containing text.
 This repository provides the instruction-tuned and OCR-optimized 7B Qwen-VL-7B-ocr model.  For comprehensive details about the foundational model architecture, please refer to the [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) repository, as well as the [Blog](https://qwenlm.github.io/blog/qwen2.5-vl/) and [GitHub](https://github.com/QwenLM/Qwen2.5-VL) pages for Qwen2.5-VL.
 ## Requirements
 For optimal performance and access to OCR-specific features, it is recommended to build from source:
 ```
 ```
+## Citation from base model
 ```
 @misc{qwen2.5-VL,
   journal={arXiv preprint arXiv:2308.12966},
   year={2023}
 }
+```