Add link to paper page

#30
by nielsr HF staff - opened
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -1,12 +1,11 @@
1
-
2
  ---
3
- license: apache-2.0
4
  language:
5
  - en
 
 
6
  pipeline_tag: image-text-to-text
7
  tags:
8
  - multimodal
9
- library_name: transformers
10
  ---
11
 
12
  # Qwen2.5-VL-7B-Instruct
@@ -16,6 +15,8 @@ library_name: transformers
16
 
17
  ## Introduction
18
 
 
 
19
  In the past five months since Qwen2-VL’s release, numerous developers have built new models on the Qwen2-VL vision-language models, providing us with valuable feedback. During this period, we focused on building more useful vision-language models. Today, we are excited to introduce the latest addition to the Qwen family: Qwen2.5-VL.
20
 
21
  #### Key Enhancements:
@@ -127,7 +128,7 @@ KeyError: 'qwen2_5_vl'
127
  We offer a toolkit to help you handle various types of visual input more conveniently, as if you were using an API. This includes base64, URLs, and interleaved images and videos. You can install it using the following command:
128
 
129
  ```bash
130
- # It's highly recommanded to use `[decord]` feature for faster video loading.
131
  pip install qwen-vl-utils[decord]==0.0.8
132
  ```
133
 
@@ -138,7 +139,7 @@ If you are not using Linux, you might not be able to install `decord` from PyPI.
138
  Here we show a code snippet to show you how to use the chat model with `transformers` and `qwen_vl_utils`:
139
 
140
  ```python
141
- from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
142
  from qwen_vl_utils import process_vision_info
143
 
144
  # default: Load the model on the available device(s)
@@ -384,7 +385,6 @@ print(output_texts)
384
  ### 🤖 ModelScope
385
  We strongly advise users especially those in mainland China to use ModelScope. `snapshot_download` can help you solve issues concerning downloading checkpoints.
386
 
387
-
388
  ### More Usage Tips
389
 
390
  For input images, we support local files, base64, and URLs. For videos, we currently only support local files.
@@ -525,4 +525,4 @@ If you find our work helpful, feel free to give us a cite.
525
  journal={arXiv preprint arXiv:2308.12966},
526
  year={2023}
527
  }
528
- ```
 
 
1
  ---
 
2
  language:
3
  - en
4
+ library_name: transformers
5
+ license: apache-2.0
6
  pipeline_tag: image-text-to-text
7
  tags:
8
  - multimodal
 
9
  ---
10
 
11
  # Qwen2.5-VL-7B-Instruct
 
15
 
16
  ## Introduction
17
 
18
+ This repository contains the model as described in [Qwen2.5-VL Technical Report](https://arxiv.org/abs/2502.13923).
19
+
20
  In the past five months since Qwen2-VL’s release, numerous developers have built new models on the Qwen2-VL vision-language models, providing us with valuable feedback. During this period, we focused on building more useful vision-language models. Today, we are excited to introduce the latest addition to the Qwen family: Qwen2.5-VL.
21
 
22
  #### Key Enhancements:
 
128
  We offer a toolkit to help you handle various types of visual input more conveniently, as if you were using an API. This includes base64, URLs, and interleaved images and videos. You can install it using the following command:
129
 
130
  ```bash
131
+ # It's highly recommended to use `[decord]` feature for faster video loading.
132
  pip install qwen-vl-utils[decord]==0.0.8
133
  ```
134
 
 
139
  Here we show a code snippet to show you how to use the chat model with `transformers` and `qwen_vl_utils`:
140
 
141
  ```python
142
+ from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
143
  from qwen_vl_utils import process_vision_info
144
 
145
  # default: Load the model on the available device(s)
 
385
  ### 🤖 ModelScope
386
  We strongly advise users especially those in mainland China to use ModelScope. `snapshot_download` can help you solve issues concerning downloading checkpoints.
387
 
 
388
  ### More Usage Tips
389
 
390
  For input images, we support local files, base64, and URLs. For videos, we currently only support local files.
 
525
  journal={arXiv preprint arXiv:2308.12966},
526
  year={2023}
527
  }
528
+ ```