Triangle104/Qwen2-VL-7B-Instruct-Q5_K_S-GGUF

This model was converted to GGUF format from Qwen/Qwen2-VL-7B-Instruct using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model.


Model details:

We're excited to unveil Qwen2-VL, the latest iteration of our Qwen-VL model, representing nearly a year of innovation.

What’s New in Qwen2-VL?

Key Enhancements:

SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.

Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.

Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions.

Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.

Model Architecture Updates:

Naive Dynamic Resolution: Unlike before, Qwen2-VL can handle arbitrary image resolutions, mapping them into a dynamic number of visual tokens, offering a more human-like visual processing experience.

Multimodal Rotary Position Embedding (M-ROPE): Decomposes positional embedding into parts to capture 1D textual, 2D visual, and 3D video positional information, enhancing its multimodal processing capabilities.

We have three models with 2, 7 and 72 billion parameters. This repo contains the instruction-tuned 7B Qwen2-VL model. For more information, visit our Blog and GitHub.


Use with llama.cpp

Install llama.cpp through brew (works on Mac and Linux)

brew install llama.cpp

Invoke the llama.cpp server or the CLI.

CLI:

llama-cli --hf-repo Triangle104/Qwen2-VL-7B-Instruct-Q5_K_S-GGUF --hf-file qwen2-vl-7b-instruct-q5_k_s.gguf -p "The meaning to life and the universe is"

Server:

llama-server --hf-repo Triangle104/Qwen2-VL-7B-Instruct-Q5_K_S-GGUF --hf-file qwen2-vl-7b-instruct-q5_k_s.gguf -c 2048

Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.

git clone https://github.com/ggerganov/llama.cpp

Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1 flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).

cd llama.cpp && LLAMA_CURL=1 make

Step 3: Run inference through the main binary.

./llama-cli --hf-repo Triangle104/Qwen2-VL-7B-Instruct-Q5_K_S-GGUF --hf-file qwen2-vl-7b-instruct-q5_k_s.gguf -p "The meaning to life and the universe is"

or

./llama-server --hf-repo Triangle104/Qwen2-VL-7B-Instruct-Q5_K_S-GGUF --hf-file qwen2-vl-7b-instruct-q5_k_s.gguf -c 2048
Downloads last month
19
GGUF
Model size
7.62B params
Architecture
qwen2vl

5-bit

Inference Examples
Inference API (serverless) does not yet support transformers models for this pipeline type.

Model tree for Triangle104/Qwen2-VL-7B-Instruct-Q5_K_S-GGUF

Base model

Qwen/Qwen2-VL-7B
Quantized
(32)
this model

Collection including Triangle104/Qwen2-VL-7B-Instruct-Q5_K_S-GGUF