|
--- |
|
base_model: unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- unsloth |
|
- mllama |
|
license: apache-2.0 |
|
language: |
|
- en |
|
--- |
|
|
|
![imae](./image.webp) |
|
|
|
# Finetuned Vision Model: unsloth/llama-3.2-11b-vision-instruct |
|
|
|
## Overview |
|
|
|
This model is a finetuned version of `unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit`, optimized for vision-based instruction tasks. |
|
It was trained 2x faster using [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's TRL library, enabling efficient large model adaptation while maintaining precision and accuracy. |
|
|
|
![Unsloth Logo](https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png) |
|
|
|
## Key Features |
|
- **Model Type**: Multimodal LLama-based Vision Instruction Model |
|
- **License**: Apache-2.0 |
|
- **Base Model**: unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit |
|
- **Developed by**: Daemontatox |
|
- **Language**: English |
|
|
|
## Training Details |
|
- **Framework**: Hugging Face Transformers + TRL |
|
- **Optimization**: Unsloth methodology for accelerated finetuning |
|
- **Quantization**: 4-bit model, enabling deployment on resource-constrained devices |
|
- **Dataset**: Vision-specific instruction tasks (details to be added by user if public) |
|
|
|
## Performance Metrics |
|
- **Inference Speed**: Optimized for low-latency environments |
|
- **Accuracy**: Improved on vision-related benchmarks (details TBD based on evaluation) |
|
- **Model Size**: Lightweight due to quantization |
|
|
|
## Applications |
|
- Vision-based interactive AI |
|
- Instruction-following tasks with multimodal input |
|
- Resource-constrained deployment (e.g., edge devices) |
|
|
|
## How to Use |
|
To load and use the model: |
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_name = "your_model_repository_name" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_4bit=True) |
|
|
|
# Example usage |
|
input_text = "Describe the image in detail:" |
|
inputs = tokenizer(input_text, return_tensors="pt") |
|
outputs = model.generate(**inputs) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|