Daemontatox
/

DocumentLlama

Image-Text-to-Text

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

DocumentLlama / README.md

Daemontatox's picture

Update README.md

13056e5 verified 7 days ago

|

history blame contribute delete

2.21 kB

	---
	base_model: unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- mllama
	license: apache-2.0
	language:
	- en
	---

	![imae](./image.webp)

	# Finetuned Vision Model: unsloth/llama-3.2-11b-vision-instruct

	## Overview

	This model is a finetuned version of `unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit`, optimized for vision-based instruction tasks.
	It was trained 2x faster using [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's TRL library, enabling efficient large model adaptation while maintaining precision and accuracy.

	![Unsloth Logo](https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png)

	## Key Features
	- Model Type: Multimodal LLama-based Vision Instruction Model
	- License: Apache-2.0
	- Base Model: unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit
	- Developed by: Daemontatox
	- Language: English

	## Training Details
	- Framework: Hugging Face Transformers + TRL
	- Optimization: Unsloth methodology for accelerated finetuning
	- Quantization: 4-bit model, enabling deployment on resource-constrained devices
	- Dataset: Vision-specific instruction tasks (details to be added by user if public)

	## Performance Metrics
	- Inference Speed: Optimized for low-latency environments
	- Accuracy: Improved on vision-related benchmarks (details TBD based on evaluation)
	- Model Size: Lightweight due to quantization

	## Applications
	- Vision-based interactive AI
	- Instruction-following tasks with multimodal input
	- Resource-constrained deployment (e.g., edge devices)

	## How to Use
	To load and use the model:
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "your_model_repository_name"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_4bit=True)

	# Example usage
	input_text = "Describe the image in detail:"
	inputs = tokenizer(input_text, return_tensors="pt")
	outputs = model.generate(**inputs)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```