Daemontatox
/

DocumentLlama

Image-Text-to-Text

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Daemontatox commited on 20 days ago

Commit

13056e5

·

verified ·

1 Parent(s): cac6a3f

Update README.md

Files changed (1) hide show

README.md +46 -6

README.md CHANGED Viewed

@@ -10,12 +10,52 @@ language:
 - en
 ---
-# Uploaded finetuned  model
-- **Developed by:** Daemontatox
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit
-This mllama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 - en
 ---
+![imae](./image.webp)
+# Finetuned Vision Model: unsloth/llama-3.2-11b-vision-instruct
+## Overview
+This model is a finetuned version of `unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit`, optimized for vision-based instruction tasks.
+It was trained 2x faster using [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's TRL library, enabling efficient large model adaptation while maintaining precision and accuracy.
+![Unsloth Logo](https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png)
+## Key Features
+- **Model Type**: Multimodal LLama-based Vision Instruction Model
+- **License**: Apache-2.0
+- **Base Model**: unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit
+- **Developed by**: Daemontatox
+- **Language**: English
+## Training Details
+- **Framework**: Hugging Face Transformers + TRL
+- **Optimization**: Unsloth methodology for accelerated finetuning
+- **Quantization**: 4-bit model, enabling deployment on resource-constrained devices
+- **Dataset**: Vision-specific instruction tasks (details to be added by user if public)
+## Performance Metrics
+- **Inference Speed**: Optimized for low-latency environments
+- **Accuracy**: Improved on vision-related benchmarks (details TBD based on evaluation)
+- **Model Size**: Lightweight due to quantization
+## Applications
+- Vision-based interactive AI
+- Instruction-following tasks with multimodal input
+- Resource-constrained deployment (e.g., edge devices)
+## How to Use
+To load and use the model:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "your_model_repository_name"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_4bit=True)
+# Example usage
+input_text = "Describe the image in detail:"
+inputs = tokenizer(input_text, return_tensors="pt")
+outputs = model.generate(**inputs)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```