vision-1-mini / README.md
maxsonderby's picture
Update README.md
983a2f4 verified
|
raw
history blame
5.52 kB
---
language:
- en
- de
- fr
- it
- pt
- hi
license: llama3.1
library_name: transformers
pipeline_tag: text-classification
tags:
- facebook
- meta
- pytorch
- llama
- brand-safety
- classification
model-index:
- name: vision-1-mini
results:
- task:
type: text-classification
name: Brand Safety Classification
metrics:
- type: accuracy
value: 0.95
name: Classification Accuracy
datasets:
- BrandSafe-16k
metrics:
- accuracy
base_model: meta-llama/Llama-2-8b-chat
model_size: "4.58 GiB"
parameters: "8.03B"
quantization: "GGUF V3"
architectures:
- LlamaForCausalLM
model_parameters:
block_count: 32
context_length: 131072
embedding_length: 4096
feed_forward_length: 14336
attention_heads: 32
kv_heads: 8
rope_freq_base: 500000
vocab_size: 128256
hardware:
recommended: "Apple Silicon"
memory:
cpu_kv_cache: "992.00 MiB"
metal_kv_cache: "32.00 MiB"
metal_compute: "560.00 MiB"
cpu_compute: "560.01 MiB"
inference:
load_time: "3.27s"
device: "Metal (Apple M3 Pro)"
memory_footprint:
cpu: "4552.80 MiB"
metal: "132.50 MiB"
---
# vision-1-mini
Vision-1-mini is an optimized 8B parameter model based on Llama 3.1, specifically designed for brand safety classification, trained on our [BrandSafe-16k](https://huggingface.co/datasets/OverseerAI/BrandSafe-16k) dataset. This model is particularly optimized for Apple Silicon devices and provides efficient, accurate brand safety assessments using the BrandSafe-16k classification system.
## Model Details
- **Model Type:** Brand Safety Classifier
- **Base Model:** Meta Llama 3.1 8B Instruct
- **Parameters:** 8.03 billion
- **Architecture:** Llama
- **Quantization:** Q4_K
- **Size:** 4.58 GiB (4.89 BPW)
- **License:** Llama 3.1
## Performance Metrics
- **Load Time:** 3.27 seconds (on Apple M3 Pro)
- **Memory Usage:**
- CPU Buffer: 4552.80 MiB
- Metal Buffer: 132.50 MiB
- KV Cache: 1024.00 MiB (512.00 MiB K, 512.00 MiB V)
- Compute Buffer: 560.00 MiB
## Hardware Compatibility
### Apple Silicon Optimizations
- Optimized for Metal/MPS
- Unified Memory Architecture support
- SIMD group reduction and matrix multiplication optimizations
- Efficient layer offloading (1/33 layers to GPU)
### System Requirements
- Recommended Memory: 12GB+
- GPU: Apple Silicon preferred (M1/M2/M3 series)
- Storage: 5GB free space
## Classification Categories
The model classifies content into the following categories:
1. B1-PROFANITY - Contains profane or vulgar language
2. B2-OFFENSIVE_SLANG - Contains offensive slang or derogatory terms
3. B3-COMPETITOR - Mentions or promotes competing brands
4. B4-BRAND_CRITICISM - Contains criticism or negative feedback about brands
5. B5-MISLEADING - Contains misleading or deceptive information
6. B6-POLITICAL - Contains political content or bias
7. B7-RELIGIOUS - Contains religious content or references
8. B8-CONTROVERSIAL - Contains controversial topics or discussions
9. B9-ADULT - Contains adult or mature content
10. B10-VIOLENCE - Contains violent content or references
11. B11-SUBSTANCE - Contains references to drugs, alcohol, or substances
12. B12-HATE - Contains hate speech or discriminatory content
13. B13-STEREOTYPE - Contains stereotypical representations
14. B14-BIAS - Shows bias against groups or individuals
15. B15-UNPROFESSIONAL - Contains unprofessional content or behavior
16. B16-MANIPULATION - Contains manipulative content or tactics
17. SAFE - Contains no brand safety concerns
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model
model = AutoModelForCausalLM.from_pretrained("maxsonderby/vision-1-mini",
device_map="auto",
torch_dtype=torch.float16,
low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained("maxsonderby/vision-1-mini")
# Example usage
text = "Your text here"
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs,
max_new_tokens=1,
temperature=0.1,
top_p=0.9)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
```
## Model Architecture
- **Attention Mechanism:**
- Head Count: 32
- KV Head Count: 8
- Layer Count: 32
- Embedding Length: 4096
- Feed Forward Length: 14336
- Context Length: 2048 (optimized from 131072)
- RoPE Base Frequency: 500000
- Dimension Count: 128
## Training & Fine-tuning
This model is fine-tuned on brand safety classification tasks using the BrandSafe-16k dataset. The model uses an optimized context window of 2048 tokens and is configured for precise, deterministic outputs with:
- Temperature: 0.1
- Top-p: 0.9
- Batch Size: 512
- Thread Count: 8
## Limitations
- The model is optimized for shorter content classification (up to 2048 tokens)
- Performance may vary on non-Apple Silicon hardware
- The model focuses solely on brand safety classification and may not be suitable for other tasks
- Classification accuracy may vary based on content complexity and context
## Citation
If you use this model in your research, please cite:
```
@misc{vision-1-mini,
author = {Max Sonderby},
title = {Vision-1-Mini: Optimized Brand Safety Classification Model},
year = {2024},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
howpublished = {\url{https://huggingface.co/maxsonderby/vision-1-mini}}
}
```