--- language: - en - de - fr - it - pt - hi license: llama3.1 library_name: transformers pipeline_tag: text-classification tags: - facebook - meta - pytorch - llama - brand-safety - classification model-index: - name: vision-1-mini results: - task: type: text-classification name: Brand Safety Classification metrics: - type: accuracy value: 0.95 name: Classification Accuracy datasets: - BrandSafe-16k metrics: - accuracy base_model: meta-llama/Llama-2-8b-chat model_size: "4.58 GiB" parameters: "8.03B" quantization: "GGUF V3" architectures: - LlamaForCausalLM model_parameters: block_count: 32 context_length: 131072 embedding_length: 4096 feed_forward_length: 14336 attention_heads: 32 kv_heads: 8 rope_freq_base: 500000 vocab_size: 128256 hardware: recommended: "Apple Silicon" memory: cpu_kv_cache: "992.00 MiB" metal_kv_cache: "32.00 MiB" metal_compute: "560.00 MiB" cpu_compute: "560.01 MiB" inference: load_time: "3.27s" device: "Metal (Apple M3 Pro)" memory_footprint: cpu: "4552.80 MiB" metal: "132.50 MiB" --- # vision-1-mini Vision-1-mini is an optimized 8B parameter model based on Llama 3.1, specifically designed for brand safety classification, trained on our [BrandSafe-16k](https://huggingface.co/datasets/OverseerAI/BrandSafe-16k) dataset. This model is particularly optimized for Apple Silicon devices and provides efficient, accurate brand safety assessments using the BrandSafe-16k classification system. ## Model Details - **Model Type:** Brand Safety Classifier - **Base Model:** Meta Llama 3.1 8B Instruct - **Parameters:** 8.03 billion - **Architecture:** Llama - **Quantization:** Q4_K - **Size:** 4.58 GiB (4.89 BPW) - **License:** Llama 3.1 ## Performance Metrics - **Load Time:** 3.27 seconds (on Apple M3 Pro) - **Memory Usage:** - CPU Buffer: 4552.80 MiB - Metal Buffer: 132.50 MiB - KV Cache: 1024.00 MiB (512.00 MiB K, 512.00 MiB V) - Compute Buffer: 560.00 MiB ## Hardware Compatibility ### Apple Silicon Optimizations - Optimized for Metal/MPS - Unified Memory Architecture support - SIMD group reduction and matrix multiplication optimizations - Efficient layer offloading (1/33 layers to GPU) ### System Requirements - Recommended Memory: 12GB+ - GPU: Apple Silicon preferred (M1/M2/M3 series) - Storage: 5GB free space ## Classification Categories The model classifies content into the following categories: 1. B1-PROFANITY - Contains profane or vulgar language 2. B2-OFFENSIVE_SLANG - Contains offensive slang or derogatory terms 3. B3-COMPETITOR - Mentions or promotes competing brands 4. B4-BRAND_CRITICISM - Contains criticism or negative feedback about brands 5. B5-MISLEADING - Contains misleading or deceptive information 6. B6-POLITICAL - Contains political content or bias 7. B7-RELIGIOUS - Contains religious content or references 8. B8-CONTROVERSIAL - Contains controversial topics or discussions 9. B9-ADULT - Contains adult or mature content 10. B10-VIOLENCE - Contains violent content or references 11. B11-SUBSTANCE - Contains references to drugs, alcohol, or substances 12. B12-HATE - Contains hate speech or discriminatory content 13. B13-STEREOTYPE - Contains stereotypical representations 14. B14-BIAS - Shows bias against groups or individuals 15. B15-UNPROFESSIONAL - Contains unprofessional content or behavior 16. B16-MANIPULATION - Contains manipulative content or tactics 17. SAFE - Contains no brand safety concerns ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer # Load model model = AutoModelForCausalLM.from_pretrained("maxsonderby/vision-1-mini", device_map="auto", torch_dtype=torch.float16, low_cpu_mem_usage=True) tokenizer = AutoTokenizer.from_pretrained("maxsonderby/vision-1-mini") # Example usage text = "Your text here" inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=1, temperature=0.1, top_p=0.9) result = tokenizer.decode(outputs[0], skip_special_tokens=True) ``` ## Model Architecture - **Attention Mechanism:** - Head Count: 32 - KV Head Count: 8 - Layer Count: 32 - Embedding Length: 4096 - Feed Forward Length: 14336 - Context Length: 2048 (optimized from 131072) - RoPE Base Frequency: 500000 - Dimension Count: 128 ## Training & Fine-tuning This model is fine-tuned on brand safety classification tasks using the BrandSafe-16k dataset. The model uses an optimized context window of 2048 tokens and is configured for precise, deterministic outputs with: - Temperature: 0.1 - Top-p: 0.9 - Batch Size: 512 - Thread Count: 8 ## Limitations - The model is optimized for shorter content classification (up to 2048 tokens) - Performance may vary on non-Apple Silicon hardware - The model focuses solely on brand safety classification and may not be suitable for other tasks - Classification accuracy may vary based on content complexity and context ## Citation If you use this model in your research, please cite: ``` @misc{vision-1-mini, author = {Max Sonderby}, title = {Vision-1-Mini: Optimized Brand Safety Classification Model}, year = {2024}, publisher = {Hugging Face}, journal = {Hugging Face Model Hub}, howpublished = {\url{https://huggingface.co/maxsonderby/vision-1-mini}} } ```