vision-1-mini / README.md
maxsonderby's picture
Update README.md
983a2f4 verified
|
raw
history blame
5.52 kB
metadata
language:
  - en
  - de
  - fr
  - it
  - pt
  - hi
license: llama3.1
library_name: transformers
pipeline_tag: text-classification
tags:
  - facebook
  - meta
  - pytorch
  - llama
  - brand-safety
  - classification
model-index:
  - name: vision-1-mini
    results:
      - task:
          type: text-classification
          name: Brand Safety Classification
        metrics:
          - type: accuracy
            value: 0.95
            name: Classification Accuracy
datasets:
  - BrandSafe-16k
metrics:
  - accuracy
base_model: meta-llama/Llama-2-8b-chat
model_size: 4.58 GiB
parameters: 8.03B
quantization: GGUF V3
architectures:
  - LlamaForCausalLM
model_parameters:
  block_count: 32
  context_length: 131072
  embedding_length: 4096
  feed_forward_length: 14336
  attention_heads: 32
  kv_heads: 8
  rope_freq_base: 500000
  vocab_size: 128256
hardware:
  recommended: Apple Silicon
  memory:
    cpu_kv_cache: 992.00 MiB
    metal_kv_cache: 32.00 MiB
    metal_compute: 560.00 MiB
    cpu_compute: 560.01 MiB
inference:
  load_time: 3.27s
  device: Metal (Apple M3 Pro)
  memory_footprint:
    cpu: 4552.80 MiB
    metal: 132.50 MiB

vision-1-mini

Vision-1-mini is an optimized 8B parameter model based on Llama 3.1, specifically designed for brand safety classification, trained on our BrandSafe-16k dataset. This model is particularly optimized for Apple Silicon devices and provides efficient, accurate brand safety assessments using the BrandSafe-16k classification system.

Model Details

  • Model Type: Brand Safety Classifier
  • Base Model: Meta Llama 3.1 8B Instruct
  • Parameters: 8.03 billion
  • Architecture: Llama
  • Quantization: Q4_K
  • Size: 4.58 GiB (4.89 BPW)
  • License: Llama 3.1

Performance Metrics

  • Load Time: 3.27 seconds (on Apple M3 Pro)
  • Memory Usage:
    • CPU Buffer: 4552.80 MiB
    • Metal Buffer: 132.50 MiB
    • KV Cache: 1024.00 MiB (512.00 MiB K, 512.00 MiB V)
    • Compute Buffer: 560.00 MiB

Hardware Compatibility

Apple Silicon Optimizations

  • Optimized for Metal/MPS
  • Unified Memory Architecture support
  • SIMD group reduction and matrix multiplication optimizations
  • Efficient layer offloading (1/33 layers to GPU)

System Requirements

  • Recommended Memory: 12GB+
  • GPU: Apple Silicon preferred (M1/M2/M3 series)
  • Storage: 5GB free space

Classification Categories

The model classifies content into the following categories:

  1. B1-PROFANITY - Contains profane or vulgar language
  2. B2-OFFENSIVE_SLANG - Contains offensive slang or derogatory terms
  3. B3-COMPETITOR - Mentions or promotes competing brands
  4. B4-BRAND_CRITICISM - Contains criticism or negative feedback about brands
  5. B5-MISLEADING - Contains misleading or deceptive information
  6. B6-POLITICAL - Contains political content or bias
  7. B7-RELIGIOUS - Contains religious content or references
  8. B8-CONTROVERSIAL - Contains controversial topics or discussions
  9. B9-ADULT - Contains adult or mature content
  10. B10-VIOLENCE - Contains violent content or references
  11. B11-SUBSTANCE - Contains references to drugs, alcohol, or substances
  12. B12-HATE - Contains hate speech or discriminatory content
  13. B13-STEREOTYPE - Contains stereotypical representations
  14. B14-BIAS - Shows bias against groups or individuals
  15. B15-UNPROFESSIONAL - Contains unprofessional content or behavior
  16. B16-MANIPULATION - Contains manipulative content or tactics
  17. SAFE - Contains no brand safety concerns

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model
model = AutoModelForCausalLM.from_pretrained("maxsonderby/vision-1-mini", 
                                           device_map="auto",
                                           torch_dtype=torch.float16,
                                           low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained("maxsonderby/vision-1-mini")

# Example usage
text = "Your text here"
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, 
                        max_new_tokens=1,
                        temperature=0.1,
                        top_p=0.9)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)

Model Architecture

  • Attention Mechanism:
    • Head Count: 32
    • KV Head Count: 8
    • Layer Count: 32
    • Embedding Length: 4096
    • Feed Forward Length: 14336
    • Context Length: 2048 (optimized from 131072)
    • RoPE Base Frequency: 500000
    • Dimension Count: 128

Training & Fine-tuning

This model is fine-tuned on brand safety classification tasks using the BrandSafe-16k dataset. The model uses an optimized context window of 2048 tokens and is configured for precise, deterministic outputs with:

  • Temperature: 0.1
  • Top-p: 0.9
  • Batch Size: 512
  • Thread Count: 8

Limitations

  • The model is optimized for shorter content classification (up to 2048 tokens)
  • Performance may vary on non-Apple Silicon hardware
  • The model focuses solely on brand safety classification and may not be suitable for other tasks
  • Classification accuracy may vary based on content complexity and context

Citation

If you use this model in your research, please cite:

@misc{vision-1-mini,
  author = {Max Sonderby},
  title = {Vision-1-Mini: Optimized Brand Safety Classification Model},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/maxsonderby/vision-1-mini}}
}