vision-1-mini / README.md

Update README.md

983a2f4 verified about 2 months ago

5.52 kB

	---
	language:
	- en
	- de
	- fr
	- it
	- pt
	- hi
	license: llama3.1
	library_name: transformers
	pipeline_tag: text-classification
	tags:
	- facebook
	- meta
	- pytorch
	- llama
	- brand-safety
	- classification
	model-index:
	- name: vision-1-mini
	results:
	- task:
	type: text-classification
	name: Brand Safety Classification
	metrics:
	- type: accuracy
	value: 0.95
	name: Classification Accuracy
	datasets:
	- BrandSafe-16k
	metrics:
	- accuracy
	base_model: meta-llama/Llama-2-8b-chat
	model_size: "4.58 GiB"
	parameters: "8.03B"
	quantization: "GGUF V3"
	architectures:
	- LlamaForCausalLM
	model_parameters:
	block_count: 32
	context_length: 131072
	embedding_length: 4096
	feed_forward_length: 14336
	attention_heads: 32
	kv_heads: 8
	rope_freq_base: 500000
	vocab_size: 128256
	hardware:
	recommended: "Apple Silicon"
	memory:
	cpu_kv_cache: "992.00 MiB"
	metal_kv_cache: "32.00 MiB"
	metal_compute: "560.00 MiB"
	cpu_compute: "560.01 MiB"
	inference:
	load_time: "3.27s"
	device: "Metal (Apple M3 Pro)"
	memory_footprint:
	cpu: "4552.80 MiB"
	metal: "132.50 MiB"
	---
	# vision-1-mini

	Vision-1-mini is an optimized 8B parameter model based on Llama 3.1, specifically designed for brand safety classification, trained on our [BrandSafe-16k](https://huggingface.co/datasets/OverseerAI/BrandSafe-16k) dataset. This model is particularly optimized for Apple Silicon devices and provides efficient, accurate brand safety assessments using the BrandSafe-16k classification system.

	## Model Details

	- Model Type: Brand Safety Classifier
	- Base Model: Meta Llama 3.1 8B Instruct
	- Parameters: 8.03 billion
	- Architecture: Llama
	- Quantization: Q4_K
	- Size: 4.58 GiB (4.89 BPW)
	- License: Llama 3.1

	## Performance Metrics

	- Load Time: 3.27 seconds (on Apple M3 Pro)
	- Memory Usage:
	- CPU Buffer: 4552.80 MiB
	- Metal Buffer: 132.50 MiB
	- KV Cache: 1024.00 MiB (512.00 MiB K, 512.00 MiB V)
	- Compute Buffer: 560.00 MiB

	## Hardware Compatibility

	### Apple Silicon Optimizations
	- Optimized for Metal/MPS
	- Unified Memory Architecture support
	- SIMD group reduction and matrix multiplication optimizations
	- Efficient layer offloading (1/33 layers to GPU)

	### System Requirements
	- Recommended Memory: 12GB+
	- GPU: Apple Silicon preferred (M1/M2/M3 series)
	- Storage: 5GB free space

	## Classification Categories

	The model classifies content into the following categories:
	1. B1-PROFANITY - Contains profane or vulgar language
	2. B2-OFFENSIVE_SLANG - Contains offensive slang or derogatory terms
	3. B3-COMPETITOR - Mentions or promotes competing brands
	4. B4-BRAND_CRITICISM - Contains criticism or negative feedback about brands
	5. B5-MISLEADING - Contains misleading or deceptive information
	6. B6-POLITICAL - Contains political content or bias
	7. B7-RELIGIOUS - Contains religious content or references
	8. B8-CONTROVERSIAL - Contains controversial topics or discussions
	9. B9-ADULT - Contains adult or mature content
	10. B10-VIOLENCE - Contains violent content or references
	11. B11-SUBSTANCE - Contains references to drugs, alcohol, or substances
	12. B12-HATE - Contains hate speech or discriminatory content
	13. B13-STEREOTYPE - Contains stereotypical representations
	14. B14-BIAS - Shows bias against groups or individuals
	15. B15-UNPROFESSIONAL - Contains unprofessional content or behavior
	16. B16-MANIPULATION - Contains manipulative content or tactics
	17. SAFE - Contains no brand safety concerns

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	# Load model
	model = AutoModelForCausalLM.from_pretrained("maxsonderby/vision-1-mini",
	device_map="auto",
	torch_dtype=torch.float16,
	low_cpu_mem_usage=True)
	tokenizer = AutoTokenizer.from_pretrained("maxsonderby/vision-1-mini")

	# Example usage
	text = "Your text here"
	inputs = tokenizer(text, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs,
	max_new_tokens=1,
	temperature=0.1,
	top_p=0.9)
	result = tokenizer.decode(outputs[0], skip_special_tokens=True)
	```

	## Model Architecture

	- Attention Mechanism:
	- Head Count: 32
	- KV Head Count: 8
	- Layer Count: 32
	- Embedding Length: 4096
	- Feed Forward Length: 14336
	- Context Length: 2048 (optimized from 131072)
	- RoPE Base Frequency: 500000
	- Dimension Count: 128

	## Training & Fine-tuning

	This model is fine-tuned on brand safety classification tasks using the BrandSafe-16k dataset. The model uses an optimized context window of 2048 tokens and is configured for precise, deterministic outputs with:
	- Temperature: 0.1
	- Top-p: 0.9
	- Batch Size: 512
	- Thread Count: 8

	## Limitations

	- The model is optimized for shorter content classification (up to 2048 tokens)
	- Performance may vary on non-Apple Silicon hardware
	- The model focuses solely on brand safety classification and may not be suitable for other tasks
	- Classification accuracy may vary based on content complexity and context

	## Citation

	If you use this model in your research, please cite:
	```
	@misc{vision-1-mini,
	author = {Max Sonderby},
	title = {Vision-1-Mini: Optimized Brand Safety Classification Model},
	year = {2024},
	publisher = {Hugging Face},
	journal = {Hugging Face Model Hub},
	howpublished = {\url{https://huggingface.co/maxsonderby/vision-1-mini}}
	}
	```