--- title: README emoji: 🐨 colorFrom: blue colorTo: red sdk: static pinned: false --- # 🔍 OverseerAI ## Mission OverseerAI is dedicated to advancing open-source AI safety and content moderation tools. We develop state-of-the-art models and datasets for brand safety classification, making content moderation more accessible and efficient for developers and organizations. ## 🌟 Our Projects ### Datasets #### [BrandSafe-16k](https://huggingface.co/datasets/OverseerAI/BrandSafe-16k) A comprehensive dataset for training brand safety classification models, featuring 16 distinct risk categories: | Category | Description | |----------|-------------| | B1-PROFANITY | Explicit language and cursing | | B2-OFFENSIVE_SLANG | Informal offensive terms | | B3-COMPETITOR | Competitive brand mentions | | B4-BRAND_CRITICISM | Negative brand commentary | | B5-MISLEADING | Deceptive or false information | | B6-POLITICAL | Political content and discussions | | B7-RELIGIOUS | Religious themes and references | | B8-CONTROVERSIAL | Contentious topics | | B9-ADULT | Adult or mature content | | B10-VIOLENCE | Violent themes or descriptions | | B11-SUBSTANCE | Drug and alcohol references | | B12-HATE | Hate speech and discrimination | | B13-STEREOTYPE | Stereotypical content | | B14-BIAS | Biased viewpoints | | B15-UNPROFESSIONAL | Unprofessional content | | B16-MANIPULATION | Manipulative content | ### Models #### [vision-1](https://huggingface.co/OverseerAI/vision-1) Our flagship model for brand safety classification: - Architecture: Meta Llama 3.1 (15GB) - Full precision model optimized for high accuracy - Trained on BrandSafe-16k dataset - Ideal for production deployments with high-end GPU resources #### [vision-1-mini](https://huggingface.co/OverseerAI/vision-1-mini) A lightweight, optimized version of vision-1: - Size: 4.58 GiB - Architecture: Llama 3.1 8B - Quantization: GGUF V3 (Q4_K) - Optimized for Apple Silicon - Fast load time: 3.27s - Efficient memory usage: 4552.80 MiB CPU / 132.50 MiB Metal - Perfect for local deployment and smaller compute resources ## 💡 Use Cases - Content moderation for social media platforms - Brand safety monitoring for advertising - User-generated content filtering - Real-time content classification - Safe content recommendation systems ## 🤝 Contributing We welcome contributions from the community! Whether it's: - Improving model accuracy - Expanding the dataset - Optimizing for different hardware - Adding new classification categories - Reporting issues or suggesting improvements ## 📫 Contact - GitHub: [OverseerAI](https://github.com/OverseerAI) - HuggingFace: [OverseerAI](https://huggingface.co/OverseerAI) ## 📜 License Our models are released under the Llama 3.1 license, and our datasets are available under open-source licenses to promote accessibility and innovation in AI safety. --- *OverseerAI - Making AI Safety Accessible and Efficient*