QvQ-72B-Preview๐ an open weight model for visual reasoning just released by Alibaba_Qwen team Qwen/qvq-676448c820912236342b9888 โจ Combines visual understanding & language reasoning. โจ Scores 70.3 on MMMU โจ Outperforms Qwen2-VL-72B-Instruct in complex problem-solving
Megrez-3B-Omni ๐ฅ an on-device multimodal LLM by Infinigence AI, another startup emerging from the Tsinghua University ecosystem. Model: Infinigence/Megrez-3B-Omni Demo: Infinigence/Megrez-3B-Omni โจSupports analysis of image, text, and audio modalities โจLeads in bilingual speech ( English & Chinese ) input, multi-turn conversations, and voice-based queries โจOutperforms in scene understanding and OCR across major benchmarks
LLaMA-O1-PRM and LLaMA-O1-Reinforcement will release in this weekend. We have implemented a novel Reinforcement finetune(RFT) pipeline that taught models learning reasoning and reward labeling without human annotation.
3 replies
ยท
reacted to julien-c's
post with โค๏ธ๐ฅ14 days ago
After some heated discussion ๐ฅ, we clarify our intent re. storage limits on the Hub
TL;DR: - public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible - private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)
We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community ๐ฅ
Last week was crazy in OS AI, with important models and datasets releases every day.
Here are the most important ones I've pinned:
๐ Cohere relased GLobal-MMLU, a multilingual version of MMLU, to evaluate AI models' world knowledge in many languages!
๐ฆ Meta released Llama-3.3-70B-Instruct, a 70B model that's on par with Llama-3.1-405B-Instruct, GPT-4o and Claude. Probably my new go-to for agentic workflows.
๐ FishAudio released fish-speech-1.5, multilingual text to speech model
๐จ Microsoft Research released TRELLIS, an extremely impressive image-to-3D model, which you can try here: JeffreyXiang/TRELLIS
๐ Yesterday, Hugging Face release FineWeb 2, a new version that extends the previous FineWeb to over 1000 languages, including extended coverage in Russina, Mandarin, German, Japanese, Spanish, French, so a huge, high-quality dataset of > 3 trillion words! HuggingFaceFW/fineweb-2
Now let's go build to make this week as productive as last one!
Open Preference Dataset for Text-to-Image Generation by the ๐ค Community
Open Image Preferences is an Apache 2.0 licensed dataset for text-to-image generation. This dataset contains 10K text-to-image preference pairs across common image generation categories, while using different model families and varying prompt complexities.
We applied the same data-driven approach that led to SOTA English performance in๐ท FineWeb to thousands of languages.
๐ฅ FineWeb2 has 8TB of compressed text data and outperforms other multilingual datasets in our experiments.
The dataset is released under the permissive ๐ ODC-By 1.0 license, and the ๐ป code to reproduce it and our evaluations is public.
We will very soon announce a big community project, and are working on a ๐ blogpost walking you through the entire dataset creation process. Stay tuned!
Audio model: โจFish Speech 1.5, Text-to-speech in 13 languages, trained on 1M+ hours of audio by FishAudio fishaudio/fish-speech-1.5 โจClearVoice, An advanced voice processing framework by Alibaba Tongyi SpeechAI https://huggingface.co/alibabasglab
HunyuanVideo ๐น The new open video generation model by Tencent! ๐ tencent/HunyuanVideo zh-ai-community/video-models-666afd86cfa4e4dd1473b64c โจ 13B parameters: Probably the largest open video model to date โจ Unified architecture for image & video generation โจ Powered by advanced features: MLLM Text Encoder, 3D VAE, and Prompt Rewrite โจ Delivers stunning visuals, diverse motion, and unparalleled stability ๐ Fully open with code & weights
Zhipu AI, the Chinese generative AI startup behind CogVideo, just launched their first productized AI Agent - AutoGLM ๐ฅ ๐ https://agent.aminer.cn
With simple text or voice commands, it: โจ Simulates phone operations effortlessly โจ Autonomously handles 50+ step tasks โจ Seamlessly operates across apps
Powered by Zhipu's "Decoupled Interface" and "Self-Evolving Learning Framework" to achieve major performance gains in Phone Use and Web Browser Use!
Meanwhile, GLM4-Edge is now on Hugging Face hub๐ ๐ THUDM/glm-edge-6743283c5809de4a7b9e0b8b Packed with advanced dialogue + multimodal models: ๐ฑ 1.5B / 2B models: Built for mobile & in-car systems ๐ป 4B / 5B models: Optimized for PCs