SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 6 days ago • 115
Running 1.64k 1.64k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Phantom: Subject-consistent video generation via cross-modal alignment Paper • 2502.11079 • Published 10 days ago • 50
Multimodal Language Model Collection What does matter besides data receipt when training a Multimodal language model? • 30 items • Updated 13 days ago • 1
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment Paper • 2502.04328 • Published 20 days ago • 26
Image / Video Gen Collection Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion • 35 items • Updated 14 days ago • 8
Magic 1-For-1: Generating One Minute Video Clips within One Minute Paper • 2502.07701 • Published 15 days ago • 32
Open Datasets Collection Thank you for sharing your dataset. I’ve fed them to my model, and they are benefit to it. • 17 items • Updated 16 days ago
Image / Video Gen Collection Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion • 35 items • Updated 14 days ago • 8
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper • 2502.02492 • Published 22 days ago • 57
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 332