Multimodal Language Model Collection What does matter besides data receipt when training a Multimodal language model? • 29 items • Updated 4 days ago • 1
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Paper • 2501.04001 • Published 5 days ago • 36
Open Datasets Collection Thank you for sharing your dataset. I’ve fed them to my model, and they are benefit to it. • 16 items • Updated 6 days ago
Image / Video Gen Collection Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion • 32 items • Updated 8 days ago • 6
Image / Video Gen Collection Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion • 32 items • Updated 8 days ago • 6
Large Motion Video Autoencoding with Cross-modal Video VAE Paper • 2412.17805 • Published 20 days ago • 24
Multimodal Language Model Collection What does matter besides data receipt when training a Multimodal language model? • 29 items • Updated 4 days ago • 1
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 124
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published about 1 month ago • 136
Multimodal Language Model Collection What does matter besides data receipt when training a Multimodal language model? • 29 items • Updated 4 days ago • 1
Image / Video Gen Collection Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion • 32 items • Updated 8 days ago • 6