admarcosai
's Collections
LMMM
updated
OneLLM: One Framework to Align All Modalities with Language
Paper
•
2312.03700
•
Published
•
21
Direct-a-Video: Customized Video Generation with User-Directed Camera
Movement and Object Motion
Paper
•
2402.03162
•
Published
•
18
Paper
•
2402.09470
•
Published
•
11
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper
•
2402.12226
•
Published
•
42
Thinking in Space: How Multimodal Large Language Models See, Remember,
and Recall Spaces
Paper
•
2412.14171
•
Published
•
24
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for
Long-term Streaming Video and Audio Interactions
Paper
•
2412.09596
•
Published
•
93
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity
Visual Descriptions
Paper
•
2412.08737
•
Published
•
53
Multimodal Latent Language Modeling with Next-Token Diffusion
Paper
•
2412.08635
•
Published
•
44