MaskBit: Embedding-free Image Generation via Bit Tokens Paper • 2409.16211 • Published 5 days ago • 13
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution Paper • 2409.12961 • Published 10 days ago • 22
Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model Paper • 2408.00754 • Published Aug 1 • 21
Efficient Inference of Vision Instruction-Following Models with Elastic Cache Paper • 2407.18121 • Published Jul 25 • 15