-
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Paper • 2501.01957 • Published • 34 -
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Paper • 2410.02155 • Published • 2 -
OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis
Paper • 2501.04561 • Published • 15
zhao
poonyZ
AI & ML interests
None yet
Recent Activity
updated
a collection
3 days ago
video LM
updated
a collection
3 days ago
omni
updated
a collection
3 days ago
video LM
Organizations
None yet
Collections
10
models
None public yet
datasets
None public yet