Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published 7 days ago • 42
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation Paper • 2501.03059 • Published Jan 6 • 20
Improving Visual Commonsense in Language Models via Multiple Image Generation Paper • 2406.13621 • Published Jun 19, 2024 • 13
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30, 2024 • 77
Speaker Normalization for Self-supervised Speech Emotion Recognition Paper • 2202.01252 • Published Feb 2, 2022
D-Flow: Differentiating through Flows for Controlled Generation Paper • 2402.14017 • Published Feb 21, 2024 • 8
SpiRit-LM: Interleaved Spoken and Written Language Model Paper • 2402.05755 • Published Feb 8, 2024 • 15
SpiRit-LM: Interleaved Spoken and Written Language Model Paper • 2402.05755 • Published Feb 8, 2024 • 15
Proactive Detection of Voice Cloning with Localized Watermarking Paper • 2401.17264 • Published Jan 30, 2024 • 18
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9, 2024 • 43
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9, 2024 • 43
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9, 2024 • 43
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9, 2024 • 43
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9, 2024 • 43
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9, 2024 • 43