-
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
Paper • 2211.06687 • Published • 3 -
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
Paper • 2401.17690 • Published • 5 -
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper • 2312.09911 • Published • 53 -
Audiobox: Unified Audio Generation with Natural Language Prompts
Paper • 2312.15821 • Published • 12
Collections
Discover the best community collections!
Collections including paper arxiv:2310.11954
-
Music ControlNet: Multiple Time-varying Controls for Music Generation
Paper • 2311.07069 • Published • 43 -
FLAP: Fast Language-Audio Pre-training
Paper • 2311.01615 • Published • 16 -
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
Paper • 2310.11954 • Published • 24 -
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies
Paper • 2308.01546 • Published • 17
-
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
Paper • 2310.11954 • Published • 24 -
Training Chain-of-Thought via Latent-Variable Inference
Paper • 2312.02179 • Published • 8 -
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Paper • 2401.16158 • Published • 17 -
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Paper • 2402.09727 • Published • 35
-
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing
Paper • 2310.12404 • Published • 15 -
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
Paper • 2310.11954 • Published • 24 -
A Survey of AI Music Generation Tools and Models
Paper • 2308.12982 • Published • 1 -
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 19
-
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Paper • 2310.15123 • Published • 7 -
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
Paper • 2310.13227 • Published • 12 -
LASER: LLM Agent with State-Space Exploration for Web Navigation
Paper • 2309.08172 • Published • 11 -
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 8
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 14 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 25 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 6 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 20