AudioLDM 2 Music for Zalo AI Challenge 2023

This checkpoint is the result of finetuning AudioLDM 2 Music (https://huggingface.co/cvssp/audioldm2-music) on the challenge dataset + MusicCaps (https://www.kaggle.com/datasets/googleai/musiccaps)

Uses

First, install the required packages:

pip install --upgrade diffusers transformers accelerate

Text-to-Audio

from diffusers import AudioLDM2Pipeline
import torch

repo_id = "vtrungnhan9/audioldm2-music-zac2023"
pipe = AudioLDM2Pipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "This music is instrumental. The tempo is medium with synthesiser arrangements, digital drums and electronic music. The music is upbeat, pulsating, youthful, buoyant, exciting, punchy, psychedelic and has propulsive beats with a dance groove. This music is Techno Pop/EDM."
neg_prompt = "bad quality"
audio = pipe(prompt, negative_prompt=neg_prompt, num_inference_steps=200, audio_length_in_s=10.0, guidance_scale=10).audios[0]

The resulting audio output can be saved as a .wav file:

import scipy

scipy.io.wavfile.write("techno.wav", rate=16000, data=audio)

Or displayed in a Jupyter Notebook / Google Colab:

from IPython.display import Audio

Audio(audio, rate=16000)

Training Details

Training Data

[More Information Needed]

Training Procedure

Please refer at https://github.com/declare-lab/tango/blob/master/train.py for training procedure

Citation

BibTeX:

@article{liu2023audioldm2,
  title={"AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining"},
  author={Haohe Liu and Qiao Tian and Yi Yuan and Xubo Liu and Xinhao Mei and Qiuqiang Kong and Yuping Wang and Wenwu Wang and Yuxuan Wang and Mark D. Plumbley},
  journal={arXiv preprint arXiv:2308.05734},
  year={2023}
}

Model Card Contact

[email protected]

Downloads last month
22
Inference Examples
Inference API (serverless) does not yet support diffusers models for this pipeline type.

Dataset used to train vtrungnhan9/audioldm2-music-zac2023