Update README.md
Browse files
README.md
CHANGED
@@ -8,37 +8,34 @@ pipeline_tag: text-to-audio
|
|
8 |
tags:
|
9 |
- text-to-audio
|
10 |
---
|
11 |
-
#
|
12 |
|
13 |
-
**
|
14 |
|
15 |
-
📣 We are releasing [**Tango-Full-FT-Audiocaps**](https://huggingface.co/declare-lab/tango-full-ft-audiocaps) which was first pre-trained on [**TangoPromptBank**](https://huggingface.co/datasets/declare-lab/TangoPromptBank), a collection of diverse text, audio pairs. We later fine tuned this checkpoint on AudioCaps. This checkpoint obtained state-of-the-art results for text-to-audio generation on AudioCaps.
|
16 |
|
17 |
## Code
|
18 |
|
19 |
Our code is released here: [https://github.com/declare-lab/tango](https://github.com/declare-lab/tango)
|
20 |
|
21 |
-
We uploaded several **TANGO** generated samples here: [https://tango-web.github.io/](https://tango-web.github.io/)
|
22 |
|
23 |
Please follow the instructions in the repository for installation, usage and experiments.
|
24 |
|
25 |
## Quickstart Guide
|
26 |
|
27 |
-
Download the **
|
28 |
|
29 |
```python
|
30 |
import IPython
|
31 |
import soundfile as sf
|
32 |
from tango import Tango
|
33 |
|
34 |
-
tango = Tango("declare-lab/
|
35 |
|
36 |
prompt = "An audience cheering and clapping"
|
37 |
audio = tango.generate(prompt)
|
38 |
sf.write(f"{prompt}.wav", audio, samplerate=16000)
|
39 |
IPython.display.Audio(data=audio, rate=16000)
|
40 |
```
|
41 |
-
[An audience cheering and clapping.webm](https://user-images.githubusercontent.com/13917097/233851915-e702524d-cd35-43f7-93e0-86ea579231a7.webm)
|
42 |
|
43 |
The model will be automatically downloaded and saved in cache. Subsequent runs will load the model directly from cache.
|
44 |
|
@@ -49,9 +46,7 @@ prompt = "Rolling thunder with lightning strikes"
|
|
49 |
audio = tango.generate(prompt, steps=200)
|
50 |
IPython.display.Audio(data=audio, rate=16000)
|
51 |
```
|
52 |
-
[Rolling thunder with lightning strikes.webm](https://user-images.githubusercontent.com/13917097/233851929-90501e41-911d-453f-a00b-b215743365b4.webm)
|
53 |
|
54 |
-
<!-- [MachineClicking](https://user-images.githubusercontent.com/25340239/233857834-bfda52b4-4fcc-48de-b47a-6a6ddcb3671b.mp4 "sample 1") -->
|
55 |
|
56 |
Use the `generate_for_batch` function to generate multiple audio samples for a batch of text prompts:
|
57 |
|
@@ -63,10 +58,4 @@ prompts = [
|
|
63 |
]
|
64 |
audios = tango.generate_for_batch(prompts, samples=2)
|
65 |
```
|
66 |
-
This will generate two samples for each of the three text prompts.
|
67 |
-
|
68 |
-
## Limitations
|
69 |
-
|
70 |
-
TANGO is trained on the small AudioCaps dataset so it may not generate good audio samples related to concepts that it has not seen in training (e.g. _singing_). For the same reason, TANGO is not always able to finely control its generations over textual control prompts. For example, the generations from TANGO for prompts _Chopping tomatoes on a wooden table_ and _Chopping potatoes on a metal table_ are very similar. _Chopping vegetables on a table_ also produces similar audio samples. Training text-to-audio generation models on larger datasets is thus required for the model to learn the composition of textual concepts and varied text-audio mappings.
|
71 |
-
|
72 |
-
We are training another version of TANGO on larger datasets to enhance its generalization, compositional and controllable generation ability.
|
|
|
8 |
tags:
|
9 |
- text-to-audio
|
10 |
---
|
11 |
+
# Tango 2: Aligning Diffusion-based Text-to-Audio Generative Models through Direct Preference Optimization
|
12 |
|
13 |
+
🎵 We developed **Tango 2** building upon **Tango** for text-to-audio generation. **Tango 2** was initialized with the **Tango-full-ft** checkpoint and underwent alignment training using DPO on **audio-alpaca**, a dataset of pairwise audio preferences. 🎶
|
14 |
|
|
|
15 |
|
16 |
## Code
|
17 |
|
18 |
Our code is released here: [https://github.com/declare-lab/tango](https://github.com/declare-lab/tango)
|
19 |
|
|
|
20 |
|
21 |
Please follow the instructions in the repository for installation, usage and experiments.
|
22 |
|
23 |
## Quickstart Guide
|
24 |
|
25 |
+
Download the **Tango 2** model and generate audio from a text prompt:
|
26 |
|
27 |
```python
|
28 |
import IPython
|
29 |
import soundfile as sf
|
30 |
from tango import Tango
|
31 |
|
32 |
+
tango = Tango("declare-lab/tango2-full")
|
33 |
|
34 |
prompt = "An audience cheering and clapping"
|
35 |
audio = tango.generate(prompt)
|
36 |
sf.write(f"{prompt}.wav", audio, samplerate=16000)
|
37 |
IPython.display.Audio(data=audio, rate=16000)
|
38 |
```
|
|
|
39 |
|
40 |
The model will be automatically downloaded and saved in cache. Subsequent runs will load the model directly from cache.
|
41 |
|
|
|
46 |
audio = tango.generate(prompt, steps=200)
|
47 |
IPython.display.Audio(data=audio, rate=16000)
|
48 |
```
|
|
|
49 |
|
|
|
50 |
|
51 |
Use the `generate_for_batch` function to generate multiple audio samples for a batch of text prompts:
|
52 |
|
|
|
58 |
]
|
59 |
audios = tango.generate_for_batch(prompts, samples=2)
|
60 |
```
|
61 |
+
This will generate two samples for each of the three text prompts.
|
|
|
|
|
|
|
|
|
|
|
|