sino
commited on
Commit
·
9a4a887
1
Parent(s):
ed14d60
Update README.md
Browse files
README.md
CHANGED
@@ -11,8 +11,15 @@ pipeline_tag: text-generation
|
|
11 |
</p>
|
12 |
<br>
|
13 |
|
14 |
-
Music tagging is a task to predict the tags of music recordings.
|
15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
|
18 |
## Requirements
|
|
|
11 |
</p>
|
12 |
<br>
|
13 |
|
14 |
+
Music tagging is a task to predict the tags of music recordings.
|
15 |
+
However, previous music tagging research primarily focuses on close-set music tagging tasks which can not be generalized to new tags.
|
16 |
+
In this work, we propose a zero-shot music tagging system modeled by a joint music and language attention (**JMLA**) model to address the open-set music tagging problem.
|
17 |
+
The **JMLA** model consists of an audio encoder modeled by a pretrained masked autoencoder and a decoder modeled by a Falcon7B.
|
18 |
+
We introduce preceiver resampler to convert arbitrary length audio into fixed length embeddings.
|
19 |
+
We introduce dense attention connections between encoder and decoder layers to improve the information flow between the encoder and decoder layers.
|
20 |
+
We collect a large-scale music and description dataset from the internet.
|
21 |
+
We propose to use ChatGPT to convert the raw descriptions into formalized and diverse descriptions to train the **JMLA** models.
|
22 |
+
Our proposed **JMLA** system achieves a zero-shot audio tagging accuracy of 64.82% on the GTZAN dataset, outperforming previous zero-shot systems and achieves comparable results to previous systems on the FMA and the MagnaTagATune datasets.
|
23 |
|
24 |
|
25 |
## Requirements
|