File size: 1,522 Bytes
2861940
f19b50e
bdd5ceb
f19b50e
 
2861940
f19b50e
2861940
f5dcb9d
d1d2b47
e0a1eb7
2861940
 
3462d76
 
 
8245641
3462d76
 
 
 
 
 
 
 
 
 
38824d4
3462d76
76b0cf6
3462d76
38824d4
3462d76
38824d4
e0a1eb7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
title: Music Descriptor CPU
emoji: 🚀
colorFrom: green
colorTo: green
sdk: gradio
sdk_version: 5.1.0
app_file: app.py
pinned: true
license: cc-by-nc-4.0
short_description: CPU version
---

<!-- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference -->

# Demo Introduction
This is an example of using the [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M) model as backbone to conduct multiple music understanding tasks with the universal representation.

The tasks include EMO, GS, MTGInstrument, MTGGenre, MTGTop50, MTGMood, NSynthI, NSynthP, VocalSetS, VocalSetT. 
More models can be referred at the [map organization page](https://huggingface.co/m-a-p).

# Known Issues

## Audio Format Support

Theorectically, all the audio formats supported by [torchaudio.load()](https://pytorch.org/audio/stable/torchaudio.html#torchaudio.load) can be used in the demo. Theese should include but not limited to `WAV, AMB, MP3, FLAC`.

## Audio Input Length

Due the **hardware limitation** of the machine hosting this demo (2 CPU and 16GB RAM) only **the first 4 seconds** of audio are used!

This issue is expected to solve in the future by applying more community-support GPU resources or using other audio encoding strategies.

In the current stage, if you want to directly run the demo with longer audios, you could clone this space and deploy with GPU.
The code will automatically use GPU for inference if there is GPU that can be detected by `torch.cuda.is_available()`.