Audio Classification
Chinese
music
monetjoe commited on
Commit
dcdd1f4
·
verified ·
1 Parent(s): 45d8449

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -3
README.md CHANGED
@@ -1,3 +1,80 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - ccmusic-database/Guzheng_Tech99
5
+ language:
6
+ - zh
7
+ metrics:
8
+ - accuracy
9
+ pipeline_tag: audio-classification
10
+ tags:
11
+ - music
12
+ ---
13
+
14
+ # Intro
15
+ For the 99 recordings, silence is first removed, which is done based on the annotation, targeting the parts where there is no technique annotation. Then all recordings are uniformly segmented into fixed-length segments of 3 seconds. After segmentation, clips shorter than 3 seconds are zero padded. This padding approach, unlike circular padding, is adopted specifically for frame-level detection tasks to prevent the introduction of extraneous information. Regarding the dataset split, since the dataset consists of 99 recordings, we split it at the recording level. The data is partitioned into training, validation, and testing subsets in a 79:10:10 ratio, roughly 8:1:1.
16
+
17
+ ## Demo
18
+ <https://www.modelscope.cn/studios/ccmusic-database/Guzheng_Tech99>
19
+
20
+ ## Usage
21
+ ```python
22
+ from modelscope import snapshot_download
23
+ model_dir = snapshot_download("ccmusic-database/Guzheng_Tech99")
24
+ ```
25
+
26
+ ## Maintenance
27
+ ```bash
28
+ git clone [email protected]:ccmusic-database/Guzheng_Tech99
29
+ cd Guzheng_Tech99
30
+ ```
31
+
32
+ ## Results
33
+ | Backbone | Mel | CQT | Chroma |
34
+ | ----------------- | --------- | --------- | --------- |
35
+ | ViT-B-16 | 0.705 | 0.518 | 0.508 |
36
+ | Swin-T | **0.849** | **0.783** | **0.766** |
37
+ | | | | |
38
+ | VGG19 | **0.862** | 0.799 | 0.665 |
39
+ | EfficientNet-V2-L | 0.783 | 0.812 | 0.697 |
40
+ | ConvNeXt-B | 0.849 | **0.849** | **0.805** |
41
+ | ResNet101 | 0.638 | 0.830 | 0.707 |
42
+ | SqueezeNet1.1 | 0.831 | 0.814 | 0.780 |
43
+ | Average | 0.788 | 0.772 | 0.704 |
44
+ <!-- Fine-tuning results for a SqueezeNet network on CQT (一个 SqueezeNet 网络在 CQT 上的微调结果):
45
+ <table>
46
+ <tr>
47
+ <th>Loss curve</th>
48
+ <td><img src="./squeezenet1_1_cqt_2024-07-30_04-38-35/loss.jpg"></td>
49
+ </tr>
50
+ <tr>
51
+ <th>Training and validation accuracy</th>
52
+ <td><img src="./squeezenet1_1_cqt_2024-07-30_04-38-35/acc.jpg"></td>
53
+ </tr>
54
+ <tr>
55
+ <th>Confusion matrix</th>
56
+ <td><img src="./squeezenet1_1_cqt_2024-07-30_04-38-35/mat.jpg"></td>
57
+ </tr>
58
+ </table> -->
59
+
60
+ ## Dataset
61
+ <https://huggingface.co/datasets/ccmusic-database/Guzheng_Tech99>
62
+
63
+ ## Mirror
64
+ <https://www.modelscope.cn/models/ccmusic-database/Guzheng_Tech99>
65
+
66
+ ## Evaluation
67
+ <https://github.com/monetjoe/ccmusic_eval/tree/tech99>
68
+
69
+ ## Cite
70
+ ```bibtex
71
+ @dataset{zhaorui_liu_2021_5676893,
72
+ author = {Monan Zhou, Shenyang Xu, Zhaorui Liu, Zhaowen Wang, Feng Yu, Wei Li and Baoqiang Han},
73
+ title = {CCMusic: an Open and Diverse Database for Chinese Music Information Retrieval Research},
74
+ month = {mar},
75
+ year = {2024},
76
+ publisher = {HuggingFace},
77
+ version = {1.2},
78
+ url = {https://huggingface.co/ccmusic-database}
79
+ }
80
+ ```