ccmusic-database
/

Guzheng_Tech99

Audio Classification

Chinese

music

Model card Files Files and versions

admin commited on 16 days ago

Commit

297a25e

1 Parent(s): dcdd1f4

upd md

Browse files

Files changed (1) hide show

README.md +11 -26

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ tags:
 For the 99 recordings, silence is first removed, which is done based on the annotation, targeting the parts where there is no technique annotation. Then all recordings are uniformly segmented into fixed-length segments of 3 seconds. After segmentation, clips shorter than 3 seconds are zero padded. This padding approach, unlike circular padding, is adopted specifically for frame-level detection tasks to prevent the introduction of extraneous information. Regarding the dataset split, since the dataset consists of 99 recordings, we split it at the recording level. The data is partitioned into training, validation, and testing subsets in a 79:10:10 ratio, roughly 8:1:1.
 ## Demo
-<https://www.modelscope.cn/studios/ccmusic-database/Guzheng_Tech99>
 ## Usage
 ```python
@@ -30,32 +30,17 @@ cd Guzheng_Tech99
 ```
 ## Results
-| Backbone          | Mel       | CQT       | Chroma    |
-| ----------------- | --------- | --------- | --------- |
-| ViT-B-16          | 0.705     | 0.518     | 0.508     |
-| Swin-T            | **0.849** | **0.783** | **0.766** |
 |                   |           |           |           |
-| VGG19             | **0.862** | 0.799     | 0.665     |
-| EfficientNet-V2-L | 0.783     | 0.812     | 0.697     |
-| ConvNeXt-B        | 0.849     | **0.849** | **0.805** |
-| ResNet101         | 0.638     | 0.830     | 0.707     |
-| SqueezeNet1.1     | 0.831     | 0.814     | 0.780     |
-| Average           | 0.788     | 0.772     | 0.704     |
-<!-- Fine-tuning results for a SqueezeNet network on CQT (一个 SqueezeNet 网络在 CQT 上的微调结果):
-<table>
-    <tr>
-        <th>Loss curve</th>
-        <td><img src="./squeezenet1_1_cqt_2024-07-30_04-38-35/loss.jpg"></td>
-    </tr>
-    <tr>
-        <th>Training and validation accuracy</th>
-        <td><img src="./squeezenet1_1_cqt_2024-07-30_04-38-35/acc.jpg"></td>
-    </tr>
-    <tr>
-        <th>Confusion matrix</th>
-        <td><img src="./squeezenet1_1_cqt_2024-07-30_04-38-35/mat.jpg"></td>
-    </tr>
-</table> -->
 ## Dataset
 <https://huggingface.co/datasets/ccmusic-database/Guzheng_Tech99>

 For the 99 recordings, silence is first removed, which is done based on the annotation, targeting the parts where there is no technique annotation. Then all recordings are uniformly segmented into fixed-length segments of 3 seconds. After segmentation, clips shorter than 3 seconds are zero padded. This padding approach, unlike circular padding, is adopted specifically for frame-level detection tasks to prevent the introduction of extraneous information. Regarding the dataset split, since the dataset consists of 99 recordings, we split it at the recording level. The data is partitioned into training, validation, and testing subsets in a 79:10:10 ratio, roughly 8:1:1.
 ## Demo
+<https://huggingface.co/spaces/ccmusic-database/Guzheng_Tech99>
 ## Usage
 ```python
 ```
 ## Results
+|     Backbone      |    Mel    |    CQT    |  Chroma   |
+| :---------------: | :-------: | :-------: | :-------: |
+|     ViT-B-16      |   0.705   |   0.518   |   0.508   |
+|      Swin-T       | **0.849** | **0.783** | **0.766** |
 |                   |           |           |           |
+|       VGG19       | **0.862** |   0.799   |   0.665   |
+| EfficientNet-V2-L |   0.783   |   0.812   |   0.697   |
+|    ConvNeXt-B     |   0.849   | **0.849** | **0.805** |
+|     ResNet101     |   0.638   |   0.830   |   0.707   |
+|   SqueezeNet1.1   |   0.831   |   0.814   |   0.780   |
+|      Average      |   0.788   |   0.772   |   0.704   |
 ## Dataset
 <https://huggingface.co/datasets/ccmusic-database/Guzheng_Tech99>