Homie0609 commited on
Commit
586b629
·
verified ·
1 Parent(s): 324092b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -3
README.md CHANGED
@@ -1,3 +1,97 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+
6
+ <div align="center">
7
+ <img src="./teaser.gif">
8
+ </div>
9
+
10
+
11
+ ## Requirements
12
+ - Python >= 3.8 (Recommend to use [Anaconda](https://www.anaconda.com/download/#linux) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html))
13
+ - [PyTorch >= 2.0.0](https://pytorch.org/) (If use A100)
14
+ - transformers >= 4.42.3
15
+ - pycocoevalcap >= 1.2
16
+
17
+ A suitable [conda](https://conda.io/) environment named `UniSoccer` can be created and activated with:
18
+
19
+ ```
20
+ conda env create -f environment.yaml
21
+ conda activate UniSoccer
22
+ ```
23
+
24
+ ## Train
25
+
26
+ <div align="center">
27
+ <img src="./architecture.png">
28
+ </div>
29
+
30
+ #### Pretrain MatchVision Encoder
31
+ As described in paper, we have two methods for pretraining MatchVision backbone (supervised classification & contrastive commentary). You can train both this two methods as following shows:
32
+
33
+
34
+ First of all, you should prepare textual data as the format in `train_data/json`, and preprocess soccer videos into 30 second clips (15s before and after timestamps) for pretraining.
35
+
36
+ **Supervised Classification**
37
+ ```
38
+ python task/pretrain_MatchVoice_Classifier.py config/pretrain_classification.py
39
+ ```
40
+ **Contrastive Commentary Retrieval**
41
+ ```
42
+ python task/pretrain_contrastive.py config/pretrain_contrastive.py
43
+ ```
44
+
45
+ Also, you could finetune MatchVision with
46
+ ```
47
+ python task/finetune_contrastive.py config/finetune_contrastive.py
48
+ ```
49
+ To be noted, you should replace the folders in task and config files.
50
+
51
+ #### Train Downstream Tasks
52
+
53
+ You could train the commentary task by several different methods:
54
+
55
+ 1. Use mp4 files
56
+ ```
57
+ python task/downstream_commentary_new_benchmark.py
58
+ ```
59
+ For this method, you might train the commentary model MatchVoice with open visual encoder or language decoder, so you should crop the videos as 30s clips named as json files shows.
60
+
61
+ 2. Use *.npy* files
62
+ ```
63
+ python task/downstream_commentary.py
64
+ ```
65
+ For this method, you cannot open the visual encoder, so you can extract features of all video clips and change ".mp4" by ".npy" as file names.
66
+
67
+ **To be noted,** folder `words_world` records the token ids of all words in LLaMA-3(8B) tokenizer of different datasets as
68
+
69
+ - *`match_time.pkl`*: MatchTime dataset ([Link here](https://huggingface.co/datasets/Homie0609/MatchTime))
70
+ - *`soccerreplay-1988.pkl`*: SoccerReplay-1988 dataset. (Not released yet)
71
+ - *`merge.pkl`*: Union set of MatchTime & SoccerReplay-1988
72
+
73
+
74
+ ## Inference
75
+
76
+ <div align="center">
77
+ <img src="./inference.png">
78
+ </div>
79
+
80
+ For inference, you could use the following codes, be sure that you have correctly crop the video clips, which is in the same format as before.
81
+ ```
82
+ python inference/inference.py
83
+ ```
84
+ Then, you could test the metrics for output `sample.csv` by:
85
+ ```
86
+ python inference/score_single.py --csv_path inference/sample.csv
87
+ ```
88
+
89
+ ## Citation
90
+ If you use this code and data for your research or project, please cite:
91
+
92
+ @misc{rao2024unisoccer,
93
+ title = {Towards Universal Soccer Video Understanding},
94
+ author = {Rao, Jiayuan and Wu, Haoning and Jiang, Hao and Zhang, Ya and Wang, Yanfeng and Xie, Weidi},
95
+ journal = {arXiv preprint arXiv:2412.01820},
96
+ year = {2024},
97
+ }