1st place solution
Hello,
I would like to thank the hosts (huggingface and Data Driven) for organizing this interesting competition.
Dataset:
The dataset was clean, with the same amount of labels for each class.
Solution:
I experimented few pretrained language model such as roberta-large, deberta-large , electra-large and e5-large but end up by using only roberta-large as it was performing better than the other one. (experiment done on 5 folds)
I also try to use LLM such as Falcon and Llama but did not succeed to get better results than roberta. (However CV was quite decent, around 0.38)
My solution:
- apply a domain transfer using train/test set on roberta-large (Mask language modeling task). The checkpoint can be find in the github I am sharing at the end of the post.
- finetuning on 15 folds and generate predictions on test set.
- 2nd finetuning on roberta large using also the pseudo labels from the test set
For the finetuning task, I also use the Teacher Free Loss function from this paper : https://arxiv.org/abs/1909.11723 .
It is similar to Knowledge distillation, but instead of training a strong model which is considered as a teacher, we create a virtual one by taking the label and reducing the value of the label (similar to label smoothing).
The 2nd finetuning improved the public LB but not the private. (exact same result)
You can find the code on this github
I did not share the 15 folds checkpoint as it is quite heavy to upload. but the pretrained checkpoint from the domain transfer is available on huggingface: https://huggingface.co/Shiro/roberta-large-movie-genre
Hope this helps.
Shiro