Go through the data_description.txt file to understand the data and the machine learning task. You can summarize it in your research logs to keep track of what all you have to do. Then fill in the provided train.py script to train a language model to get a good performance. Finally, you should submit the predictions of your best model for the test set as a submission.csv as described in the evaluation_details.txt file. Never try to read any csv files directly. Do not forget to execute the changes you made to check for performance.