Which one is better, finetuning with a sequence length of 1024 or 2048 in LLaMA1 and LLaMA2? And what are the reasons behind the choice?

#7
by danielpark - opened

Hello,

I have a question about the sequence lengths for the following two models:

  • upstage/Llama-2-70b-instruct-1024
  • upstage/llama-30b-instruct-2048

As far as I know, for LLaMA1, the default sequence length is 1024. However, to better capture context and improve training, you fine-tuned the model with a sequence length of 2048.
Now, for LLaMA2, despite having a default sequence length of 2048, I noticed that you fine-tuned it with a sequence length of 1024. I'm curious if there's a specific reason for this choice.

I'm interested in understanding the perspective from which you analyze the results of both models.

If there's anything I'm misunderstanding, I would appreciate your guidance.
Thank you.

danielpark changed discussion title from Sequence Length 1024 vs 2048 in LLaMA1 and 2, Which is better? Why? to Which one is better, finetuning with a sequence length of 1024 or 2048 in LLaMA1 and LLaMA2? And what are the reasons behind the choice?
upstage org

Hello.

As you may have experienced, I believe Instruction Tuning of LLM is a field of empirical experimentation.

In the case of llama-30b, a higher score was achieved with more Orca style dataset and max_seq_len:2048.

However, for llama-2-70b, in our setting, a smaller size dataset and max_seq_len:1024 scored better.
In fact, it recorded the highest score on our internal leaderboard when only about 50k of a dataset other than Orca dataset was used.

Llama-2-70b tended to overfit faster at max_seq_len:2048, so it performed worse than llama-2-70b-hf. However, we do not plan to do additional experiments to solve this. (Because there is not much benefit in terms of cost)
In conclusion, in our setting, the performance of llama-2-70b was better at max_seq_len:1024, so we chose 1024.

I hope my answer was sufficient for you.

(For reference, according to each model's config.json, max_position_embeddings is 2048 for llma1 and 4096 for llama2.)

Thank you so much for your thoughtful response.

Limerobot changed discussion status to closed

Sign up or log in to comment