Which one is better, finetuning with a sequence length of 1024 or 2048 in LLaMA1 and LLaMA2? And what are the reasons behind the choice?

by danielpark - opened Aug 1, 2023

Discussion

danielpark

Aug 1, 2023

•

edited Aug 3, 2023

Hello,

I have a question about the sequence lengths for the following two models:

upstage/Llama-2-70b-instruct-1024
upstage/llama-30b-instruct-2048

As far as I know, for LLaMA1, the default sequence length is 1024. However, to better capture context and improve training, you fine-tuned the model with a sequence length of 2048.
Now, for LLaMA2, despite having a default sequence length of 2048, I noticed that you fine-tuned it with a sequence length of 1024. I'm curious if there's a specific reason for this choice.

I'm interested in understanding the perspective from which you analyze the results of both models.

If there's anything I'm misunderstanding, I would appreciate your guidance.
Thank you.

danielpark changed discussion title from Sequence Length 1024 vs 2048 in LLaMA1 and 2, Which is better? Why? to Which one is better, finetuning with a sequence length of 1024 or 2048 in LLaMA1 and LLaMA2? And what are the reasons behind the choice? Aug 1, 2023

Limerobot

upstage org Aug 2, 2023

Hello.

As you may have experienced, I believe Instruction Tuning of LLM is a field of empirical experimentation.

In the case of llama-30b, a higher score was achieved with more Orca style dataset and max_seq_len:2048.

However, for llama-2-70b, in our setting, a smaller size dataset and max_seq_len:1024 scored better.
In fact, it recorded the highest score on our internal leaderboard when only about 50k of a dataset other than Orca dataset was used.

Llama-2-70b tended to overfit faster at max_seq_len:2048, so it performed worse than llama-2-70b-hf. However, we do not plan to do additional experiments to solve this. (Because there is not much benefit in terms of cost)
In conclusion, in our setting, the performance of llama-2-70b was better at max_seq_len:1024, so we chose 1024.

I hope my answer was sufficient for you.

(For reference, according to each model's config.json, max_position_embeddings is 2048 for llma1 and 4096 for llama2.)

danielpark

Aug 3, 2023

•

edited Aug 3, 2023

Thank you so much for your thoughtful response.

Limerobot changed discussion status to closed Aug 3, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment