About finetuning

by Xiangyu1 - opened Aug 14

Aug 14

Could you make your fine-tuning code publicly available?

Technology Innovation Institute org Aug 14

Hi @Xiangyu1
Since this model is compatible with HF ecosystem, you could check out https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py as a starting point to finetune the model

wdlctc

Sep 3

I wrote a blog about finetune falcon-mamba-7b with 32k context on one H100 GPU, I hope it helps! https://wdlctc.github.io/mstmamba.html

Sep 4

I wrote a blog about finetune falcon-mamba-7b with 32k context on one H100 GPU, I hope it helps! https://wdlctc.github.io/mstmamba.html

Can I train using just the LongLoRA code, or have you made any modifications to this code?

wdlctc

Sep 4

If you want to train from scratch, you may need to initialize the model weight without using a pre-trained model.

We do modifications to huggingface to make it support 2x long context length with mini-sequence technology

Technology Innovation Institute org Sep 5

Note the training had some issues which should be fixed by: https://github.com/huggingface/transformers/pull/33195 the kernels did not considered layer norms on B, DT and C states

Technology Innovation Institute org Sep 5

Now the fix is merged on transformers main branch, make sure to re-install transformers main branch before the next release

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment