About finetuning
Could you make your fine-tuning code publicly available?
Hi
@Xiangyu1
Since this model is compatible with HF ecosystem, you could check out https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py as a starting point to finetune the model
I wrote a blog about finetune falcon-mamba-7b with 32k context on one H100 GPU, I hope it helps! https://wdlctc.github.io/mstmamba.html
I wrote a blog about finetune falcon-mamba-7b with 32k context on one H100 GPU, I hope it helps! https://wdlctc.github.io/mstmamba.html
Can I train using just the LongLoRA code, or have you made any modifications to this code?
If you want to train from scratch, you may need to initialize the model weight without using a pre-trained model.
We do modifications to huggingface to make it support 2x long context length with mini-sequence technology
Note the training had some issues which should be fixed by: https://github.com/huggingface/transformers/pull/33195 the kernels did not considered layer norms on B, DT and C states
Now the fix is merged on transformers main branch, make sure to re-install transformers main branch before the next release