BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models
Abstract
This work presents BAdam, an optimizer that leverages the block coordinate optimization framework with Adam as the inner solver. BAdam offers a memory efficient approach to the full parameter finetuning of large language models and reduces running time of the backward process thanks to the chain rule property. Experimentally, we apply BAdam to instruction-tune the Llama 2-7B model on the Alpaca-GPT4 dataset using a single RTX3090-24GB GPU. The results indicate that BAdam exhibits superior convergence behavior in comparison to LoRA and LOMO. Furthermore, our downstream performance evaluation of the instruction-tuned models using the MT-bench shows that BAdam modestly surpasses LoRA and more substantially outperforms LOMO. Finally, we compare BAdam with Adam on a medium-sized task, i.e., finetuning RoBERTa-large on the SuperGLUE benchmark. The results demonstrate that BAdam is capable of narrowing the performance gap with Adam. Our code is available at https://github.com/Ledzy/BAdam.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning (2024)
- MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA based Mixture of Experts (2024)
- Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models (2024)
- ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models (2024)
- Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 15
Browse 15 models citing this paperDatasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 2
Collections including this paper 0
No Collection including this paper