IntelLabs
/

lonas-llama-7b-commonsense-adapter

Model card Files Files and versions

michaelbeale-il commited on Jun 6, 2024

Commit

614b58f

·

verified ·

1 Parent(s): 9573d0c

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -7,8 +7,7 @@ license: apache-2.0
 The super-adapter-network fine-tuned on LLaMA-7B with some commonsense reasoning datasets using LoNAS.
 ## Paper Abstract
-Recently, several approaches successfully demonstrated that weight-sharing Neural Architecture Search (NAS) can effectively explore a search space of elastic low-rank adapters (LoRA), allowing the parameter-efficient fine-tuning (PEFT) and compression of large language models. In this paper, we introduce a novel approach called Shears, demonstrating how the integration of cost-effective sparsity and a proposed Neural Low-rank adapter Search (NLS) algorithm can further improve the efficiency of PEFT approaches. Results demonstrate the benefits of Shears compared to other methods, reaching high sparsity levels while improving or with little drop in accuracy, utilizing a single GPU for a pair of hours.
 ## Model Details
 ### Information

 The super-adapter-network fine-tuned on LLaMA-7B with some commonsense reasoning datasets using LoNAS.
 ## Paper Abstract
+Large Language Models (LLMs) continue to grow, reaching hundreds of billions of parameters and making it challenging for Deep Learning practitioners with resource-constrained systems to use them, e.g., fine-tuning these models for a downstream task of their interest. Adapters, such as low-rank adapters (LoRA), have been proposed to reduce the number of trainable parameters in a model, reducing memory requirements and enabling smaller systems to fine-tune these models. Orthogonal to this work, Neural Architecture Search (NAS) has been used to discover compressed and more efficient architectures without sacrificing performance compared to similar base models. This paper introduces a novel approach, LoNAS, to use NAS on language models by exploring a search space of elastic low-rank adapters while reducing memory and compute requirements of full-scale NAS, resulting in high-performing compressed models obtained from weight-sharing super-networks. Compared to models fine-tuned with LoRA, these models contain fewer total parameters, reducing the inference time with only minor decreases in accuracy and, in some cases, even improving accuracy. We discuss the limitations of LoNAS and share observations for the research community regarding its generalization capabilities, which have motivated our follow-up work.
 ## Model Details
 ### Information