-
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Paper • 2403.03853 • Published • 62 -
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Paper • 2301.00774 • Published • 3 -
The LLM Surgeon
Paper • 2312.17244 • Published • 9 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 68
Collections
Discover the best community collections!
Collections including paper arxiv:2312.17244
-
The LLM Surgeon
Paper • 2312.17244 • Published • 9 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 64 -
Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models
Paper • 2401.06102 • Published • 19 -
Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing
Paper • 2407.08770 • Published • 19
-
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 56 -
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Paper • 2312.12456 • Published • 41 -
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Paper • 2312.12742 • Published • 12 -
Mini-GPTs: Efficient Large Language Models through Contextual Pruning
Paper • 2312.12682 • Published • 8
-
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Paper • 2312.09390 • Published • 32 -
OneLLM: One Framework to Align All Modalities with Language
Paper • 2312.03700 • Published • 20 -
Generative Multimodal Models are In-Context Learners
Paper • 2312.13286 • Published • 34 -
The LLM Surgeon
Paper • 2312.17244 • Published • 9
-
Linear Self-Attention Approximation via Trainable Feedforward Kernel
Paper • 2211.04076 • Published • 1 -
Greenformer: Factorization Toolkit for Efficient Deep Neural Networks
Paper • 2109.06762 • Published • 1 -
COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models
Paper • 2305.17235 • Published • 2 -
Exploring Low Rank Training of Deep Neural Networks
Paper • 2209.13569 • Published • 1
-
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Paper • 2310.17157 • Published • 11 -
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Paper • 2305.15805 • Published • 1 -
Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt
Paper • 2305.11186 • Published • 1 -
Composable Sparse Fine-Tuning for Cross-Lingual Transfer
Paper • 2110.07560 • Published • 1