arxiv:2312.04916

EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism

Published on Dec 8, 2023

· Submitted by

akhaliq on Dec 11, 2023

Upvote

Authors:

Yanxi Chen ,

Xuchen Pan ,

Jingren Zhou

Abstract

We present EE-LLM, a framework for large-scale training and inference of early-exit large language models (LLMs). While recent works have shown preliminary evidence for the efficacy of early exiting in accelerating LLM inference, EE-LLM makes a foundational step towards scaling up early-exit LLMs by supporting their training and inference with massive 3D parallelism. Built upon Megatron-LM, EE-LLM implements a variety of algorithmic innovations and performance optimizations tailored to early exiting, including a lightweight method that facilitates backpropagation for the early-exit training objective with pipeline parallelism, techniques of leveraging idle resources in the original pipeline schedule for computation related to early-exit layers, and two approaches of early-exit inference that are compatible with KV caching for autoregressive generation. Our analytical and empirical study shows that EE-LLM achieves great training efficiency with negligible computational overhead compared to standard LLM training, as well as outstanding inference speedup without compromising output quality. To facilitate further research and adoption, we release EE-LLM at https://github.com/pan-x-c/EE-LLM.

View arXiv page View PDF Add to collection

Community

puffy310

Dec 11, 2023

Github Repo is empty lol

yanxi-chen

Paper author Dec 12, 2023

Github Repo is empty lol

Hi @puffy310 , the code has been uploaded to GitHub (which was delayed due to a technical issue).

puffy310

Dec 12, 2023

Github Repo is empty lol

Hi @puffy310 , the code has been uploaded to GitHub (which was delayed due to a technical issue).

Glad to hear! Excited to look at it.

puffy310

Dec 12, 2023

Github Repo is empty lol

Hi @puffy310 , the code has been uploaded to GitHub (which was delayed due to a technical issue).

Glad to hear! Excited to look at it.

Is there any plans to release pretrained models?

yanxi-chen

Paper author Dec 13, 2023

Glad to hear! Excited to look at it.

Is there any plans to release pretrained models?

@puffy310 It is likely, though not guaranteed, that we will release our pre-trained models later on when they are ready. We will keep you posted on any update :)

yanxi-chen

Paper author Feb 2, 2024

Is there any plans to release pretrained models?

@puffy310 It is likely, though not guaranteed, that we will release our pre-trained models later on when they are ready. We will keep you posted on any update :)

Hi @puffy310 We have released the pre-trained models. You may find them at https://github.com/pan-x-c/EE-LLM?tab=readme-ov-file#checkpoints

puffy310

Feb 7, 2024

Is there any plans to release pretrained models?

@puffy310 It is likely, though not guaranteed, that we will release our pre-trained models later on when they are ready. We will keep you posted on any update :)

Hi @puffy310 We have released the pre-trained models. You may find them at https://github.com/pan-x-c/EE-LLM?tab=readme-ov-file#checkpoints

Never thought i'd see the day. Thank you for releasing it and I deeply thank you for notifying me! Have a great day.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2312.04916 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2312.04916 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2312.04916 in a Space README.md to link it from this page.

EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 2