SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Paper
•
2305.09781
•
Published
•
4
Note https://github.com/NVIDIA/FasterTransformer readme states: FasterTransformer development has transitioned to TensorRT-LLM. All developers are encouraged to leverage TensorRT-LLM to get the latest improvements on LLM Inference. The NVIDIA/FasterTransformer repo will stay up, but will not have further development.