Mastering Long Contexts in LLMs with KVPress
โข
58
We've not measured FLOPs, but we have a few plots here that measure total time for generation here: https://github.com/NVIDIA/kvpress/blob/main/notebooks/speed_and_memory.ipynb
For most presses, the compression computation are very light compared to the forward pass of the long context itself.
Happy you like it @julien-c ! Feel free to share it on social media to raise awareness around the package ๐ค