How to use sequential prefill with transformers?

by Juodumas - opened Aug 14

Aug 14

As described in the blog https://huggingface.co/blog/falconmamba, sequential prefill enables long context. Is there any example code how to use it?

Yarflam

20 days ago

Ooooh the question was asked 17 days ago.
I tried running the model falcon-mamba-7b-instruct on Runpod instance 24 GB without any special parameters with a large context (40 documents) and I got a GPU memory error.
Another ticket has been opened: https://huggingface.co/tiiuae/falcon-mamba-7b-instruct/discussions/5
I hope someone can help us. :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment