Caching doesn't work on multi gpu
#23
by
srinivasbilla
- opened
I get gibberish if caching is enabled when inferencing over multigpu
@eastwind, so you do not get gibberish every time?
Would you kindly post some non-gibberish examples?
What did you do to go from Gibberish to English?
@eastwind I now found your contribution here to answer the last question. Thanks!
https://huggingface.co/tiiuae/falcon-40b-instruct/discussions/20
Yeah, not using cache hurts performance alot.
We recommend using Text Generation Inference for fast inference with Falcon. See this blog for more information.