Text Generation
Transformers
PyTorch
TensorBoard
Safetensors
bloom
Eval Results
text-generation-inference
Inference Endpoints

Controlling generation length

#9
by gsarti - opened
BigScience Workshop org

I'm trying to use Bloom for generation with the Inference API, but if the input is not minimal I only get 1-2 extra tokens. I tried to check the detailed parameters in the docs, but there doesn't seem to be a way to control length for text generation models (not seq2seq). Moreover, even if I try to pass some parameters to the API request I get a message saying "Parameters are not accepted for this specific model".

Is there a way to enable more lengthy generation using Bloom?

I’m frustrated with this as well. Text generation is useless.

BigScience Workshop org

Hi sorry for the very long delay. Currently we hard limit the number of tokens one can request in order to prevent users from requesting too many tokens and causing the entire system to crash. Though I'm quite surprised it generates 1-2 extra tokens ... it should be a lot more.

I'm closing this discussion due to this being a few months old and quite a few things have changed since then. Please feel free to re-open if you still see this issue.

TimeRobber changed discussion status to closed

Sign up or log in to comment