Batch inference working?

#17

by joshlevy89 - opened Jun 20, 2023

Jun 20, 2023

When I do model.generate on a single sample it works fine. However, getting it to work on multiple samples of different length has been a challenge because there is no pad token in this model and my attempts to modify the embedding layer to include one (e.g. model.resize_token_embeddings(len(tokenizer)) have failed ('LlamaGPTQForCausalLM' object has no attribute 'resize_token_embeddings').

Has anyone gotten batch inference to work with this model, and if so how?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment