Batch inference working?
#17
by
joshlevy89
- opened
When I do model.generate on a single sample it works fine. However, getting it to work on multiple samples of different length has been a challenge because there is no pad token in this model and my attempts to modify the embedding layer to include one (e.g. model.resize_token_embeddings(len(tokenizer)) have failed ('LlamaGPTQForCausalLM' object has no attribute 'resize_token_embeddings').
Has anyone gotten batch inference to work with this model, and if so how?