Please expand the example usage with more hyper parameters and with explanation of them

#12
by MonsterMMORPG - opened

Please expand the example usage with more hyper parameters and with explanation of them

Currently your example does not provide any hyper parameter :/

Also can we see the progress somehow like 10% completed 15% completed etc?

hi @MonsterMMORPG , thanks for your interest in the model! Some hyperparameters are used in the colab example on the model card, but it doesn't have much in terms of allowing for easy parameter changing. I'd recommend looking at and adapting code from the summarize.py script in the hf spaces demo for this model. If you run inference in shorter "token batches" like 4096 at a time, you will see a progress bar (otherwise not, since it's one massive batch that is going through at a time).

Unfortunately, despite the methods implemented for the "long" summarization models to make handling long sequences possible, it can still be quite a memory intensive. I wrote up some comments on inference hyperparameters here for pszemraj/led-base-book-summary, I'd say it's the same for this model except that perhaps encoder_no_repeat_ngram_size is slightly less critical than for LED base.

Additionally, I've only trained on the ampere series GPUs (not run inference). Your mileage may vary, but since you have a 30XX GPU, try enabling the tf32 data type and see if that helps

@pszemraj ty very much for answers. I have tried many models here with searching 16384 but none of the produces a good result when compared to classical models as facebook/bart-large-cnn. I use the following methodology. Actually I would prefer to split input into 1024 tokenized equal chunks but i don't know how to do

image.png

However when I use a 16k model results are not very good when compared to traditional models as above

So which hyper parameters should I use

for example for long-t5-tglobal-base-16384-book-summary are below ones good enough?

what does repetition_penalty do
early_stopping effect?
no_repeat_ngram_size?
encoder_no_repeat_ngram_size ?
num_beams effect?

any other parameters that can affect performance?

image.png

@pszemraj Here comparison between facebook/bart-large-cnn and pszemraj/long-t5-tglobal-base-16384-book-summary on a long speech text data

Also tf32 made huge speed boost ty for the tips

facebook/bart-large-cnn : https://justpaste.it/6jhhd

pszemraj/long-t5-tglobal-base-16384-book-summary : https://justpaste.it/8wa8v

long speech text data : https://justpaste.it/3a1pn

thanks for the comparisons! I'm going to change the status to "closed" as I think we've gotten to somewhat of a convergence point. feel free to reopen as issue if needed, or we can just comment on this thread/discuss in discord.

for anyone coming across this later, I added a link to this hf blog post which is a solid starting point for understanding the beam search parameters.

pszemraj changed discussion status to closed

Sign up or log in to comment