Is sliding window used or not?

by J22 - opened

Although it is set to 2047 in config.json. The value looks strange. Maybe a typo?

Microsoft org

It is 2047 due to the flash-attn pattern. Since it goes from 0 to 2047, it will encode 2048 tokens.

gugarosa changed discussion status to closed

Sign up or log in to comment