Why Rotary Positional Embeddings Over Alibi?

#48
by mallorbc - opened

I looked at the modeling code, and it seems that the code supports Alibi positional encoding as well as rotary. See the Alibi paper: https://arxiv.org/pdf/2108.12409.pdf

Alibi allows one to train a model on shorter sequences and inference on longer sequences. MPT-7b is an example of another great model that uses Alibi. With Alibi, you are able to easy change it so that instead of having a max context window of 2048, you can have a context window of 4096 with a few simple steps.
https://huggingface.co/mosaicml/mpt-7b

Perhaps this will be talked about in the paper that will be released, but if the results in the Alibi paper are correct, it is the superior positional encoding and should be used.

Sign up or log in to comment