What's the difference between zephyr-7b-beta and zephyr-7b-alpha?

#36
by haha-point - opened

Zephyr-7b-beta seems to perform better than zephyr-7b-alpha. But it seems they were trained based on the same base model and datasets. So what is the difference?

At least one difference can be found in the model card: beta was trained for 3 epochs whereas alpha only for one.

Sign up or log in to comment