What's the difference between zephyr-7b-beta and zephyr-7b-alpha?
#36
by
haha-point
- opened
Zephyr-7b-beta seems to perform better than zephyr-7b-alpha. But it seems they were trained based on the same base model and datasets. So what is the difference?
At least one difference can be found in the model card: beta was trained for 3 epochs whereas alpha only for one.