stabilityai/stable-diffusion-2-1 · Question in the Text encoder setting

Hi,

I find there probably is a problem in setting up the text encoder, not sure why this occurs...

In particular, in the text encoder, the number of hidden layers is set to 23 https://huggingface.co/stabilityai/stable-diffusion-2-1/blob/main/text_encoder/config.json#L19, However, when looking into the official OpenClip H-14, the number of the hidden layer is 24 https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/model_configs/ViT-H-14.json#L15, this can also be confirmed from the number of layers in the LAION CLIP ViT H-14 repo, https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/blob/main/config.json#L54

Does anyone know why the hugging face repo is setting the number of hidden layers to 23? Is this a bug, or a small trick to improve the sampling performance?

Thanks