In the README, you recommend using --dtype half, which is equivalent to float16. However, in the config, you are using bfloat16. vLLM warns that it is casting torch.bfloat16 to torch.float16. Perhaps it would be better to use the original --dtype bfloat16?