Model not giving answer

#12
by nicolaschaillan - opened

Hello,

Thanks so much for the amazing work.

I'm running the model on Azure on a NV12s_v3
GPU with 12 core,
112 gb

Here is my code:

import torch
from transformers import pipeline

print('got here')
generate_text = pipeline(model="databricks/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")

print('got here2')
generate_text("Explain to me the difference between nuclear fission and fusion.")
print('got here3')

I get "got here" and "got here 2" but I never get "got here3", I waited 1H...

Is this normal? Is NV12s_v3 not enough for the model for a single query?

Thanks.

Databricks org

An M60 is too small for generation (8GB RAM). You have loaded the model mostly onto the CPU and it will take forever. You at least need to try 8-bit here, but, need a GPU with at least 16GB of RAM for that to load on the GPU.

This comment has been hidden
Databricks org

Please see the repo for generation snippets: https://github.com/databrickslabs/dolly
It isn't a matter of different code, you need different hardware here.
Alternatively, much smaller models were just released. You can try the 2.7B model on an M60 and that should work

Thank you, any recommendation for Azure on VMs that could run the full model ?

Databricks org

The full 12B model works on A100s. It also works on A10 GPUs if you load in 8-bit, and sounds like it works on T4 as well in 8-bit. It should also work on a 32GB V100 if you load in float16, not bfloat16

Would be great to document this and put the python code required for each use case!

Databricks org

Yeah we'll update everything for v2 more fully soon, including training and generation. Right now it's assumed you're working on an A100

srowen changed discussion status to closed

Sign up or log in to comment