[ERROR] Chronos Inference on GPU with Torch

#6
by IUseAMouse - opened

Hello,

Chronos works just fine on CPU, but I run into an error trying to run it on GPU. When passing a torch tensor on cuda, I get the following error :

'''
File "/srv/home/yvincent/mlhive/projects/timeseries/src/forecast.py", line 80, in forecast
forecast = self.pipeline.predict(data, self.prediction_length)
File "/srv/home/yvincent/miniconda3/envs/onnx/lib/python3.13/site-packages/chronos/chronos.py", line 507, in predict
token_ids, attention_mask, scale = self.tokenizer.context_input_transform(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
context_tensor
^^^^^^^^^^^^^^
)
^
File "/srv/home/yvincent/miniconda3/envs/onnx/lib/python3.13/site-packages/chronos/chronos.py", line 224, in context_input_transform
token_ids, attention_mask, scale = self._input_transform(context=context)
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/srv/home/yvincent/miniconda3/envs/onnx/lib/python3.13/site-packages/chronos/chronos.py", line 189, in _input_transform
torch.bucketize(
~~~~~~~~~~~~~~~^
input=scaled_context,
^^^^^^^^^^^^^^^^^^^^^
...<3 lines>...
right=True,
^^^^^^^^^^^
)
'''

How to reproduce the error :
'''
device = 'cuda'
pipeline = ChronosPipeline.from_pretrained(
'amazon/chronos-t5-mini',
device_map=device,
torch_dtype=getattr(torch, 'bfloat16')
)
batch_size = 4
seq_len = 500
pred_len = 64
data = torch.rand(batch_size, seq_len).to(device)

forecast = self.pipeline.predict(data, pred_len)
'''

This code works perfectly well if device='cpu'. I am trying to run this on a V100 32GB. Note that I run into this error with any chronos-t5 model (mini, tiny, small and large).

The line that throws the error in the lib is line 189 in src/chronos/chronos.py :
'''
token_ids = (
torch.bucketize(
input=scaled_context,
boundaries=self.boundaries,
# buckets are open to the right, see:
# https://pytorch.org/docs/2.1/generated/torch.bucketize.html#torch-bucketize
right=True,
)
+ self.config.n_special_tokens
)
'''

Is it possible that scaled_context and self.boundaries are not cast on the same device ? Please let me know if there's a fix or if I can help.

Amazon Web Services org

@IUseAMouse if you change

data = torch.rand(batch_size, seq_len).to(device)

into

data = torch.rand(batch_size, seq_len)

then it should be working fine. So, there's no need to put the data in the right device. The reason is: the tokenizer always sits on the CPU, and that's where bucketization happens. The pipeline will take care of moving the quantized data to the right device.

Sign up or log in to comment