[ERROR] Chronos Inference on GPU with Torch
Hello,
Chronos works just fine on CPU, but I run into an error trying to run it on GPU. When passing a torch tensor on cuda, I get the following error :
'''
File "/srv/home/yvincent/mlhive/projects/timeseries/src/forecast.py", line 80, in forecast
forecast = self.pipeline.predict(data, self.prediction_length)
File "/srv/home/yvincent/miniconda3/envs/onnx/lib/python3.13/site-packages/chronos/chronos.py", line 507, in predict
token_ids, attention_mask, scale = self.tokenizer.context_input_transform(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
context_tensor
^^^^^^^^^^^^^^
)
^
File "/srv/home/yvincent/miniconda3/envs/onnx/lib/python3.13/site-packages/chronos/chronos.py", line 224, in context_input_transform
token_ids, attention_mask, scale = self._input_transform(context=context)
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/srv/home/yvincent/miniconda3/envs/onnx/lib/python3.13/site-packages/chronos/chronos.py", line 189, in _input_transform
torch.bucketize(
~~~~~~~~~~~~~~~^
input=scaled_context,
^^^^^^^^^^^^^^^^^^^^^
...<3 lines>...
right=True,
^^^^^^^^^^^
)
'''
How to reproduce the error :
'''
device = 'cuda'
pipeline = ChronosPipeline.from_pretrained(
'amazon/chronos-t5-mini',
device_map=device,
torch_dtype=getattr(torch, 'bfloat16')
)
batch_size = 4
seq_len = 500
pred_len = 64
data = torch.rand(batch_size, seq_len).to(device)
forecast = self.pipeline.predict(data, pred_len)
'''
This code works perfectly well if device='cpu'. I am trying to run this on a V100 32GB. Note that I run into this error with any chronos-t5 model (mini, tiny, small and large).
The line that throws the error in the lib is line 189 in src/chronos/chronos.py :
'''
token_ids = (
torch.bucketize(
input=scaled_context,
boundaries=self.boundaries,
# buckets are open to the right, see:
# https://pytorch.org/docs/2.1/generated/torch.bucketize.html#torch-bucketize
right=True,
)
+ self.config.n_special_tokens
)
'''
Is it possible that scaled_context and self.boundaries are not cast on the same device ? Please let me know if there's a fix or if I can help.
@IUseAMouse if you change
data = torch.rand(batch_size, seq_len).to(device)
into
data = torch.rand(batch_size, seq_len)
then it should be working fine. So, there's no need to put the data in the right device. The reason is: the tokenizer always sits on the CPU, and that's where bucketization happens. The pipeline will take care of moving the quantized data to the right device.