Jetpack 6 +fp8 sd3 returns zeroes
I am abroad with my trusty suitcase AGX Orin attempting to run some tests requested by a client, I am running a fairly simple adaptation of the hello world example on the fp8 safetensors model. Everything appears to run without issue except that the resulting image is only zeroes.
My code:
import torch
from diffusers import StableDiffusion3Pipeline
import cv2
import numpy as np
model_path = "/usr/src/sd3/sd3_medium_incl_clips_t5xxlfp8.safetensors"
print('loading model')
pipe = StableDiffusion3Pipeline.from_single_file(
model_path,
torch_dtype=torch.float16,
)
print('model loaded')
pipe = pipe.to('cuda')
image = pipe(
"A cat holding a sign that says hello world",
negative_prompt="",
num_inference_steps=10,
guidance_scale=7.0,
).images[0]
image.save('image.jpg')
np_image = np.array(image)
cv_image = cv2.cvtColor(np_image, cv2.COLOR_RGB2BGR)
cv2.imshow("Generated Image", cv_image)
cv2.waitKey(0)
cv2.destroyAllWindows()
pass # debug hook
My environment:
python:
absl-py==2.1.0
accelerate==0.32.1
AHRS==0.3.1
appdirs==1.4.4
apturl==0.5.2
astunparse==1.6.3
attrs==21.2.0
bcrypt==3.2.0
beniget==0.4.1
blinker==1.4
blobconverter==1.4.3
Brlapi==0.8.3
Brotli==1.0.9
ccsm==0.9.14.1
certifi==2020.6.20
chardet==4.0.0
charset-normalizer==3.3.2
click==8.0.3
colorama==0.4.4
coloredlogs==15.0.1
compizconfig-python==0.9.14.1
cpuset==1.6
cryptography==3.4.8
cupshelpers==1.0
cupy==13.2.0
cycler==0.11.0
dbus-python==1.2.18
decorator==4.4.2
defer==1.0.6
depthai==2.27.0.0
depthai-pipeline-graph==0.0.5
depthai-sdk==1.15.0
diffusers==0.29.2
distlib==0.3.4
distro==1.9.0
distro-info==1.1+ubuntu0.2
duplicity==0.8.21
evdev==1.4.0
fasteners==0.14.1
fastrlock==0.8.2
filelock==3.6.0
flatbuffers==24.3.25
fonttools==4.29.1
fs==2.4.12
fsspec==2024.6.0
future==0.18.2
gast==0.6.0
google-pasta==0.2.0
graphsurgeon==0.4.6
grpcio==1.64.1
h5py==3.11.0
httplib2==0.20.2
huggingface-hub==0.23.4
humanfriendly==10.0
idna==3.3
imageio==2.34.2
importlib-metadata==4.6.4
jeepney==0.7.1
jetson-stats @ file:///usr/src/jetson_stats
Jetson.GPIO==2.1.7
Jinja2==3.1.4
keras==3.4.1
keyring==23.5.0
kiwisolver==1.3.2
language-selector==0.1
launchpadlib==1.10.16
lazr.restfulclient==0.14.4
lazr.uri==1.0.6
lazy_loader==0.4
libclang==18.1.1
lockfile==0.12.2
louis==3.20.0
lxml==4.8.0
lz4==3.1.3+dfsg
macaroonbakery==1.3.1
Mako==1.1.3
Markdown==3.6
markdown-it-py==3.0.0
MarkupSafe==2.1.5
marshmallow==3.17.0
matplotlib==3.5.1
mdurl==0.1.2
ml-dtypes==0.3.2
monotonic==1.6
more-itertools==8.10.0
mpmath==0.0.0
namex==0.0.8
networkx==3.3
numpy==1.23.5
oauthlib==3.2.0
olefile==0.46
onboard==1.4.1
onnx==1.16.0
onnx-graphsurgeon==0.3.12
onnxruntime-gpu @ file:///usr/src/onnxruntime_gpu-1.18.0-cp310-cp310-linux_aarch64.whl
opencv-contrib-python==4.10.0.84
opt-einsum==3.3.0
optree==0.11.0
packaging==21.3
pandas==1.3.5
paramiko==2.9.3
pexpect==4.8.0
pillow==10.3.0
platformdirs==2.5.1
ply==3.11
protobuf==4.25.3
psutil==6.0.0
ptyprocess==0.7.0
pycairo==1.20.1
pycups==2.0.1
Pygments==2.18.0
PyGObject==3.42.1
PyJWT==2.3.0
pymacaroons==0.13.0
PyNaCl==1.5.0
PyOpenGL==3.1.5
pyparsing==2.4.7
PyQt5==5.15.6
PyQt5-sip==12.9.1
pyRFC3339==1.1
python-apt==2.4.0+ubuntu3
python-dateutil==2.8.1
python-dbusmock==0.27.5
python-debian==0.1.43+ubuntu1.1
pythran==0.10.0
pytube==15.0.0
PyTurboJPEG==1.6.4
pytz==2022.1
pyxdg==0.27
PyYAML==5.4.1
Qt.py==1.4.1
quickdl==0.0.2
regex==2024.5.15
requests==2.32.3
rich==13.7.1
safetensors==0.4.3
scikit-image==0.24.0
scipy==1.14.0
SecretStorage==3.3.1
sentencepiece==0.2.0
sentry-sdk==1.21.0
six==1.16.0
smbus2==0.4.3
sympy==1.9
systemd-python==234
tensorboard==2.16.2
tensorboard-data-server==0.7.2
tensorflow @ file:///usr/src/wheels/tensorflow-2.16.1%2Bnv24.06-cp310-cp310-linux_aarch64.whl
tensorflow-io-gcs-filesystem==0.37.0
tensorrt==8.6.2
tensorrt-dispatch==8.6.2
tensorrt-lean==8.6.2
termcolor==2.4.0
tifffile==2024.6.18
tokenizers==0.19.1
torch @ file:///usr/src/wheels/torch-2.3.0-cp310-cp310-linux_aarch64.whl
torchaudio @ file:///usr/src/wheels/torchaudio-2.3.0%2B952ea74-cp310-cp310-linux_aarch64.whl
torchvision @ file:///usr/src/wheels/torchvision-0.18.0a0%2B6043bc2-cp310-cp310-linux_aarch64.whl
tqdm==4.66.4
transformers==4.42.3
types-pyside2==5.15.2.1.7
typing_extensions==4.12.2
ubuntu-drivers-common==0.0.0
ubuntu-pro-client==8001
uff==0.6.9
ufoLib2==0.13.1
ufw==0.36.1
unicodedata2==14.0.0
urllib3==2.2.2
urwid==2.1.2
virtualenv==20.13.0+ds
wadllib==1.3.6
Werkzeug==3.0.3
wrapt==1.16.0
xdg==5
xkit==0.0.0
xmltodict==0.13.0
zipp==1.0.0
torch, tensorflow and onnx are running from the official wheels supplied by Nvidia for jetpack 6
I have experimented with inference step values from 8 to 30 with no change, I suspect that there is something wonky with diffusers loading the model into fp16 from the fp8 quant. Some guidance would be helpful.
Output:
Update: I have now tried using both the fp8 and the fp16 safetensors with the same result mentioned above.
I got the same result! Full black in the image.
I am now attempting a full and clean pull from the hub to see if the problem is related to opening the model as a single file, or some other compatibility element that hasnβt become clear.
I am now attempting a full and clean pull from the hub to see if the problem is related to opening the model as a single file, or some other compatibility element that hasnβt become clear.
Ok. Thank you for your sharing. If you solve it, please let me know.
same result, an image mat of only zeroes (this time using the exact demo code)
Output:
Fetching 26 files: 100%|ββββββββββ| 26/26 [56:36<00:00, 130.62s/it]
Loading pipeline components...: 22%|βββ | 2/9 [00:02<00:07, 1.04s/it]You set add_prefix_space
. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 33%|ββββ | 3/9 [00:02<00:04, 1.48it/s]
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|βββββ | 1/2 [00:00<00:00, 2.02it/s]
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:01<00:00, 1.97it/s]
Loading pipeline components...: 100%|ββββββββββ| 9/9 [00:04<00:00, 2.01it/s]
100%|ββββββββββ| 1/1 [00:01<00:00, 1.37s/it]
100%|ββββββββββ| 50/50 [00:47<00:00, 1.04it/s]
@edgechangfu Can you downgrade to diffusers==0.29.0 and try again?
Output:
/usr/src/sd3/venv/lib/python3.10/site-packages/diffusers/models/transformers/transformer_2d.py:34: FutureWarning: Transformer2DModelOutput
is deprecated and will be removed in version 1.0.0. Importing Transformer2DModelOutput
from diffusers.models.transformer_2d
is deprecated and this will be removed in a future version. Please use from diffusers.models.modeling_outputs import Transformer2DModelOutput
, instead.
deprecate("Transformer2DModelOutput", "1.0.0", deprecation_message)
Loading pipeline components...: 56%|ββββββ | 5/9 [00:02<00:01, 2.10it/s]You set add_prefix_space
. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 67%|βββββββ | 6/9 [00:02<00:01, 2.48it/s]
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|βββββ | 1/2 [00:00<00:00, 1.71it/s]
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:01<00:00, 1.72it/s]
Loading pipeline components...: 100%|ββββββββββ| 9/9 [00:03<00:00, 2.26it/s]
0%| | 0/10 [00:00<?, ?it/s]Passing scale
via joint_attention_kwargs
when not using the PEFT backend is ineffective.
10%|β | 1/10 [00:01<00:12, 1.36s/it]Passing scale
via joint_attention_kwargs
when not using the PEFT backend is ineffective.
20%|ββ | 2/10 [00:02<00:09, 1.19s/it]Passing scale
via joint_attention_kwargs
when not using the PEFT backend is ineffective.
30%|βββ | 3/10 [00:03<00:07, 1.13s/it]Passing scale
via joint_attention_kwargs
when not using the PEFT backend is ineffective.
40%|ββββ | 4/10 [00:04<00:06, 1.10s/it]Passing scale
via joint_attention_kwargs
when not using the PEFT backend is ineffective.
50%|βββββ | 5/10 [00:05<00:05, 1.08s/it]Passing scale
via joint_attention_kwargs
when not using the PEFT backend is ineffective.
60%|ββββββ | 6/10 [00:06<00:04, 1.06s/it]Passing scale
via joint_attention_kwargs
when not using the PEFT backend is ineffective.
70%|βββββββ | 7/10 [00:07<00:03, 1.05s/it]Passing scale
via joint_attention_kwargs
when not using the PEFT backend is ineffective.
80%|ββββββββ | 8/10 [00:08<00:02, 1.04s/it]Passing scale
via joint_attention_kwargs
when not using the PEFT backend is ineffective.
90%|βββββββββ | 9/10 [00:09<00:01, 1.04s/it]Passing scale
via joint_attention_kwargs
when not using the PEFT backend is ineffective.
100%|ββββββββββ| 10/10 [00:10<00:00, 1.07s/it]
@edgechangfu please do get back to me on this when you can, if that solution works we need to move this over to: https://github.com/huggingface/diffusers
After some testing the results are actually weirder than I thought... I can basically "fiddle" with the settings...for a horribly long time. Suddenly it works, works well, and consistently.
Then, I will unload the model, go about my business, load it back for some testing... and it no longer works, I then have to repeat the process. Really quite strange.
Hi,
@Manbehindthemadness
@edgechangfu
I am also facing the same issue.
Were you able to find a fix/workaround for this?
Hi, @Manbehindthemadness @edgechangfu I am also facing the same issue.
Were you able to find a fix/workaround for this?
I am sorry, I have not, Iβm still getting an output of NaN. I have had to put my project on hold for now.