Ross Wightman

rwightman

AI & ML interests

Computer vision, transfer learning, semi/self supervised learning, robotics.

Recent Activity

View all activity

Organizations

Hugging Face's profile picture PyTorch Image Models's profile picture Spaces-explorers's profile picture Flax Community's profile picture LAION eV's profile picture kotol's profile picture Pixel Parsing's profile picture

rwightman's activity

reacted to csabakecskemeti's post with 🤗🚀 about 18 hours ago
view post
Post
2249
Testing Training on AMD/ROCm the first time!

I've got my hands on an AMD Instinct MI100. It's about the same price used as a V100 but on paper has more TOPS (V100 14TOPS vs MI100 23TOPS) also the HBM has faster clock so the memory bandwidth is 1.2TB/s.
For quantized inference it's a beast (MI50 was also surprisingly fast)

For LORA training with this quick test I could not make the bnb config works so I'm running the FT on the fill size model.

Will share all the install, setup and setting I've learned in a blog post, together with the cooling shroud 3D design.
·
replied to csabakecskemeti's post about 19 hours ago
view reply

Yeah it's 112 for PCIe V100 and 125 for the SXM I think. One thing on the MI100 and other MIxx chip specs I was never clear on, if their float16 'matrix' numbers are matrix mul float16 w/ float32 accumulate (which is what you'd want). The datacenter NVIDIA chip 'tensor core' flops are usually float32 acc (unless it's a gamer card in which case that's halved).

The MI100 does have native bfloat16 which is a big win over V100.

I do feel though you are getting good TOPS/$ here because AMD hasn't been that successful in competing with NVIDIA on the full system offer (chips + driver/software). I've really really wanted this to change but AMD keeps frustrating... how do you find working with it so far in terms of issues / crashes / head banging? :) Hopefully things have been improving

replied to csabakecskemeti's post about 20 hours ago
view reply

FWIW, the MI100 was released after the A100, 3 years after the V100... that says something :) Also it's the matrix / tensor core mixed or reduced precision FLOPs that are of interest not the float32 FLOPS which are the 14 & 23 numbers..