@nroggendorff on Hugging Face: "We're using RLHF on diffusion models, right? Just making sure.."

Back to feed

nroggendorff

posted an update 6 days ago

Post

2761

We're using RLHF on diffusion models, right? Just making sure..

CyberNative

6 days ago

GRPO would be dope!
Btw, did we ever found out if diffusion LLMs learn from output? Like understanding context of answer and applying it reversely? Example: If A = B, then B=C. Does C=A if B=A.
I thought this was something diffusion LLMs improve at.

nroggendorff

5 days ago

•

edited 5 days ago

it's a similar architecture to image generation, so.. kinda? diffusion llms aren't very popular though, so there isn't a ton of research on them. transformers is a much more reliable model type for now.

edit: it's not really a super serious experiment, they are more for testing if a logical response is possible this way.
this is also kinda why q and a bots are really bad, people just found that that format doesn't scale very well at all

edit 2: (i said one of, because another huge reason is quality data scarcity and lack of flexibility. with incremental models like gpts, you can have any number of roles and stuff, whereas input-output models just have that)

Fizzarolli

5 days ago

big final models do (mostly flux in oss, i think sd3.5 has a bit but not nearly as strong?)

most random pony or sdxl loras arent though, none of the trainers support it and its all hidden in research codebases that are impossible to use

Join the conversation