Join the conversation
Join the community of Machine Learners and AI enthusiasts.
Sign Up
GRPO would be dope!
Btw, did we ever found out if diffusion LLMs learn from output? Like understanding context of answer and applying it reversely? Example: If A = B, then B=C. Does C=A if B=A.
I thought this was something diffusion LLMs improve at.
it's a similar architecture to image generation, so.. kinda? diffusion llms aren't very popular though, so there isn't a ton of research on them. transformers is a much more reliable model type for now.
edit: it's not really a super serious experiment, they are more for testing if a logical response is possible this way.
this is also kinda why q and a bots are really bad, people just found that that format doesn't scale very well at all
edit 2: (i said one of, because another huge reason is quality data scarcity and lack of flexibility. with incremental models like gpts, you can have any number of roles and stuff, whereas input-output models just have that)
big final models do (mostly flux in oss, i think sd3.5 has a bit but not nearly as strong?)
most random pony or sdxl loras arent though, none of the trainers support it and its all hidden in research codebases that are impossible to use