Steinzer Narayan
AI & ML interests
Recent Activity
Organizations
steinzer-narayan's activity
nvidia/HelpSteer2
I think it's less likely to be model size alone, and more likely to be a function of the relationship between alignment and/or instruction tuning, and model size. That is, smaller models have weaker representations, and implement less-rigorous state machines that virtually guarantee transitions in the latent space which can't appear in the larger models. These transitions appear as 'poor performance' on standardised tests, but in the context of storytelling, manifest as 'creativity'. Larger models, by comparison, learn a broader variety of more discretized features - but cannot utilise these in the way the smaller models do, because they have sufficient capacity to satisfy alignment and instruction-following to a degree that globally eliminates these 'unwanted' state transitions.