Steinzer Narayan's picture
4

Steinzer Narayan

steinzer-narayan

AI & ML interests

None yet

Recent Activity

liked a dataset 2 days ago
google/Synthetic-Persona-Chat
liked a dataset 2 days ago
nvidia/HelpSteer2
updated a model 5 months ago
steinzer-narayan/caudwell-9b-v0
View all activity

Organizations

None yet

steinzer-narayan's activity

replied to mlabonne's post 7 months ago
view reply

I think it's less likely to be model size alone, and more likely to be a function of the relationship between alignment and/or instruction tuning, and model size. That is, smaller models have weaker representations, and implement less-rigorous state machines that virtually guarantee transitions in the latent space which can't appear in the larger models. These transitions appear as 'poor performance' on standardised tests, but in the context of storytelling, manifest as 'creativity'. Larger models, by comparison, learn a broader variety of more discretized features - but cannot utilise these in the way the smaller models do, because they have sufficient capacity to satisfy alignment and instruction-following to a degree that globally eliminates these 'unwanted' state transitions.