Steinzer Narayan

steinzer-narayan

AI & ML interests

None yet

Recent Activity

liked a dataset 2 days ago

google/Synthetic-Persona-Chat

liked a dataset 2 days ago

nvidia/HelpSteer2

updated a model 5 months ago

steinzer-narayan/caudwell-9b-v0

View all activity

Organizations

None yet

steinzer-narayan's activity

liked 2 datasets 2 days ago

google/Synthetic-Persona-Chat

Viewer • Updated Mar 1, 2024 • 10.9k • 2.08k • 91

nvidia/HelpSteer2

Viewer • Updated Dec 18, 2024 • 21.4k • 16.2k • 400

updated 2 models 5 months ago

steinzer-narayan/caudwell-9b-v0

Updated Aug 30, 2024 • 3

steinzer-narayan/caudwell-9b-v0-Q6_K-GGUF

Updated Aug 30, 2024 • 4

replied to mlabonne's post 7 months ago

I think it's less likely to be model size alone, and more likely to be a function of the relationship between alignment and/or instruction tuning, and model size. That is, smaller models have weaker representations, and implement less-rigorous state machines that virtually guarantee transitions in the latent space which can't appear in the larger models. These transitions appear as 'poor performance' on standardised tests, but in the context of storytelling, manifest as 'creativity'. Larger models, by comparison, learn a broader variety of more discretized features - but cannot utilise these in the way the smaller models do, because they have sufficient capacity to satisfy alignment and instruction-following to a degree that globally eliminates these 'unwanted' state transitions.

liked a model 10 months ago

Qwen/Qwen1.5-32B-Chat-GGUF

Text Generation • Updated Apr 9, 2024 • 2.3k • 52

updated 2 models 10 months ago

steinzer-narayan/fimbulhermes-15B-v0.1_exl2_6.5bpw

Text Generation • Updated Apr 4, 2024 • 5

steinzer-narayan/fimbulhermes-15B-v0.1

Text Generation • Updated Apr 4, 2024 • 4

liked a model 11 months ago

Sao10K/Fimbulvetr-11B-v2

Text Generation • Updated Apr 4, 2024 • 495 • 171