nebiyu29 (youhannes)

liked a Space 25 days ago

Running on Zero

22

🚀

FlipSketch

New activity in McAuley-Lab/Amazon-Reviews-2023 5 months ago

time stamp problem

1

#6 opened 5 months ago by

nebiyu29

reacted to Sentdex's post with 👍 5 months ago

Post

8411

Okay, first pass over KAN: Kolmogorov–Arnold Networks, it looks very interesting!

Interpretability of KAN model:
May be considered mostly as a safety issue these days, but it can also be used as a form of interaction between the user and a model, as this paper argues and I think they make a valid point here. With MLP, we only interact with the outputs, but KAN is an entirely different paradigm and I find it compelling.

Scalability:
KAN shows better parameter efficiency than MLP. This likely translates also to needing less data. We're already at the point with the frontier LLMs where all the data available from the internet is used + more is made synthetically...so we kind of need something better.

Continual learning:
KAN can handle new input information w/o catastrophic forgetting, which helps to keep a model up to date without relying on some database or retraining.

Sequential data:
This is probably what most people are curious about right now, and KANs are not shown to work with sequential data yet and it's unclear what the best approach might be to make it work well both in training and regarding the interpretability aspect. That said, there's a rich long history of achieving sequential data in variety of ways, so I don't think getting the ball rolling here would be too challenging.

Mostly, I just love a new paradigm and I want to see more!

KAN: Kolmogorov-Arnold Networks (2404.19756)

5 replies

·

liked a model 8 months ago

cardiffnlp/twitter-roberta-base-sentiment

Text Classification • Updated Jan 20, 2023 • 2.13M • 278

updated a model 9 months ago

nebiyu29/new_hate_classifier

Updated Apr 11

updated a Space 9 months ago

Runtime error

🌍

Hate Classifier

updated 2 models 9 months ago

nebiyu29/hate_classifier

Text Classification • Updated Apr 10 • 12

nebiyu29/project-us

Text Classification • Updated Apr 10 • 15

reacted to Jaward's post with 🤗 9 months ago

Post

2827

After giving GPU Programming a hands-on try, I have come to appreciate the level of complexity in AI compute:

- Existing/leading frameworks (CUDA, OpenCL, DSLs, even Triton), still fall at the mercy of low-level compute that requires deeper understanding and experience.
- Ambiguous optimizations methods that will literally drive you mad 🤯
- Triton is cool but not cool enough (high level abstractions that fall back to low level compute issues as you build more specialized kernels)
- As for CUDA, optimization requires considering all major components of the GPU (DRAM, SRAM, ALUs) 🤕
- Models today require stallion written GPU kernels to reduce storage and compute cost.
- GPTQ was a big save 👍🏼

@karpathy is right expertise in this area is scarce and the reason is quite obvious - uncertainties: we are still struggling to get peak performance from multi-connected GPUs while maintaining precision and reducing cost.

May the Scaling Laws favor us lol.