2 68 71

Shyam Sunder Kumar

theainerd

AI & ML interests

Natural Language Processing

Recent Activity

upvoted a collection about 11 hours ago

Qwen2.5-1M

reacted to merve's post with 🚀 about 16 hours ago

smolagents can see 🔥 we just shipped vision support to smolagents 🤗 agentic computers FTW you can now: 💻 let the agent get images dynamically (e.g. agentic web browser) 📑 pass images at the init of the agent (e.g. chatting with documents, filling forms automatically etc) with few LoC change! 🤯 you can use transformers models locally (like Qwen2VL) OR plug-in your favorite multimodal inference provider (gpt-4o, antrophic & co) 🤠 read our blog http://hf.co/blog/smolagents-can-see

upvoted an article about 16 hours ago

We now support VLMs in smolagents!

View all activity

Organizations

theainerd's activity

upvoted a collection about 11 hours ago

Qwen2.5-1M

Collection

The long-context version of Qwen2.5, supporting 1M-token context lengths • 2 items • Updated about 13 hours ago • 56

reacted to merve's post with 🚀 about 16 hours ago

Post

1504

smolagents can see 🔥
we just shipped vision support to smolagents 🤗 agentic computers FTW

you can now:
💻 let the agent get images dynamically (e.g. agentic web browser)
📑 pass images at the init of the agent (e.g. chatting with documents, filling forms automatically etc)
with few LoC change! 🤯
you can use transformers models locally (like Qwen2VL) OR plug-in your favorite multimodal inference provider (gpt-4o, antrophic & co) 🤠

read our blog http://hf.co/blog/smolagents-can-see

upvoted an article about 16 hours ago

Article

We now support VLMs in smolagents!

3 days ago

• 33

updated a collection about 17 hours ago

Reasoning

Collection

6 items • Updated about 17 hours ago

upvoted a paper about 17 hours ago

Reasoning Language Models: A Blueprint

Paper • 2501.11223 • Published 7 days ago • 24

updated a collection about 18 hours ago

Reasoning

Collection

6 items • Updated about 17 hours ago

upvoted 2 papers about 18 hours ago

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published 5 days ago • 57

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published 5 days ago • 202

liked a model 4 days ago

openbmb/MiniCPM-o-2_6

Any-to-Any • Updated about 13 hours ago • 74.9k • 834

reacted to chansung's post with 👍 5 days ago

Post

1904

Simple summarization of Evolving Deeper LLM Thinking (Google DeepMind)

The process starts by posing a question.
1) The LLM generates initial responses.
2) These generated responses are evaluated according to specific criteria (program-based checker).
3) The LLM critiques the evaluated results.
4) The LLM refines the responses based on the evaluation, critique, and original responses.

The refined response is then fed back into step 2). If it meets the criteria, the process ends. Otherwise, the algorithm generates more responses based on the refined ones (with some being discarded, some remaining, and some responses potentially being merged).

Through this process, it demonstrated excellent performance in complex scheduling problems (travel planning, meeting scheduling, etc.). It's a viable method for finding highly effective solutions in specific scenarios.

However, there are two major drawbacks:
🤔 An excessive number of API calls are required. (While the cost might not be very high, it leads to significant latency.)
🤔 The evaluator is program-based. (This limits its use as a general method. It could potentially be modified/implemented using LLM as Judge, but that would introduce additional API costs for evaluation.)

https://arxiv.org/abs/2501.09891

liked a Space 5 days ago

Running on CPU Upgrade

604

🏆

Open ASR Leaderboard

updated a collection 5 days ago

Agents

Collection

4 items • Updated 5 days ago

replied to chansung's post 6 days ago

Informative. Thanks

reacted to chansung's post with 👍 6 days ago

Post

1948

Simple Summarization on DeepSeek-R1 from DeepSeek AI

The RL stage is very important.
↳ However, it is difficult to create a truly helpful AI for people solely through RL.
↳ So, we applied a learning pipeline consisting of four stages: providing a good starting point, reasoning RL, SFT, and safety RL, and achieved performance comparable to o1.
↳ Simply fine-tuning other open models with the data generated by R1-Zero (distillation) resulted in performance comparable to o1-mini.

Of course, this is just a brief overview and may not be of much help. All models are accessible on Hugging Face, and the paper can be read through the GitHub repository.

Model: https://huggingface.co/deepseek-ai
Paper: https://github.com/deepseek-ai/DeepSeek-R1