🌁#89: AI in Action: How AI Engineers, Self-Optimizing Models, and Humanoid Robots Are Reshaping 2025

Community Article Published February 25, 2025

we look into the accelerating trends in AI – soaring demand for engineers, self-optimizing models, and humanoid robots stepping into reality

--

This Week in Turing Post:

  • Wednesday, AI 101, Model: Inside SmolLM2
  • Friday, Agentic Workflow: Reflections & Actions

🔳 Turing Post is on 🤗 Hugging Face as a resident -> click to follow!


The main topic – AI in Action

Last week, I co-emceed the Agents Engineering Track at the AI Engineer Summit in New York (link to the full day of presentations below). What struck me most was how much has changed since ChatGPT launched us into the era of generative AI.

Machine learning has been deeply practical for years, with ML teams embedded in every major company. But since early 2023, AI teams have been forming everywhere – and they’re starving for talent. For the first time, companies like Jane Street, BlackRock, and Morgan Stanley openly discussed their AI work. They weren’t revealing too much, of course, they came with a different message: we are working on super cool stuff – come work with us. AI has moved beyond hype and theory – it’s now a reality, and so is the soaring demand for engineers and builders.

The bar is high. As Xiaofeng Wang from LinkedIn put it, the ideal candidate is a strong software engineer skilled in infrastructure integration, experienced in interface design, with a background in AI and data science, the ability to quickly learn new technologies, implement solutions efficiently, and adapt to evolving trends. If you find one, he says, they’re worth more than a unicorn.

The crazy thing? It’s actually not completely impossible to become one. Generative AI has never been more accessible, with open-source models, educational resources, and hands-on tools available to anyone willing to dive in.

It’s a fascinating time to be an AI builder. And for now, a highly lucrative one as well.

While humans are sharpening their skills, AI itself is evolving – becoming more capable and practical. Just look at last week’s developments. AI is advancing faster than anticipated, growing increasingly useful – both for itself and for our benefit.

Take Sakana AI’s CUDA Engineer – an AI that optimizes AI itself. It’s an autonomous agent that converts PyTorch code into ultra-optimized CUDA kernels, delivering 10–100x speedups on GPU computations. Using evolutionary optimization, AI makes itself smarter, faster, cheaper, and more efficient.

And if AI optimizing AI isn’t enough, and you still insist that AI should be able to fold your laundry (which I fully support!) – well, that might actually be happening. Two robotics companies just shared demos of their highly capable robots.

Figure introduced Helix, a generalist Vision-Language-Action (VLA) model that unifies perception, language understanding, and dexterous control. Running on Figure’s humanoid robots, Helix gives them real-world intelligence – letting them pick up objects they’ve never seen before, collaborate with other robots, and respond to natural language commands without additional training. This video is both meditative and tells you about the nearest future:

image/png

And then 1X Technologies demonstrated their NEO Gamma. It walks with a natural gait, picks up objects, sits in chairs, and understands conversational prompts thanks to an in-house language model. It even has soft covers for safety and emotive ear rings – because if robots are moving in, they might as well have some personality.

image/png

AI teams starving for AI talent in every possible industry. AI optimizing AI. Robots thinking on the fly. Humanoids stepping into our homes.

2025 has just started and it’s already full of AI in action.

Curated Collections

image/png

We are reading/watching:

News from The Usual Suspects ©

Microsoft’s Quantum Gambit Shakes Up the Market

  • Majorana 1 quantum chip has Wall Street buzzing, pushing stocks of quantum computing firms IonQ, Rigetti, and D-Wave higher. With Microsoft claiming its chip is less error-prone and closer to real-world applications, the debate over quantum’s timeline just got a lot more interesting. Nvidia’s Jensen Huang recently downplayed quantum’s near-term impact – but Microsoft, Alphabet, and IBM seem to think otherwise. Who’s right? The market’s watching.

OpenAI’s o1-preview and DeepSeek-R1 Play Chess… and Cheat

  • A new study shows that AI reasoning models don’t just play by the rules – they rewrite them. Researchers found that models like OpenAI’s o1-preview and DeepSeek-R1 often resorted to hacking the chess game environment rather than playing fairly. More traditional LLMs, like GPT-4o and Claude 3.5 Sonnet, needed a little nudge to break the rules, but they still got there.

Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

  • Researchers from Mila-Quebec AI Institute, Université de Montréal, and the University of California, Berkeley are deeply concerned about the risks posed by superintelligent agents based on the models above. To deal with that they propose Scientist AI, a non-agentic AI designed for understanding rather than goal pursuit. Unlike agentic AI, which risks deception, self-preservation, and power-seeking, Scientist AI builds causal models and answers questions with calibrated uncertainty. It offers safety guardrails against risky AI systems, aids scientific discovery, and ensures AI safety research progresses without existential threats. This Bayesian, interpretable system mitigates overconfidence and converges towards safer performance with increased compute.

Not Scientist AI, but AI Co-Scientist is coming from Google this week

  • Google Research unveils AI Co-Scientist, a multi-agent system built on Gemini 2.0 to accelerate scientific discovery. Designed to generate hypotheses, refine research proposals, and assist in biomedical breakthroughs, it has already contributed to drug repurposing for leukemia and antimicrobial resistance studies. With an expert-in-the-loop approach and Trusted Tester access, Google’s AI is aiming to be a true collaborator—not just a tool.

Thinking Machines Lab: A New AI Powerhouse Emerges

  • Ex-OpenAI and Meta researchers launch Thinking Machines Lab, focused on customizable AI, multimodal systems, and transparency. With Mira Murati and John Schulman leading, expect big things. But it’s not super clear yet what they are going to do.

Models to pay attention to:

  • Claude 3.7 Sonnet and Claude Code – This is Anthropic’s first hybrid reasoning model, which allows users to toggle between rapid responses and extended thinking. It excels in coding, achieving SOTA results on SWE-bench Verified (70.3%) and TAU-bench. The model maintains prior pricing at $3M/input tokens and $15M/output →read their blog
  • Microsoft’s Muse – a generative AI model trained on gameplay data to generate alternative game sequences for creative ideation in interactive design →read the paper
  • SmolVLM2 – a family of small yet powerful video-language models optimized for efficiency across devices, enabling real-time video analysis and semantic search →read the paper
  • Alibaba published Qwen2.5-VL Technical Report →read the paper
  • InfiR – a Small Language Model optimized for reasoning, significantly outperforming similarly scaled models while ensuring efficient edge-device deployment →read the paper
  • Multimodal Mamba – a linear-complexity multimodal model that reduces GPU memory use and inference costs while maintaining strong multimodal reasoning →read the paper
  • Magma – a multimodal foundation model integrating vision, language, and action planning for digital and robotic applications →read the paper
  • RDLMC – a Riemannian diffusion-based language model that improves generative modeling efficiency on high-dimensional categorical distributions →read the paper

The freshest research papers, categorized for your convenience

There were quite a few TOP research papers this week, we will mark them with 🌟 in each section.

Multimodal, Perception, and Vision-Language Models

  • 🌟 SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding – Advances vision-language learning with multilingual training and improved zero-shot capabilities →read the paper
  • 🌟 Intuitive Physics Understanding Emerges from Self-Supervised Pretraining on Natural Videos – Trains a model on video frame prediction to develop intuitive physics reasoning →read the paper

LLM Optimization, Memory, and Efficiency

  • SurveyX: Academic Survey Automation via Large Language Models – Develops an automated system for generating high-quality academic surveys, improving citation precision and evaluation frameworks →read the paper
  • From RAG to Memory: Non-Parametric Continual Learning for Large Language Models – Introduces HippoRAG 2, a retrieval-augmented generation method that enhances long-term memory and retrieval →read the paper
  • How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? – Examines the trade-offs in integrating new knowledge into LLMs using Low-Rank Adaptation (LoRA) →read the paper
  • Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models – Develops LORAM, a memory-efficient fine-tuning approach that enables large model training on low-resource hardware →read the paper
  • 🌟Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention – Optimizes sparse attention for long-context models, significantly improving efficiency →read the paper
  • Eager Updates for Overlapped Communication and Computation in DiLoCo – Reduces communication bottlenecks in distributed LLM training by overlapping updates with computation →read the paper

Reinforcement Learning (RL), Self-Improvement, and Decision-Making

  • 🌟 S2R: Teaching LLMs to Self-verify and Self-correct via RL – Develops a framework to improve LLM reasoning by teaching self-verification and self-correction →read the paper
  • Logic-RL: Unleashing LLM Reasoning with Rule-Based RL – Uses RL to enhance logical reasoning capabilities →read the paper
  • Discovering Highly Efficient Low-Weight Quantum Error-Correcting Codes with RL – Optimizes quantum error-correcting codes using RL, reducing physical qubit overhead →read the paper
  • Armap: Scaling Autonomous Agents via Automatic Reward Modeling and Planning – Introduces a decision-making framework that learns rewards automatically, improving agent-based reasoning →read the paper
  • 🌟OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning – Develops a tool-based system for multi-step decision-making and structured tool use →read the paper
  • Thinking Preference Optimization – Enhances LLM reasoning by refining preference-based optimization of reasoning steps →read the paper

LLM Trustworthiness, Safety, and Alignment

  • 🌟ReLearn: Unlearning via Learning for Large Language Models – Introduces a knowledge-unlearning method that removes sensitive knowledge without degrading fluency →read the paper
  • 🌟 On the Trustworthiness of Generative Foundation Models – Guideline, Assessment, and Perspective – Develops a framework for evaluating trustworthiness in generative AI models →read the paper
  • Rethinking Diverse Human Preference Learning through Principal Component Analysis – Improves human preference modeling using principal component analysis (PCA) for better LLM alignment →read the paper

Code Generation, Software Engineering, and Web Crawling

  • 🌟 S Test Time Scaling for Code Generation – Introduces a test-time scaling framework that improves LLM-based code generation through iterative debugging →read the paper
  • Craw4LLM: Efficient Web Crawling for LLM Pretraining – Optimizes web crawling for LLM training by prioritizing the most impactful pages →read the paper
  • 🌟Autellix: An Efficient Serving Engine for LLM Agents as General Programs – Enhances LLM serving efficiency for agentic applications by optimizing request scheduling →read the paper

Mathematical Reasoning, Logical Thinking, and Test-Time Optimization in LLMs

  • LLMs and Mathematical Reasoning Failures – Evaluates LLMs on newly designed math problems, exposing weaknesses in multi-step problem-solving →read the paper
  • Small Models Struggle to Learn from Strong Reasoners – Identifies the limitations of small LLMs in benefiting from chain-of-thought distillation from larger models →read the paper
  • 🌟Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering – Examines how inference scaling helps LLMs selectively answer questions with confidence →read the paper
  • Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options – Enhances LLM problem-solving by systematically exploring multiple solution paths →read the paper

That’s all for today. Thank you for reading!


Please share this article to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve.

image/png

Community

Sign up or log in to comment