173 129 368

Alvaro Bartolome PRO

alvarobartt

https://alvarobartt.me

AI & ML interests

machine learning @huggingface

Recent Activity

posted an update about 2 hours ago

🔥 Agents can do anything! @microsoft Research just announced the release of Magma 8B! Magma is a new Visual Language Model (VLM) with 8B parameters for multi-modal agents designed to handle complex interactions across virtual and real environments; and it's MIT licensed! Magma comes with exciting new features such as: - Introduces the Set-of-Mark and Trace-of-Mark techniques for fine-tuning - Leverages a large amount of unlabeled video data to learn the spatial-temporal grounding and planning - A strong generalization and ability to be fine-tuned for other agentic tasks - SOTA in different multi-modal benchmarks spanning across UI navigation, robotics manipulation, image / video understanding and spatial understanding and reasoning - Generates goal-driven visual plans and actions for agentic use cases Model: https://huggingface.co/microsoft/Magma-8B Technical Report: https://huggingface.co/papers/2502.13130

liked a model about 2 hours ago

microsoft/Magma-8B

updated a model about 4 hours ago

microsoft/Magma-8B

View all activity

Organizations

alvarobartt's activity

posted an update about 2 hours ago

Post

143

🔥 Agents can do anything! @microsoft Research just announced the release of Magma 8B!

Magma is a new Visual Language Model (VLM) with 8B parameters for multi-modal agents designed to handle complex interactions across virtual and real environments; and it's MIT licensed!

Magma comes with exciting new features such as:
- Introduces the Set-of-Mark and Trace-of-Mark techniques for fine-tuning
- Leverages a large amount of unlabeled video data to learn the spatial-temporal grounding and planning
- A strong generalization and ability to be fine-tuned for other agentic tasks
- SOTA in different multi-modal benchmarks spanning across UI navigation, robotics manipulation, image / video understanding and spatial understanding and reasoning
- Generates goal-driven visual plans and actions for agentic use cases

Model: microsoft/Magma-8B
Technical Report: Magma: A Foundation Model for Multimodal AI Agents (2502.13130)