Prithiv Sakthi's picture

Prithiv Sakthi

prithivMLmods

AI & ML interests

computer vision, multimodality, adapters @starngerzonehf @strangerguardhf

Recent Activity

updated a Space about 8 hours ago
prithivMLmods/Agent-Dino
updated a collection about 8 hours ago
Siglip2 Custom
updated a collection about 8 hours ago
Siglip2 Custom
View all activity

Organizations

Stanford AI's profile picture DataScienceEngineering's profile picture AI FILMS's profile picture Samsung Electronics's profile picture MISATO-dataset's profile picture GEM benchmark's profile picture OpenGVLab's profile picture MusicAI's profile picture BigScience Biomedical Datasets's profile picture OpenVINO Toolkit's profile picture LLMs's profile picture ONNXConfig for all's profile picture Gradio-Themes-Party's profile picture scikit-learn's profile picture Open-Source AI Meetup's profile picture lora concepts library's profile picture Platzi Community's profile picture Kornia AI's profile picture Tune a video concepts library's profile picture Universitรฉ Dauphine-PSL's profile picture Keras Dreambooth Event's profile picture Stable Diffusion Dreambooth Concepts Library's profile picture The Waifu Research Department's profile picture Musika's profile picture Blog-explorers's profile picture OpenSky's profile picture AI Tamil Nadu's profile picture OpenLLM France's profile picture huggingPartyParis's profile picture Team Tonic's profile picture That Time I got Reincarnated as a Hugging Face Organization's profile picture LocalLLaMA's profile picture Major TOM's profile picture MLX Community's profile picture C4AI Community's profile picture M4-ai's profile picture Chinese LLMs on Hugging Face's profile picture ONNX Community's profile picture Dataset Tools's profile picture Nerdy Face's profile picture Stranger Zone's profile picture open/ acc's profile picture Data Is Better Together Contributor's profile picture None yet's profile picture Doge Face's profile picture Stranger Guard's profile picture

prithivMLmods's activity

reacted to AdinaY's post with ๐Ÿš€ about 11 hours ago
view post
Post
107
Try QwQ-Max-Preview, Qwen's reasoning model here๐Ÿ‘‰ https://chat.qwen.ai
Can't wait for the model weights to drop on the Hugging Face Hub ๐Ÿ”ฅ
  • 1 reply
ยท
reacted to sequelbox's post with ๐Ÿš€ about 11 hours ago
view post
Post
710
SNEAK PREVIEW: Tachibana 2! A new high-difficulty code-reasoning dataset to use and challenge deepseek-ai/DeepSeek-R1 - harder prompts, complex requirements, deeper technical skill.

Link here: sequelbox/Tachibana2-DeepSeek-R1-PREVIEW

All responses generated by DeepSeek's R1 model, all prompts synthetically generated by Llama 3.1 405b Instruct.

excited to bring out the full dataset for everyone's use as soon as I can! more to come soon.
replied to their post 2 days ago
view reply

@lunarflu
I read about it somewhere in a Microsoft article that, unlike the eโˆ’ based, it doesnโ€™t exist naturally but in specific equilibrium conditions with superconductors, and also with a backable magnetic field, it can proceed to obtain the particle to existence.

posted an update 3 days ago
view post
Post
5195
It's really interesting about the deployment of a new state of matter in Majorana 1: the worldโ€™s first quantum processor powered by topological qubits. If you missed this news this week, here are some links for you:

๐Ÿ…ฑ๏ธTopological qubit arrays: https://arxiv.org/pdf/2502.12252

โš›๏ธ Quantum Blog: https://azure.microsoft.com/en-us/blog/quantum/2025/02/19/microsoft-unveils-majorana-1-the-worlds-first-quantum-processor-powered-by-topological-qubits/

๐Ÿ“– Read the story: https://news.microsoft.com/source/features/innovation/microsofts-majorana-1-chip-carves-new-path-for-quantum-computing/

๐Ÿ“ Majorana 1 Intro: https://youtu.be/Q4xCR20Dh1E?si=Z51DbEYnZFp_88Xp

๐ŸŒ€The Path to a Million Qubits: https://youtu.be/wSHmygPQukQ?si=TS80EhI62oWiMSHK
ยท
reacted to nicolay-r's post with ๐Ÿš€ 3 days ago
view post
Post
3596
๐Ÿ“ข If you're looking for translating massive dataset of JSON-lines / CSV data with various set of source fields, then the following update would be relevant. So far and experimenting with adapting language specific Sentiment Analysis model, got a change to reforge and relaese bulk-translate 0.25.2.
โญ๏ธ https://github.com/nicolay-r/bulk-translate/releases/tag/0.25.2

The update has the following major features
- Supporting schemas: all the columns to be translated are now could be declared within the same prompt-style format. using json this automatically allows to map them onto output fields
- The related updates for shell execution mode: schema parameter is now available alongside with just a prompt usage before.

Benefit is that your output is invariant. You can extend and stack various translators with separated shell laucnhes.

Screenshot below is the application of the google-translate engine in manual batching mode.
๐Ÿš€ Performance: 2.5 it / sec (in the case of a single field translation)

๐ŸŒŸ about bulk-translate: https://github.com/nicolay-r/bulk-translate
๐ŸŒŒ nlp-thirdgate: https://github.com/nicolay-r/nlp-thirdgate?tab=readme-ov-file
  • 1 reply
ยท
reacted to lysandre's post with ๐Ÿ”ฅโค๏ธโค๏ธ 4 days ago
view post
Post
4976
SmolVLM-2 and SigLIP-2 are now part of transformers in dedicated releases!

They're added on top of the v4.49.0 release, and can be installed from the following tags: v4.49.0-SmolVLM-2 and v4.49.0-SigLIP-2.

This marks a new beginning for the release process of transformers. For the past five years, we've been doing monthly releases featuring many models (v4.49.0, the latest release, features 9 new architectures).

Starting with SmolVLM-2 & SigLIP2, we'll now additionally release tags supporting new models on a stable branch. These models are therefore directly available for use by installing from the tag itself. These tags will continue to be updated with fixes applied to these models.

Going forward, continue expecting software releases following semantic versioning: v4.50.0 will have ~10 new architectures compared to v4.49.0, as well as a myriad of new features, improvements and bug fixes. Accompanying these software releases, we'll release tags offering brand new models as fast as possible, to make them accessible to all immediately.
  • 1 reply
ยท
reacted to JingzeShi's post with ๐Ÿš€ 4 days ago
reacted to DmitryRyumin's post with ๐Ÿ”ฅ 4 days ago
view post
Post
3493
๐Ÿš€๐ŸŽญ๐ŸŒŸ New Research Alert - WACV 2025 (Avatars Collection)! ๐ŸŒŸ๐ŸŽญ๐Ÿš€
๐Ÿ“„ Title: EmoVOCA: Speech-Driven Emotional 3D Talking Heads ๐Ÿ”

๐Ÿ“ Description: EmoVOCA is a data-driven method for generating emotional 3D talking heads by combining speech-driven lip movements with expressive facial dynamics. This method has been developed to overcome the limitations of corpora and to achieve state-of-the-art animation quality.

๐Ÿ‘ฅ Authors: @FedeNoce , Claudio Ferrari, and Stefano Berretti

๐Ÿ“… Conference: WACV, 28 Feb โ€“ 4 Mar, 2025 | Arizona, USA ๐Ÿ‡บ๐Ÿ‡ธ

๐Ÿ“„ Paper: https://arxiv.org/abs/2403.12886

๐ŸŒ Github Page: https://fedenoce.github.io/emovoca/
๐Ÿ“ Repository: https://github.com/miccunifi/EmoVOCA

๐Ÿš€ CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers

๐Ÿš€ WACV-2024-Papers: https://github.com/DmitryRyumin/WACV-2024-Papers

๐Ÿš€ ICCV-2023-Papers: https://github.com/DmitryRyumin/ICCV-2023-Papers

๐Ÿ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

๐Ÿš€ Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

๐Ÿ” Keywords: #EmoVOCA #3DAnimation #TalkingHeads #SpeechDriven #FacialExpressions #MachineLearning #ComputerVision #ComputerGraphics #DeepLearning #AI #WACV2024
  • 1 reply
ยท
reacted to tegridydev's post with ๐Ÿค— 4 days ago
view post
Post
2199
Open Source AI Agents | Github/Repo List | [2025]

https://huggingface.co/blog/tegridydev/open-source-ai-agents-directory

Check out the article & Follow, bookmark, save the tab as I will be updating it <3
(using it as my own notepad & decided i might keep it up to date if i post it here, instead of making the 15th_version of it and not saving it with a name i can remember on my desktop lol)
reacted to jsulz's post with โค๏ธ 4 days ago
view post
Post
3145
Time flies!

Six months after joining Hugging Face the Xet team is kicking off the first migrations from LFS to our storage for a number of repositories on the Hub.

More on the nitty gritty details behind the migration soon, but here are the big takeaways:

๐Ÿค– We've successfully completed the first migrations from LFS -> Xet to test the infrastructure and prepare for a wider release

โœ… No action on your part needed - you can work with a Xet-backed repo like any other repo on the Hub (for now - major improvements on their way!)

๐Ÿ‘€ Keep an eye out for the Xet logo to see if a repo you know is on our infra! See the screenshots below to spot the difference ๐Ÿ‘‡

โฉ โฉ โฉ Blazing uploads and downloads coming soon. Wโ€™re gearing up for a full integration with the Hub's Python library that will make building on the Hub faster than ever - special thanks to @celinah and @Wauplin for their assistance.

๐ŸŽ‰ Want Early Access? If youโ€™re curious and want to test it out the bleeding edge that will power the development experience on the Hub, weโ€™d love to partner with you. Let me know!

This is the culmination of a lot of effort from the entire team. Big round of applause to @sirahd @brianronan @jgodlewski @hoytak @seanses @assafvayner @znation @saba9 @rajatarya @port8080 @yuchenglow
  • 1 reply
ยท
reacted to davanstrien's post with ๐Ÿง  5 days ago
view post
Post
2410
Hacked together a way to log trl GRPO training completions to a ๐Ÿค— dataset repo. This allows you to:

- Track rewards from multiple reward functions
- Treat the completion and rewards from training as a "proper" dataset and do EDA
- Share results for open science

The implementation is super hacky, but I'm curious if people would find this useful.

To push completions to the Hub, you just need two extra parameters:

log_completions=True
log_completions_hub_repo='your-username/repo-name'

Example dataset: davanstrien/test-logs
Colab: https://colab.research.google.com/drive/1wzBFPVthRYYTp-mEYlznLg_e_0Za1M3g

reacted to merve's post with ๐Ÿง ๐Ÿง  5 days ago
view post
Post
4912
Google just released PaliGemma 2 Mix: new versatile instruction vision language models ๐Ÿ”ฅ

> Three new models: 3B, 10B, 28B with res 224, 448 ๐Ÿ’™
> Can do vision language tasks with open-ended prompts, understand documents, and segment or detect anything ๐Ÿคฏ

Read more https://huggingface.co/blog/paligemma2mix
Try the demo google/paligemma2-10b-mix
All models are here google/paligemma-2-mix-67ac6a251aaf3ee73679dcc4
reacted to burtenshaw's post with ๐Ÿš€ 6 days ago
view post
Post
6545
AGENTS + FINETUNING! This week Hugging Face learn has a whole pathway on finetuning for agentic applications. You can follow these two courses to get knowledge on levelling up your agent game beyond prompts:

1๏ธโƒฃ New Supervised Fine-tuning unit in the NLP Course https://huggingface.co/learn/nlp-course/en/chapter11/1
2๏ธโƒฃNew Finetuning for agents bonus module in the Agents Course https://huggingface.co/learn/agents-course/bonus-unit1/introduction

Fine-tuning will squeeze everything out of your model for how youโ€™re using it, more than any prompt.
  • 2 replies
ยท
reacted to AdinaY's post with โค๏ธ 6 days ago
view post
Post
4147
๐Ÿš€ StepFun้˜ถ่ทƒๆ˜Ÿ่พฐ is making BIG open moves!

Last year, their GOT-OCR 2.0 took the community by storm ๐Ÿ”ฅbut many didnโ€™t know they were also building some amazing models. Now, theyโ€™ve just dropped something huge on the hub!

๐Ÿ“บ Step-Video-T2V: a 30B bilingual open video model that generates 204 frames (8-10s) at 540P resolution with high information density & consistency.
stepfun-ai/stepvideo-t2v

๐Ÿ”Š Step-Audio-TTS-3B : a TTS trained with the LLM-Chat paradigm on a large synthetic dataset, capable of generating RAP & Humming
stepfun-ai/step-audio-67b33accf45735bb21131b0b
ยท
posted an update 7 days ago
view post
Post
3864
Dino: The Minimalist Multipurpose Chat System ๐ŸŒ 
Agent-Dino : prithivMLmods/Agent-Dino
Github: https://github.com/PRITHIVSAKTHIUR/Agent-Dino

By default, it performs the following tasks:
{Text-to-Text Generation}, {Image-Text-Text Generation}
@image: Generates an image using Stable Diffusion xL.
@3d: Generates a 3D mesh.
@web: Web search agents.
@rAgent: Initiates a reasoning chain using Llama mode for coding explanations.
@tts1-โ™€, @tts2-โ™‚: Voice generation (Female and Male voices).
@yolo : Object Detection
reacted to ZennyKenny's post with ๐Ÿค— 7 days ago
view post
Post
2121
Really excited to start contributing to the SWE Arena project: https://swe-arena.com/

Led by IBM PhD fellow @terryyz , our goal is to advance research in code generation and app development by frontier LLMs.

reacted to sayakpaul's post with ๐Ÿ”ฅ 7 days ago
view post
Post
2793
Inference-time scaling meets Flux.1-Dev (and others) ๐Ÿ”ฅ

Presenting a simple re-implementation of "Inference-time scaling diffusion models beyond denoising steps" by Ma et al.

I did the simplest random search strategy, but results can potentially be improved with better-guided search methods.

Supports Gemini 2 Flash & Qwen2.5 as verifiers for "LLMGrading" ๐Ÿค—

The steps are simple:

For each round:

1> Starting by sampling 2 starting noises with different seeds.
2> Score the generations w.r.t a metric.
3> Obtain the best generation from the current round.

If you have more compute budget, go to the next search round. Scale the noise pool (2 ** search_round) and repeat 1 - 3.

This constitutes the random search method as done in the paper by Google DeepMind.

Code, more results, and a bunch of other stuff are in the repository. Check it out here: https://github.com/sayakpaul/tt-scale-flux/ ๐Ÿค—