Victor Mustar's picture

Victor Mustar PRO

victor

AI & ML interests

Building the UX of this website

Recent Activity

Organizations

Hugging Face's profile picture Google's profile picture Competitions's profile picture Safetensors's profile picture 21 RNN's profile picture Spaces-explorers's profile picture Text Generation Inference's profile picture CVPR Demo Track's profile picture Spaces Examples's profile picture Hugging Chat's profile picture Webhooks Explorers (BETA)'s profile picture lora concepts library's profile picture Scanned Tokens's profile picture Huggingface Projects's profile picture hf admins's profile picture Hugging Face OSS Metrics's profile picture Stable Diffusion Dreambooth Concepts Library's profile picture Core ML Projects's profile picture temp-org's profile picture Blog-explorers's profile picture Mustarz's profile picture Open LLM Leaderboard's profile picture Enterprise Explorers's profile picture The Collectionists's profile picture ZeroGPU Explorers's profile picture Hugging Face Tools's profile picture TstOrg141's profile picture Stable Video benchmark's profile picture Social Post Explorers's profile picture Dev Mode Explorers's profile picture LLHF's profile picture SLLHF's profile picture Self-serve FTW's profile picture Inference Explorers's profile picture

victor's activity

reacted to retronic's post with ๐Ÿ˜Ž๐Ÿ˜Ž 1 day ago
view post
Post
1354
Colox is out, may be bugs!

Colox is out and ready on HF, it might have bugs though as it is not tested yet. You can try for yourself now! :)
reacted to hexgrad's post with ๐Ÿ”ฅ 1 day ago
view post
Post
2369
Wanted: Peak Data. I'm collecting audio data to train another TTS model:
+ AVM data: ChatGPT Advanced Voice Mode audio & text from source
+ Professional audio: Permissive (CC0, Apache, MIT, CC-BY)

This audio should *impress* most native speakers, not just barely pass their audio Turing tests. Professional-caliber means S or A-tier, not your average bloke off the street. Traditional TTS may not make the cut. Absolutely no low-fi microphone recordings like Common Voice.

The bar is much higher than last time, so there are no timelines yet and I expect it may take longer to collect such mythical data. Raising the bar means evicting quite a bit of old data, and voice/language availability may decrease. The theme is *quality* over quantity. I would rather have 1 hour of A/S-tier than 100 hours of mid data.

I have nothing to offer but the north star of a future Apache 2.0 TTS model, so prefer data that you *already have* and costs you *nothing extra* to send. Additionally, *all* the new data may be used to construct public, Apache 2.0 voicepacks, and if that arrangement doesn't work for you, no need to send any audio.

Last time I asked for horses; now I'm asking for unicorns. As of writing this post, I've currently got a few English & Chinese unicorns, but there is plenty of room in the stable. Find me over on Discord at rzvzn: https://discord.gg/QuGxSWBfQy
reacted to m-ric's post with ๐Ÿ‘€ 1 day ago
view post
Post
1621
๐—”๐—ฑ๐˜†๐—ฒ๐—ป'๐˜€ ๐—ป๐—ฒ๐˜„ ๐——๐—ฎ๐˜๐—ฎ ๐—”๐—ด๐—ฒ๐—ป๐˜๐˜€ ๐—•๐—ฒ๐—ป๐—ฐ๐—ต๐—บ๐—ฎ๐—ฟ๐—ธ ๐˜€๐—ต๐—ผ๐˜„๐˜€ ๐˜๐—ต๐—ฎ๐˜ ๐——๐—ฒ๐—ฒ๐—ฝ๐—ฆ๐—ฒ๐—ฒ๐—ธ-๐—ฅ๐Ÿญ ๐˜€๐˜๐—ฟ๐˜‚๐—ด๐—ด๐—น๐—ฒ๐˜€ ๐—ผ๐—ป ๐—ฑ๐—ฎ๐˜๐—ฎ ๐˜€๐—ฐ๐—ถ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐˜๐—ฎ๐˜€๐—ธ๐˜€! โŒ

โžก๏ธ How well do reasoning models perform on agentic tasks? Until now, all indicators seemed to show that they worked really well. On our recent reproduction of Deep Search, OpenAI's o1 was by far the best model to power an agentic system.

So when our partner Adyen built a huge benchmark of 450 data science tasks, and built data agents with smolagents to test different models, I expected reasoning models like o1 or DeepSeek-R1 to destroy the tasks at hand.

๐Ÿ‘Ž But they really missed the mark. DeepSeek-R1 only got 1 or 2 out of 10 questions correct. Similarly, o1 was only at ~13% correct answers.

๐Ÿง These results really surprised us. We thoroughly checked them, we even thought our APIs for DeepSeek were broken and colleagues Leandro Anton helped me start custom instances of R1 on our own H100s to make sure it worked well.
But there seemed to be no mistake. Reasoning LLMs actually did not seem that smart. Often, these models made basic mistakes, like forgetting the content of a folder that they had just explored, misspelling file names, or hallucinating data. Even though they do great at exploring webpages through several steps, the same level of multi-step planning seemed much harder to achieve when reasoning over files and data.

It seems like there's still lots of work to do in the Agents x Data space. Congrats to Adyen for this great benchmark, looking forward to see people proposing better agents! ๐Ÿš€

Read more in the blog post ๐Ÿ‘‰ https://huggingface.co/blog/dabstep
reacted to Xenova's post with ๐Ÿ”ฅ 1 day ago
view post
Post
2956
We did it. Kokoro TTS (v1.0) can now run 100% locally in your browser w/ WebGPU acceleration. Real-time text-to-speech without a server. โšก๏ธ

Generate 10 seconds of speech in ~1 second for $0.

What will you build? ๐Ÿ”ฅ
webml-community/kokoro-webgpu

The most difficult part was getting the model running in the first place, but the next steps are simple:
โœ‚๏ธ Implement sentence splitting, allowing for streamed responses
๐ŸŒ Multilingual support (only phonemization left)

Who wants to help?
ยท
replied to sebblers's post 1 day ago
view reply

It might be because the quota is calculated before the space runs. For example, if only 2 minutes remain but the space is set to run for 3 minutes (by the author), the message still appears.

reacted to nicolay-r's post with ๐Ÿง  2 days ago
view post
Post
1994
๐Ÿšจ Key takeaway of a quick mastering Sentiment Analysis nowadays. Trough the questionare ๐Ÿ“ of the past RuOpinoinNE-2024 competition we got insights and participants model preference chocies. Our main conclusion:

โœจ The submissions of the top performed models exploit Few-shot learning for LLM.

Takeaway note comparing with the prior RuSentNE-2023 competition:
๐Ÿง  Reasoning in steps requires more actions for tweaking. Most recent solutions empowered with Chain-of-Thouhgt are tend to think too much. Earlier we might see improvements for the Flan-T5 (2.8B) in fine-tuned mode but not among the zero-shot approaches.
nicolay-r/flan-t5-tsa-thor-xl

Related materials:
https://github.com/dialogue-evaluation/RuOpinionNE-2024
RuSentNE-2023: Evaluating Entity-Oriented Sentiment Analysis on Russian News Texts (2305.17679)
Large Language Models in Targeted Sentiment Analysis (2404.12342)
reacted to sebblers's post with ๐Ÿ˜” 2 days ago
view post
Post
1961
Subscribed to pro a month ago because I wanted to get 25 minutes of zero gpu quota.

I get error messages saying that I have exceeded quota on ALL spaces on this site.

I haven't even used any quota. It says I have 25 minutes left to use. I can't try anything out!

Been like this for a whole month now. What is this!? What did I sign up for exactly?
ยท
replied to sebblers's post 2 days ago
view reply

Mhh that's not normal your quota should reset daily. @cbensimon can probably help here. Sorry for the inconveniance.

reacted to grimjim's post with ๐Ÿ˜Ž 2 days ago
view post
Post
2147
I've made yet another merge of reasoning models with incremental gains on the current Open LLM leaderboard.
open-llm-leaderboard/open_llm_leaderboard

Merging in DeepSeek R1 distillation to Llama 3.1 8B (at 10% task arithmetic weight, using the Llama 3.1 8B base model as the case rather than the instruct model) with a prior best merge resulted in a slightly lower IFEval, but a higher result in every other benchmark save for MMLU-PRO, which went down only marginally. MATH Lvl5 and GPQA went up palpably.
grimjim/DeepSauerHuatuoSkywork-R1-o1-Llama-3.1-8B

This result is currently my best Llama 3.1 8B merge result to date. The actual R1 distillation itself scored quite badly, so this would seem to be another case of unexpected formatting (reflected in IFEval) hurting the evaluation results, obscuring the strength of a model.

It is also possible to use the text generation feature of this model to generate roleplay completions. Based on informal testing, this model's bias toward problem-solving will subtly impact narration.
replied to their post 3 days ago
view reply

Specifically, the detailed status of individual spaces is now more difficult to understand visually than before. Whether it's private or not, whether you've liked it or not, whether it's RUNNING or not... etc.

Ok, I'll try to improve the contrast of it, should help?

replied to their post 3 days ago
reacted to hexgrad's post with ๐Ÿ‘ 3 days ago
view post
Post
5387
I wrote an article about G2P: https://hf.co/blog/hexgrad/g2p

G2P is an underrated piece of small TTS models, like offensive linemen who do a bunch of work and get no credit.

Instead of relying on explicit G2P, larger speech models implicitly learn this task by eating many thousands of hours of audio data. They often use a 500M+ parameter LLM at the front to predict latent audio tokens over a learned codebook, then decode these tokens into audio.

Kokoro instead relies on G2P preprocessing, is 82M parameters, and thus needs less audio to learn. Because of this, we can cherrypick high fidelity audio for training data, and deliver solid speech for those voices. In turn, this excellent audio quality & lack of background noise helps explain why Kokoro is very competitive in single-voice TTS Arenas.
  • 1 reply
ยท
reacted to oleggolev's post with ๐Ÿš€ 5 days ago
view post
Post
4341
๐Ÿš€ Dobby-mini is out!

Last week, @SentientAGI released two demo models for the upcoming Dobby model family which we are building with your feedback: SentientAGI/dobby-mini-679af3ed45dfdd8c25e8112c

๐Ÿ”ฅ The two models (available as transformers and GGUF) are here:
- SentientAGI/Dobby-Mini-Unhinged-Llama-3.1-8B ๐Ÿ˜ˆ
- SentientAGI/Dobby-Mini-Leashed-Llama-3.1-8B ๐Ÿ˜‡

Fine-tuned from Llama-3.1-8B-Instruct while retaining benchmark performance, these personality-enhanced models are prime for building anything from AI companions and social agents to opinionated chatbots and content generators.

- ๐Ÿฆ… Pro-freedom
- ๐Ÿ’ธ Pro-crypto
- ๐Ÿ’ช Opinionated and stand their ground

๐Ÿ’ป Local Setup with Ollama:
- Written instructions: https://huggingface.co/blog/chrisaubin/hosting-dobby-mini
- Companion video: https://www.youtube.com/watch?v=b1rbtCgK2YA

๐ŸŽ† Use via API on Fireworks for free!
- Unhinged: https://tinyurl.com/4h2c7tmv
- Leashed: https://tinyurl.com/2xjwsdxb

โœŒ๏ธ Try Dobby-mini via a Gradio demo:
- https://demo-dobby.sentient.xyz/
- No Internet search, ask it some personal questions!

Dobby-70B en route ๐Ÿ˜Ž
posted an update 5 days ago
view post
Post
3652
Hey everyone, we've given https://hf.co/spaces page a fresh update!

Smart Search: Now just type what you want to doโ€”like "make a viral meme" or "generate music"โ€”and our search gets it.

New Categories: Check out the cool new filter bar with icons to help you pick a category fast.

Redesigned Space Cards: Reworked a bit to really show off the app descriptions, so you know what each Space does at a glance.

Random Prompt: Need ideas? Hit the dice button for a burst of inspiration.

Weโ€™d love to hear what you thinkโ€”drop us some feedback plz!
ยท
reacted to merve's post with ๐Ÿ‘ 7 days ago
view post
Post
3746
This week in open AI was ๐Ÿ”ฅ Let's recap! ๐Ÿค— merve/january-31-releases-679a10669bd4030090c5de4d
LLMs ๐Ÿ’ฌ
> Huge: AllenAI released new Tรผlu models that outperform DeepSeek R1 using Reinforcement Learning with Verifiable Reward (RLVR) based on Llama 3.1 405B ๐Ÿ”ฅ
> Mistral AI is back to open-source with their "small" 24B models (base & SFT), with Apache 2.0 license ๐Ÿ˜ฑ
> Alibaba Qwen released their 1M context length models Qwen2.5-Instruct-1M, great for agentic use with Apache 2.0 license ๐Ÿ”ฅ
> Arcee AI released Virtuoso-medium, 32.8B LLMs distilled from DeepSeek V3 with dataset of 5B+ tokens
> Velvet-14B is a new family of 14B Italian LLMs trained on 10T tokens in six languages
> OpenThinker-7B is fine-tuned version of Qwen2.5-7B-Instruct on OpenThoughts dataset

VLMs & vision ๐Ÿ‘€
> Alibaba Qwen is back with Qwen2.5VL, amazing new capabilities ranging from agentic computer use to zero-shot localization ๐Ÿ”ฅ
> NVIDIA released new series of Eagle2 models with 1B and 9B sizes
> DeepSeek released Janus-Pro, new any-to-any model (image-text generation from image-text input) with MIT license
> BEN2 is a new background removal model with MIT license!

Audio ๐Ÿ—ฃ๏ธ
> YuE is a new open-source music generation foundation model, lyrics-to-song generation

Codebase ๐Ÿ‘ฉ๐Ÿปโ€๐Ÿ’ป
> We are open-sourcing our SmolVLM training and eval codebase! https://github.com/huggingface/smollm/tree/main/vision
> Open-R1 is open-source reproduction of R1 by @huggingface science team https://huggingface.co/blog/open-r1
  • 1 reply
ยท
reacted to chansung's post with ๐Ÿ‘ 7 days ago
view post
Post
4109
A brief summary of the o3-mini

The OpenAI o3-mini model is a significant improvement over the o1-mini, reaching o1 performance levels. While generally good, its performance isn't universally better than previous models (o1, o1-prev.) or GPT-4o across all benchmarks. This means workflows should be re-evaluated with each model upgrade.

The o3-mini has "low," "medium," and "high" versions, with "low" being the base model used for benchmarking. It's speculated that the higher versions simply involve more processing. A fair comparison with other models like Gemini 2.0 Thinking or DeepSeek-R1 would likely need to use the "low" version and a similar "think more" mechanism.

The system card is recommended reading due to its comprehensive benchmark data.

https://openai.com/index/openai-o3-mini/
reacted to onekq's post with ๐Ÿ‘€ 7 days ago
view post
Post
1644
o3-mini is slightly better than R1, but lags behind Claude. Sorry folks, no new SOTA ๐Ÿ˜•

But OAI definitely owns the fashion of API. temperature and top_p are history now, reasoning_effort will be copied by other vendors.

onekq-ai/WebApp1K-models-leaderboard
ยท
reacted to singhsidhukuldeep's post with ๐Ÿ‘€ 7 days ago
view post
Post
2367
Excited to share groundbreaking research in Knowledge Graph-based Retrieval-Augmented Generation (KG-RAG)!

Researchers from the University of Science and Technology of China have developed FRAG - a novel flexible modular framework that revolutionizes how Large Language Models (LLMs) reason with knowledge graphs.

What makes FRAG special? It intelligently adapts retrieval strategies based on query complexity without requiring expensive KG fine-tuning. The framework uses a reasoning-aware module to classify queries as simple or complex, then applies tailored retrieval pipelines.

Under the hood:
- For simple queries: Uses breadth-first search and ranking to efficiently find relevant paths
- For complex queries: Employs shortest path algorithms to minimize computational overhead
- Features a preprocessing-retrieval-postprocessing pipeline with flexible components
- Leverages traditional algorithms like PersonalizedPageRank for subgraph extraction
- Implements edge and path ranking models for precise information filtering

The results are impressive - FRAG achieves state-of-the-art performance while maintaining high efficiency and low resource consumption. On benchmark datasets like WebQSP and CWQ, it outperforms existing approaches by significant margins.

Most importantly, FRAG maintains flexibility and modularity while improving retrieval quality - no expensive LLM fine-tuning required! This makes it highly practical for real-world applications.

This work represents a major step forward in making LLMs more reliable and capable of complex reasoning tasks. Looking forward to seeing how this technology evolves!
  • 2 replies
ยท
reacted to prithivMLmods's post with ๐Ÿค— 7 days ago
view post
Post
4700
o3-Mini and Deepseek R1
Worked out with some famous and weird examples.

๐Ÿ”ฅBlog: https://huggingface.co/blog/prithivMLmods/o3-mini-vs-deepseek-r1

Prompt : Using HTML, CSS, and JavaScript in a single HTML file to create a simulation of the solar system. Pay extreme attention to the UI to make it as intuitive as possible. Ensure that every planet appears as a sphere and is labeled with its corresponding name.

example 1: o3 Mini , example 2: Deepseek R1

Q2 : https://huggingface.co/blog/prithivMLmods/o3-mini-vs-deepseek-r1#q2--web-solar-system-explorer
  • 1 reply
ยท