victor (Victor Mustar)

reacted to retronic's post with 😎😎 1 day ago

Post

1354

Colox is out, may be bugs!

Colox is out and ready on HF, it might have bugs though as it is not tested yet. You can try for yourself now! :)

reacted to hexgrad's post with 🔥 1 day ago

Post

2369

Wanted: Peak Data. I'm collecting audio data to train another TTS model:
+ AVM data: ChatGPT Advanced Voice Mode audio & text from source
+ Professional audio: Permissive (CC0, Apache, MIT, CC-BY)

This audio should *impress* most native speakers, not just barely pass their audio Turing tests. Professional-caliber means S or A-tier, not your average bloke off the street. Traditional TTS may not make the cut. Absolutely no low-fi microphone recordings like Common Voice.

The bar is much higher than last time, so there are no timelines yet and I expect it may take longer to collect such mythical data. Raising the bar means evicting quite a bit of old data, and voice/language availability may decrease. The theme is *quality* over quantity. I would rather have 1 hour of A/S-tier than 100 hours of mid data.

I have nothing to offer but the north star of a future Apache 2.0 TTS model, so prefer data that you *already have* and costs you *nothing extra* to send. Additionally, *all* the new data may be used to construct public, Apache 2.0 voicepacks, and if that arrangement doesn't work for you, no need to send any audio.

Last time I asked for horses; now I'm asking for unicorns. As of writing this post, I've currently got a few English & Chinese unicorns, but there is plenty of room in the stable. Find me over on Discord at rzvzn: https://discord.gg/QuGxSWBfQy

reacted to m-ric's post with 👀 1 day ago

Post

1621

𝗔𝗱𝘆𝗲𝗻'𝘀 𝗻𝗲𝘄 𝗗𝗮𝘁𝗮 𝗔𝗴𝗲𝗻𝘁𝘀 𝗕𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸 𝘀𝗵𝗼𝘄𝘀 𝘁𝗵𝗮𝘁 𝗗𝗲𝗲𝗽𝗦𝗲𝗲𝗸-𝗥𝟭 𝘀𝘁𝗿𝘂𝗴𝗴𝗹𝗲𝘀 𝗼𝗻 𝗱𝗮𝘁𝗮 𝘀𝗰𝗶𝗲𝗻𝗰𝗲 𝘁𝗮𝘀𝗸𝘀! ❌

➡️ How well do reasoning models perform on agentic tasks? Until now, all indicators seemed to show that they worked really well. On our recent reproduction of Deep Search, OpenAI's o1 was by far the best model to power an agentic system.

So when our partner Adyen built a huge benchmark of 450 data science tasks, and built data agents with smolagents to test different models, I expected reasoning models like o1 or DeepSeek-R1 to destroy the tasks at hand.

👎 But they really missed the mark. DeepSeek-R1 only got 1 or 2 out of 10 questions correct. Similarly, o1 was only at ~13% correct answers.

🧐 These results really surprised us. We thoroughly checked them, we even thought our APIs for DeepSeek were broken and colleagues Leandro Anton helped me start custom instances of R1 on our own H100s to make sure it worked well.
But there seemed to be no mistake. Reasoning LLMs actually did not seem that smart. Often, these models made basic mistakes, like forgetting the content of a folder that they had just explored, misspelling file names, or hallucinating data. Even though they do great at exploring webpages through several steps, the same level of multi-step planning seemed much harder to achieve when reasoning over files and data.

It seems like there's still lots of work to do in the Agents x Data space. Congrats to Adyen for this great benchmark, looking forward to see people proposing better agents! 🚀

Read more in the blog post 👉 https://huggingface.co/blog/dabstep

reacted to Xenova's post with 🔥 1 day ago

Post

2956

We did it. Kokoro TTS (v1.0) can now run 100% locally in your browser w/ WebGPU acceleration. Real-time text-to-speech without a server. ⚡️

Generate 10 seconds of speech in ~1 second for $0.

What will you build? 🔥
webml-community/kokoro-webgpu

The most difficult part was getting the model running in the first place, but the next steps are simple:
✂️ Implement sentence splitting, allowing for streamed responses
🌍 Multilingual support (only phonemization left)

Who wants to help?

6 replies

·

replied to sebblers's post 1 day ago

It might be because the quota is calculated before the space runs. For example, if only 2 minutes remain but the space is set to run for 3 minutes (by the author), the message still appears.

reacted to nicolay-r's post with 🧠 2 days ago

Post

1994

🚨 Key takeaway of a quick mastering Sentiment Analysis nowadays. Trough the questionare 📝 of the past RuOpinoinNE-2024 competition we got insights and participants model preference chocies. Our main conclusion:

✨ The submissions of the top performed models exploit Few-shot learning for LLM.

Takeaway note comparing with the prior RuSentNE-2023 competition:
🧠 Reasoning in steps requires more actions for tweaking. Most recent solutions empowered with Chain-of-Thouhgt are tend to think too much. Earlier we might see improvements for the Flan-T5 (2.8B) in fine-tuned mode but not among the zero-shot approaches.
nicolay-r/flan-t5-tsa-thor-xl

Related materials:
https://github.com/dialogue-evaluation/RuOpinionNE-2024
RuSentNE-2023: Evaluating Entity-Oriented Sentiment Analysis on Russian News Texts (2305.17679)
Large Language Models in Targeted Sentiment Analysis (2404.12342)

reacted to sebblers's post with 😔 2 days ago

Post

1961

Subscribed to pro a month ago because I wanted to get 25 minutes of zero gpu quota.

I get error messages saying that I have exceeded quota on ALL spaces on this site.

I haven't even used any quota. It says I have 25 minutes left to use. I can't try anything out!

Been like this for a whole month now. What is this!? What did I sign up for exactly?

5 replies

·

replied to sebblers's post 2 days ago

Mhh that's not normal your quota should reset daily. @cbensimon can probably help here. Sorry for the inconveniance.

reacted to grimjim's post with 😎 2 days ago

Post

2147

I've made yet another merge of reasoning models with incremental gains on the current Open LLM leaderboard.
open-llm-leaderboard/open_llm_leaderboard

Merging in DeepSeek R1 distillation to Llama 3.1 8B (at 10% task arithmetic weight, using the Llama 3.1 8B base model as the case rather than the instruct model) with a prior best merge resulted in a slightly lower IFEval, but a higher result in every other benchmark save for MMLU-PRO, which went down only marginally. MATH Lvl5 and GPQA went up palpably.
grimjim/DeepSauerHuatuoSkywork-R1-o1-Llama-3.1-8B

This result is currently my best Llama 3.1 8B merge result to date. The actual R1 distillation itself scored quite badly, so this would seem to be another case of unexpected formatting (reflected in IFEval) hurting the evaluation results, obscuring the strength of a model.

It is also possible to use the text generation feature of this model to generate roleplay completions. Based on informal testing, this model's bias toward problem-solving will subtly impact narration.

replied to their post 3 days ago

Specifically, the detailed status of individual spaces is now more difficult to understand visually than before. Whether it's private or not, whether you've liked it or not, whether it's RUNNING or not... etc.

Ok, I'll try to improve the contrast of it, should help?

replied to their post 3 days ago

hope you like it!

reacted to hexgrad's post with 👍 3 days ago

Post

5387

I wrote an article about G2P: https://hf.co/blog/hexgrad/g2p

G2P is an underrated piece of small TTS models, like offensive linemen who do a bunch of work and get no credit.

Instead of relying on explicit G2P, larger speech models implicitly learn this task by eating many thousands of hours of audio data. They often use a 500M+ parameter LLM at the front to predict latent audio tokens over a learned codebook, then decode these tokens into audio.

Kokoro instead relies on G2P preprocessing, is 82M parameters, and thus needs less audio to learn. Because of this, we can cherrypick high fidelity audio for training data, and deliver solid speech for those voices. In turn, this excellent audio quality & lack of background noise helps explain why Kokoro is very competitive in single-voice TTS Arenas.

1 reply

·

reacted to oleggolev's post with 🚀 5 days ago

Post

4341

🚀 Dobby-mini is out!

Last week, @SentientAGI released two demo models for the upcoming Dobby model family which we are building with your feedback: SentientAGI/dobby-mini-679af3ed45dfdd8c25e8112c

🔥 The two models (available as transformers and GGUF) are here:
- SentientAGI/Dobby-Mini-Unhinged-Llama-3.1-8B 😈
- SentientAGI/Dobby-Mini-Leashed-Llama-3.1-8B 😇

Fine-tuned from Llama-3.1-8B-Instruct while retaining benchmark performance, these personality-enhanced models are prime for building anything from AI companions and social agents to opinionated chatbots and content generators.

- 🦅 Pro-freedom
- 💸 Pro-crypto
- 💪 Opinionated and stand their ground

💻 Local Setup with Ollama:
- Written instructions: https://huggingface.co/blog/chrisaubin/hosting-dobby-mini
- Companion video: https://www.youtube.com/watch?v=b1rbtCgK2YA

🎆 Use via API on Fireworks for free!
- Unhinged: https://tinyurl.com/4h2c7tmv
- Leashed: https://tinyurl.com/2xjwsdxb

✌️ Try Dobby-mini via a Gradio demo:
- https://demo-dobby.sentient.xyz/
- No Internet search, ask it some personal questions!

Dobby-70B en route 😎

posted an update 5 days ago

Post

3652

Hey everyone, we've given https://hf.co/spaces page a fresh update!

Smart Search: Now just type what you want to do—like "make a viral meme" or "generate music"—and our search gets it.

New Categories: Check out the cool new filter bar with icons to help you pick a category fast.

Redesigned Space Cards: Reworked a bit to really show off the app descriptions, so you know what each Space does at a glance.

Random Prompt: Need ideas? Hit the dice button for a burst of inspiration.

We’d love to hear what you think—drop us some feedback plz!

5 replies

·

reacted to merve's post with 👍 7 days ago

Post

3746

This week in open AI was 🔥 Let's recap! 🤗 merve/january-31-releases-679a10669bd4030090c5de4d
LLMs 💬
> Huge: AllenAI released new Tülu models that outperform DeepSeek R1 using Reinforcement Learning with Verifiable Reward (RLVR) based on Llama 3.1 405B 🔥
> Mistral AI is back to open-source with their "small" 24B models (base & SFT), with Apache 2.0 license 😱
> Alibaba Qwen released their 1M context length models Qwen2.5-Instruct-1M, great for agentic use with Apache 2.0 license 🔥
> Arcee AI released Virtuoso-medium, 32.8B LLMs distilled from DeepSeek V3 with dataset of 5B+ tokens
> Velvet-14B is a new family of 14B Italian LLMs trained on 10T tokens in six languages
> OpenThinker-7B is fine-tuned version of Qwen2.5-7B-Instruct on OpenThoughts dataset

VLMs & vision 👀
> Alibaba Qwen is back with Qwen2.5VL, amazing new capabilities ranging from agentic computer use to zero-shot localization 🔥
> NVIDIA released new series of Eagle2 models with 1B and 9B sizes
> DeepSeek released Janus-Pro, new any-to-any model (image-text generation from image-text input) with MIT license
> BEN2 is a new background removal model with MIT license!

Audio 🗣️
> YuE is a new open-source music generation foundation model, lyrics-to-song generation

Codebase 👩🏻‍💻
> We are open-sourcing our SmolVLM training and eval codebase! https://github.com/huggingface/smollm/tree/main/vision
> Open-R1 is open-source reproduction of R1 by @huggingface science team https://huggingface.co/blog/open-r1

1 reply

·

reacted to chansung's post with 👍 7 days ago

Post

4109

A brief summary of the o3-mini

The OpenAI o3-mini model is a significant improvement over the o1-mini, reaching o1 performance levels. While generally good, its performance isn't universally better than previous models (o1, o1-prev.) or GPT-4o across all benchmarks. This means workflows should be re-evaluated with each model upgrade.

The o3-mini has "low," "medium," and "high" versions, with "low" being the base model used for benchmarking. It's speculated that the higher versions simply involve more processing. A fair comparison with other models like Gemini 2.0 Thinking or DeepSeek-R1 would likely need to use the "low" version and a similar "think more" mechanism.

The system card is recommended reading due to its comprehensive benchmark data.

https://openai.com/index/openai-o3-mini/

reacted to onekq's post with 👀 7 days ago

Post

1644

o3-mini is slightly better than R1, but lags behind Claude. Sorry folks, no new SOTA 😕

But OAI definitely owns the fashion of API. temperature and top_p are history now, reasoning_effort will be copied by other vendors.

onekq-ai/WebApp1K-models-leaderboard

4 replies

·

reacted to singhsidhukuldeep's post with 👀 7 days ago

Post

2367

Excited to share groundbreaking research in Knowledge Graph-based Retrieval-Augmented Generation (KG-RAG)!

Researchers from the University of Science and Technology of China have developed FRAG - a novel flexible modular framework that revolutionizes how Large Language Models (LLMs) reason with knowledge graphs.

What makes FRAG special? It intelligently adapts retrieval strategies based on query complexity without requiring expensive KG fine-tuning. The framework uses a reasoning-aware module to classify queries as simple or complex, then applies tailored retrieval pipelines.

Under the hood:
- For simple queries: Uses breadth-first search and ranking to efficiently find relevant paths
- For complex queries: Employs shortest path algorithms to minimize computational overhead
- Features a preprocessing-retrieval-postprocessing pipeline with flexible components
- Leverages traditional algorithms like PersonalizedPageRank for subgraph extraction
- Implements edge and path ranking models for precise information filtering

The results are impressive - FRAG achieves state-of-the-art performance while maintaining high efficiency and low resource consumption. On benchmark datasets like WebQSP and CWQ, it outperforms existing approaches by significant margins.

Most importantly, FRAG maintains flexibility and modularity while improving retrieval quality - no expensive LLM fine-tuning required! This makes it highly practical for real-world applications.

This work represents a major step forward in making LLMs more reliable and capable of complex reasoning tasks. Looking forward to seeing how this technology evolves!

2 replies

·

reacted to prithivMLmods's post with 🤗 7 days ago

Post

4700

o3-Mini and Deepseek R1
Worked out with some famous and weird examples.

🔥Blog: https://huggingface.co/blog/prithivMLmods/o3-mini-vs-deepseek-r1

Prompt : Using HTML, CSS, and JavaScript in a single HTML file to create a simulation of the solar system. Pay extreme attention to the UI to make it as intuitive as possible. Ensure that every planet appears as a sphere and is labeled with its corresponding name.

example 1: o3 Mini , example 2: Deepseek R1

Q2 : https://huggingface.co/blog/prithivMLmods/o3-mini-vs-deepseek-r1#q2--web-solar-system-explorer

1 reply

·

Victor Mustar PRO

AI & ML interests

Recent Activity

Organizations

victor's activity