Definitely NotJ

CrappyREincarnation

AI & ML interests

None yet

Recent Activity

View all activity

Organizations

None yet

CrappyREincarnation's activity

reacted to chansung's post with 👍 3 days ago
view post
Post
3911
A brief summary of the o3-mini

The OpenAI o3-mini model is a significant improvement over the o1-mini, reaching o1 performance levels. While generally good, its performance isn't universally better than previous models (o1, o1-prev.) or GPT-4o across all benchmarks. This means workflows should be re-evaluated with each model upgrade.

The o3-mini has "low," "medium," and "high" versions, with "low" being the base model used for benchmarking. It's speculated that the higher versions simply involve more processing. A fair comparison with other models like Gemini 2.0 Thinking or DeepSeek-R1 would likely need to use the "low" version and a similar "think more" mechanism.

The system card is recommended reading due to its comprehensive benchmark data.

https://openai.com/index/openai-o3-mini/
reacted to kristaller486's post with 🚀 3 days ago
view post
Post
1281
Nebo-T1-Russian

(Probably) the first "longCoT" dataset for the Russian language created via Deeseek-R1.

- Prompts taken from the Sky-T1 dataset and translated via Llama3.3-70B.
- Answers and reasoning generated by Deepseek-R1 (685B).
- 16.4K samples in total, ≈12.4K Russian-only (in the rest, either the answer or reasoning is in English).
- Languages in the answers and reasoning are labeled using fasttext.

kristaller486/Nebo-T1-Russian
reacted to nicolay-r's post with 🔥 3 days ago
view post
Post
1558
📢 The LLaMA-3.1-8B distilled 8B version of the R1 DeepSeek AI is available besides the one based on Qwen

📙 Notebook for using it in reasoning over series of data 🧠 :
https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_deep_seek_7b_distill_llama3.ipynb

Loading using the pipeline API of the transformers library:
https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/transformers_llama.py
🟡 GPU Usage: 12.3 GB (FP16/FP32 mode) which is suitable for T4. (a 1.5 GB less than Qwen-distilled version)
🐌 Perfomance: T4 instance: ~0.19 tokens/sec (FP32 mode) and (FP16 mode) ~0.22-0.30 tokens/sec. Is it should be that slow? 🤔
Model name: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
⭐ Framework: https://github.com/nicolay-r/bulk-chain
🌌 Notebooks and models hub: https://github.com/nicolay-r/nlp-thirdgate
reacted to fffiloni's post with 🚀 3 days ago
view post
Post
2078
Explain like i'm 5 the last take from @thomwolf on X about Dario's essay on DeepSeek:

—› Open-source AI is like a big cookbook that everyone can read and improve. Instead of a few chefs keeping their recipes secret, anyone can cook, test, and invent new things.

If only one company controls AI, everything stops if they have a problem—like when the internet goes down. With open-source, many people can help, making sure it keeps running smoothly.

AI isn’t just a race between two countries; it’s a team effort around the world. By sharing, we move faster and create safer technology for everyone.

🤗
reacted to luigi12345's post with 🔥 4 days ago
view post
Post
1781
🚀 OpenAI o3-mini Just Dropped – Here’s What You Need to Know!

OpenAI just launched o3-mini, a faster, smarter upgrade over o1-mini. It’s better at math, coding, and logic, making it more reliable for structured tasks. Now available in ChatGPT & API, with function calling, structured outputs, and system messages.

🔥 Why does this matter?
✅ Stronger in logic, coding, and structured reasoning
✅ Function calling now works reliably for API responses
✅ More stable & efficient for production tasks
✅ Faster responses with better accuracy

⚠️ Who should use it?
✔️ Great for coding, API calls, and structured Q&A
❌ Not meant for long conversations or complex reasoning (GPT-4 is better)

💡 Free users: Try it under “Reason” mode in ChatGPT
💡 Plus/Team users: Daily message limit tripled to 150/day!
  • 1 reply
·
reacted to merve's post with 👍 4 days ago
view post
Post
3510
This week in open AI was 🔥 Let's recap! 🤗 merve/january-31-releases-679a10669bd4030090c5de4d
LLMs 💬
> Huge: AllenAI released new Tülu models that outperform DeepSeek R1 using Reinforcement Learning with Verifiable Reward (RLVR) based on Llama 3.1 405B 🔥
> Mistral AI is back to open-source with their "small" 24B models (base & SFT), with Apache 2.0 license 😱
> Alibaba Qwen released their 1M context length models Qwen2.5-Instruct-1M, great for agentic use with Apache 2.0 license 🔥
> Arcee AI released Virtuoso-medium, 32.8B LLMs distilled from DeepSeek V3 with dataset of 5B+ tokens
> Velvet-14B is a new family of 14B Italian LLMs trained on 10T tokens in six languages
> OpenThinker-7B is fine-tuned version of Qwen2.5-7B-Instruct on OpenThoughts dataset

VLMs & vision 👀
> Alibaba Qwen is back with Qwen2.5VL, amazing new capabilities ranging from agentic computer use to zero-shot localization 🔥
> NVIDIA released new series of Eagle2 models with 1B and 9B sizes
> DeepSeek released Janus-Pro, new any-to-any model (image-text generation from image-text input) with MIT license
> BEN2 is a new background removal model with MIT license!

Audio 🗣️
> YuE is a new open-source music generation foundation model, lyrics-to-song generation

Codebase 👩🏻‍💻
> We are open-sourcing our SmolVLM training and eval codebase! https://github.com/huggingface/smollm/tree/main/vision
> Open-R1 is open-source reproduction of R1 by @huggingface science team https://huggingface.co/blog/open-r1
  • 1 reply
·
reacted to jasoncorkill's post with 🚀 4 days ago
view post
Post
2612
We benchmarked @xai-org 's Aurora model, as far as we know the first public evaluation of the model at scale.

We collected 401k human annotations in over the past ~2 days for this, we have uploaded all of the annotation data here on huggingface with a fully permissive license
Rapidata/xAI_Aurora_t2i_human_preferences
reacted to fdaudens's post with 🔥 4 days ago
view post
Post
3136
🎯 Kokoro TTS just hit v1.0! 🚀

Small but mighty: 82M parameters, runs locally, speaks multiple languages. The best part? It's Apache 2.0 licensed!
This could unlock so many possibilities ✨

Check it out: hexgrad/Kokoro-82M
  • 1 reply
·
reacted to hexgrad's post with 👀 4 days ago
view post
Post
2527
Technical question: Is Abliteration still an effective method for uncensoring LLMs? Generally, what are the most effective methods to uncensor LLMs?

An effective uncensoring method would ideally be low-cost, data-efficient, and above all, successfully uncensor an LLM with minimal benchmark regressions.

"Tiananmen Square", "Winnie-the-Pooh", etc and more broadly "China influence/censorship" are some common criticisms leveled at DeepSeek.

I am vaguely aware of "Abliteration", a technique coined by @failspy (apologies if that attribution is incorrect) and originally described in a mid-2024 paper titled "Refusal in Language Models Is Mediated by a Single Direction" https://arxiv.org/abs/2406.11717

Abliteration is proposed as a relatively cheap and effective way to bypass censorship in models. However, it is not without criticism: https://www.reddit.com/r/LocalLLaMA/comments/1f07b4b/abliteration_fails_to_uncensor_models_while_it/

Curious to hear people's takes on Abliteration or other uncensoring methods, especially as it relates to DeepSeek.
·