Loubna Ben Allal

loubnabnl

AI & ML interests

SmolLMs, ML for code, data

Recent Activity

Articles

Organizations

Hugging Face's profile picture BigScience Workshop's profile picture BigScience Catalogue Data's profile picture BigScience Data's profile picture HuggingFaceBR4's profile picture Team 8's profile picture CodeParrot's profile picture BigCode's profile picture Hugging Face H4's profile picture CompVis Community's profile picture BigCode Data's profile picture LocalCodeLLMs's profile picture Need4Speed's profile picture Code Llama's profile picture Hugging Face TB Research's profile picture Hugging Face Smol Cluster's profile picture Nt3awnou's profile picture huggingPartyParis's profile picture Qwen's profile picture ZeroGPU Explorers's profile picture HF AFAIK's profile picture gg-hf's profile picture Nanotron Research's profile picture Women on Hugging Face's profile picture Hugging Face SMOL's profile picture HuggingFaceFW's profile picture bigcode nvidia's profile picture Social Post Explorers's profile picture Dev Mode Explorers's profile picture Cosmopedia Stories Collab's profile picture StarCoder2 Data's profile picture Data Agents's profile picture Argilla Warehouse's profile picture smol-explorers's profile picture swissai-hf-data's profile picture Hugging Face Science's profile picture

loubnabnl's activity

reacted to ginipick's post with ๐Ÿ”ฅ 3 days ago
view post
Post
4085
๐ŸŒŸ Digital Odyssey: AI Image & Video Generation Platform ๐ŸŽจ
Welcome to our all-in-one AI platform for image and video generation! ๐Ÿš€
โœจ Key Features

๐ŸŽจ High-quality image generation from text
๐ŸŽฅ Video creation from still images
๐ŸŒ Multi-language support with automatic translation
๐Ÿ› ๏ธ Advanced customization options

๐Ÿ’ซ Unique Advantages

โšก Fast and accurate results using FLUX.1-dev and Hyper-SD models
๐Ÿ”’ Robust content safety filtering system
๐ŸŽฏ Intuitive user interface
๐Ÿ› ๏ธ Extended toolkit including image upscaling and logo generation

๐ŸŽฎ How to Use

Enter your image or video description
Adjust settings as needed
Click generate
Save and share your results automatically

๐Ÿ”ง Tech Stack

FluxPipeline
Gradio
PyTorch
OpenCV

link: ginigen/Dokdo

Turn your imagination into reality with AI! โœจ
#AI #ImageGeneration #VideoGeneration #MachineLearning #CreativeTech
  • 7 replies
ยท
reacted to anton-l's post with ๐Ÿš€๐Ÿ”ฅ 6 days ago
view post
Post
1953
Introducing ๐Ÿ“๐…๐ข๐ง๐ž๐Œ๐š๐ญ๐ก: the best public math pre-training dataset with 50B+ tokens!
HuggingFaceTB/finemath

Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.

We build the dataset by:
๐Ÿ› ๏ธ carefully extracting math data from Common Crawl;
๐Ÿ”Ž iteratively filtering and recalling high quality math pages using a classifier trained on synthetic annotations to identify math reasoning and deduction.

We conducted a series of ablations comparing the performance of Llama-3.2-3B-Base after continued pre-training on FineMath and observe notable gains compared to the baseline model and other public math datasets.

We hope this helps advance the performance of LLMs on math and reasoning! ๐Ÿš€
Weโ€™re also releasing all the ablation models as well as the evaluation code.

HuggingFaceTB/finemath-6763fb8f71b6439b653482c2
reacted to julien-c's post with ๐Ÿ”ฅโค๏ธ๐Ÿค— 14 days ago
view post
Post
7577
After some heated discussion ๐Ÿ”ฅ, we clarify our intent re. storage limits on the Hub

TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)

docs: https://huggingface.co/docs/hub/storage-limits

We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community ๐Ÿ”ฅ

cc: @reach-vb @pierric @victor and the HF team
ยท
reacted to clem's post with ๐Ÿ”ฅโค๏ธ 23 days ago
view post
Post
4348
Hugging Face is becoming the best place to share the most viral AI apps with spaces.

Kolors Virtual Try-on just crossed 6,000,000 unique visitors & is now the #5 most popular space. Congrats to the Kwai Kolors team!

Kwai-Kolors/Kolors-Virtual-Try-On
  • 2 replies
ยท
reacted to merve's post with ๐Ÿ”ฅ 23 days ago
view post
Post
2863
Last week we were blessed with open-source models! A recap ๐Ÿ’
merve/nov-29-releases-674ccc255a57baf97b1e2d31

๐Ÿ–ผ๏ธ Multimodal
> At Hugging Face we released SmolVLM, a performant and efficient smol vision language model ๐Ÿ’—
> Show Lab released ShowUI-2B: new vision-language-action model to build GUI/web automation agents ๐Ÿค–
> Rhymes AI has released the base model of Aria: Aria-Base-64K and Aria-Base-8K with their respective context length
> ViDoRe team released ColSmolVLM: A new ColPali-like retrieval model based on SmolVLM
> Dataset: Llava-CoT-o1-Instruct: new dataset labelled using Llava-CoT multimodal reasoning model๐Ÿ“–
> Dataset: LLaVA-CoT-100k dataset used to train Llava-CoT released by creators of Llava-CoT ๐Ÿ“•

๐Ÿ’ฌ LLMs
> Qwen team released QwQ-32B-Preview, state-of-the-art open-source reasoning model, broke the internet ๐Ÿ”ฅ
> AliBaba has released Marco-o1, a new open-source reasoning model ๐Ÿ’ฅ
> NVIDIA released Hymba 1.5B Base and Instruct, the new state-of-the-art SLMs with hybrid architecture (Mamba + transformer)

โฏ๏ธ Image/Video Generation
> Qwen2VL-Flux: new image generation model based on Qwen2VL image encoder, T5 and Flux for generation
> Lightricks released LTX-Video, a new DiT-based video generation model that can generate 24 FPS videos at 768x512 res โฏ๏ธ
> Dataset: Image Preferences is a new image generation preference dataset made with DIBT community effort of Argilla ๐Ÿท๏ธ

Audio
> OuteAI released OuteTTS-0.2-500M new multilingual text-to-speech model based on Qwen-2.5-0.5B trained on 5B audio prompt tokens
reacted to julien-c's post with ๐Ÿ‘€๐Ÿ”ฅ 24 days ago
view post
Post
2185
wow ๐Ÿ˜ฎ

INTELLECT-1 is the first collaboratively trained 10 billion parameter language model trained from scratch on 1 trillion tokens of English text and code.

PrimeIntellect/INTELLECT-1-Instruct
reacted to merve's post with ๐Ÿ”ฅ 29 days ago
view post
Post
3868
Small yet mighty! ๐Ÿ’ซ

We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient ๐Ÿค 

We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base HuggingFaceTB/smolvlm-6740bd584b2dcbf51ecb1f39

Learn more from our blog here: huggingface.co/blog/smolvlm
This release comes with a demo, fine-tuning code, MLX integration and TRL integration for DPO ๐Ÿ’
Try the demo: HuggingFaceTB/SmolVLM
Fine-tuning Recipe: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Also TRL integration for DPO ๐Ÿ’—
reacted to thomwolf's post with ๐Ÿ”ฅ 30 days ago
reacted to openfree's post with ๐Ÿ‘€๐Ÿ”ฅ 30 days ago
view post
Post
3144
๐Ÿค— HuggingFace Trending TOP 300 Board - Featuring AI Rating System
๐Ÿ“Š Service Introduction
A comprehensive dashboard that provides at-a-glance access to the real-time TOP 300 trending Spaces, Models, and Datasets on HuggingFace.
Our specially developed AI rating system evaluates the practical value and growth potential of each item.
โญ Key Features
1. AI Rising Rate

Growth potential evaluation based on creation date and ranking
5-tier star rating system (โ˜…โ˜…โ˜…โ˜…โ˜…)
Evaluation Criteria:

Recency: Higher relative weights for recently created items
Ranking Impact: Higher relative weights for top rankings
Comprehensive assessment using statistical/analytical models applied to AI



2. AI Popularity Score

Comprehensive evaluation combining objective popularity and Rising Rate
18-tier grading system from AAA+ to B-
Evaluation Elements:

Base Score: Benchmark based on likes, downloads, comments, etc.
Additional Score: Rising Rate applied as a weighted factor
Comprehensive assessment using statistical/analytical models applied to AI



3. Visualization Features

Real-time screenshot capture with caching
Intuitive card-based UI
Responsive grid layout
Pastel gradient design

๐ŸŽฏ Applications

AI/ML Project Trend Analysis
Early Discovery of Promising Models/Datasets
Community Activity Monitoring
Research/Development Direction Reference

๐Ÿ’ก Key Advantages

Real-time TOP 300 ranking
AI-based objective evaluation system
Fast loading with caching system
Intuitive and modern UI/UX
Integrated dashboard for 3 categories

๐Ÿ”„ Update Cycle

Real-time data reflection
Manual refresh option
Minimized server load through screenshot caching

๐ŸŽ Future Plans

Addition of detailed analysis report feature
Custom filtering options
Time-series trend analysis
Category-specific detailed statistics

๐ŸŒ How to Access
openfree/trending-board

#HuggingFace #AI #MachineLearning #TrendingBoard #DataScience #
  • 3 replies
ยท
posted an update about 1 month ago
view post
Post
1632
Making SmolLM2 reproducible: open-sourcing our training & evaluation toolkit ๐Ÿ› ๏ธ https://github.com/huggingface/smollm/

- Pre-training code with nanotron
- Evaluation suite with lighteval
- Synthetic data generation using distilabel (powers our new SFT dataset HuggingFaceTB/smoltalk)
- Post-training scripts with TRL & the alignment handbook
- On-device tools with llama.cpp for summarization, rewriting & agents

Apache 2.0 licensed. V2 pre-training data mix coming soon!

Which other tools should we add next?
reacted to prithivMLmods's post with ๐Ÿ”ฅ about 1 month ago
view post
Post
2904
Weekend Dribble ๐Ÿ“ฆ๐Ÿบ

Adapters for Product Ad Backdrops, Smooth Polaroids, Minimalist Sketch cards, Super Blends!!

๐ŸคDemo on: prithivMLmods/FLUX-LoRA-DLC

Stranger Zones :
๐Ÿ‘‰๐Ÿผ{ Super Blend } : strangerzonehf/Flux-Super-Blend-LoRA

๐Ÿ‘‰๐Ÿผ{ Product Concept Ad } : prithivMLmods/Flux-Product-Ad-Backdrop
๐Ÿ‘‰๐Ÿผ{ Frosted Mock-ups } : prithivMLmods/Flux.1-Dev-Frosted-Container-LoRA
๐Ÿ‘‰๐Ÿผ{ Polaroid Plus } : prithivMLmods/Flux-Polaroid-Plus
๐Ÿ‘‰๐Ÿผ{Sketch Cards} : prithivMLmods/Flux.1-Dev-Sketch-Card-LoRA

๐Ÿ‘‰Stranger Zone: https://huggingface.co/strangerzonehf

๐Ÿ‘‰Flux LoRA Collections: prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be

.
.
.
@prithivMLmods ๐Ÿค—
reacted to merve's post with โค๏ธ๐Ÿš€ about 1 month ago
view post
Post
3101
your hugging face profile now has your recent activities ๐Ÿค—
reacted to merve's post with ๐Ÿ”ฅ about 1 month ago
view post
Post
2578
What a week! A recap for everything you missed โ„๏ธ
merve/nov-22-releases-673fbbcfc1c97c4f411def07
Multimodal โœจ
> Mistral AI
released Pixtral 124B, a gigantic open vision language model
> Llava-CoT (formerly known as Llava-o1) was released, a multimodal reproduction of o1 model by PKU
> OpenGVLab released MMPR: a new multimodal reasoning dataset
> Jina has released Jina-CLIP-v2 0.98B multilingual multimodal embeddings
> Apple released new SotA vision encoders AIMv2

LLMs ๐Ÿฆ™
> AllenAI dropped a huge release of models, datasets and scripts for Tรผlu, a family of models based on Llama 3.1 aligned with SFT, DPO and a new technique they have developed called RLVR
> Jina has released embeddings-v3: new multilingual embeddings with longer context
> Hugging Face released SmolTalk: synthetic dataset used to align SmolLM2 using supervised fine-tuning
> Microsoft released orca-agentinstruct-1M-v1: a gigantic instruction dataset of 1M synthetic instruction pairs

Image Generation ๐Ÿ–ผ๏ธ
> Black Forest Labs released Flux 1. tools: four new models for different image modifications and two LoRAs to do image conditioning and better steer generations

Lastly Hugging Face released a new library Observers: a lightweight SDK for monitoring interactions with AI APIs and easily store and browse them ๐Ÿ“š
$ pip install observers
  • 3 replies
ยท