Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
87
12
16
Guilherme Penedo
guipenedo
Follow
francois55512's profile picture
kargaranamir's profile picture
sscal's profile picture
884 followers
·
22 following
gui_penedo
guipenedo
AI & ML interests
None yet
Recent Activity
commented
on
an
article
1 day ago
Open-R1: Update #1
upvoted
an
article
1 day ago
Open-R1: Update #1
updated
a dataset
3 days ago
HuggingFaceFW/fineweb-edu-score-2
View all activity
Articles
Open-R1: Update #1
1 day ago
•
175
FineWeb2-C: Help Build Better Language Models in Your Language
Dec 23, 2024
•
18
Organizations
guipenedo
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
New activity in
huggingface-legal/takedown-notices
3 days ago
Update 2025/2025-01-22-Torstar.md
#4 opened 3 days ago by
guipenedo
New activity in
HuggingFaceFW/fineweb-edu
11 days ago
New update returns a 500 server error using the datasets-server API
6
#18 opened about 1 month ago by
jonna32
New activity in
HuggingFaceFW/fineweb-2
14 days ago
Synthetic Data Generator
1
#5 opened 23 days ago by
kishorekashyap
New activity in
HuggingFaceFW/fineweb-2
26 days ago
Cannot load with datasets
3
#4 opened 26 days ago by
mbanon
New activity in
HuggingFaceFW/fineweb-edu
28 days ago
A lot of load errors after new update
14
#19 opened 28 days ago by
yzhangcs
Add "date" column to "default" subset
#20 opened 28 days ago by
lhoestq
New activity in
HuggingFaceFW/fineweb
about 2 months ago
Simple exact deduplication removes 2/3 of data.
4
#49 opened 6 months ago by
egor-pakhomov
Torrent?
3
#4 opened 10 months ago by
emilss
Any plan to train models on larger subset of dataset?
1
#8 opened 10 months ago by
mrfakename
Are copyrighted works included in this dataset?
4
#9 opened 10 months ago by
umm-maybe
Reprocessing for a new language
14
#12 opened 9 months ago by
pere
Training configs for data ablation study
2
#14 opened 9 months ago by
jimmyhbx
tiny-fineweb
3
#19 opened 9 months ago by
3thn
Unsafe files
1
#25 opened 9 months ago by
alielfilali01
"Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20" using fineweb by Karpathy
#28 opened 8 months ago by
clem
Regarding to the newly updated indexes(writen as deduplication issues)
5
#29 opened 8 months ago by
kimcando
Dedup
1
#32 opened 8 months ago by
shawnkx
Language subset
3
#33 opened 8 months ago by
talmor
How to compute the aggerate score?
1
#35 opened 8 months ago by
mornmirror
why do you apply "All filters except the (very destructive) terminal_punct"
3
#36 opened 8 months ago by
bpwl0121
Load more