Hugging Face
Models
Datasets
Spaces
Posts
Docs
Solutions
Pricing
Log In
Sign Up
JournalistsonHF
's Collections
Audio tools
Transcription
Image Generation Tools
Test Chat Models
For Fun & Understanding AI Capabilities
Datasets
Text-Analysis Tools
LLMs Evaluation
Data Journalism
Text-to-Speech
Datasets
updated
Oct 1
A curated list of datasets to train your models
Upvote
2
HuggingFaceFW/fineweb-edu
Viewer
•
Updated
30 days ago
•
3B
•
568k
•
530
google/frames-benchmark
Viewer
•
Updated
25 days ago
•
824
•
1.61k
•
160
Running
on
CPU Upgrade
90
▶
FineVideo Explorer
Running
202
🧬
Synthetic Data Generator
Build datasets using natural language
HuggingFaceFV/finevideo
Viewer
•
Updated
5 days ago
•
39.5k
•
19.6k
•
266
CIVICS-dataset/CIVICS
Viewer
•
Updated
May 13
•
700
•
56
•
5
HuggingFaceFW/fineweb
Viewer
•
Updated
Jul 16
•
46B
•
377k
•
1.74k
HuggingFaceTB/cosmopedia
Viewer
•
Updated
Aug 12
•
31.1M
•
10.8k
•
561
academic-datasets/AMMeBa
Preview
•
Updated
May 21
•
44
HuggingFaceM4/OBELICS
Viewer
•
Updated
Aug 22, 2023
•
276M
•
22.4k
•
140
bigcode/the-stack-v2
Viewer
•
Updated
Apr 23
•
5.45B
•
8.58k
•
281
pixparse/pdfa-eng-wds
Viewer
•
Updated
Mar 29
•
7.1k
•
8.44k
•
139
pixparse/idl-wds
Viewer
•
Updated
Mar 29
•
3.41M
•
6.91k
•
175
argilla/OpenHermesPreferences
Viewer
•
Updated
Mar 1
•
989k
•
3.58k
•
199
argilla/Capybara-Preferences
Viewer
•
Updated
May 9
•
15.4k
•
303
•
38
PleIAs/YouTube-Commons
Updated
Jun 26
•
991
•
316
PleIAs/French-PD-Newspapers
Viewer
•
Updated
Mar 19
•
2.25M
•
827
•
61
mozilla-foundation/common_voice_17_0
Viewer
•
Updated
Jun 16
•
13M
•
25.4k
•
171
satellogic/EarthView
Viewer
•
Updated
26 days ago
•
7.41M
•
8.82k
•
105
Upvote
2
Share collection
View history
Collection guide
Browse collections