Lists of URLs from various training datasets
Nick Hagar
nhagar
AI & ML interests
digital media, collective attention, computational social science
Recent Activity
published
a dataset
2 days ago
nhagar/c4_en_noblock_urls
updated
a dataset
2 days ago
nhagar/c4_realnewslike_urls
published
a dataset
3 days ago
nhagar/c4_realnewslike_urls
Organizations
models
None public yet
datasets
101
nhagar/c4_en_noblock_urls
Updated
nhagar/c4_realnewslike_urls
Viewer
•
Updated
•
13.8M
nhagar/CC-MAIN-2021-17_urls
Viewer
•
Updated
•
55.9M
•
30
nhagar/CC-MAIN-2017-34_urls
Viewer
•
Updated
•
59.3M
•
42
nhagar/CC-MAIN-2015-40_urls
Viewer
•
Updated
•
21.2M
•
28
nhagar/CC-MAIN-2022-21_urls
Viewer
•
Updated
•
58.7M
•
31
nhagar/CC-MAIN-2014-10_urls
Viewer
•
Updated
•
30.2M
•
30
nhagar/CC-MAIN-2016-36_urls
Viewer
•
Updated
•
69.9M
•
35
nhagar/CC-MAIN-2024-10_urls
Viewer
•
Updated
•
63.8M
•
34
nhagar/CC-MAIN-2018-05_urls
Viewer
•
Updated
•
76.6M
•
31