README.md · biglam/README at main

metadata

title: README
emoji: 📚
colorFrom: pink
colorTo: gray
sdk: static
pinned: false

BigScience 🌸 is an open scientific collaboration of nearly 600 researchers from 50 countries and 250 institutions who collaborate on various projects within the natural language processing (NLP) space to broaden the accessibility of language datasets while working on challenging scientific questions around training language models.

BigLAM started as a datasets hackathon focused on making data from Libraries, Archives, and Museums (LAMS) with potential machine-learning applications accessible via the Hugging Face Hub. We are continuing to work on making more datasets available via the Hugging Face hub to help make these datasets more discoverable, open them up to new audiences, and help ensure that machine-learning datasets more closely reflect the richness of human culture.

Dataset Overview

An overview of datasets currently made available via BigLam organised by task type.

image-classification

text-classification

image-to-text

Brill Iconclass AI Test Set

text-generation

object-detection

fill-mask

token-classification

Unsilencing Colonial Archives via Automated Entity Recognition