Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models Mar 20 β’ 69
Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 β’ 28
Huggy Lingo: Using Machine Learning to Improve Language Metadata on the Hugging Face Hub Aug 2, 2023 β’ 1
On Limitations of LLM as Annotator for Low Resource Languages Paper β’ 2411.17637 β’ Published 29 days ago β’ 2 β’ 2