Announcing Finance Commons and the Bad Data Toolbox: Pioneering Open Data and Advanced Document Processing Jul 19 • 17
Toxicity of the Commons: Curating Open-Source Pre-Training Data Paper • 2410.22587 • Published 11 days ago • 8
view article Article OCR Processing and Text in Image Analysis with Florence-2-base and Qwen2-VL-2B By PandorAI1995 • 22 days ago • 12
view article Article The case for specialized pre-training: ultra-fast foundation models for dedicated tasks By Pclanglais • Aug 4 • 26
OpenCulture Collection A multilingual dataset of public domain books and newspapers. • 27 items • Updated 3 days ago • 113