Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
nicolay-rΒ 
posted an update about 20 hours ago
Post
397
πŸ“’ If you wish to empower LLM with NER for texts in English, then I can recommend to use Spacy. Sharing the wrapper of Spacy NER models the bulk-ner dedicated for hadling CSV / JSONL content:
Script: https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/ner_spacy_383.sh
Code: https://raw.githubusercontent.com/nicolay-r/nlp-thirdgate/refs/heads/master/ner/spacy_383.py

What do you need to know about Spacy NER models:
β˜‘οΈ Models represent a python packages; packages could be installed directly into environemnt or via python CLI.
β˜‘οΈ Library has a pipeline for optimized request handling in batches.
β˜‘οΈ Architecture: DNN embedding-based models (not transformers)

πŸ€– List of models (or see screenshot below):
https://huggingface.co/spacy
πŸ“‹ Supported NER types:
https://github.com/explosion/spaCy/discussions/9147

⚠️ NOTE: chunking seems to be non-applicable due to specifics of models and usage of the internal pipeline mechanism

πŸš€ Performance for sentences (en):
Model: spacy/en_core_web_sm πŸ”₯ 530 sentences per second πŸ”₯ (similar to larger solutions)

🌌 other wrappers for bulk-ner nlp-thirdgate: https://github.com/nicolay-r/nlp-thirdgate#ner
In this post