Post
114
π’ If you wish to empower LLM with NER for texts in English, then I can recommend to use Spacy. Sharing the wrapper of Spacy NER models the bulk-ner dedicated for hadling CSV / JSONL content:
Script: https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/ner_spacy_383.sh
Code: https://raw.githubusercontent.com/nicolay-r/nlp-thirdgate/refs/heads/master/ner/spacy_383.py
What do you need to know about Spacy NER models:
βοΈ Models represent a python packages; packages could be installed directly into environemnt or via python CLI.
βοΈ Library has a pipeline for optimized request handling in batches.
βοΈ Architecture: DNN embedding-based models (not transformers)
π€ List of models (or see screenshot below):
https://huggingface.co/spacy
π Supported NER types:
https://github.com/explosion/spaCy/discussions/9147
β οΈ NOTE: chunking seems to be non-applicable due to specifics of models and usage of the internal pipeline mechanism
π Performance for sentences (en):
Model: spacy/en_core_web_sm π₯ 530 sentences per second π₯ (similar to larger solutions)
π other wrappers for bulk-ner nlp-thirdgate: https://github.com/nicolay-r/nlp-thirdgate#ner
Script: https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/ner_spacy_383.sh
Code: https://raw.githubusercontent.com/nicolay-r/nlp-thirdgate/refs/heads/master/ner/spacy_383.py
What do you need to know about Spacy NER models:
βοΈ Models represent a python packages; packages could be installed directly into environemnt or via python CLI.
βοΈ Library has a pipeline for optimized request handling in batches.
βοΈ Architecture: DNN embedding-based models (not transformers)
π€ List of models (or see screenshot below):
https://huggingface.co/spacy
π Supported NER types:
https://github.com/explosion/spaCy/discussions/9147
β οΈ NOTE: chunking seems to be non-applicable due to specifics of models and usage of the internal pipeline mechanism
π Performance for sentences (en):
Model: spacy/en_core_web_sm π₯ 530 sentences per second π₯ (similar to larger solutions)
π other wrappers for bulk-ner nlp-thirdgate: https://github.com/nicolay-r/nlp-thirdgate#ner