1 5

Deepanshu Sharma

deepsh2207

deepanshu2207

AI & ML interests

None yet

Recent Activity

updated a Space 8 days ago

deepsh2207/TextExtractor

updated a Space 8 days ago

deepsh2207/Image-to-Text

reacted to tomaarsen's post with 🚀 9 months ago

NuMind has just released 3 new state-of-the-art GLiNER models for Named Entity Recognition/Information Extraction. These GLiNER models allow you to specify any label that you want, and it'll find spans in the text corresponding to your label. It's been shown to work quite well on unusual domains, e.g. celestial entities in my picture. There are 3 models released: - https://huggingface.co/numind/NuNER_Zero: The primary model, SOTA & can detect really long entities. - https://huggingface.co/numind/NuNER_Zero-span: Slightly better performance than NuNER Zero, but can't detect entities longer than 12 tokens. - https://huggingface.co/numind/NuNER_Zero-4k: Slightly worse than NuNER Zero, but has a context length of 4k tokens. Some more details about these models in general: - They are *really* small, orders of magnitude smaller than LLMs, which don't reach this level of performance. - Because they're small - they're fast: <1s per sentence on free GPUs. - They have an MIT license: free commercial usage. Try out the demo here: https://huggingface.co/spaces/numind/NuZero Or check out all of the models here: https://huggingface.co/collections/numind/nunerzero-zero-shot-ner-662b59803b9b438ff56e49e2 If there's ever a need for me to extract some information from any text: I'll be using these. Great work @Serega6678!

View all activity

Organizations

deepsh2207's activity

updated 2 Spaces 8 days ago

Text Extractor

📑

Extract text from documents or images

Image To Text App

🚀

Extract text from images

reacted to tomaarsen's post with 🚀 9 months ago

Post

2404

NuMind has just released 3 new state-of-the-art GLiNER models for Named Entity Recognition/Information Extraction. These GLiNER models allow you to specify any label that you want, and it'll find spans in the text corresponding to your label. It's been shown to work quite well on unusual domains, e.g. celestial entities in my picture.

There are 3 models released:
- numind/NuNER_Zero:
The primary model, SOTA & can detect really long entities.
- numind/NuNER_Zero-span:
Slightly better performance than NuNER Zero, but can't detect entities longer than 12 tokens.
- numind/NuNER_Zero-4k:
Slightly worse than NuNER Zero, but has a context length of 4k tokens.

Some more details about these models in general:
- They are *really* small, orders of magnitude smaller than LLMs, which don't reach this level of performance.
- Because they're small - they're fast: <1s per sentence on free GPUs.
- They have an MIT license: free commercial usage.

Try out the demo here: https://huggingface.co/spaces/numind/NuZero
Or check out all of the models here: numind/nunerzero-zero-shot-ner-662b59803b9b438ff56e49e2

If there's ever a need for me to extract some information from any text: I'll be using these. Great work @Serega6678 !

3 replies

New activity in nielsr/udop-large 11 months ago

Use of other OCR engine

#2 opened 11 months ago by

deepsh2207

reacted to tomaarsen's post with ❤️ 11 months ago

Post

I remember very well that about two years ago, 0-shot named entity recognition (i.e. where you can choose any labels on the fly) was completely infeasible. Fast forward a year, and Universal-NER/UniNER-7B-all surprised me by showing that 0-shot NER is possible! However, I had a bunch of concerns that prevented me from ever adopting it myself. For example, the model was 7B parameters, only worked with 1 custom label at a time, and it had a cc-by-nc-4.0 license.

Since then, a little known research paper introduced GLiNER, which was a modified & finetuned variant of the microsoft/deberta-v3-base line of models. Notably, GLiNER outperforms UniNER-7B, despite being almost 2 orders of magnitude smaller! It also allows for multiple labels at once, supports nested NER, and the models are Apache 2.0.

Very recently, the models were uploaded to Hugging Face, and I was inspired to create a demo for the English model. The demo runs on CPU, and can still very efficiently compute labels with great performance. I'm very impressed at the models.

There are two models right now:
* base (english): urchade/gliner_base
* multi (multilingual): urchade/gliner_multi

And my demo to experiment with the base model can be found here: https://huggingface.co/spaces/tomaarsen/gliner_base