Intro
GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoders (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios.
This particular version utilize bi-encoder architecture, where textual encoder is team-lucid/DeBERTa v3 xlarge and entity label encoder is sentence transformer - BAAI/bge-m3.
Such architecture brings several advantages over uni-encoder GLiNER:
- An unlimited amount of entities can be recognized at a single time;
- Faster inference if entity embeddings are preprocessed;
- Better generalization to unseen entities;
However, it has some drawbacks such as a lack of inter-label interactions that make it hard for the model to disambiguate semantically similar but contextually different entities.
- Paper: https://arxiv.org/abs/2311.08526
- Repository:
- Service: https://github.com/henrikalbihn/gliner-as-a-service
Installation & Usage
Install or update the gliner package:
pip install gliner>=0.2.16
pip install python-mecab-ko
Once you've downloaded the GLiNER library, you can import the GLiNER class. You can then load this model using GLiNER.from_pretrained
and predict entities with predict_entities
.
from gliner import GLiNER
model = GLiNER.from_pretrained("lots-o/gliner-bi-ko-xlarge-v1")
text = """ํฌ๋ฆฌ์คํ ํผ ๋๋(Christopher Nolan) ์ ์๊ตญ์ ์ํ ๊ฐ๋
, ๊ฐ๋ณธ๊ฐ, ์ํ ํ๋ก๋์์ด๋ค. ๊ทธ์ ๋ํ์์ผ๋ก๋ 2008๋
๊ฐ๋ดํ ใ๋คํฌ ๋์ดํธใ ์๋ฆฌ์ฆ๊ฐ ์์ผ๋ฉฐ, ํนํ ใ๋คํฌ ๋์ดํธใ(2008)์ ๊ฐ๋
์ผ๋ก ๊ฐ์ฅ ์ ๋ช
ํ๋ค. ์ด ์ํ๋ ๋ฐฐํธ๋งจ ์บ๋ฆญํฐ๋ฅผ ์ค์ฌ์ผ๋ก ํ ์ํผํ์ด๋ก ์ํ๋ก, ํ์ค ๋ ์ ์ ์กฐ์ปค ์ญํ ์ด ํฐ ์ธ๊ธฐ๋ฅผ ๋์๋ค. ๋ํ, 2010๋
์ ๊ฐ๋ดํ ใ์ธ์
์
ใ(2010)์ ๋ณต์กํ ์๊ฐ๊ณผ ๊ฟ์ ๊ฐ๋
์ ๋ค๋ฃฌ SF ์ํ๋ก, ์ํ ์ ์ ๋ฐฉ์๊ณผ ์คํ ๋ฆฌ ์ ๊ฐ์์ ํ์ ์ ์ธ ์ ๊ทผ์ ์ ๋ณด์๋ค. ํฌ๋ฆฌ์คํ ํผ ๋๋์ ์๊ฐ ์ฌํ๊ณผ ๋ค์ฐจ์์ ์ด์ผ๊ธฐ๋ฅผ ํ๊ตฌํ๋ ์ํ๋ค์ ํตํด ํ๋ ์ํ๊ณ์์ ์ค์ํ ๊ฐ๋
์ผ๋ก ์๋ฆฌ๋งค๊นํ๋ค.
"""
labels = [
"์ํ/์์ค ์ํ๋ช
",
"์ฌ๋ ์ด๋ฆ",
"์บ๋ฆญํฐ ์ด๋ฆ",
"์ง์
๋ช
",
"๋ ์ง_์ฐ(๋
)",
"๋ ์ง_์ผ",
"๋ ์ง_๋ฌ(์)",
"๊ตญ๊ฐ"
]
entities = model.predict_entities(text, labels, threshold=0.2)
for entity in entities:
print(entity["text"], "=>", entity["label"])
ํฌ๋ฆฌ์คํ ํผ ๋๋ => ์ฌ๋ ์ด๋ฆ
Christopher Nolan => ์ฌ๋ ์ด๋ฆ
์๊ตญ => ๊ตญ๊ฐ
์ํ ๊ฐ๋
=> ์ง์
๋ช
๊ฐ๋ณธ๊ฐ => ์ง์
๋ช
์ํ ํ๋ก๋์ => ์ง์
๋ช
2008๋
=> ๋ ์ง_์ฐ(๋
)
๋คํฌ ๋์ดํธ => ์ํ/์์ค ์ํ๋ช
๋คํฌ ๋์ดํธ => ์ํ/์์ค ์ํ๋ช
2008 => ๋ ์ง_์ฐ(๋
)
๊ฐ๋
=> ์ง์
๋ช
๋ฐฐํธ๋งจ => ์บ๋ฆญํฐ ์ด๋ฆ
ํ์ค ๋ ์ => ์ฌ๋ ์ด๋ฆ
์กฐ์ปค => ์บ๋ฆญํฐ ์ด๋ฆ
2010๋
=> ๋ ์ง_์ฐ(๋
)
์ธ์
์
=> ์ํ/์์ค ์ํ๋ช
2010 => ๋ ์ง_์ฐ(๋
)
ํฌ๋ฆฌ์คํ ํผ ๋๋ => ์ฌ๋ ์ด๋ฆ
๊ฐ๋
=> ์ง์
๋ช
If you have a large amount of entities and want to pre-embed them, please, refer to the following code snippet:
labels = ["your entities"]
texts = ["your texts"]
entity_embeddings = model.encode_labels(labels, batch_size = 8)
outputs = model.batch_predict_with_embeds(texts, entity_embeddings, labels)
Dataset
- ๊ตญ๋ฆฝ๊ตญ์ด์ ๋ชจ๋์ ๋ง๋ญ์น
- ํ๊ตญ์ด ์ค์ฒฉ ๊ฐ์ฒด๋ช ๋ง๋ญ์น(Korean Nested Named Entity Corpus)
entity_type_mapping = {
"PS": {
"PS_NAME": "์ธ๋ฌผ_์ฌ๋",
"PS_CHARACTER": "์ธ๋ฌผ_๊ฐ์ ์บ๋ฆญํฐ",
"PS_PET": "์ธ๋ฌผ_๋ฐ๋ ค๋๋ฌผ",
},
"FD": {
"FD_SCIENCE": "ํ๋ฌธ ๋ถ์ผ_๊ณผํ",
"FD_SOCIAL_SCIENCE": "ํ๋ฌธ ๋ถ์ผ_์ฌํ๊ณผํ",
"FD_MEDICINE": "ํ๋ฌธ ๋ถ์ผ_์ํ",
"FD_ART": "ํ๋ฌธ ๋ถ์ผ_์์ ",
"FD_HUMANITIES": "ํ๋ฌธ ๋ถ์ผ_์ธ๋ฌธํ",
"FD_OTHERS": "ํ๋ฌธ ๋ถ์ผ_๊ธฐํ",
},
"TR": {
"TR_SCIENCE": "์ด๋ก _๊ณผํ",
"TR_SOCIAL_SCIENCE": "์ด๋ก _์ฌํ๊ณผํ",
"TR_MEDICINE": "์ด๋ก _์ํ",
"TR_ART": "์ด๋ก _์์ ",
"TR_HUMANITIES": "์ด๋ก _์ฒ ํ/์ธ์ด/์ญ์ฌ",
"TR_OTHERS": "์ด๋ก _๊ธฐํ",
},
"AF": {
"AF_BUILDING": "์ธ๊ณต๋ฌผ_๊ฑด์ถ๋ฌผ/ํ ๋ชฉ๊ฑด์ค๋ฌผ",
"AF_CULTURAL_ASSET": "์ธ๊ณต๋ฌผ_๋ฌธํ์ฌ",
"AF_ROAD": "์ธ๊ณต๋ฌผ_๋๋ก/์ฒ ๋ก",
"AF_TRANSPORT": "์ธ๊ณต๋ฌผ_๊ตํต์๋จ/์ด์ก์๋จ",
"AF_MUSICAL_INSTRUMENT": "์ธ๊ณต๋ฌผ_์
๊ธฐ",
"AF_WEAPON": "์ธ๊ณต๋ฌผ_๋ฌด๊ธฐ",
"AFA_DOCUMENT": "์ธ๊ณต๋ฌผ_๋์/์์ ์ํ๋ช
",
"AFA_PERFORMANCE": "์ธ๊ณต๋ฌผ_์ถค/๊ณต์ฐ/์ฐ๊ทน ์ํ๋ช
",
"AFA_VIDEO": "์ธ๊ณต๋ฌผ_์ํ/TV ํ๋ก๊ทธ๋จ",
"AFA_ART_CRAFT": "์ธ๊ณต๋ฌผ_๋ฏธ์ /์กฐํ ์ํ๋ช
",
"AFA_MUSIC": "์ธ๊ณต๋ฌผ_์์
์ํ๋ช
",
"AFW_SERVICE_PRODUCTS": "์ธ๊ณต๋ฌผ_์๋น์ค ์ํ",
"AFW_OTHER_PRODUCTS": "์ธ๊ณต๋ฌผ_๊ธฐํ ์ํ",
},
"OG": {
"OGG_ECONOMY": "๊ธฐ๊ด_๊ฒฝ์ ",
"OGG_EDUCATION": "๊ธฐ๊ด_๊ต์ก",
"OGG_MILITARY": "๊ธฐ๊ด_๊ตฐ์ฌ",
"OGG_MEDIA": "๊ธฐ๊ด_๋ฏธ๋์ด",
"OGG_SPORTS": "๊ธฐ๊ด_์คํฌ์ธ ",
"OGG_ART": "๊ธฐ๊ด_์์ ",
"OGG_MEDICINE": "๊ธฐ๊ด_์๋ฃ",
"OGG_RELIGION": "๊ธฐ๊ด_์ข
๊ต",
"OGG_SCIENCE": "๊ธฐ๊ด_๊ณผํ",
"OGG_LIBRARY": "๊ธฐ๊ด_๋์๊ด",
"OGG_LAW": "๊ธฐ๊ด_๋ฒ๋ฅ ",
"OGG_POLITICS": "๊ธฐ๊ด_์ ๋ถ/๊ณต๊ณต",
"OGG_FOOD": "๊ธฐ๊ด_์์ ์
์ฒด",
"OGG_HOTEL": "๊ธฐ๊ด_์๋ฐ ์
์ฒด",
"OGG_OTHERS": "๊ธฐ๊ด_๊ธฐํ",
},
"LC": {
"LCP_COUNTRY": "์ฅ์_๊ตญ๊ฐ",
"LCP_PROVINCE": "์ฅ์_๋/์ฃผ ์ง์ญ",
"LCP_COUNTY": "์ฅ์_์ธ๋ถ ํ์ ๊ตฌ์ญ",
"LCP_CITY": "์ฅ์_๋์",
"LCP_CAPITALCITY": "์ฅ์_์๋",
"LCG_RIVER": "์ฅ์_๊ฐ/ํธ์",
"LCG_OCEAN": "์ฅ์_๋ฐ๋ค",
"LCG_BAY": "์ฅ์_๋ฐ๋/๋ง",
"LCG_MOUNTAIN": "์ฅ์_์ฐ/์ฐ๋งฅ",
"LCG_ISLAND": "์ฅ์_์ฌ",
"LCG_CONTINENT": "์ฅ์_๋๋ฅ",
"LC_SPACE": "์ฅ์_์ฒ์ฒด",
"LC_OTHERS": "์ฅ์_๊ธฐํ",
},
"CV": {
"CV_CULTURE": "๋ฌธ๋ช
_๋ฌธ๋ช
/๋ฌธํ",
"CV_TRIBE": "๋ฌธ๋ช
_๋ฏผ์กฑ/์ข
์กฑ",
"CV_LANGUAGE": "๋ฌธ๋ช
_์ธ์ด",
"CV_POLICY": "๋ฌธ๋ช
_์ ๋/์ ์ฑ
",
"CV_LAW": "๋ฌธ๋ช
_๋ฒ/๋ฒ๋ฅ ",
"CV_CURRENCY": "๋ฌธ๋ช
_ํตํ",
"CV_TAX": "๋ฌธ๋ช
_์กฐ์ธ",
"CV_FUNDS": "๋ฌธ๋ช
_์ฐ๊ธ/๊ธฐ๊ธ",
"CV_ART": "๋ฌธ๋ช
_์์ ",
"CV_SPORTS": "๋ฌธ๋ช
_์คํฌ์ธ ",
"CV_SPORTS_POSITION": "๋ฌธ๋ช
_์คํฌ์ธ ํฌ์ง์
",
"CV_SPORTS_INST": "๋ฌธ๋ช
_์คํฌ์ธ ์ฉํ/๋๊ตฌ",
"CV_PRIZE": "๋ฌธ๋ช
_์/ํ์ฅ",
"CV_RELATION": "๋ฌธ๋ช
_๊ฐ์กฑ/์น์กฑ ๊ด๊ณ",
"CV_OCCUPATION": "๋ฌธ๋ช
_์ง์
",
"CV_POSITION": "๋ฌธ๋ช
_์ง์/์ง์ฑ
",
"CV_FOOD": "๋ฌธ๋ช
_์์",
"CV_DRINK": "๋ฌธ๋ช
_์๋ฃ/์ ",
"CV_FOOD_STYLE": "๋ฌธ๋ช
_์์ ์ ํ",
"CV_CLOTHING": "๋ฌธ๋ช
_์๋ณต/์ฌ์ ",
"CV_BUILDING_TYPE": "๋ฌธ๋ช
_๊ฑด์ถ ์์",
},
"DT": {
"DT_DURATION": "๋ ์ง_๊ธฐ๊ฐ",
"DT_DAY": "๋ ์ง_์ผ",
"DT_WEEK": "๋ ์ง_์ฃผ(์ฃผ์ฐจ)",
"DT_MONTH": "๋ ์ง_๋ฌ(์)",
"DT_YEAR": "๋ ์ง_์ฐ(๋
)",
"DT_SEASON": "๋ ์ง_๊ณ์ ",
"DT_GEOAGE": "๋ ์ง_์ง์ง์๋",
"DT_DYNASTY": "๋ ์ง_์์กฐ์๋",
"DT_OTHERS": "๋ ์ง_๊ธฐํ",
},
"TI": {
"TI_DURATION": "์๊ฐ_๊ธฐ๊ฐ",
"TI_HOUR": "์๊ฐ_์๊ฐ(์)",
"TI_MINUTE": "์๊ฐ_๋ถ",
"TI_SECOND": "์๊ฐ_์ด",
"TI_OTHERS": "์๊ฐ_๊ธฐํ",
},
"QT": {
"QT_AGE": "์๋_๋์ด",
"QT_SIZE": "์๋_๋์ด/๋ฉด์ ",
"QT_LENGTH": "์๋_๊ธธ์ด/๊ฑฐ๋ฆฌ",
"QT_COUNT": "์๋_์๋/๋น๋",
"QT_MAN_COUNT": "์๋_์ธ์์",
"QT_WEIGHT": "์๋_๋ฌด๊ฒ",
"QT_PERCENTAGE": "์๋_๋ฐฑ๋ถ์จ",
"QT_SPEED": "์๋_์๋",
"QT_TEMPERATURE": "์๋_์จ๋",
"QT_VOLUME": "์๋_๋ถํผ",
"QT_ORDER": "์๋_์์",
"QT_PRICE": "์๋_๊ธ์ก",
"QT_PHONE": "์๋_์ ํ๋ฒํธ",
"QT_SPORTS": "์๋_์คํฌ์ธ ์๋",
"QT_CHANNEL": "์๋_์ฑ๋ ๋ฒํธ",
"QT_ALBUM": "์๋_์จ๋ฒ ์๋",
"QT_ADDRESS": "์๋_์ฃผ์ ๊ด๋ จ ์ซ์",
"QT_OTHERS": "์๋_๊ธฐํ ์๋",
},
"EV": {
"EV_ACTIVITY": "์ฌ๊ฑด_์ฌํ์ด๋/์ ์ธ",
"EV_WAR_REVOLUTION": "์ฌ๊ฑด_์ ์/ํ๋ช
",
"EV_SPORTS": "์ฌ๊ฑด_์คํฌ์ธ ํ์ฌ",
"EV_FESTIVAL": "์ฌ๊ฑด_์ถ์ /์ํ์ ",
"EV_OTHERS": "์ฌ๊ฑด_๊ธฐํ",
},
"AM": {
"AM_INSECT": "๋๋ฌผ_๊ณค์ถฉ",
"AM_BIRD": "๋๋ฌผ_์กฐ๋ฅ",
"AM_FISH": "๋๋ฌผ_์ด๋ฅ",
"AM_MAMMALIA": "๋๋ฌผ_ํฌ์ ๋ฅ",
"AM_AMPHIBIA": "๋๋ฌผ_์์๋ฅ",
"AM_REPTILIA": "๋๋ฌผ_ํ์ถฉ๋ฅ",
"AM_TYPE": "๋๋ฌผ_๋ถ๋ฅ๋ช
",
"AM_PART": "๋๋ฌผ_๋ถ์๋ช
",
"AM_OTHERS": "๋๋ฌผ_๊ธฐํ",
},
"PT": {
"PT_FRUIT": "์๋ฌผ_๊ณผ์ผ/์ด๋งค",
"PT_FLOWER": "์๋ฌผ_๊ฝ",
"PT_TREE": "์๋ฌผ_๋๋ฌด",
"PT_GRASS": "์๋ฌผ_ํ",
"PT_TYPE": "์๋ฌผ_๋ถ๋ฅ๋ช
",
"PT_PART": "์๋ฌผ_๋ถ์๋ช
",
"PT_OTHERS": "์๋ฌผ_๊ธฐํ",
},
"MT": {
"MT_ELEMENT": "๋ฌผ์ง_์์",
"MT_METAL": "๋ฌผ์ง_๊ธ์",
"MT_ROCK": "๋ฌผ์ง_์์",
"MT_CHEMICAL": "๋ฌผ์ง_ํํ",
},
"TM": {
"TM_COLOR": "์ฉ์ด_์๊น",
"TM_DIRECTION": "์ฉ์ด_๋ฐฉํฅ",
"TM_CLIMATE": "์ฉ์ด_๊ธฐํ ์ง์ญ",
"TM_SHAPE": "์ฉ์ด_๋ชจ์/ํํ",
"TM_CELL_TISSUE_ORGAN": "์ฉ์ด_์ธํฌ/์กฐ์ง/๊ธฐ๊ด",
"TMM_DISEASE": "์ฉ์ด_์ฆ์/์ง๋ณ",
"TMM_DRUG": "์ฉ์ด_์ฝํ",
"TMI_HW": "์ฉ์ด_IT ํ๋์จ์ด",
"TMI_SW": "์ฉ์ด_IT ์ํํธ์จ์ด",
"TMI_SITE": "์ฉ์ด_URL ์ฃผ์",
"TMI_EMAIL": "์ฉ์ด_์ด๋ฉ์ผ ์ฃผ์",
"TMI_MODEL": "์ฉ์ด_์ ํ ๋ชจ๋ธ๋ช
",
"TMI_SERVICE": "์ฉ์ด_IT ์๋น์ค",
"TMI_PROJECT": "์ฉ์ด_ํ๋ก์ ํธ",
"TMIG_GENRE": "์ฉ์ด_๊ฒ์ ์ฅ๋ฅด",
"TM_SPORTS": "์ฉ์ด_์คํฌ์ธ ",
},
}
Evaluation
Evaluate with the konne dev set :
The evaluation results presented in the table below, except for the values I provided, were derived from the following source: taeminlee/gliner_ko.
Model | Precision(P) | Recall(R) | F1 |
---|---|---|---|
gliner-bi-ko-small-v1 (t=0.5) | 81.53% | 74.16% | 77.67% |
gliner-bi-ko-xlarge-v1 (t=0.5) | 84.73% | 77.71% | 81.07% |
Gliner-ko (t=0.5) | 72.51% | 79.82% | 75.99% |
Gliner Large-v2 (t=0.5) | 34.33% | 19.50% | 24.87% |
Gliner Multi (t=0.5) | 40.94% | 34.18% | 37.26% |
Pororo | 70.25% | 57.94% | 63.50% |
Citation
@misc{gliner_bi_ko_xlarge_v1,
title={gliner-bi-ko-xlarge-v1},
author={Gihwan Kim},
year={2025},
url={https://huggingface.co/lots-o/gliner-bi-ko-xlarge-v1}
publisher={Hugging Face}
}
@misc{zaratiana2023gliner,
title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
year={2023},
eprint={2311.08526},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Model tree for lots-o/gliner-bi-ko-xlarge-v1
Base model
team-lucid/deberta-v3-xlarge-korean