bert-base-japanese-unidic-luw-upos
Model Description
This is a BERT model pre-trained on Japanese Wikipedia texts for POS-tagging and dependency-parsing, derived from bert-base-japanese-v2. Every long-unit-word is tagged by UPOS (Universal Part-Of-Speech).
How to Use
import torch
from transformers import AutoTokenizer,AutoModelForTokenClassification
tokenizer=AutoTokenizer.from_pretrained("KoichiYasuoka/bert-base-japanese-unidic-luw-upos")
model=AutoModelForTokenClassification.from_pretrained("KoichiYasuoka/bert-base-japanese-unidic-luw-upos")
s="国境の長いトンネルを抜けると雪国であった。"
t=tokenizer.tokenize(s)
p=[model.config.id2label[q] for q in torch.argmax(model(tokenizer.encode(s,return_tensors="pt"))["logits"],dim=2)[0].tolist()[1:-1]]
print(list(zip(t,p)))
or
import esupar
nlp=esupar.load("KoichiYasuoka/bert-base-japanese-unidic-luw-upos")
print(nlp("国境の長いトンネルを抜けると雪国であった。"))
fugashi and unidic-lite are required.
Reference
安岡孝一: Transformersと国語研長単位による日本語係り受け解析モデルの製作, 情報処理学会研究報告, Vol.2022-CH-128, No.7 (2022年2月), pp.1-8.
See Also
esupar: Tokenizer POS-tagger and Dependency-parser with BERT/RoBERTa/DeBERTa models
- Downloads last month
- 13
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for KoichiYasuoka/bert-base-japanese-unidic-luw-upos
Base model
tohoku-nlp/bert-base-japanese-v2