File size: 2,340 Bytes

---
language:
- hi
tags:
- ner
---

# NER in Hindi
## muril_base_cased_hindi_ner

Base model is [google/muril-base-cased](https://huggingface.co/google/muril-base-cased), a BERT model pre-trained on 17 Indian languages and their transliterated counterparts.
Hindi NER dataset is from [HiNER](https://github.com/cfiltnlp/HiNER).

## Usage
### example: 
```python
from transformers import AutoModelForTokenClassification, AutoTokenizer
import torch

model = AutoModelForTokenClassification.from_pretrained("MichaelHuang/muril_base_cased_hindi_ner")
tokenizer = AutoTokenizer.from_pretrained("google/muril-base-cased")

# Define the labels dictionary
labels_dict = {
    0: "B-FESTIVAL",
    1: "B-GAME",
    2: "B-LANGUAGE",
    3: "B-LITERATURE",
    4: "B-LOCATION",
    5: "B-MISC",
    6: "B-NUMEX",
    7: "B-ORGANIZATION",
    8: "B-PERSON",
    9: "B-RELIGION",
    10: "B-TIMEX",
    11: "I-FESTIVAL",
    12: "I-GAME",
    13: "I-LANGUAGE",
    14: "I-LITERATURE",
    15: "I-LOCATION",
    16: "I-MISC",
    17: "I-NUMEX",
    18: "I-ORGANIZATION",
    19: "I-PERSON",
    20: "I-RELIGION",
    21: "I-TIMEX",
    22: "O"
}

def ner_predict(sentence, model, tokenizer, labels_dict):
    # Tokenize the input sentence
    inputs = tokenizer(sentence, return_tensors="pt", padding=True, truncation=True, max_length=128)

    # Perform inference
    with torch.no_grad():
        outputs = model(**inputs)

    # Get the predicted labels
    predicted_labels = torch.argmax(outputs.logits, dim=2)

    # Convert tokens and labels to lists
    tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
    labels = predicted_labels.squeeze().tolist()

    # Map numeric labels to string labels
    predicted_labels = [labels_dict[label] for label in labels]

    # Combine tokens and labels
    result = list(zip(tokens, predicted_labels))

    return result

test_sentence = "अकबर ईद पर टेनिस खेलता है"
predictions = ner_predict(test_sentence, model, tokenizer, labels_dict)

for token, label in predictions:
    print(f"{token}: {label}")
```

### Eval results

| eval_loss | eval_accuracy| eval_f1|epoch | eval_precision  | eval_recall |
|:--------:|:-------------:|:------:|:----:|:---------------:|:----------:|
| 0.11     |    0.97       |  0.88    | 3.0 |      0.87 |   0.89     |