Update README.md
Browse files
README.md
CHANGED
@@ -25,8 +25,7 @@ tags:
|
|
25 |
|
26 |
# IndicNER
|
27 |
IndicNER is a model trained to complete the task of identifying named entities from sentences in Indian languages. Our model is specifically fine-tuned to the 11 Indian languages mentioned above over millions of sentences. The model is then benchmarked over a human annotated testset and multiple other publicly available Indian NER datasets.
|
28 |
-
The 11 languages covered by
|
29 |
-
The link to our GitHub repository containing all our code can be found [here](https://github.com/AI4Bharat/indicner). The link to our paper can be found here.
|
30 |
|
31 |
## Training Corpus
|
32 |
Our model was trained on a [dataset](https://huggingface.co/datasets/ai4bharat/naamapadam) which we mined from the existing [Samanantar Corpus](https://huggingface.co/datasets/ai4bharat/samanantar). We used a bert-base-multilingual-uncased model as the starting point and then fine-tuned it to the NER dataset mentioned previously.
|
@@ -44,9 +43,9 @@ The first 5 languages (bn, hi, kn, ml, mr ) have large human annotated testsets
|
|
44 |
## Downloads
|
45 |
Download from this same Huggingface repo.
|
46 |
|
|
|
47 |
|
48 |
-
|
49 |
-
|
50 |
|
51 |
<!-- citing information -->
|
52 |
## Citing
|
|
|
25 |
|
26 |
# IndicNER
|
27 |
IndicNER is a model trained to complete the task of identifying named entities from sentences in Indian languages. Our model is specifically fine-tuned to the 11 Indian languages mentioned above over millions of sentences. The model is then benchmarked over a human annotated testset and multiple other publicly available Indian NER datasets.
|
28 |
+
The 11 languages covered by IndicNER are: Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu.
|
|
|
29 |
|
30 |
## Training Corpus
|
31 |
Our model was trained on a [dataset](https://huggingface.co/datasets/ai4bharat/naamapadam) which we mined from the existing [Samanantar Corpus](https://huggingface.co/datasets/ai4bharat/samanantar). We used a bert-base-multilingual-uncased model as the starting point and then fine-tuned it to the NER dataset mentioned previously.
|
|
|
43 |
## Downloads
|
44 |
Download from this same Huggingface repo.
|
45 |
|
46 |
+
## Usage
|
47 |
|
48 |
+
You can use [this Colab notebook](https://colab.research.google.com/drive/1sYa-PDdZQ_c9SzUgnhyb3Fl7j96QBCS8?usp=sharing) for samples on using IndicNER or for finetuning a pre-trained model on Naampadam dataset to build your own NER models.
|
|
|
49 |
|
50 |
<!-- citing information -->
|
51 |
## Citing
|