About the Model
An Environmental Named Entity Recognition model, trained on dataset from USEPA to recognize environmental due diligence (7 entities) from a given text corpus (remediation reports, record of decision, 5 year record etc). This model was built on top of distilbert-base-uncased
- Dataset: https://data.mendeley.com/datasets/tx6vmd4g9p/4
- Dataset Reasearch Paper: https://doi.org/10.1016/j.dib.2022.108579
Usage
The easiest way is to load the inference api from huggingface and second method is through the pipeline object offered by transformers library.
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("token-classification", model="d4data/EnviDueDiligence_NER")
# Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("d4data/EnviDueDiligence_NER")
model = AutoModelForTokenClassification.from_pretrained("d4data/EnviDueDiligence_NER")
Author
This model is part of the Research topic "Environmental Due Diligence" conducted by Deepak John Reji, Afreen Aman. If you use this work (code, model or dataset), please cite:
Aman, A. and Reji, D.J., 2022. EnvBert: An NLP model for Environmental Due Diligence data classification. Software Impacts, 14, p.100427.
You can support me here :)
- Downloads last month
- 23
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.