license: apache-2.0
language:
- vi
metrics:
- exact_match
- f1
base_model:
- google-bert/bert-base-multilingual-cased
pipeline_tag: question-answering
library_name: transformers
new_version: google-bert/bert-base-multilingual-cased
tags:
- legal
BERT-Law: Information Extraction Model for Legal Texts
Model Description
BERT-Law is a fine-tuned version of BERT (Bidirectional Encoder Representations from Transformers), focusing on information extraction from legal documents. The model is specifically trained on a custom dataset called UTE_LAW, which consists of approximately 30,000 pairs of legal questions and related documents. The main goal of this model is to extract relevant information from legal text while reducing the costs associated with using third-party APIs.
Key Features
- Base Model: The model is built on top of
google-bert/bert-base-multilingual-cased
, which is a pre-trained multilingual BERT model. - Fine-tuning: It has been fine-tuned with the UTE_LAW dataset, focusing on extracting relevant information from legal texts.
- Model Type: BERT-based model for question-answering tasks.
- Task: The model is optimized for information extraction tasks, specifically designed to handle legal documents.
Model Specifications
- Maximum Sequence Length: 512 tokens
- Language: Primarily focused on Vietnamese legal texts.
- License: Apache-2.0 License
@inproceedings{zaib-2021-bert-coqac, title = "BERT-CoQAC: BERT-based Conversational Question Answering in Context", author = "Zaib, Munazza and Tran, Dai Hoang and Sagar, Subhash and Mahmood, Adnan and Zhang, Wei E. and Sheng, Quan Z.", booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing", month = "4", year = "2021", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/2104.11394", doi = "10.48550/arXiv.2104.11394" }
@article{devlin-2018-bert, title = "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding", author = "Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina", journal = "arXiv:1810.04805", year = "2018", url = "https://arxiv.org/abs/1810.04805", doi = "10.48550/arXiv.1810.04805" }
Usage
This model is suitable for applications in legal domains, such as:
- Legal document analysis: Extracting relevant information from legal texts.
- Question answering: Providing answers to legal questions based on the content of legal documents.
The model aims to reduce reliance on third-party APIs, which can incur higher costs, by providing a locally deployable solution for legal document processing.