quanghuy123 commited on
Commit
2b64812
·
verified ·
1 Parent(s): 5b3a698

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -25
README.md CHANGED
@@ -20,37 +20,28 @@ tags:
20
 
21
  **BERT-Law** is a fine-tuned version of **BERT (Bidirectional Encoder Representations from Transformers)**, focusing on information extraction from legal documents. The model is specifically trained on a custom dataset called **UTE_LAW**, which consists of approximately 30,000 pairs of legal questions and related documents. The main goal of this model is to extract relevant information from legal text while reducing the costs associated with using third-party APIs.
22
 
 
 
23
  ### Key Features
24
  - **Base Model**: The model is built on top of `google-bert/bert-base-multilingual-cased`, which is a pre-trained multilingual BERT model.
25
  - **Fine-tuning**: It has been fine-tuned with the **UTE_LAW** dataset, focusing on extracting relevant information from legal texts.
26
  - **Model Type**: BERT-based model for **question-answering** tasks.
27
  - **Task**: The model is optimized for information extraction tasks, specifically designed to handle legal documents.
 
28
 
29
  ### Model Specifications
30
- - **Maximum Sequence Length**: 512 tokens
31
- - **Language**: Primarily focused on **Vietnamese** legal texts.
32
- - **License**: Apache-2.0 License
33
-
34
-
35
- @inproceedings{zaib-2021-bert-coqac,
36
- title = "BERT-CoQAC: BERT-based Conversational Question Answering in Context",
37
- author = "Zaib, Munazza and Tran, Dai Hoang and Sagar, Subhash and Mahmood, Adnan and Zhang, Wei E. and Sheng, Quan Z.",
38
- booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
39
- month = "4",
40
- year = "2021",
41
- publisher = "Association for Computational Linguistics",
42
- url = "https://arxiv.org/abs/2104.11394",
43
- doi = "10.48550/arXiv.2104.11394"
44
- }
45
-
46
- @article{devlin-2018-bert,
47
- title = "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding",
48
- author = "Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina",
49
- journal = "arXiv:1810.04805",
50
- year = "2018",
51
- url = "https://arxiv.org/abs/1810.04805",
52
- doi = "10.48550/arXiv.1810.04805"
53
- }
54
 
55
  ## Usage
56
 
@@ -58,4 +49,4 @@ This model is suitable for applications in legal domains, such as:
58
  - **Legal document analysis**: Extracting relevant information from legal texts.
59
  - **Question answering**: Providing answers to legal questions based on the content of legal documents.
60
 
61
- The model aims to reduce reliance on third-party APIs, which can incur higher costs, by providing a locally deployable solution for legal document processing.
 
20
 
21
  **BERT-Law** is a fine-tuned version of **BERT (Bidirectional Encoder Representations from Transformers)**, focusing on information extraction from legal documents. The model is specifically trained on a custom dataset called **UTE_LAW**, which consists of approximately 30,000 pairs of legal questions and related documents. The main goal of this model is to extract relevant information from legal text while reducing the costs associated with using third-party APIs.
22
 
23
+ Additionally, the model supports **Retrieval-Augmented Generation (RAG)**, which enhances its ability to handle smaller context windows, thereby optimizing API costs for third-party integrations. RAG is especially useful in scenarios where processing large legal documents in a single request might be inefficient or expensive.
24
+
25
  ### Key Features
26
  - **Base Model**: The model is built on top of `google-bert/bert-base-multilingual-cased`, which is a pre-trained multilingual BERT model.
27
  - **Fine-tuning**: It has been fine-tuned with the **UTE_LAW** dataset, focusing on extracting relevant information from legal texts.
28
  - **Model Type**: BERT-based model for **question-answering** tasks.
29
  - **Task**: The model is optimized for information extraction tasks, specifically designed to handle legal documents.
30
+ - **RAG Support**: Enhanced ability to process smaller context windows, improving cost-efficiency when using third-party APIs.
31
 
32
  ### Model Specifications
33
+ | Specification | Description |
34
+ |---------------------------|----------------------------------------------|
35
+ | **Maximum Sequence Length**| 512 tokens |
36
+ | **Language** | Primarily focused on **Vietnamese** legal texts|
37
+ | **License** | Apache-2.0 License |
38
+ | **Task** | Question-answering, Information extraction |
39
+ | **RAG Support** | Yes |
40
+
41
+ ### References
42
+ - Zaib, Munazza and Tran, Dai Hoang and Sagar, Subhash and Mahmood, Adnan and Zhang, Wei E. and Sheng, Quan Z. (2021). BERT-CoQAC: BERT-based Conversational Question Answering in Context. In *Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing*.
43
+ [Link](https://arxiv.org/abs/2104.11394)
44
+ - Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. [Link](https://arxiv.org/abs/1810.04805)
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  ## Usage
47
 
 
49
  - **Legal document analysis**: Extracting relevant information from legal texts.
50
  - **Question answering**: Providing answers to legal questions based on the content of legal documents.
51
 
52
+ The model aims to reduce reliance on third-party APIs, which can incur higher costs, by providing a locally deployable solution for legal document processing. With the integration of **RAG**, it further optimizes the extraction process by handling smaller context windows, improving efficiency and reducing costs when dealing with large or complex legal documents.