antoinelouis commited on
Commit
332838e
1 Parent(s): f002601

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md CHANGED
@@ -1,3 +1,32 @@
1
  ---
 
 
2
  license: cc-by-sa-4.0
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: fr
3
+ pipeline_tag: fill-mask
4
  license: cc-by-sa-4.0
5
+ tags:
6
+ - legal
7
+ widget:
8
+ - text: "Chaque commune de la Région peut adopter un <mask> communal de développement, applicable à l'ensemble de son territoire."
9
  ---
10
+
11
+ # Legal-CamemBERT
12
+
13
+ * Legal-DistilCamemBERT is a [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base)-based model further pre-trained on [23,000+ statutory articles](https://huggingface.co/datasets/maastrichtlawtech/bsard) from the Belgian legislation.
14
+ * We chose the following training set-up: 50k training steps (200 epochs) with batches of 32 sequences of length 512 with an initial learning rate of 5e-5.
15
+ * Training was performed on one Tesla V100 GPU with 32 GB using the [code](https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_mlm.py) provided by Hugging Face.
16
+
17
+ ---
18
+
19
+ ### Load Pretrained Model
20
+
21
+ ```python
22
+ from transformers import AutoTokenizer, AutoModel
23
+
24
+ tokenizer = AutoTokenizer.from_pretrained("maastrichtlawtech/legal-distilcamembert")
25
+ model = AutoModel.from_pretrained("maastrichtlawtech/legal-distilcamembert")
26
+ ```
27
+
28
+ ### About Us
29
+
30
+ The [Maastricht Law & Tech Lab](https://www.maastrichtuniversity.nl/about-um/faculties/law/research/law-and-tech-lab) develops algorithms, models, and systems that allow computers to process natural language texts from the legal domain.
31
+
32
+ Author: [Antoine Louis](https://antoinelouis.co) on behalf of the [Maastricht Law & Tech Lab](https://www.maastrichtuniversity.nl/about-um/faculties/law/research/law-and-tech-lab).