File size: 1,562 Bytes
d8c3725
ae9a579
088683b
ae9a579
 
ae9a869
 
ae9a579
088683b
 
 
d8c3725
ae9a579
e5a285f
ae9a579
e5a285f
ae9a579
 
 
 
 
 
 
 
 
 
e5a285f
 
ae9a579
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
language: fr
license: mit
tags:
- legal
datasets: maastrichtlawtech/bsard
pipeline_tag: fill-mask
widget:
- text: >-
    Chaque commune de la Région peut adopter un <mask> communal de
    développement, applicable à l'ensemble de son territoire.
---

# Legal-CamemBERT-Base

* Legal-CamemBERT-Base is a [CamemBERT-Base](https://huggingface.co/camembert-base) model further pre-trained on [23,000+ legislative articles](https://huggingface.co/datasets/maastrichtlawtech/bsard) from the Belgian legislation.
* We chose the following training set-up: 50k training steps (200 epochs) with batches of 32 sequences of length 512 with an initial learning rate of 5e-5.
* Training was performed on one Tesla V100 GPU with 32 GB using the [code](https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_mlm.py) provided by Hugging Face.

---

### Load Pretrained Model

```python
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("maastrichtlawtech/legal-camembert-base")
model = AutoModel.from_pretrained("maastrichtlawtech/legal-camembert-base")
```

### About Us

The [Maastricht Law & Tech Lab](https://www.maastrichtuniversity.nl/about-um/faculties/law/research/law-and-tech-lab) develops algorithms, models, and systems that allow computers to process natural language texts from the legal domain.

Author: [Antoine Louis](https://antoinelouis.co) on behalf of the [Maastricht Law & Tech Lab](https://www.maastrichtuniversity.nl/about-um/faculties/law/research/law-and-tech-lab).