|
--- |
|
language: fr |
|
license: mit |
|
tags: |
|
- legal |
|
datasets: maastrichtlawtech/bsard |
|
pipeline_tag: fill-mask |
|
widget: |
|
- text: >- |
|
Chaque commune de la Région peut adopter un <mask> communal de |
|
développement, applicable à l'ensemble de son territoire. |
|
--- |
|
|
|
# Legal-CamemBERT-Base |
|
|
|
* Legal-CamemBERT-Base is a [CamemBERT-Base](https://huggingface.co/camembert-base) model further pre-trained on [23,000+ legislative articles](https://huggingface.co/datasets/maastrichtlawtech/bsard) from the Belgian legislation. |
|
* We chose the following training set-up: 50k training steps (200 epochs) with batches of 32 sequences of length 512 with an initial learning rate of 5e-5. |
|
* Training was performed on one Tesla V100 GPU with 32 GB using the [code](https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_mlm.py) provided by Hugging Face. |
|
|
|
--- |
|
|
|
### Load Pretrained Model |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModel |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("maastrichtlawtech/legal-camembert-base") |
|
model = AutoModel.from_pretrained("maastrichtlawtech/legal-camembert-base") |
|
``` |
|
|
|
### About Us |
|
|
|
The [Maastricht Law & Tech Lab](https://www.maastrichtuniversity.nl/about-um/faculties/law/research/law-and-tech-lab) develops algorithms, models, and systems that allow computers to process natural language texts from the legal domain. |
|
|
|
Author: [Antoine Louis](https://antoinelouis.co) on behalf of the [Maastricht Law & Tech Lab](https://www.maastrichtuniversity.nl/about-um/faculties/law/research/law-and-tech-lab). |