File size: 1,581 Bytes
7587f6c 332838e 7587f6c 332838e 594bc49 332838e 594bc49 7587f6c 332838e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
---
language: fr
license: cc-by-sa-4.0
tags:
- legal
datasets: maastrichtlawtech/bsard
pipeline_tag: fill-mask
widget:
- text: Chaque commune de la Région peut adopter un <mask> communal de développement,
applicable à l'ensemble de son territoire.
---
# Legal-CamemBERT
* Legal-DistilCamemBERT is a [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base)-based model further pre-trained on [23,000+ statutory articles](https://huggingface.co/datasets/maastrichtlawtech/bsard) from the Belgian legislation.
* We chose the following training set-up: 50k training steps (200 epochs) with batches of 32 sequences of length 512 with an initial learning rate of 5e-5.
* Training was performed on one Tesla V100 GPU with 32 GB using the [code](https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_mlm.py) provided by Hugging Face.
---
### Load Pretrained Model
```python
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("maastrichtlawtech/legal-distilcamembert")
model = AutoModel.from_pretrained("maastrichtlawtech/legal-distilcamembert")
```
### About Us
The [Maastricht Law & Tech Lab](https://www.maastrichtuniversity.nl/about-um/faculties/law/research/law-and-tech-lab) develops algorithms, models, and systems that allow computers to process natural language texts from the legal domain.
Author: [Antoine Louis](https://antoinelouis.co) on behalf of the [Maastricht Law & Tech Lab](https://www.maastrichtuniversity.nl/about-um/faculties/law/research/law-and-tech-lab). |