antoinelouis's picture
Librarian Bot: Update dataset YAML metadata for model (#1)
594bc49
|
raw
history blame
1.58 kB
metadata
language: fr
license: cc-by-sa-4.0
tags:
  - legal
datasets: maastrichtlawtech/bsard
pipeline_tag: fill-mask
widget:
  - text: >-
      Chaque commune de la Région peut adopter un <mask> communal de
      développement, applicable à l'ensemble de son territoire.

Legal-CamemBERT

  • Legal-DistilCamemBERT is a DistilCamemBERT-based model further pre-trained on 23,000+ statutory articles from the Belgian legislation.
  • We chose the following training set-up: 50k training steps (200 epochs) with batches of 32 sequences of length 512 with an initial learning rate of 5e-5.
  • Training was performed on one Tesla V100 GPU with 32 GB using the code provided by Hugging Face.

Load Pretrained Model

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("maastrichtlawtech/legal-distilcamembert")
model = AutoModel.from_pretrained("maastrichtlawtech/legal-distilcamembert")

About Us

The Maastricht Law & Tech Lab develops algorithms, models, and systems that allow computers to process natural language texts from the legal domain.

Author: Antoine Louis on behalf of the Maastricht Law & Tech Lab.