maastrichtlawtech
/

legal-camembert-base

feature-extraction

Inference Endpoints

Model card Files Files and versions Community

legal-camembert-base / README.md

antoinelouis's picture

Update README.md

e5a285f about 1 year ago

|

No virus

1.56 kB

	---
	language: fr
	license: mit
	tags:
	- legal
	datasets: maastrichtlawtech/bsard
	pipeline_tag: fill-mask
	widget:
	- text: >-
	Chaque commune de la Région peut adopter un <mask> communal de
	développement, applicable à l'ensemble de son territoire.
	---

	# Legal-CamemBERT-Base

	* Legal-CamemBERT-Base is a [CamemBERT-Base](https://huggingface.co/camembert-base) model further pre-trained on [23,000+ legislative articles](https://huggingface.co/datasets/maastrichtlawtech/bsard) from the Belgian legislation.
	* We chose the following training set-up: 50k training steps (200 epochs) with batches of 32 sequences of length 512 with an initial learning rate of 5e-5.
	* Training was performed on one Tesla V100 GPU with 32 GB using the [code](https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_mlm.py) provided by Hugging Face.

	---

	### Load Pretrained Model

	```python
	from transformers import AutoTokenizer, AutoModel

	tokenizer = AutoTokenizer.from_pretrained("maastrichtlawtech/legal-camembert-base")
	model = AutoModel.from_pretrained("maastrichtlawtech/legal-camembert-base")
	```

	### About Us

	The [Maastricht Law & Tech Lab](https://www.maastrichtuniversity.nl/about-um/faculties/law/research/law-and-tech-lab) develops algorithms, models, and systems that allow computers to process natural language texts from the legal domain.

	Author: [Antoine Louis](https://antoinelouis.co) on behalf of the [Maastricht Law & Tech Lab](https://www.maastrichtuniversity.nl/about-um/faculties/law/research/law-and-tech-lab).