File size: 969 Bytes
8b71f51
 
 
 
57d984b
8b71f51
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
---
language: it
license: apache-2.0
widget:
- text: "Il <mask> ha chiesto revocarsi l'obbligo di pagamento"
---

# ITALIAN-LEGAL-BERT-SC
It is the [ITALIAN-LEGAL-BERT](https://huggingface.co/dlicari/Italian-Legal-BERT) variant pre-trained from scratch on Italian legal documents (ITA-LEGAL-BERT-SC) based on the CamemBERT architecture

## Training procedure
It was trained from scratch using a larger training dataset, 6.6GB of civil and criminal cases. 
We used [CamemBERT](https://huggingface.co/docs/transformers/main/en/model_doc/camembert) architecture with a language modeling head on top, AdamW Optimizer, initial learning rate 2e-5 (with linear learning rate decay), sequence length 512, batch size 18, 1 million training steps,
device 8*NVIDIA A100 40GB using distributed data parallel (each step performs 8 batches). It uses SentencePiece tokenization trained from scratch on a subset of training set (5 milions sentences) 
and vocabulary size of 32000