π Model Card: LEGIT-BART Series
ποΈ Model Overview
The LEGIT-BART models are a family of pre-trained transformer-based models for Italian legal text processing.
They build upon BART-IT (morenolq/bart-it
) and are further pre-trained on Italian legal corpora.
π‘ Key features:
- Extended context length with Local-Sparse-Global (LSG) Attention (up to 16,384 tokens) π
- Trained on legal documents such as statutes, case law, and contracts π
- Not fine-tuned for specific tasks (requires further adaptation)
π Available Models
Model | Description | Link |
---|---|---|
LEGIT-BART | Continued pre-training of morenolq/bart-it on Italian legal texts |
π Link |
LEGIT-BART-LSG-4096 | Continued pre-training of morenolq/bart-it , supporting 4,096 tokens |
π Link |
LEGIT-BART-LSG-16384 | Continued pre-training of morenolq/bart-it , supporting 16,384 tokens |
π Link |
LEGIT-SCRATCH-BART | Trained from scratch on Italian legal texts | π Link |
LEGIT-SCRATCH-BART-LSG-4096 | Trained from scratch with LSG attention, supporting 4,096 tokens | π Link |
LEGIT-SCRATCH-BART-LSG-16384 | Trained from scratch with LSG attention, supporting 16,384 tokens | π Link |
BART-IT-LSG-4096 | morenolq/bart-it with LSG attention, supporting 4,096 tokens (no legal adaptation) |
π Link |
BART-IT-LSG-16384 | morenolq/bart-it with LSG attention, supporting 16,384 tokens (no legal adaptation) |
π Link |
π οΈ Model Details
πΉ Architecture
- Base Model:
morenolq/bart-it
- Transformer Encoder-Decoder
- LSG Attention for long documents
- Specific tokenizers for models trained from scratch (underperforming continual pre-training in our experiments).
πΉ Training Data
- Dataset:
joelniklaus/Multi_Legal_Pile
- Types of legal texts used:
- Legislation (laws, codes, amendments)
- Case law (judicial decisions)
- Contracts (public legal agreements)
π How to Use
from transformers import BartForConditionalGeneration, AutoTokenizer
# Load tokenizer and model
model_name = "morenolq/LEGIT-BART"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)
# Example input
input_text = "<mask> 1234: Il contratto si intende concluso quando..."
inputs = tokenizer(input_text, return_tensors="pt", max_length=4096, truncation=True)
# Generate summary
summary_ids = model.generate(inputs.input_ids, max_length=150, num_beams=4, early_stopping=True)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("π Summary:", summary)
β οΈ Limitations & Ethical Considerations
- Not fine-tuned for specific tasks: The models are pre-trained on legal texts and may require further adaptation for specific legal NLP tasks (e.g., summarization, question-answering).
- Bias and fairness: Legal texts may contain biases present in the legal system. Care should be taken to ensure fairness and ethical use of the models.
- Legal advice: The models are not a substitute for professional legal advice. Always consult a qualified legal professional for legal matters.
π Reference
The paper presenting LEGIT-BART models is currently under review and will be updated here once published.
Coming soon...
- Downloads last month
- 2
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API does not yet support model repos that contain custom code.
Model tree for morenolq/BART-IT-LSG-4096
Base model
morenolq/bart-it