SetFit with sentence-transformers/all-mpnet-base-v2

This is a SetFit model that can be used for Text Classification. This SetFit model uses sentence-transformers/all-mpnet-base-v2 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Sources

Model Labels

Label Examples
1
  • 'Does that mean that a fair transition must be ensured through taxation, including with a capital tax for the most wealthy?'
  • 'In fact, there are alternatives, there is a need for motivation to create reasonable parallel opportunities for job creation during a gradual transition.'
  • 'We show that it is possible to combine ecological sustainability with welfare, justice and development.'
0
  • 'As a representative of the Center Party, I am convinced that a transition to a fossil-independent transport sector and the fleet of vehicles is both necessary and possible.'
  • 'Natural solutions supporting the green digital transition aim to mitigate and adapt to climate change.'
  • 'Such a project is at the heart of the ecological transition: the Government, as well as parliamentarians and all the actors involved in this concession, have shown their commitment to this model, which is very innovative, and their ambition to accompany projects at the crossroads of these various issues.'

Evaluation

Metrics

Label Accuracy
all 0.9375

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("Francesco-A/setfit-all-mpnet-base-v2-non-augmented_dataset-133-shot-just_transition-v1.4.1")
# Run inference
preds = model("The protection of protected areas and nature reserves is essential to conserve biodiversity and preserve wild habitats.")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 5 31.4436 120
Label Training Sample Count
0 133
1 133

Training Hyperparameters

  • batch_size: (16, 16)
  • num_epochs: (3, 3)
  • max_steps: -1
  • sampling_strategy: oversampling
  • body_learning_rate: (2e-05, 1e-05)
  • head_learning_rate: 0.01
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • l2_weight: 0.01
  • seed: 1234
  • eval_max_steps: -1
  • load_best_model_at_end: True

Training Results

Epoch Step Training Loss Validation Loss
0.0009 1 0.2933 -
0.0449 50 0.2605 -
0.0898 100 0.2551 -
0.1346 150 0.2467 -
0.1795 200 0.233 -
0.2244 250 0.1117 -
0.2693 300 0.0049 -
0.3142 350 0.0007 -
0.3591 400 0.0004 -
0.4039 450 0.0003 -
0.4488 500 0.0002 -
0.4937 550 0.0002 -
0.5386 600 0.0002 -
0.5835 650 0.0002 -
0.6284 700 0.0001 -
0.6732 750 0.0001 -
0.7181 800 0.0001 -
0.7630 850 0.0001 -
0.8079 900 0.0001 -
0.8528 950 0.0001 -
0.8977 1000 0.0001 -
0.9425 1050 0.0001 -
0.9874 1100 0.0001 -
1.0 1114 - 0.0938
1.0323 1150 0.0001 -
1.0772 1200 0.0001 -
1.1221 1250 0.0001 -
1.1670 1300 0.0001 -
1.2118 1350 0.0001 -
1.2567 1400 0.0001 -
1.3016 1450 0.0001 -
1.3465 1500 0.0001 -
1.3914 1550 0.0001 -
1.4363 1600 0.0 -
1.4811 1650 0.0 -
1.5260 1700 0.0 -
1.5709 1750 0.0 -
1.6158 1800 0.0 -
1.6607 1850 0.0 -
1.7056 1900 0.0 -
1.7504 1950 0.0 -
1.7953 2000 0.0 -
1.8402 2050 0.0 -
1.8851 2100 0.0 -
1.9300 2150 0.0 -
1.9749 2200 0.0 -
2.0 2228 - 0.0951
2.0197 2250 0.0003 -
2.0646 2300 0.0012 -
2.1095 2350 0.0005 -
2.1544 2400 0.001 -
2.1993 2450 0.0001 -
2.2442 2500 0.0001 -
2.2890 2550 0.0001 -
2.3339 2600 0.0001 -
2.3788 2650 0.0001 -
2.4237 2700 0.0001 -
2.4686 2750 0.0001 -
2.5135 2800 0.0 -
2.5583 2850 0.0001 -
2.6032 2900 0.0 -
2.6481 2950 0.0 -
2.6930 3000 0.0 -
2.7379 3050 0.0 -
2.7828 3100 0.0 -
2.8276 3150 0.0 -
2.8725 3200 0.0 -
2.9174 3250 0.0 -
2.9623 3300 0.0 -
3.0 3342 - 0.0964

Framework Versions

  • Python: 3.10.14
  • SetFit: 1.1.0
  • Sentence Transformers: 3.3.1
  • Transformers: 4.44.0
  • PyTorch: 2.4.0
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
10
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Francesco-A/setfit-all-mpnet-base-v2-non-augmented_dataset-133-shot-just_transition-v1.4.1

Finetuned
(207)
this model

Evaluation results