SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("dabraldeepti25/embedding-model-midterm-submission")
# Run inference
sentences = [
    'Comprehensive Flight Analysis: NYC to PAR\n            \n            Route Overview:\n            \n    Route Characteristics:\n    - Distance and typical duration\n    - Common connection points\n    - Seasonal weather impact\n    - Time zone considerations\n    \n    Market Analysis:\n    - Popular travel periods\n    - Price fluctuation patterns\n    - Competing airlines\n    - Alternative routes\n    \n    Operational Considerations:\n    - Aircraft types commonly used\n    - Typical delays and causes\n    - Seasonal performance metrics\n    - Airport congestion analysis\n    \n            \n            Pricing Information:\n            Base Fare: 30.00\n            Total Price: 124.87 EUR\n            \n            Detailed Flight Information:\n            \n                    Flight Segment Analysis:\n                    Carrier: 6X 172\n                    Equipment: 744\n                    \n                    Departure Details:\n                    Airport: JFK\n                    Terminal: N/A\n                    Time: 2025-03-26T20:00:00\n                    \n                    Arrival Details:\n                    Airport: LHR\n                    Terminal: N/A\n                    Time: 2025-03-27T08:05:00\n                    \n                    Operational Information:\n                    - Aircraft specifications\n                    - Typical on-time performance\n                    - Seasonal reliability metrics\n                    \n                    Flight Segment Analysis:\n                    Carrier: 6X 312\n                    Equipment: 319\n                    \n                    Departure Details:\n                    Airport: LHR\n                    Terminal: 4\n                    Time: 2025-03-27T12:15:00\n                    \n                    Arrival Details:\n                    Airport: CDG\n                    Terminal: 1\n                    Time: 2025-03-27T14:25:00\n                    \n                    Operational Information:\n                    - Aircraft specifications\n                    - Typical on-time performance\n                    - Seasonal reliability metrics\n                    \n            \n            Route Market Analysis:\n            - Historical price trends\n            - Peak travel periods\n            - Alternative routing options\n            - Alliance and codeshare details\n            \n            Airport Information:\n            \n    Airport: NYC\n    \n    Terminal Information:\n    - Layout and facilities\n    - Transfer processes\n    - Security procedures\n    - Lounges and services\n    \n    Ground Transportation:\n    - Public transit options\n    - Taxi and ride-share\n    - Car rental facilities\n    - Parking services\n    \n    Amenities:\n    - Dining options\n    - Shopping facilities\n    - Business services\n    - Medical facilities\n    \n            \n    Airport: PAR\n    \n    Terminal Information:\n    - Layout and facilities\n    - Transfer processes\n    - Security procedures\n    - Lounges and services\n    \n    Ground Transportation:\n    - Public transit options\n    - Taxi and ride-share\n    - Car rental facilities\n    - Parking services\n    \n    Amenities:\n    - Dining options\n    - Shopping facilities\n    - Business services\n    - Medical facilities\n    \n            \n            Travel Planning Guidelines:\n            - Optimal booking windows\n            - Fare class benefits\n            - Baggage policies\n            - Transit visa requirements\n            - Connection considerations\n            \n            Additional Services:\n            - Available ancillary services\n            - Lounge access details\n            - Special assistance services\n            - Meal and seat selection options',
    'Hotel Comprehensive Profile: PREMIER INN LONDON EALING\n            \n            Location Analysis:\n            City: LON\n            Precise Location: 51.51339, \n                            -0.30961\n            Country: GB\n            \n            Property Details:\n            Chain: PI\n            Category: Standard Hotel\n            Last Updated: 2023-06-15T10:25:08\n            \n            Neighborhood Overview:\n            \n    Neighborhood Characteristics:\n    - Local atmosphere and vibe\n    - Safety and security assessment\n    - Proximity to business districts\n    - Entertainment and dining options\n    - Cultural attractions nearby\n    - Shopping facilities\n    - Green spaces and recreation\n    \n    Transportation Hub Analysis:\n    - Major transit stations\n    - Bus and tram routes\n    - Taxi availability\n    - Bike-sharing stations\n    \n    Local Life:\n    - Popular local venues\n    - Markets and shopping areas\n    - Cultural institutions\n    - Sports facilities\n    \n            \n            Detailed Amenities:\n            \n    Room Features:\n    - Climate control systems\n    - Entertainment options\n    - Work space configuration\n    - Connectivity solutions\n    \n    Property Facilities:\n    - Dining venues\n    - Meeting spaces\n    - Wellness facilities\n    - Recreation options\n    \n    Business Services:\n    - Conference facilities\n    - Technical support\n    - Business center\n    - Translation services\n    \n    Guest Services:\n    - Concierge assistance\n    - Room service hours\n    - Laundry facilities\n    - Airport transfers\n    \n            \n            Transportation Access:\n            - Distance from major airports\n            - Public transit options\n            - Parking facilities\n            - Local transportation services\n            \n            Guest Services:\n            - Check-in/out policies\n            - Room service availability\n            - Business facilities\n            - Wellness options\n            \n            Area Attractions:\n            \n        Historic Sites:\n        - Tower of London (Historic castle and fortress)\n        - Westminster Abbey (Gothic church, royal coronations)\n        - Buckingham Palace (Official residence of British monarch)\n        - St. Paul\'s Cathedral (Anglican cathedral, iconic dome)\n        \n        Cultural Venues:\n        - British Museum (World artifacts and art)\n        - Tate Modern (Modern and contemporary art)\n        - National Gallery (European paintings)\n        \n        Entertainment Districts:\n        - Covent Garden (Shopping, street performers, dining)\n        - Soho (Entertainment, theaters, restaurants)\n        \n            \n            Additional Information:\n            - Seasonal considerations\n            - Business travel amenities\n            - Family-friendly features\n            - Accessibility information\n            \n            \n            City Overview:\n            thumb|260px|Historical Routemaster double-decker bus outside St Paul\'s cathedral\nNoisy, vibrant and truly multicultural, \'\'\' London\'\'\' is a megalopolis of people, ideas and frenetic energy. The capital and largest city of the United Kingdom sits on the River Thames in South-East England. \'\'\'Greater London\'\'\' has a population of a little over 9 million. Considered one of the world\'s leading "global cities", London remains an international capital of culture, music, education, fashion, politics, finance and trade. For the visitor, there is a seemingly endless choice of historical sites, shopping, museums, food, art galleries, nightlife, and activities.  # Truncate to keep focus on hotel\n            \n            Local Transportation:\n            thumb|260px|Historical Routemaster double-decker bus outside St Paul\'s cathedral',
    'Hotel Comprehensive Profile: HOTEL GRACERY GINZA\n            \n            Location Analysis:\n            City: TYO\n            Precise Location: 35.66905, \n                            139.76364\n            Country: JP\n            \n            Property Details:\n            Chain: FG\n            Category: Standard Hotel\n            Last Updated: 2023-06-15T09:58:08\n            \n            Neighborhood Overview:\n            \n    Neighborhood Characteristics:\n    - Local atmosphere and vibe\n    - Safety and security assessment\n    - Proximity to business districts\n    - Entertainment and dining options\n    - Cultural attractions nearby\n    - Shopping facilities\n    - Green spaces and recreation\n    \n    Transportation Hub Analysis:\n    - Major transit stations\n    - Bus and tram routes\n    - Taxi availability\n    - Bike-sharing stations\n    \n    Local Life:\n    - Popular local venues\n    - Markets and shopping areas\n    - Cultural institutions\n    - Sports facilities\n    \n            \n            Detailed Amenities:\n            \n    Room Features:\n    - Climate control systems\n    - Entertainment options\n    - Work space configuration\n    - Connectivity solutions\n    \n    Property Facilities:\n    - Dining venues\n    - Meeting spaces\n    - Wellness facilities\n    - Recreation options\n    \n    Business Services:\n    - Conference facilities\n    - Technical support\n    - Business center\n    - Translation services\n    \n    Guest Services:\n    - Concierge assistance\n    - Room service hours\n    - Laundry facilities\n    - Airport transfers\n    \n            \n            Transportation Access:\n            - Distance from major airports\n            - Public transit options\n            - Parking facilities\n            - Local transportation services\n            \n            Guest Services:\n            - Check-in/out policies\n            - Room service availability\n            - Business facilities\n            - Wellness options\n            \n            Area Attractions:\n            Information about local attractions and points of interest.\n            \n            Additional Information:\n            - Seasonal considerations\n            - Business travel amenities\n            - Family-friendly features\n            - Accessibility information\n            \n            \n            City Overview:\n            thumb|240px|The bulk of the Tokyo Metropolitan Government Building, [[Tokyo/Shinjuku|Shinjuku]]\n:\'\'Tokyo can be broadly divided into the "23 special wards", "Tama region" and "Islands". This article is about the 23 special wards of Tokyo, which corresponds to what many think of as the "city of Tokyo". For information on Tokyo as a prefecture, Tama region and Islands, see Tokyo Metropolis.\'\'\n\'\'\'Tokyo\'\'\' ( \'\'Tōkyō\'\') is the enormous and wealthy capital of Japan, and its main city, overflowing with culture, commerce, and most of all, people. As the most populated urban area in the world, Tokyo is a fascinating and dynamic metropolis that mixes foreign influences, consumer culture and global business along with remnants of the capital of old Japan. From modern electronics and gleaming skyscrapers to cherry blossoms and the Imperial Palace, this city represents the entire sweep of Japanese history and culture. Tokyo truly has something for every traveller.  # Truncate to keep focus on hotel\n            \n            Local Transportation:\n            thumb|240px|The bulk of the Tokyo Metropolitan Government Building, [[Tokyo/Shinjuku|Shinjuku]]\n:\'\'Tokyo can be broadly divided into the "23 special wards", "Tama region" and "Islands". This article is about the 23 special wards of Tokyo, which corresponds to what many think of as the "city of Tokyo". For information on Tokyo as a prefecture, Tama region and Islands, see Tokyo Metropolis.\'\'\n\'\'\'Tokyo\'\'\' ( \'\'Tōkyō\'\') is the enormous and wealthy capital of Japan, and its main city, overflowing with\n            \n            Nearby Attractions:',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.0341
spearman_cosine 0.0194

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,677 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 85 tokens
    • mean: 229.9 tokens
    • max: 256 tokens
    • min: 85 tokens
    • mean: 229.06 tokens
    • max: 256 tokens
    • min: 0.0
    • mean: 0.15
    • max: 0.3
  • Samples:
    sentence_0 sentence_1 label
    '''New York''' (known as "The Big Apple", "NYC," and often called "New York City") is a global center for media, entertainment, art, fashion, research, finance, and trade. The bustling, cosmopolitan heart of the 4th largest metropolis in the world and by far the most populous city in the United States, New York has long been a key entry point and a defining city for the nation.
    From the Statue of Liberty in the harbor to the Empire State Building towering over the Manhattan skyline, from the tunnels of the subway to the riches of Wall Street, from the bright signs of Times Square to the naturalistic beauty of Central Park, and from Yankee Stadium in the Bronx to Coney Island in Brooklyn, New York's landmarks are quintessential American landmarks. The city's neighborhoods and streets are so iconic they have become ingrained into the American consciousness. Here the power, wealth and culture of the United States is on full display in one of the largest and most iconic skylines in the wor...
    City Overview:
    thumb
    300px
    Destination Guide: New York City

    Overview and Cultural Context:
    '''New York''' (known as "The Big Apple", "NYC," and often called "New York City") is a global center for media, entertainment, art, fashion, research, finance, and trade. The bustling, cosmopolitan heart of the 4th largest metropolis in the world and by far the most populous city in the United States, New York has long been a key entry point and a defining city for the nation.
    From the Statue of Liberty in the harbor to the Empire State Building towering over the Manhattan skyline, from the tunnels of the subway to the riches of Wall Street, from the bright signs of Times Square to the naturalistic beauty of Central Park, and from Yankee Stadium in the Bronx to Coney Island in Brooklyn, New York's landmarks are quintessential American landmarks. The city's neighborhoods and streets are so iconic they have become ingrained into the American consciousness. Here the power, wealth and culture of the U...
    Hotel Comprehensive Profile: POGGIO REGILLO HOTEL-FRASCATI

    Location Analysis:
    City: ROM
    Precise Location: 41.82707,
    12.68755
    Country: IT

    Property Details:
    Chain: HA
    Category: Standard Hotel
    Last Updated: 2023-06-15T10:10:11

    Neighborhood Overview:

    Neighborhood Characteristics:
    - Local atmosphere and vibe
    - Safety and security assessment
    - Proximity to business districts
    - Entertainment and dining options
    - Cultural attractions nearby
    - Shopping facilities
    - Green spaces and recreation

    Transportation Hub Analysis:
    - Major transit stations
    - Bus and tram routes
    - Taxi availability
    - Bike-sharing stations

    Local Life:
    - Popular local venues
    - Markets and shopping areas
    - Cultural institutions
    - Sports facilities

    ...
    0.008242591836162516
    Transportation and Getting Around:
    '''Rome''' (Italian and Latin: ''Roma''), the 'Eternal City', is the capital and largest city of Italy and of the Lazio region. It's the famed city of the Roman Empire, the Seven Hills, ''La Dolce Vita'', the Vatican City and ''Three Coins in the Fountain''. Rome, as a millennia-long centre of power, culture and religion, was the centre of one of the greatest civilisations ever, and has exerted a huge influence over the world in its circa 2500 years of existence.
    thumb
    300x300px The Colosseum
    The historic centre of the city is a UNESCO World Heritage Site. With wonderful palaces, thousand-year-old churches and basilicas, grand romantic ruins, opulent monuments, ornate statues and graceful fountains, Rome has an immensely rich historical heritage and cosmopolitan atmosphere, making it one of Europe's and the world's most visited, famous, influential and beautiful capitals. Today, Rome has a growing nightlife scene and is also seen as a shopping...
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            384,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step spearman_cosine
1.0 105 -0.0042
2.0 210 0.0080
3.0 315 0.0194

Framework Versions

  • Python: 3.13.1
  • Sentence Transformers: 3.4.1
  • Transformers: 4.49.0
  • PyTorch: 2.6.0+cpu
  • Accelerate: 0.26.0
  • Datasets: 3.3.1
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
5
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for dabraldeepti25/embedding-model-midterm-submission

Finetuned
(248)
this model

Evaluation results