Update safetensors keys; update README usage

by tomaarsen HF staff - opened


Pull Request overview

  • Update the adapter_model.safetensors keys such that loading can be done more conveniently on an AutoModel.
  • Update the base model
  • Update the README with the new (simplified) inference via Sentence Transformers and Transformers.


I customized some code in my local peft installation to update the keys of the loaded adapter, allowing me to save it with the new keys (e.g. base_model.model.embeddings... instead of base_model.model.model.embeddings.... Beyond that the adapter is fully the same.

Now you can apply the PeftModel.from_pretrained directly over an AutoModel, rather than only over a torch.nn.Module with a model key pointing to the AutoModel.

Beyond that, the model now works with Sentence Transformers if you find that convenient:

from sentence_transformers import SentenceTransformer
from peft import PeftModel

model = SentenceTransformer("all-mpnet-base-v2")
model[0].auto_model = PeftModel.from_pretrained(

sentences = [
    "I love pineapple on pizza",
    "I hate pineapple on pizza",
    "I like pineapple on pizza",
embeddings = model.encode(sentences)

similarity = model.similarity(embeddings, embeddings)
# tensor([[1.0000, 0.5732, 0.9713],
#         [0.5732, 1.0000, 0.5804],
#         [0.9713, 0.5804, 1.0000]])
# I.e.: the first and third sentence are very similar, the second sentence is less similar to the other two

Note! This script uses revision to point to this PR. With other words, you can run this script right now before merging so you can verify whether the performance is identical. Here is the full Transformers-based script from the README with the revision argument as well:

from peft import PeftModel
from transformers import AutoModel, AutoTokenizer
import torch
import torch.nn as nn
import torch.nn.functional as F

class SiameseNetworkMPNet(nn.Module):
    def __init__(self, model_name, tokenizer, normalize=True):
        super(SiameseNetworkMPNet, self).__init__()

        self.model = AutoModel.from_pretrained(model_name)
        self.normalize = normalize
        self.tokenizer = tokenizer

    def apply_lora_weights(self, lora_model):
        self.model = PeftModel.from_pretrained(self.model, lora_model, revision="refs/pr/2")
        self.model = self.model.merge_and_unload()
        return self

    def forward(self, **inputs):
        model_output = self.model(**inputs)
        attention_mask = inputs['attention_mask']
        last_hidden_states = model_output.last_hidden_state  # First element of model_output contains all token embeddings
        embeddings = torch.sum(last_hidden_states * attention_mask.unsqueeze(-1), 1) / torch.clamp(attention_mask.sum(1, keepdim=True), min=1e-9) # mean_pooling
        if self.normalize:
            embeddings = F.layer_norm(embeddings, embeddings.shape[1:])
            embeddings = F.normalize(embeddings, p=2, dim=1)

        return embeddings

base_model_name = "sentence-transformers/all-mpnet-base-v2" 
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

# Load the base model
base_model = SiameseNetworkMPNet(model_name=base_model_name, tokenizer=tokenizer)

# Load and apply LoRA weights
lora_model = SiameseNetworkMPNet(model_name=base_model_name, tokenizer=tokenizer)

from sklearn.metrics.pairwise import cosine_similarity

def two_sentence_similarity(model, tokenizer, text1, text2):
    # Tokenize both texts
    tokens1 = tokenizer(text1, return_tensors="pt", max_length=128, truncation=True, padding="max_length")
    tokens2 = tokenizer(text2, return_tensors="pt", max_length=128, truncation=True, padding="max_length")
    # Generate embeddings
    embeddings1 = model(**tokens1).detach().cpu().numpy()
    embeddings2 = model(**tokens2).detach().cpu().numpy()
    # Compute cosine similarity
    similarity = cosine_similarity(embeddings1, embeddings2)
    print(f"Cosine Similarity: {similarity[0][0]}")
    return similarity[0][0]

# Example sentences
text1 = "I love pineapple on pizza"
text2 = "I hate pineapple on pizza"

print(f"For Base Model sentences: '{text1}' and '{text2}'")
two_sentence_similarity(base_model, tokenizer, text1, text2)
print(f"\nFor FineTuned Model sentences: '{text1}' and '{text2}'")
two_sentence_similarity(lora_model, tokenizer, text1, text2)


# Example sentences
text1 = "I love pineapple on pizza"
text2 = "I like pineapple on pizza"

print(f"For Base Model sentences: '{text1}' and '{text2}'")
two_sentence_similarity(base_model, tokenizer, text1, text2)
print(f"\n\nFor FineTuned Model sentences: '{text1}' and '{text2}'")
two_sentence_similarity(lora_model, tokenizer, text1, text2)

This should solve #1 and simplify the usage a bit. Let me know if you have any concerns or questions!

cc @vahidthegreat

  • Tom Aarsen
tomaarsen changed pull request status to open
vahidthegreat changed pull request status to merged


Hmm, I think this banner is because of the PEFT snippet that it tries to generate here: https://huggingface.co/vahidthegreat/StanceAware-SBERT?library=peft

I'm not sure what the valid options are. A fix is just to remove the library_name: peft and then it should be gone.

  • Tom Aarsen

I removed the library_name: peft for now.
Thanks a lot for all the edits. Amazing help!!!
I'm new to this environment so I'm still figuring things out and your inputs are really insightful to me.

Sign up or log in to comment