--- language: en tags: - bert - regression - biencoder - similarity pipeline_tag: text-similarity --- # BiEncoder Regression Model This model is a BiEncoder architecture that outputs similarity scores between text pairs. ## Model Details - Base Model: bert-base-uncased - Task: Regression - Architecture: BiEncoder with cosine similarity - Loss Function: cosine_embedding ## Usage ```python from transformers import AutoTokenizer, AutoModel from modeling import BiEncoderModelRegression # Load model components tokenizer = AutoTokenizer.from_pretrained("minoosh/bert-reg-biencoder-cosine_embedding") base_model = AutoModel.from_pretrained("bert-base-uncased") model = BiEncoderModelRegression(base_model, loss_fn="cosine_embedding") # Load weights state_dict = torch.load("pytorch_model.bin") model.load_state_dict(state_dict) # Prepare inputs texts1 = ["first text"] texts2 = ["second text"] inputs = tokenizer( texts1, texts2, padding=True, truncation=True, return_tensors="pt" ) # Get similarity scores outputs = model(**inputs) similarity_scores = outputs["logits"] ``` ## Metrics The model was trained using cosine_embedding loss and evaluated using: - Mean Squared Error (MSE) - Mean Absolute Error (MAE) - Pearson Correlation - Spearman Correlation - Cosine Similarity