README: LoRA Fine-Tuned BERT for Body Parts NER
Model Overview
This repository contains a LoRA fine-tuned BERT model designed for Named Entity Recognition (NER) to identify body parts in text. The model is lightweight, efficient, and optimized for deployment in applications requiring precise entity extraction.
Features
- Base Model: BERT (bert-base-uncased)
- LoRA Fine-Tuning: Applied for token classification to recognize body parts.
- Custom Inference Method: Predicts entities with labels, start-end positions, and confidence scores.
- Lightweight Adapter: Uses LoRA adapters to reduce memory footprint.
Model Usage
This model can be used for Named Entity Recognition (NER) to identify body parts in text sentences.
Installation
Clone the repository:
git clone https://huggingface.co/gsri-18/lora_finetuned_bert_on_body_parts_ner_dataset_synthetic cd lora_finetuned_bert_on_body_parts_ner_dataset_synthetic
Install the required dependencies:
pip install transformers peft torch prettytable
Example Usage
Example input and output:
import torch
from transformers import BertTokenizerFast, BertForTokenClassification
from peft import PeftModel
from prettytable import PrettyTable # For tabular output
# Device configuration
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Hugging Face model repository name
base_model_name = "bert-base-uncased"
lora_model_name = "gsri-18/lora_finetuned_bert_on_body_parts_ner_dataset_synthetic"
# Load the tokenizer
tokenizer = BertTokenizerFast.from_pretrained(base_model_name)
# Load the base model
base_model = BertForTokenClassification.from_pretrained(base_model_name, num_labels=3).to(device)
# Load the LoRA fine-tuned model
model = PeftModel.from_pretrained(base_model, lora_model_name).to(device)
# Set the model to evaluation mode
model.eval()
# Function to predict entities with confidence scores
def predict_entities(sentence, label_mapping):
# Tokenize the sentence
inputs = tokenizer(sentence, return_tensors="pt", truncation=True, padding=True).to(device)
with torch.no_grad():
# Get model outputs
outputs = model(**inputs)
# Process logits
logits = outputs.logits
probabilities = torch.nn.functional.softmax(logits, dim=-1)
predictions = torch.argmax(logits, dim=-1).squeeze().cpu().numpy()
confidence_scores = torch.max(probabilities, dim=-1).values.squeeze().cpu().numpy()
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"].squeeze().cpu().numpy())
# Extract entities
entities = []
for i, (token, prediction, confidence) in enumerate(zip(tokens, predictions, confidence_scores)):
if token.startswith("##"): # Skip subword tokens
continue
label = label_mapping[prediction]
if label != "O": # Skip tokens labeled as "O"
start = sentence.find(token)
end = start + len(token) - 1
entities.append({
"token": token,
"label": label,
"start": start,
"end": end,
"confidence": round(float(confidence), 8) # Round to 8 decimal places
})
return entities
# Process and print predictions for multiple sentences
def process_and_print_predictions(sentences, label_mapping):
for i, sentence in enumerate(sentences, 1):
print(f"\nSentence {i}: {sentence}")
entities = predict_entities(sentence, label_mapping)
if not entities:
print("No body parts were detected in the sentence.")
else:
# Create a table for better formatting
table = PrettyTable()
table.field_names = ["Token", "Label", "Start", "End", "Confidence"]
for entity in entities:
table.add_row([entity['token'], entity['label'], entity['start'], entity['end'], f"{entity['confidence']:.8f}"])
print(table)
# Test sentences
test_sentences = [
"The arm connects to the shoulder and elbow.",
"He felt pain in his knee and spinal cord after running.",
"The Named Entity Recognition model is working well."
]
label_mapping = {0: "O", 1: "B-BODY", 2: "I-BODY"}
# Display results for all sentences
process_and_print_predictions(test_sentences, label_mapping)
Output
For the input sentences:
"The arm connects to the shoulder and elbow."
+----------+--------+-------+-----+------------+ | Token | Label | Start | End | Confidence | +----------+--------+-------+-----+------------+ | arm | B-BODY | 4 | 6 | 0.99613851 | | shoulder | B-BODY | 24 | 31 | 0.99928027 | | elbow | B-BODY | 37 | 41 | 0.99879104 | +----------+--------+-------+-----+------------+
"He felt pain in his knee and spinal cord after running."
+--------+--------+-------+-----+------------+ | Token | Label | Start | End | Confidence | +--------+--------+-------+-----+------------+ | knee | B-BODY | 20 | 23 | 0.99953437 | | spinal | B-BODY | 29 | 34 | 0.99941099 | | cord | I-BODY | 36 | 39 | 0.99498111 | +--------+--------+-------+-----+------------+
"The Named Entity Recognition model is working well." No body parts were detected in the sentence.
Files in the Repository
adapter_config.json
: Configuration for the LoRA adapter.adapter_model.safetensors
: The fine-tuned LoRA adapter weights.tokenizer.json
,vocab.txt
,tokenizer_config.json
: Tokenizer files.model-bert-synthetic-bpr-lora-large.pth
: The base model weights.
License
no-license
For issues or questions, please create an issue in the repository or reach out to the contributors.