Model Card: Resume Classification Using BERT
Model Overview
This model is a fine-tuned version of bert-base-uncased
designed for multiclass classification. It categorizes resumes into one of 24 predefined job categories, making it suitable for automated resume screening and classification tasks.
Dataset
The dataset used for fine-tuning consists of 2400+ resumes in string and PDF formats. These resumes are categorized into 24 job categories. The dataset is available at https://www.kaggle.com/competitions/jarvis-calling-hiring-contest/data
- Classes:
['ACCOUNTANT', 'ADVOCATE', 'AGRICULTURE', 'APPAREL', 'ARTS', 'AUTOMOBILE', 'AVIATION', 'BANKING', 'BPO', 'BUSINESS-DEVELOPMENT', 'CHEF', 'CONSTRUCTION', 'CONSULTANT', 'DESIGNER', 'DIGITAL-MEDIA', 'ENGINEERING', 'FINANCE', 'FITNESS', 'HEALTHCARE', 'HR', 'INFORMATION-TECHNOLOGY', 'PUBLIC-RELATIONS', 'SALES', 'TEACHER']
The dataset underwent significant preprocessing to remove noise and improve text quality for tokenization.
Preprocessing steps include:
- Removal of HTML tags, URLs, punctuation, unicode characters, escape sequences, stop words, and irrelevant white spaces.
- All the functions available in preprocessing.py
Model Configuration
Base Model:
bert-base-uncased
Fine-tuning Task: Multiclass classification (24 classes)
Preprocessing Summary: The preprocessing steps applied to the training data have been encapsulated in the
preprocess_function
to simplify and standardize usage.Model Output: The raw output consists of logits for each class. To obtain probabilities, you can apply the sigmoid activation function using torch.nn.Sigmoid().
Postprocessing: A postprocessing utility, included as the postprocess_function, converts the raw logits into the corresponding classified class names in text format for easier interpretation.
Training Details
The fine-tuning process involved:
- Input tokenization using
bert-base-uncased
tokenizer. - Feeding preprocessed text into the BERT model for contextual understanding.
- Output logits normalized using the sigmoid activation function to produce probabilities for each class.
- The entire training code is available in kaggle: https://www.kaggle.com/code/naandhu/bert-base-uncased-fine-tuned-for-classification
Model Output
The model provides raw output logits for each job category. These logits can be converted into probabilities using:
import torch.nn as nn
sigmoid = nn.Sigmoid()
probs = sigmoid(logits)
The highest probability corresponds to the predicted job category.
Use Cases
- Automated resume classification for HR platforms.
- Sorting resumes into industry-specific categories for targeted hiring processes.
- Candidate profiling and analysis for recruitment agencies.
Limitations
- Model performance is reliant on the quality and diversity of the dataset. Biases in the dataset may affect predictions.
- Preprocessing removes non-textual elements, which might strip out context-critical features.
- PDFs with poor formatting or heavy graphical content may not preprocess effectively.
Citation
If you use this model in your work, please cite:
"Resume Classification Model using BERT for Multiclass Job Categorization."
- Downloads last month
- 21
Model tree for Naandhu/bert-resume-classifier
Base model
google-bert/bert-base-uncased