AuthEcho_Project

This project contains well-trained deep learning models to predict the Speaker and their Gender.

The repository offers a Speaker and Gender Prediction System built using TensorFlow, Librosa, and Gradio. The application predicts the top 3 speakers and their probabilities from an audio file, determines the speaker's gender, and classifies unknown speakers using a confidence threshold.

Features

  • Predicts the top 3 speakers from an audio file.
  • Determines the gender of the speaker.
  • Identifies unknown speakers with a confidence threshold.
  • Provides a Gradio interface for easy testing.

Getting Started

Prerequisites

To run this application, you need:

  • Python: Version 3.8 or higher
  • Required Python libraries:
    • tensorflow
    • numpy
    • librosa
    • gradio
    • scikit-learn

Install the required libraries with:

pip install tensorflow numpy librosa gradio scikit-learn

Installation

  1. Clone the Repository:
git clone https://github.com/your-username/speaker-gender-prediction.git
cd speaker-gender-prediction
  1. Add Pre-Trained Models and Label Encoders:

Place the following files in the repository's root directory:

  • lstm_speaker_model.h5: Pre-trained speaker recognition model.
  • lstm_gender_model.h5: Pre-trained gender prediction model.
  • lstm_speaker_label.pkl: Label encoder for speaker classes.
  • lstm_gender_label.pkl: Label encoder for gender classes.

Usage

Run the application using:

python app.py

Gradio Interface

The Gradio interface allows you to:

  • Upload an audio file or record audio directly.
  • Predict the top 3 speakers and their probabilities.
  • Determine the gender of the speaker.
  • Detect and classify unknown speakers using confidence thresholds.

Project Structure

.
β”œβ”€β”€ app.py                # Main application file
β”œβ”€β”€ models/lstm_speaker_model.h5   # Pre-trained speaker model (to be added)
β”œβ”€β”€ models/lstm_gender_model.h5    # Pre-trained gender model (to be added)
β”œβ”€β”€ models/lstm_speaker_label.pkl  # Speaker label encoder (to be added)
β”œβ”€β”€ models/lstm_gender_label.pkl   # Gender label encoder (to be added)
β”œβ”€β”€ requirements.txt        # Python dependencies
└── README.md               # Project documentation

Example Output

Top 3 Predicted Speakers:

The top 3 predicted speakers are:
Speaker 1: 85.23%
Speaker 2: 10.12%
Speaker 3: 4.65%

The predicted gender is: Male

Unknown Speaker:

The top 3 predicted speakers are:
Unknown: 45.23%

The predicted gender is: Unknown

How It Works

  1. Feature Extraction:

    • Extracts MFCCs, chroma features, and spectral contrast from the input audio file using librosa.
  2. Speaker and Gender Models:

    • Speaker Model: A pre-trained LSTM model classifies the speaker based on extracted features.
    • Gender Model: A separate LSTM model determines the gender.
  3. Unknown Detection:

    • If the highest confidence score for a speaker is below a defined threshold, the speaker is classified as "Unknown."

Roadmap

  • Add support for real-time audio predictions.
  • Improve unknown speaker detection using open-set recognition techniques.
  • Expand the dataset for more robust gender classification.

Contributing

Contributions are welcome! To contribute:

  1. Fork the repository.
  2. Create a feature branch (git checkout -b feature-branch-name).
  3. Commit your changes (git commit -m "Add new feature").
  4. Push to the branch (git push origin feature-branch-name).
  5. Open a Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

  • TensorFlow: For building the deep learning models.
  • Librosa: For audio processing and feature extraction.
  • Gradio: For creating the user interface.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .