VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration

VoiceRestore is a cutting-edge speech restoration model designed to significantly enhance the quality of degraded voice recordings. Leveraging flow-matching transformers, this model excels at addressing a wide range of audio imperfections commonly found in speech, including background noise, reverberation, distortion, and signal loss.

It is based on this repo & demo of audio restorations: VoiceRestore

Usage - using Transformers πŸ€—

!git lfs install
!git clone https://huggingface.co/jadechoghari/VoiceRestore
%cd VoiceRestore
!pip install -r requirements.txt
from transformers import AutoModel
# path to the model folder (on colab it's as follows)
checkpoint_path = "/content/VoiceRestore"
model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True)
model("test_input.wav", "test_output.wav")
#add short=False if audio is > 10 seconds
model("long.mp3", "long_output.mp3", short=False)

Example

Degraded Input:

Degraded Input Audio


Restored (steps=32, cfg=1.0):

Restored audio - 16 steps, strength 0.5:


Key Features

  • Universal Restoration: The model can handle any level and type of voice recording degradation. Pure magic.
  • Easy to Use: Simple interface for processing degraded audio files.
  • Pretrained Model: Includes a 301 million parameter transformer model with pre-trained weights. (Model is still in the process of training, there will be further checkpoint updates)

Model Details

  • Architecture: Flow-matching transformer
  • Parameters: 300M+ parameters
  • Input: Degraded speech audio (various formats supported)
  • Output: Restored speech

Limitations and Future Work

  • Current model is optimized for speech; may not perform optimally on music or other audio types.
  • Ongoing research to improve performance on extreme degradations.
  • Future updates may include real-time processing capabilities.

Citation

If you use VoiceRestore in your research, please cite our paper:

@article{kirdey2024voicerestore,
  title={VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration},
  author={Kirdey, Stanislav},
  journal={arXiv},
  year={2024}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Downloads last month
144
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.

Spaces using jadechoghari/VoiceRestore 2