File size: 4,676 Bytes
f13d144 14e1f3b 4056d41 14e1f3b 373d5f7 14e1f3b fa6de38 14e1f3b 853cf78 14e1f3b bda8552 ac38a4f 14e1f3b 853cf78 ca0bd50 853cf78 e52c10a 14e1f3b 853cf78 e52c10a 14e1f3b e52c10a 14e1f3b a82b1e3 14e1f3b fa6de38 14e1f3b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
---
language:
- en
tags:
- myshell
- speech-to-speech
---
<!-- might put a [width=2000 * height=xxx] img here, this size best fits git page
<img src="resources\cover.png"> -->
<img src="resources/dreamvoice.png">
# DreamVoice: Text-guided Voice Conversion
--------------------
## Introduction
DreamVoice is an innovative approach to voice conversion (VC) that leverages text-guided generation to create personalized and versatile voice experiences.
Unlike traditional VC methods, which require a target recording during inference, DreamVoice introduces a more intuitive solution by allowing users to specify desired voice timbres through text prompts.
For more details, please check our interspeech paper: [DreamVoice](https://arxiv.org/abs/2406.16314)
To listen to demos and download dataset, please check dreamvoice's homepage: [Homepage](https://haidog-yaqub.github.io/dreamvoice_demo/)
# Model Usage
To load the models, you need to install packages:
```
pip install -r requirements.txt
```
Then you can use the model with the following code:
- NEW! DreamVoice Plugin for OpenVoice (DreamVG + [Opnevoice](https://github.com/myshell-ai/OpenVoice))
```python
import torch
from dreamvoice import DreamVoice_Plugin
from dreamvoice.openvoice_utils import se_extractor
from openvoice.api import ToneColorConverter
# init dreamvoice
dreamvoice = DreamVoice_Plugin(device='cuda')
# init openvoice
ckpt_converter = 'checkpoints_v2/converter'
openvoice = ToneColorConverter(f'{ckpt_converter}/config.json', device='cuda')
openvoice.load_ckpt(f'{ckpt_converter}/checkpoint.pth')
# generate speaker
prompt = 'cute female girl voice'
target_se = dreamvoice.gen_spk(prompt)
target_se = target_se.unsqueeze(-1)
# content source
source_path = 'examples/test2.wav'
source_se = se_extractor(source_path, openvoice).to(device)
# voice conversion
encode_message = "@MyShell"
openvoice.convert(
audio_src_path=source_path,
src_se=source_se,
tgt_se=target_se,
output_path='output.wav',
message=encode_message)
```
- DreamVoice Plugin for DiffVC (Diffusion-based VC Model)
```python
from dreamvoice import DreamVoice
# Initialize DreamVoice in plugin mode with CUDA device
dreamvoice = DreamVoice(mode='plugin', device='cuda')
# Description of the target voice
prompt = 'young female voice, sounds young and cute'
# Provide the path to the content audio and generate the converted audio
gen_audio, sr = dreamvoice.genvc('examples/test1.wav', prompt)
# Save the converted audio
dreamvoice.save_audio('gen1.wav', gen_audio, sr)
# Save the speaker embedding if you like the generated voice
dreamvoice.save_spk_embed('voice_stash1.pt')
# Load the saved speaker embedding
dreamvoice.load_spk_embed('voice_stash1.pt')
# Use the saved speaker embedding for another audio sample
gen_audio2, sr = dreamvoice.simplevc('examples/test2.wav', use_spk_cache=True)
dreamvoice.save_audio('gen2.wav', gen_audio2, sr)
```
- End-to-end DreamVoice VC Model
```python
from dreamvoice import DreamVoice
# Initialize DreamVoice in end-to-end mode with CUDA device
dreamvoice = DreamVoice(mode='end2end', device='cuda')
# Provide the path to the content audio and generate the converted audio
gen_end2end, sr = dreamvoice.genvc('examples/test1.wav', prompt)
# Save the converted audio
dreamvoice.save_audio('gen_end2end.wav', gen_end2end, sr)
# Note: End-to-end mode does not support saving speaker embeddings
# To use a voice generated in end-to-end mode, switch back to plugin mode
# and extract the speaker embedding from the generated audio
# Switch back to plugin mode
dreamvoice = DreamVoice(mode='plugin', device='cuda')
# Load the speaker audio from the previously generated file
gen_end2end2, sr = dreamvoice.simplevc('examples/test2.wav', speaker_audio='gen_end2end.wav')
# Save the new converted audio
dreamvoice.save_audio('gen_end2end2.wav', gen_end2end2, sr)
```
- DiffVC (Diffusion-based VC Model)
```python
from dreamvoice import DreamVoice
# Plugin mode can be used for traditional one-shot voice conversion
dreamvoice = DreamVoice(mode='plugin', device='cuda')
# Generate audio using traditional one-shot voice conversion
gen_tradition, sr = dreamvoice.simplevc('examples/test1.wav', speaker_audio='examples/speaker.wav')
# Save the converted audio
dreamvoice.save_audio('gen_tradition.wav', gen_tradition, sr)
```
## Reference
If you find the code useful for your research, please consider citing:
```bibtex
@article{hai2024dreamvoice,
title={DreamVoice: Text-Guided Voice Conversion},
author={Hai, Jiarui and Thakkar, Karan and Wang, Helin and Qin, Zengyi and Elhilali, Mounya},
journal={arXiv preprint arXiv:2406.16314},
year={2024}
}
``` |