File size: 7,739 Bytes
7e85272
 
 
 
 
 
 
 
 
 
c8be32d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7e85272
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
---
license: mit
title: ULTIMATE RVC
sdk: gradio
emoji: 🏢
colorFrom: indigo
colorTo: indigo
pinned: true
short_description: An ap
---
# Ultimate RVC

An extension of [AiCoverGen](https://github.com/SociallyIneptWeeb/AICoverGen), which provides several new features and improvements, enabling users to generate song covers using RVC with ease. Ideal for people who want to incorporate singing functionality into their AI assistant/chatbot/vtuber, or for people who want to hear their favourite characters sing their favourite song.

<!-- Showcase: TBA -->

![](images/webui_generate.png?raw=true)

Ultimate RVC is under constant development and testing, but you can try it out right now locally or on Google Colab!

## New Features

* Easy and automated setup using launcher scripts for both windows and debian-based linux systems
* Caching system which saves intermediate audio files as needed, thereby reducing inference time as much as possible. For example, if song A has already been converted using model B and now you want to convert song A using model C, then vocal extraction can be skipped and inference time reduced drastically
* Ability to listen to intermediate audio files in the UI. This is useful for getting an idea of what is happening in each step of the song cover generation pipeline
* A "multi-step" song cover generation tab: here you can try out each step of the song cover generation pipeline in isolation. For example, if you already have extracted vocals available and only want to convert these using your voice model, then you can do that here. Besides, this tab is useful for experimenting with settings for each step of the song cover generation pipeline
* An overhaul of the song input component for the song cover generation pipeline. Now cached input songs can be selected from a dropdown, so that you don't have to supply the Youtube link of a song each time you want to convert it. 
* A new "manage models" tab, which collects and revamps all existing functionality for managing voice models, as well as adds some new features, such as the ability to delete existing models
* A "manage audio", which allows you to to interact with all audio generated by the app. Currently, this tab supports deleting audio files.
* Lots of visual and performance improvements resulting from updating from Gradio 3 to Gradio 4 and from python 3.9 to python 3.11

<!-- ## Changelog

TBA -->

#### PRO TIP: Use a GPU for faster processing

While it is possible to run the Ultimate RVC web app on a CPU, it is highly recommended to use a GPU for faster processing. On an NVIDIA 3080 GPU, the AI cover generation process takes approximately 1.5 minutes, while on a CPU, it takes approximately 15 minutes. No testing has been done on AMD GPUs, so no guarantees are made for their performance.

## Colab notebook

For those without a powerful enough NVIDIA GPU, you may try Ultimate RVC out using Google Colab.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JackismyShephard/ultimate-rvc/blob/main/notebooks/ultimate_rvc_colab.ipynb)

For those who want to run this locally, follow the setup guide below.

## Setup

### Install Git

Follow the instructions [here](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) to install Git on your computer.

### Clone Ultimate RVC repository
Open a terminal and run the following commands to clone this entire repository and open it locally.
```
git clone https://github.com/JackismyShephard/ultimate-rvc
cd ultimate-rvc
```

### Install dependencies

#### Windows
Run the following command to install the necessary dependencies on Windows:
```
./urvc.bat install 
```
Note that this will install Miniconda in your user directory. 
The whole process may take upwards of 10 minutes, so grab a cup of coffee and wait.

#### Linux (Debian-based)

Run the following command to install the necessary dependencies on Debian-based Linux distributions (e.g. Ubuntu):
```
./urvc.sh install 
```
The command has been tested only on Ubuntu 22.04 and 24.04 so support for other distributions is not guaranteed. 
Also note that the command will install the CUDA 12.1 toolkit system-wide. In case you have problems, you may need to install the toolkit manually.

## Usage

### Start the app

#### Windows

```
./urvc.bat run
```
#### Linux (Debian-based)

```
./urvc.sh run
```


Once the following output message `Running on local URL:  http://127.0.0.1:7860` appears, you can click on the link to open a tab with the web app.

### Manage models


#### Download models

![](images/webui_dl_model.png?raw=true)

Navigate to the `Download model` subtab under the `Manage models` tab, and paste the download link to an RVC model and give it a unique name.
You may search the [AI Hub Discord](https://discord.gg/aihub) where already trained voice models are available for download.
The downloaded zip file should contain the .pth model file and an optional .index file.

Once the 2 input fields are filled in, simply click `Download`! Once the output message says `[NAME] Model successfully downloaded!`, you should be able to use it in the `Generate song covers` tab!

#### Upload models

![](images/webui_upload_model.png?raw=true)

For people who have trained RVC v2 models locally and would like to use them for AI cover generations.
Navigate to the `Upload model` subtab under the `Manage models` tab, and follow the instructions.
Once the output message says `Model with name [NAME] successfully uploaded!`, you should be able to use it in the `Generate song covers` tab!

#### Delete RVC models

TBA

### Generate song covers

#### One-click generation


![](images/webui_generate.png?raw=true)

- From the Voice model dropdown menu, select the voice model to use.
- In the song input field, copy and paste the link to any song on YouTube, the full path to a local audio file, or select a cached input song.
- Pitch should be set to either -12, 0, or 12 depending on the original vocals and the RVC AI modal. This ensures the voice is not *out of tune*.
- Other advanced options for vocal conversion, audio mixing and etc. can be viewed by clicking the appropriate accordion arrow to expand.

Once all options are filled in, click `Generate` and the AI generated cover should appear in a less than a few minutes depending on your GPU.

#### Multi-step generation
TBA

<!-- ## CLI
TBA -->

## Update to latest version

Run the following command to pull latest changes from the repository and reinstall dependencies. 
Note that the process may take upwards of 5 minutes.
#### Windows

```
./urvc.bat update
```

#### Linux (Debian-based)

```
./urvc.sh update
```

## Development mode

When developing new features or debugging, it is recommended to run the app in development mode. This enables hot reloading, which means that the app will automatically reload when changes are made to the code.

#### Windows

```
./urvc.bat dev
```

#### Linux (Debian-based)

```
./urvc.sh dev
```


## Terms of Use

The use of the converted voice for the following purposes is prohibited.

* Criticizing or attacking individuals.

* Advocating for or opposing specific political positions, religions, or ideologies.

* Publicly displaying strongly stimulating expressions without proper zoning.

* Selling of voice models and generated voice clips.

* Impersonation of the original owner of the voice with malicious intentions to harm/hurt others.

* Fraudulent purposes that lead to identity theft or fraudulent phone calls.

## Disclaimer

I am not liable for any direct, indirect, consequential, incidental, or special damages arising out of or in any way connected with the use/misuse or inability to use this software.