Spaces:

Miuzarte
/

SUI-svc-3.0

Runtime error

App Files Files Community

SUI-svc-3.0 / app.py

Miuzarte

Upload app.py

b550bd4 almost 2 years ago

raw

history blame

14.2 kB

	import io

	import gradio as gr
	import librosa
	import numpy as np
	import soundfile
	import torch
	from inference.infer_tool import Svc
	import logging
	logging.getLogger('numba').setLevel(logging.WARNING)

	model_name = "logs/48k/G_1M111000_sing.pth"
	config_name = "configs/config.json"

	svc_model = Svc(model_name, config_name)
	def vc_fn(input_audio, vc_transform):
	if input_audio is None:
	return None
	sampling_rate, audio = input_audio
	# print(audio.shape,sampling_rate)
	duration = audio.shape[0] / sampling_rate
	audio = (audio / np.iinfo(audio.dtype).max).astype(np.float32)
	if len(audio.shape) > 1:
	audio = librosa.to_mono(audio.transpose(1, 0))
	if sampling_rate != 16000:
	audio = librosa.resample(audio, orig_sr=sampling_rate, target_sr=16000)
	print(audio.shape)
	out_wav_path = io.BytesIO()
	soundfile.write(out_wav_path, audio, 16000, format="wav")
	out_wav_path.seek(0)
	out_audio, out_sr = svc_model.infer("suiji", vc_transform, out_wav_path)
	_audio = out_audio.cpu().numpy()
	return (48000, _audio)

	app = gr.Blocks()
	with app:
	with gr.Tabs():
	with gr.TabItem("SUI-svc-3.0"):
	gr.Markdown(value="""
	# 这是AI岁己歌声变声器的在线demo

	### 项目：[sovits 3.0 48kHz](https://github.com/innnky/so-vits-svc/tree/main) \| 目前模型训练状态：1000000steps底模 + 111000steps

	# sovits4.0版已经收炉，暂时没有在线demo，模型移步[Miuzarte/SUImodels](https://huggingface.co/Miuzarte/SUImodels)

	\|\|
	\|-\|
	\|\|

	## 一些注意事项❗❕❗❕：

	#### 输入的音频一定要是纯净的干音，不要把歌曲直接扔进来

	#### 和声和混响也不能有，UVR分离出人声之后需要注意一下

	#### 对陈述语气没多大作用，实在没干音库的话，你可以自己唱然后升十几个调慢慢试效果

	#### 推理出来有概率会给吸气音上电，需要后期小修一下，大概可能也许是因为炼太久糊了

	\|\|
	\|-\|
	\|\|

	Todo:

	1. 导出onnx（✔）

	2. 本地一键包（没必要）

	3. TTS，vits（working）
	""")
	vc_input3 = gr.Audio(label="输入音频（长度请控制在30s左右，过长可能会爆内存）")
	vc_transform = gr.Number(label="变调（整数，可以正负，半音数量，升高八度就是12）", value=0)
	vc_submit = gr.Button("转换", variant="primary")
	vc_output2 = gr.Audio(label="输出音频（最右侧三个点可以下载）")
	vc_submit.click(vc_fn, [vc_input3, vc_transform], [vc_output2])
	with gr.TabItem("仓库说明➕本地使用MoeSS高速推理的教程"):
	gr.Markdown(value="""
	## [仓库](https://huggingface.co/Miuzarte/SUImodels)内模型所用于训练的数据：

	\|Sovits3_v1\|Base/G_1000000.pth\|Singing/G_1M111000.pth\|Singing/G_100000.pth\|
	\|-:\|:-:\|:-:\|:-:\|
	\|训练集\|12月录播（除电台）、出道至今22条歌投、10条歌切、圣诞音声（27.5小时）\|Base/G_1000000.pth作为底模_2022年所有唱歌投稿、唱歌切片、圣诞音声（3.9小时）\|2022年所有唱歌投稿、唱歌切片、圣诞音声（3.9小时）\|

	#### [仓库](https://huggingface.co/Miuzarte/SUImodels)内G.pth、D.pth都有，欢迎作为底模用于进一步训练

	#### 如果要训练自己的数据请访问：[[项目Github仓库]](https://github.com/innnky/so-vits-svc)（32k分支少绕路，48k没什么人管，4.0流程和3.0大同小异）

	# 在本地使用 [MoeSS](https://github.com/NaruseMioShirakana/MoeSS) 推理：

	#### 因为该程序每次更新都会有较大的变化，下面的下载链接都将指向[[MoeSS 4.2.2]](https://github.com/NaruseMioShirakana/MoeSS/releases/tag/4.2.2)

	### 0. 下载[[MoeSS本体]](https://github.com/NaruseMioShirakana/MoeSS/releases/download/4.2.2/MoeSS-CPU.7z)、[[hubert]](https://huggingface.co/NaruseMioShirakana/MoeSS-SUBModel/resolve/main/hubert.7z)，并解压成以下的文件结构

	```
	MoeSS
	├── cleaners
	├── emotion
	├── hifigan
	├── hubert
	│ └── hubert.onnx
	├── Mods
	├── OutPuts
	├── temp
	├── avcodec-58.dll
	├── avformat-58.dll
	├── avutil-56.dll
	├── MoeSS.exe
	├── onnxruntime.dll
	├── onnxruntime_providers_shared.dll
	├── ParamsRegex.json
	├── ShirakanaUI.dmres
	├── swresample-3.dll
	└── swscale-5.dll
	```

	### 1. 下载[[转换好的onnx模型]](https://huggingface.co/Miuzarte/SUImodels/blob/main/sovits3_48k/v1/Singing/suijiSUI_v1_1M111000_SoVits.onnx)，放在 MoeSS\\\Mods\\suijiSUI_v1_1M111000 里面

	### 2. 在 MoeSS\\Mods 新建一个岁己SUI_v1_1M111k.json 并写入以下文本，保存时请确保编码为UTF-8，保存时请确保编码为UTF-8，保存时请确保编码为UTF-8

	```json
	{
	"Folder" : "suijiSUI_v1_1M111000",
	"Name" : "岁己SUI_v1_1M111k",
	"Type" : "SoVits",
	"Rate" : 48000,
	"Hop" : 320,
	"Hubert": "hubert",
	"SoVits3": true,
	"Characters" : ["岁己SUI"]
	}
	```

	#### 以上步骤完成之后的文件结构应该长这样

	```
	MoeSS
	├── cleaners
	├── emotion
	├── hifigan
	├── hubert
	│ └── hubert.onnx
	├── Mods
	│ ├── 岁己SUI_v1_1M111k.json
	│ └── suijiSUI_v1_1M111000
	│ └── suijiSUI_v1_1M111000_SoVits.onnx
	├── OutPuts
	├── temp
	├── avcodec-58.dll
	├── avformat-58.dll
	├── avutil-56.dll
	├── MoeSS.exe
	├── onnxruntime.dll
	├── onnxruntime_providers_shared.dll
	├── ParamsRegex.json
	├── ShirakanaUI.dmres
	├── swresample-3.dll
	└── swscale-5.dll
	```

	### （A卡不用看）如果要使用GPU推理的话，下载[[MoeSS-GPU.7z]](https://github.com/NaruseMioShirakana/MoeSS/releases/download/3.2.0/MoeSS-GPU.7z)并解压"MoeSS - CUDA.exe"、"onnxruntime_providers_cuda.dll"至 MoeSS 目录（全覆盖一遍也行）。注意：需要CUDA版本 ≥ 11.6 < 12 、 CUdnn < 83.0 ，目前30系显卡最新驱动是cuda12，需要降级，建议选CPU版本

	### 3. 运行 MoeSS.exe / Moess - CUDA.exe

	1. 在左上角选择模型 “SoVits:岁己SUI_v1_1M111k” 并等待加载，完成后右边会显示 “当前模型: 岁己SUI_v1_1M111k”

	2. 将音频文件拖入程序窗口或直接点击开始转换后选择文件或在左下角输入框中写入音频文件路径再点击开始转换，支持批量，如：

	从 3.0.0 到 4.0.1 MoeSS 终于支持了文件拖放

	```
	A:\\SUI\\so-vits-svc\\raw\\wavs\\2043.wav
	A:\\SUI\\so-vits-svc\\raw\\wavs\\2044.flac
	"B:\\引号\\加不加\\都行.mp3"
	"D:\\应该吧\\路径有空格\\最好还是加.aac"
	"Z:\\作者说\\只能用\\这五种格式.ogg"
	```

	3. 开始转换前可在弹出的参数框中调整对输入音频的升降调，确定后等待最下方进度条走完然后点右上角保存音频文件，批量推理会直接输出至 MoeSS\\OutPuts\\ 无需再保存

	\|下面的弃用\|下面的弃用\|下面的弃用\|
	\|:-\|:-:\|-:\|
	\|下面的弃用\|下面的弃用\|下面的弃用\|

	### 本地推理可调用GPU(NVIDIA)，3060Ti 8G可推理一条20(建议) - 30s的音频，过长音频可分割后批量处理，就算用CPU推理也比 Hugging Face 快不少

	# 在本地部署并使用 inference_main.py 处理的保姆级教程：

	#### 我都写成这样了再小白应该都能搞定（不怕麻烦的话）

	### 0. 创建一个存放文件的目录，例如 D:\\SUI\\

	### 1. 安装所需的软件

	1. [miniconda-Python3.8](https://docs.conda.io/en/latest/miniconda.html#windows-installers)（未测试其他Python版本）[点这里可以直接下载](https://repo.anaconda.com/miniconda/Miniconda3-py38_22.11.1-1-Windows-x86_64.exe)，Just Me 与 All Users 都行，其余可无脑下一步

	2. [git](https://git-scm.com/download/win)（建议使用便携版）[点这里可以直接下载(便携版v2.39.0.2)](https://github.com/git-for-windows/git/releases/download/v2.39.0.windows.2/PortableGit-2.39.0.2-64-bit.7z.exe)，路径填 D:\\SUI\\git\\

	3. [Visual Studio 生成工具](https://visualstudio.microsoft.com/zh-hans/)（用于编译pyworld，流程走完后可卸载）[点这里可以直接下载](https://c2rsetup.officeapps.live.com/c2r/downloadVS.aspx?sku=community&channel=Release&version=VS2022)，左边勾选“使用 C++ 的桌面开发”，右边只需以下四个，"MSVC v143 - VS 2022 C++......"、"适用于最新 v143 生成工具的 C++ ATL......"、"Windows 11 SDK......"、"用于 Windows 的 C++ CMake......"

	### 2. 在开始菜单中运行 Anaconda Powershell Prompt 并配置环境（除了工作目录，复制粘贴回车即可）

	```
	# 切换工作目录
	cd D:\\SUI\\
	# 拉取仓库
	.\\git\\bin\\git lfs clone https://huggingface.co/spaces/Miuzarte/SUI-svc-3.0
	# 切换工作目录至仓库内
	cd D:\\SUI\\SUI-svc-3.0\\
	# 创建并激活环境
	# 如果conda报SSL相关错误请关闭科学上网
	conda create -n sovits python=3.8 -y
	conda activate sovits

	# 更换国内清华源
	conda config --set show_channel_urls yes
	conda config --remove-key channels
	conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
	conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/menpo/
	conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda/
	conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2/
	conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
	conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
	conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
	pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
	```
	推理所使用的设备取决于安装的torch是否支持cuda，请仔细阅读以下汉字
	```
	# GPU（NVIDIA，CUDA版本不低于11.3）
	# 似乎10系及以前都不支持cuda11？
	# 如果pip报SSL相关错误请关闭科学上网
	pip install -r requirements_gpu.txt
	pip install https://download.pytorch.org/whl/cu113/torch-1.12.1%2Bcu113-cp38-cp38-win_amd64.whl
	pip install https://download.pytorch.org/whl/cu113/torchvision-0.13.1%2Bcu113-cp38-cp38-win_amd64.whl
	pip install https://download.pytorch.org/whl/cu113/torchaudio-0.12.1%2Bcu113-cp38-cp38-win_amd64.whl
	```
	```
	# CPU（x86，内存建议不小于8G）
	# 如果pip报SSL相关错误请关闭科学上网
	pip install -r requirements_cpu.txt
	```
	至此环境配置完成，关闭该终端窗口（方便我写下一步）

	### 3. 歌声音色转换

	1. 运行 Anaconda Powershell Prompt 切换工作目录并激活环境

	```
	cd D:\\SUI\\SUI-svc-3.0\\
	conda activate sovits
	```

	2. 如果想要像这个demo一样用网页的GUI处理，这条之后的可以跳过了

	```
	python app.py
	# 运行完成后日志会输出应用所在的端口，默认7860，则浏览器访问 127.0.0.1:7860
	# 不排除该端口被占用后程序选择了其他端口
	```

	3. 在 SUI-svc-3.0\\raw\\ 文件夹中放入需要转换的音频（wav格式），8G显存的情况下建议每条音频的长度控制在20(建议) - 30s（不包括无声部分），过长会爆显存导致处理时间超级加倍甚至直接报错

	4. 编辑 SUI-svc-3.0\\inference_main.py 的第23行（可参考第24行注释的格式），以及26行的变调，修改完保存时注意编码应为 UTF-8

	5. 在终端中运行 inference_main.py 开始推理

	```
	python inference_main.py
	# 音频将输出至 SUI-svc-3.0\\results\\ 文件夹
	```
	""")
	app.launch()