2121-8
/

japanese-parler-tts-large-bate

text2text-generation

Inference Endpoints

Model card Files Files and versions Community

japanese-parler-tts-large-bate / README.md

2121-8's picture

Create README.md

6f62095 verified 3 months ago

|

3.42 kB

	---
	language:
	- ja
	base_model:
	- parler-tts/parler-tts-large-v1
	- retrieva-jp/t5-base-long
	datasets:
	- ylacombe/libritts_r_filtered
	- ylacombe/libritts-r-filtered-descriptions-10k-v5-without-accents
	pipeline_tag: text-to-audio
	library_name: transformers
	tags:
	- text-to-speech
	- annotation
	- japanese
	---



	# Japanese Parler-TTS Large (β版)

	このリポジトリは、[parler-tts/parler-tts-large-v1](https://huggingface.co/parler-tts/parler-tts-large-v1)を基に、日本語でのテキスト読み上げを可能にするよう再学習したモデルを公開しています。本モデルは、軽量でありながら高品質な音声生成を提供します。

	注意: 本家の[Parler-TTS](https://huggingface.co/collections/parler-tts/parler-tts-fully-open-source-high-quality-tts-66164ad285ba03e8ffde214c)で使用されているtokenizerとは互換性がありません。本モデル用に独自のtokenizerが採用されています。

	現在このリポジトリはβ版です。機能やモデルの最適化は正式リリースに向けて進行中です。

	正式リリース版のURL: 準備中

	---


	## Japanese Parler-TTS Index

	- [Japanese Parler-TTS Mini (878M)](https://huggingface.co/2121-8/japanese-parler-tts-mini-bate)
	- [Japanese Parler-TTS Large (2.33B)](https://huggingface.co/2121-8/japanese-parler-tts-large-bate)


	---

	## 📖 クイックインデックス
	* [👨‍💻 インストール](#👨‍💻-インストール)
	* [🎲 ランダムな音声での使用方法](#🎲-ランダムな音声での使用方法)
	* [🎯 特定の話者を指定する方法](#🎯-特定の話者を指定する方法)
	* [謝辞](#謝辞)

	---

	## 🛠️ 使用方法

	### 👨‍💻 インストール

	以下のコマンドでインストールできます。

	```sh
	pip install git+https://github.com/huggingface/parler-tts.git
	pip install git+https://github.com/getuka/RubyInserter.git
	```

	---

	### 🎲 ランダムな音声での使用方法

	```python
	import torch
	from parler_tts import ParlerTTSForConditionalGeneration
	from transformers import AutoTokenizer
	import soundfile as sf

	device = "cuda:0" if torch.cuda.is_available() else "cpu"

	model = ParlerTTSForConditionalGeneration.from_pretrained("2121-8/japanese-parler-tts-large-bate").to(device)
	tokenizer = AutoTokenizer.from_pretrained("2121-8/japanese-parler-tts-large-bate")

	prompt = "こんにちは、今日はどのようにお過ごしですか？"
	description = "A female speaker with a slightly high-pitched voice delivers her words at a moderate speed with a quite monotone tone in a confined environment, resulting in a quite clear audio recording."

	input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
	prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

	generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
	audio_arr = generation.cpu().numpy().squeeze()
	sf.write("parler_tts_japanese_out.wav", audio_arr, model.config.sampling_rate)
	```

	---

	### 🎯 特定の話者を指定する方法

	準備中

	---

	## 謝辞

	本モデルの開発にあたり、以下の資源提供をしていただいた方々に感謝いたします。

	- [saldra](https://x.com/sald_ra)
	- [Witness](https://x.com/i_witnessed_it)

	彼らの貢献がなければ、このプロジェクトは実現しませんでした。

	---