File size: 1,156 Bytes
f150b2c f0db202 3ddd487 ed8beae 3ddd487 ed8beae f0db202 ed8beae f0db202 e21695a f0db202 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
---
license: cc-by-4.0
tags:
- Speech tokenizer
---
# Getting Started with XCodec2 on Hugging Face
XCodec2 is a speech tokenizer that offers the following key features:
1. **Single Vector Quantization**
2. **50 Tokens per Second**
3. **Multilingual Speech Semantic Support and High-Quality Speech Reconstruction**
To use `xcodec2`, ensure you have it installed. You can install it using the following command:
```bash
conda create -n xcodec2 python=3.9
conda activate xcodec2
pip install xcodec2==0.1.1
```
Then,
```python
import torch
import soundfile as sf
from transformers import AutoConfig
from xcodec2.modeling_xcodec2 import XCodec2Model
model_path = "HKUST-Audio/xcodec2"
model = XCodec2Model.from_pretrained(model_path)
model.eval().cuda()
wav, sr = sf.read("test.wav")
wav_tensor = torch.from_numpy(wav).float().unsqueeze(0)
with torch.no_grad():
# only 16khz speech
vq_code = model.encode_code(input_waveform=wav_tensor)
print("Code:", vq_code )
recon_wav = model.decode_code(vq_code).cpu()
sf.write("reconstructed.wav", recon_wav[0, 0, :].numpy(), sr)
print("Done! Check reconstructed.wav")
``` |