Whisper-Tiny model in Unity Sentis (Version 1.4.0-pre.3*)
(*Sentis files from 1.3.0 and earlier will not be compatible and would need to be recreated.)
This is the Whisper Tiny model tested to work in Unity 2023. It is a speech-to-text model. You feed in a 16kHz wav file and it outputs the best guess for what was said in the audio.
How to Use
- Open a new scene in Unity 2023
- Import package
com.unity.sentis
version1.4.0-pre.3
from the package manager. - Put the
RunWhisper.cs
on the Main Camera - Put the *.sentis files and the
vocab.json
in the Assets/StreamingAssets folder - Add a 16kHz mono audio file up to 30 seconds long to your project and drag on to the audioClip field.
- IMPORTANT: The audio must be 16kHz. In the audio inspector select "Force Mono". And "Decompress on Load".
- You can add a step to convert 44kHz or 22kHz audio to 16kHz with this model
When you press play the transcription of the audio will be displayed in the console window.
Languages
The output starts with 4 tokens which you can set. One token specifies the input language. One token specifies whether it is straight transcription into the specified language or if it is translated to English. See here for more details. These special tokens are defined in the added_tokens.json file.
- Downloads last month
- 342
Inference API (serverless) does not yet support unity-sentis models for this pipeline type.