techiaith
/

whisper-base-ft-btb-cv-cy-cpp

Automatic Speech Recognition

Model card Files Files and versions Community

whisper-base-ft-btb-cv-cy-cpp / README.md

DewiBrynJones's picture

Update README.md

6820fe3 verified 6 days ago

|

1.72 kB

	---
	license: apache-2.0
	datasets:
	- techiaith/banc-trawsgrifiadau-bangor
	- techiaith/commonvoice_18_0_cy
	language:
	- cy
	base_model:
	- openai/whisper-base
	pipeline_tag: automatic-speech-recognition
	tags:
	- whispercpp
	---


	# whisper-base-ft-btb-cv-cy-cpp

	This model is a version of the [openai/whisper-base](https://huggingface.co/openai/whisper-base) model, fine-tuned with
	transcriptions of Welsh language spontaneous speech from Banc Trawsgrifiadau Bangor (btb) dataset, as well as read
	speech from Welsh Common Voice version 18 (cv) for additional training, and then
	[converted for use in whisper.cpp](https://github.com/ggerganov/whisper.cpp/tree/master/models#fine-tuned-models).

	Whispercpp is a C/C++ port of Whisper that provides high performance inference on hardware such as desktops, laptops
	and mobile devices, thus giving an offline option.

	The model is a smaller in size to the corresponding model for hosting on cloud GPU based infrastructure
	[techiaith/whisper-large-v3-ft-btb-cv-cy](https://huggingface.co/techiaith/whisper-large-v3-ft-btb-cv-cy) and thus
	not as accurate.

	It achieves the following WER results for transcribing Welsh language spontaneous speech:

	- WER: 62.76
	- CER: 27.70


	## Usage

	whispercpp makes it easy to use models in many platforms and applications. See the 'examples' folder
	in the whispercpp github repo for more information and example code.

	To get quickly started with whispercpp's basic usage however, follow the '[Quick Start](https://github.com/ggerganov/whisper.cpp?tab=readme-ov-file#quick-start)'
	but download this model with the following command:


	`$ wget https://huggingface.co/techiaith/whisper-base-ft-btb-cv-cy-cpp/resolve/main/ggml-model.bin`