spellingdragon
/

custom-whisper-handler

Automatic Speech Recognition

endpoints-template

Inference Endpoints

Model card Files Files and versions Community

custom-whisper-handler / README.md

spellingdragon's picture

Duplicate from philschmid/openai-whisper-endpoint

4963faa about 1 year ago

|

2.38 kB

	---
	license: mit
	tags:
	- audio
	- automatic-speech-recognition
	- endpoints-template
	library_name: generic
	inference: false
	---

	# OpenAI [Whisper](https://github.com/openai/whisper) Inference Endpoint example

	> Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

	For more information about the model, license and limitations check the original repository at [openai/whisper](https://github.com/openai/whisper).

	---

	This repository implements a custom `handler` task for `automatic-speech-recognition` for 🤗 Inference Endpoints using OpenAIs new Whisper model. The code for the customized pipeline is in the [pipeline.py](https://huggingface.co/philschmid/openai-whisper-endpoint/blob/main/handler.py).

	There is also a [notebook](https://huggingface.co/philschmid/openai-whisper-endpoint/blob/main/create_handler.ipynb) included, on how to create the `handler.py`

	### Request

	The endpoint expects a binary audio file. Below is a cURL example and a Python example using the `requests` library.

	curl

	```bash
	# load audio file
	wget https://cdn-media.huggingface.co/speech_samples/sample1.flac

	# run request
	curl --request POST \
	--url https://{ENDPOINT}/ \
	--header 'Content-Type: audio/x-flac' \
	--header 'Authorization: Bearer {HF_TOKEN}' \
	--data-binary '@sample1.flac'
	```

	Python

	```python
	import json
	from typing import List
	import requests as r
	import base64
	import mimetypes

	ENDPOINT_URL=""
	HF_TOKEN=""

	def predict(path_to_audio:str=None):
	# read audio file
	with open(path_to_audio, "rb") as i:
	b = i.read()
	# get mimetype
	content_type= mimetypes.guess_type(path_to_audio)[0]

	headers= {
	"Authorization": f"Bearer {HF_TOKEN}",
	"Content-Type": content_type
	}
	response = r.post(ENDPOINT_URL, headers=headers, data=b)
	return response.json()

	prediction = predict(path_to_audio="sample1.flac")

	prediction

	```
	expected output

	```json
	{"text": " going along slushy country roads and speaking to damp audiences in draughty school rooms day after day for a fortnight. He'll have to put in an appearance at some place of worship on Sunday morning, and he can come to us immediately afterwards."}
	```