Generate speech from text using a reference audio
Estimate gender, height, and torso area from an image