Image Super-resolution via Diffusion Inversion
Identify speakers in an audio file
Describe audio with text