aar2dee2 commited on
Commit
7aad239
·
1 Parent(s): e832ae5

fix audio output

Browse files
Files changed (2) hide show
  1. README.md +2 -71
  2. app.py +3 -3
README.md CHANGED
@@ -15,74 +15,5 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
15
 
16
  1. Create an account on Coqui and get an api key from [account settings](https://app.coqui.ai/account).
17
  2. [Clone a voice](https://docs.coqui.ai/reference/voices_clone_from_file_create) using a recording of length between 3 and 5 seconds. I used a `.wav` file.
18
- 3. [Create a sample of the cloned voice]()
19
-
20
- ## Creating a voice with [Resemble.ai](https://www.resemble.ai/)
21
-
22
- ### [API Reference for creating a voice](https://docs.app.resemble.ai/docs/resource_voice/create)
23
-
24
- 1. Get your Resemble API key after creating an account with [Resemble.ai](https://www.resemble.ai/).
25
- 2. Make a request to the endpoint with the `Authorization` header set to `Token <your_resemble_api_key>` and a JSON body with `name` and the `dataset_url`:
26
-
27
- ```
28
- {
29
- "name": "chatty_vader",
30
- "dataset_url": "https://huggingface.co/spaces/aar2dee2/chatty_vader/resolve/main/data.zip"
31
- }
32
- ```
33
-
34
- This will return a response like below.
35
-
36
- ```
37
- {
38
- "success": true,
39
- "item": {
40
- "uuid": "10f91c43",
41
- "name": "chatty_vader",
42
- "status": "initializing",
43
- "dataset_url": "https://huggingface.co/spaces/aar2dee2/chatty_vader/resolve/main/data.zip",
44
- "created_at": "2023-04-07T06:38:29.307Z",
45
- "updated_at": "2023-04-07T06:38:29.323Z"
46
- }
47
- }
48
- ```
49
-
50
- If you specify a `callback_url`, you get a notification from Resemble when the voice has been created.
51
-
52
- I temporarily modified the `app.py` so I could provide my huggingface space url as the `callback_url` in the `create a voice` request.
53
-
54
- ```
55
- def receive_data_from_resemble(data):
56
- print("data from resemble", data)
57
- return data
58
-
59
-
60
- iface = gr.Interface(fn=receive_data_from_resemble,
61
- inputs="json", outputs="json")
62
- iface.launch()
63
- ```
64
-
65
- Test creating a clip
66
- [api endpoint](https://app.resemble.ai/api/v2/projects/e2da3585/clips)
67
- json body;
68
-
69
- ```json
70
- {
71
- "title": "testing_vader",
72
- "body": "There is no need to panic. It will all be over soon",
73
- "voice_uuid": "f00d917f",
74
- "is_public": true,
75
- "callback_uri": "https://aar2dee2-chatty-vader.hf.space/run/predict"
76
- }
77
- ```
78
-
79
- response received:
80
-
81
- ```
82
- {
83
- "success": false,
84
- "message": "This voice is still building and cannot be used at this time."
85
- }
86
- ```
87
-
88
- Same for voice id "10f91c43"
 
15
 
16
  1. Create an account on Coqui and get an api key from [account settings](https://app.coqui.ai/account).
17
  2. [Clone a voice](https://docs.coqui.ai/reference/voices_clone_from_file_create) using a recording of length between 3 and 5 seconds. I used a `.wav` file.
18
+ 3. [Create a sample of the cloned voice](https://docs.coqui.ai/reference/samples_create)
19
+ 4. Store the sample and save the `voice_id` in your env variables as `COQUI_VOICE_ID`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app.py CHANGED
@@ -142,7 +142,7 @@ description = "Darth Vader resurrected with all the knowledge of humanity"
142
 
143
  mic_translate = gr.Interface(
144
  fn=main,
145
- inputs=gr.Audio(source="microphone", format="wav"),
146
  outputs=gr.Audio(label="Generated Speech", type="numpy"),
147
  title=title,
148
  description=description,
@@ -150,8 +150,8 @@ mic_translate = gr.Interface(
150
 
151
  file_translate = gr.Interface(
152
  fn=main,
153
- inputs=gr.Audio(source="upload", type="filepath", format="wav"),
154
- outputs=gr.Audio(label="Generated Speech", type="filepath", format="wav"),
155
  title=title,
156
  description=description,
157
  )
 
142
 
143
  mic_translate = gr.Interface(
144
  fn=main,
145
+ inputs=gr.Audio(source="microphone"),
146
  outputs=gr.Audio(label="Generated Speech", type="numpy"),
147
  title=title,
148
  description=description,
 
150
 
151
  file_translate = gr.Interface(
152
  fn=main,
153
+ inputs=gr.Audio(source="upload", type="filepath"),
154
+ outputs=gr.Audio(label="Generated Speech", type="numpy"),
155
  title=title,
156
  description=description,
157
  )