Update README.md
Browse files
README.md
CHANGED
@@ -93,44 +93,6 @@ We anticipate that Moonshine models’ transcription capabilities may be used fo
|
|
93 |
|
94 |
There are also potential dual-use concerns that come with releasing Moonshine. While we hope the technology will be used primarily for beneficial purposes, making ASR technology more accessible could enable more actors to build capable surveillance technologies or scale up existing surveillance efforts, as the speed and accuracy allow for affordable automatic transcription and translation of large volumes of audio communication. Moreover, these models may have some capabilities to recognize specific individuals out of the box, which in turn presents safety concerns related both to dual use and disparate performance. In practice, we expect that the cost of transcription is not the limiting factor of scaling up surveillance projects.
|
95 |
|
96 |
-
## Setup
|
97 |
-
|
98 |
-
* Install `uv` for Python environment management
|
99 |
-
|
100 |
-
- Follow instructions [here](https://github.com/astral-sh/uv)
|
101 |
-
|
102 |
-
* Create and activate virtual environment
|
103 |
-
|
104 |
-
```shell
|
105 |
-
uv venv env_moonshine
|
106 |
-
source env_moonshine/bin/activate
|
107 |
-
```
|
108 |
-
|
109 |
-
* Install the `useful-moonshine` package from this github repo
|
110 |
-
|
111 |
-
```shell
|
112 |
-
uv pip install transformers torchaudio
|
113 |
-
```
|
114 |
-
|
115 |
-
* Test transcribing an audio file
|
116 |
-
|
117 |
-
```python
|
118 |
-
from transformers import AutoModelForSpeechSeq2Seq, AutoConfig, PreTrainedTokenizerFast
|
119 |
-
|
120 |
-
import torchaudio
|
121 |
-
import sys
|
122 |
-
|
123 |
-
audio, sr = torchaudio.load(sys.argv[1])
|
124 |
-
if sr != 16000:
|
125 |
-
audio = torchaudio.functional.resample(audio, sr, 16000)
|
126 |
-
|
127 |
-
model = AutoModelForSpeechSeq2Seq.from_pretrained('usefulsensors/moonshine-tiny', trust_remote_code=True)
|
128 |
-
tokenizer = PreTrainedTokenizerFast.from_pretrained('usefulsensors/moonshine-tiny')
|
129 |
-
|
130 |
-
tokens = model(audio)
|
131 |
-
print(tokenizer.decode(tokens[0], skip_special_tokens=True))
|
132 |
-
```
|
133 |
-
|
134 |
## Citation
|
135 |
If you benefit from our work, please cite us:
|
136 |
```
|
|
|
93 |
|
94 |
There are also potential dual-use concerns that come with releasing Moonshine. While we hope the technology will be used primarily for beneficial purposes, making ASR technology more accessible could enable more actors to build capable surveillance technologies or scale up existing surveillance efforts, as the speed and accuracy allow for affordable automatic transcription and translation of large volumes of audio communication. Moreover, these models may have some capabilities to recognize specific individuals out of the box, which in turn presents safety concerns related both to dual use and disparate performance. In practice, we expect that the cost of transcription is not the limiting factor of scaling up surveillance projects.
|
95 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
96 |
## Citation
|
97 |
If you benefit from our work, please cite us:
|
98 |
```
|