File size: 6,075 Bytes
195fd31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
---
title: Video Redaction
emoji: 🐨
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 5.14.0
app_file: app.py
pinned: false
---

# Promptable Video Redaction with Moondream

This tool uses Moondream 2B, a powerful yet lightweight vision-language model, to detect and redact objects from videos. Moondream can recognize a wide variety of objects, people, text, and more with high accuracy while being much smaller than traditional models. 

[Try it now.](https://huggingface.co/spaces/moondream/promptable-video-redaction)

## About Moondream

Moondream is a tiny yet powerful vision-language model that can analyze images and answer questions about them. It's designed to be lightweight and efficient while maintaining high accuracy. Some key features:

- Only 2B parameters
- Fast inference with minimal resource requirements
- Supports CPU and GPU execution
- Open source and free to use
- Can detect almost anything you can describe in natural language

Links:
- [GitHub Repository](https://github.com/vikhyat/moondream)
- [Hugging Face](https://huggingface.co/vikhyatk/moondream2)
- [Build with Moondream](http://docs.moondream.ai/)

## Features

- Real-time object detection in videos using Moondream
- Multiple visualization styles:
  - Censor: Black boxes over detected objects
  - Bounding Box: Traditional bounding boxes with labels
  - Hitmarker: Call of Duty style crosshair markers
- Optional grid-based detection for improved accuracy
- Flexible object type detection using natural language
- Frame-by-frame processing with IoU-based merging
- Batch processing of multiple videos
- Web-compatible output format
- User-friendly web interface
- Command-line interface for automation

## Requirements

- Python 3.8+
- OpenCV (cv2)
- PyTorch
- Transformers
- Pillow (PIL)
- tqdm
- ffmpeg
- numpy
- gradio (for web interface)

## Installation

1. Clone this repository and create a new virtual environment
```bash
git clone https://github.com/vikhyat/moondream/blob/main/recipes/promptable-video-redaction
python -m venv .venv
source .venv/bin/activate
```
2. Install the required packages:
```bash
pip install -r requirements.txt
```
3. Install ffmpeg:
   - On Ubuntu/Debian: `sudo apt-get install ffmpeg libvips`
   - On macOS: `brew install ffmpeg`
   - On Windows: Download from [ffmpeg.org](https://ffmpeg.org/download.html)
> Downloading libvips for Windows requires some additional steps, see [here](https://docs.moondream.ai/quick-start)

## Usage

### Web Interface

1. Start the web interface:
```bash
python app.py
```

2. Open the provided URL in your browser

3. Use the interface to:
   - Upload your video
   - Specify what to censor (e.g., face, logo, text)
   - Adjust processing speed and quality
   - Configure grid size for detection
   - Process and download the censored video

### Command Line Interface

1. Create an `inputs` directory in the same folder as the script:
```bash
mkdir inputs
```

2. Place your video files in the `inputs` directory. Supported formats:
   - .mp4
   - .avi
   - .mov
   - .mkv
   - .webm

3. Run the script:
```bash
python main.py
```

### Optional Arguments:
- `--test`: Process only first 3 seconds of each video (useful for testing detection settings)
```bash
python main.py --test
```

- `--preset`: Choose FFmpeg encoding preset (affects output quality vs. speed)
```bash
python main.py --preset ultrafast  # Fastest, lower quality
python main.py --preset veryslow   # Slowest, highest quality
```

- `--detect`: Specify what object type to detect (using natural language)
```bash
python main.py --detect person     # Detect people
python main.py --detect "red car"  # Detect red cars
python main.py --detect "person wearing a hat"  # Detect people with hats
```

- `--box-style`: Choose visualization style
```bash
python main.py --box-style censor     # Black boxes (default)
python main.py --box-style bounding-box       # Bounding box-style boxes with labels
python main.py --box-style hitmarker  # COD-style hitmarkers
```

- `--rows` and `--cols`: Enable grid-based detection by splitting frames
```bash
python main.py --rows 2 --cols 2   # Split each frame into 2x2 grid
python main.py --rows 3 --cols 3   # Split each frame into 3x3 grid
```

You can combine arguments:
```bash
python main.py --detect "person wearing sunglasses" --box-style bounding-box --test --preset "fast" --rows 2 --cols 2
```

### Visualization Styles

The tool supports three different visualization styles for detected objects:

1. **Censor** (default)
   - Places solid black rectangles over detected objects
   - Best for privacy and content moderation
   - Completely obscures the detected region

2. **Bounding Box**
   - Traditional object detection style
   - Red bounding box around detected objects
   - Label showing object type above the box
   - Good for analysis and debugging

3. **Hitmarker**
   - Call of Duty inspired visualization
   - White crosshair marker at center of detected objects
   - Small label above the marker
   - Stylistic choice for gaming-inspired visualization

Choose the style that best fits your use case using the `--box-style` argument.

## Output

Processed videos will be saved in the `outputs` directory with the format:
`[style]_[object_type]_[original_filename].mp4`

For example:
- `censor_face_video.mp4`
- `bounding-box_person_video.mp4`
- `hitmarker_car_video.mp4`

The output videos will include:
- Original video content
- Selected visualization style for detected objects
- Web-compatible H.264 encoding

## Notes

- Processing time depends on video length, grid size, and GPU availability
- GPU is strongly recommended for faster processing
- Requires sufficient disk space for temporary files
- Detection quality varies based on video quality and Moondream's ability to recognize the specified object
- Grid-based detection impacts performance significantly - use only when needed
- Web interface shows progress updates and errors
- Choose visualization style based on your use case
- Moondream can detect almost anything you can describe in natural language