Merge branch 'ashawkey:main' into main
Browse files- assets/update_logs.md +4 -0
- docker/Dockerfile +53 -0
- docker/README.md +80 -0
- gradio_app.py +227 -0
- main.py +19 -11
- nerf/network.py +4 -5
- nerf/network_grid.py +4 -5
- nerf/network_tcnn.py +6 -16
- nerf/provider.py +12 -5
- nerf/renderer.py +4 -4
- nerf/sd.py +6 -6
- nerf/utils.py +42 -27
- raymarching/src/raymarching.cu +1 -1
- readme.md +25 -8
assets/update_logs.md
CHANGED
@@ -1,3 +1,7 @@
|
|
|
|
|
|
|
|
|
|
1 |
### 2022.10.5
|
2 |
* Basic reproduction finished.
|
3 |
* Non --cuda_ray, --tcnn are not working, need to fix.
|
|
|
1 |
+
### 2022.10.9
|
2 |
+
* The shading (partially) starts to work, at least it won't make scene empty. For some prompts, it shows better results (less severe Janus problem). The textureless rendering mode is still disabled.
|
3 |
+
* Enable shading by default (--albedo_iters 1000).
|
4 |
+
|
5 |
### 2022.10.5
|
6 |
* Basic reproduction finished.
|
7 |
* Non --cuda_ray, --tcnn are not working, need to fix.
|
docker/Dockerfile
ADDED
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
FROM nvidia/cuda:11.6.2-cudnn8-devel-ubuntu20.04
|
2 |
+
|
3 |
+
# Remove any third-party apt sources to avoid issues with expiring keys.
|
4 |
+
RUN rm -f /etc/apt/sources.list.d/*.list
|
5 |
+
|
6 |
+
RUN apt-get update
|
7 |
+
|
8 |
+
RUN DEBIAN_FRONTEND=noninteractive TZ=Europe/MADRID apt-get install -y tzdata
|
9 |
+
|
10 |
+
# Install some basic utilities
|
11 |
+
RUN apt-get install -y \
|
12 |
+
curl \
|
13 |
+
ca-certificates \
|
14 |
+
sudo \
|
15 |
+
git \
|
16 |
+
bzip2 \
|
17 |
+
libx11-6 \
|
18 |
+
python3 \
|
19 |
+
python3-pip \
|
20 |
+
libglfw3-dev \
|
21 |
+
libgles2-mesa-dev \
|
22 |
+
libglib2.0-0 \
|
23 |
+
&& rm -rf /var/lib/apt/lists/*
|
24 |
+
|
25 |
+
|
26 |
+
# Create a working directory
|
27 |
+
RUN mkdir /app
|
28 |
+
WORKDIR /app
|
29 |
+
|
30 |
+
RUN cd /app
|
31 |
+
RUN git clone https://github.com/ashawkey/stable-dreamfusion.git
|
32 |
+
|
33 |
+
|
34 |
+
RUN pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
|
35 |
+
|
36 |
+
WORKDIR /app/stable-dreamfusion
|
37 |
+
|
38 |
+
RUN pip3 install -r requirements.txt
|
39 |
+
RUN pip3 install git+https://github.com/NVlabs/nvdiffrast/
|
40 |
+
|
41 |
+
# Needs nvidia runtime, if you have "No CUDA runtime is found" error: https://stackoverflow.com/questions/59691207/docker-build-with-nvidia-runtime, first answer
|
42 |
+
RUN pip3 install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
|
43 |
+
|
44 |
+
RUN pip3 install git+https://github.com/openai/CLIP.git
|
45 |
+
RUN bash scripts/install_ext.sh
|
46 |
+
|
47 |
+
|
48 |
+
|
49 |
+
|
50 |
+
|
51 |
+
# Set the default command to python3
|
52 |
+
#CMD ["python3"]
|
53 |
+
|
docker/README.md
ADDED
@@ -0,0 +1,80 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
### Docker installation
|
2 |
+
|
3 |
+
## Build image
|
4 |
+
To build the docker image on your own machine, which may take 15-30 mins:
|
5 |
+
```
|
6 |
+
docker build -t stable-dreamfusion:latest .
|
7 |
+
```
|
8 |
+
|
9 |
+
If you have the error **No CUDA runtime is found** when building the wheels for tiny-cuda-nn you need to setup the nvidia-runtime for docker.
|
10 |
+
```
|
11 |
+
sudo apt-get install nvidia-container-runtime
|
12 |
+
```
|
13 |
+
Then edit `/etc/docker/daemon.json` and add the default-runtime:
|
14 |
+
```
|
15 |
+
{
|
16 |
+
"runtimes": {
|
17 |
+
"nvidia": {
|
18 |
+
"path": "nvidia-container-runtime",
|
19 |
+
"runtimeArgs": []
|
20 |
+
}
|
21 |
+
},
|
22 |
+
"default-runtime": "nvidia"
|
23 |
+
}
|
24 |
+
```
|
25 |
+
And restart docker:
|
26 |
+
```
|
27 |
+
sudo systemctl restart docker
|
28 |
+
```
|
29 |
+
Now you can build tiny-cuda-nn inside docker.
|
30 |
+
|
31 |
+
## Download image
|
32 |
+
To download the image (~6GB) instead:
|
33 |
+
```
|
34 |
+
docker pull supercabb/stable-dreamfusion:3080_0.0.1
|
35 |
+
docker tag supercabb/stable-dreamfusion:3080_0.0.1 stable-dreamfusion
|
36 |
+
```
|
37 |
+
|
38 |
+
## Use image
|
39 |
+
|
40 |
+
You can launch an interactive shell inside the container:
|
41 |
+
|
42 |
+
```
|
43 |
+
docker run --gpus all -it --rm -v $(cd ~ && pwd):/mnt stable-dreamfusion /bin/bash
|
44 |
+
```
|
45 |
+
From this shell, all the code in the repo should work.
|
46 |
+
|
47 |
+
To run any single command `<command...>` inside the docker container:
|
48 |
+
```
|
49 |
+
docker run --gpus all -it --rm -v $(cd ~ && pwd):/mnt stable-dreamfusion /bin/bash -c "<command...>"
|
50 |
+
```
|
51 |
+
To train:
|
52 |
+
```
|
53 |
+
export TOKEN="#HUGGING FACE ACCESS TOKEN#"
|
54 |
+
docker run --gpus all -it --rm -v $(cd ~ && pwd):/mnt stable-dreamfusion /bin/bash -c "echo ${TOKEN} > TOKEN \
|
55 |
+
&& python3 main.py --text \"a hamburger\" --workspace trial -O"
|
56 |
+
|
57 |
+
```
|
58 |
+
Run test without gui:
|
59 |
+
```
|
60 |
+
export PATH_TO_WORKSPACE="#PATH_TO_WORKSPACE#"
|
61 |
+
docker run --gpus all -it --rm -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix:ro -v $(cd ~ && pwd):/mnt \
|
62 |
+
-v $(cd ${PATH_TO_WORKSPACE} && pwd):/app/stable-dreamfusion/trial stable-dreamfusion /bin/bash -c "python3 \
|
63 |
+
main.py --workspace trial -O --test"
|
64 |
+
```
|
65 |
+
Run test with gui:
|
66 |
+
```
|
67 |
+
export PATH_TO_WORKSPACE="#PATH_TO_WORKSPACE#"
|
68 |
+
xhost +
|
69 |
+
docker run --gpus all -it --rm -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix:ro -v $(cd ~ && pwd):/mnt \
|
70 |
+
-v $(cd ${PATH_TO_WORKSPACE} && pwd):/app/stable-dreamfusion/trial stable-dreamfusion /bin/bash -c "python3 \
|
71 |
+
main.py --workspace trial -O --test --gui"
|
72 |
+
xhost -
|
73 |
+
```
|
74 |
+
|
75 |
+
|
76 |
+
|
77 |
+
|
78 |
+
|
79 |
+
|
80 |
+
|
gradio_app.py
ADDED
@@ -0,0 +1,227 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import torch
|
2 |
+
import argparse
|
3 |
+
|
4 |
+
from nerf.provider import NeRFDataset
|
5 |
+
from nerf.utils import *
|
6 |
+
|
7 |
+
import gradio as gr
|
8 |
+
import gc
|
9 |
+
|
10 |
+
print(f'[INFO] loading options..')
|
11 |
+
|
12 |
+
# fake config object, this should not be used in CMD, only allow change from gradio UI.
|
13 |
+
parser = argparse.ArgumentParser()
|
14 |
+
parser.add_argument('--text', default=None, help="text prompt")
|
15 |
+
# parser.add_argument('-O', action='store_true', help="equals --fp16 --cuda_ray --dir_text")
|
16 |
+
# parser.add_argument('-O2', action='store_true', help="equals --fp16 --dir_text")
|
17 |
+
parser.add_argument('--test', action='store_true', help="test mode")
|
18 |
+
parser.add_argument('--save_mesh', action='store_true', help="export an obj mesh with texture")
|
19 |
+
parser.add_argument('--eval_interval', type=int, default=10, help="evaluate on the valid set every interval epochs")
|
20 |
+
parser.add_argument('--workspace', type=str, default='trial_gradio')
|
21 |
+
parser.add_argument('--guidance', type=str, default='stable-diffusion', help='choose from [stable-diffusion, clip]')
|
22 |
+
parser.add_argument('--seed', type=int, default=0)
|
23 |
+
|
24 |
+
### training options
|
25 |
+
parser.add_argument('--iters', type=int, default=10000, help="training iters")
|
26 |
+
parser.add_argument('--lr', type=float, default=1e-3, help="initial learning rate")
|
27 |
+
parser.add_argument('--ckpt', type=str, default='latest')
|
28 |
+
parser.add_argument('--cuda_ray', action='store_true', help="use CUDA raymarching instead of pytorch")
|
29 |
+
parser.add_argument('--max_steps', type=int, default=1024, help="max num steps sampled per ray (only valid when using --cuda_ray)")
|
30 |
+
parser.add_argument('--num_steps', type=int, default=64, help="num steps sampled per ray (only valid when not using --cuda_ray)")
|
31 |
+
parser.add_argument('--upsample_steps', type=int, default=64, help="num steps up-sampled per ray (only valid when not using --cuda_ray)")
|
32 |
+
parser.add_argument('--update_extra_interval', type=int, default=16, help="iter interval to update extra status (only valid when using --cuda_ray)")
|
33 |
+
parser.add_argument('--max_ray_batch', type=int, default=4096, help="batch size of rays at inference to avoid OOM (only valid when not using --cuda_ray)")
|
34 |
+
parser.add_argument('--albedo_iters', type=int, default=1000, help="training iters that only use albedo shading")
|
35 |
+
# model options
|
36 |
+
parser.add_argument('--bg_radius', type=float, default=1.4, help="if positive, use a background model at sphere(bg_radius)")
|
37 |
+
parser.add_argument('--density_thresh', type=float, default=10, help="threshold for density grid to be occupied")
|
38 |
+
# network backbone
|
39 |
+
parser.add_argument('--fp16', action='store_true', help="use amp mixed precision training")
|
40 |
+
parser.add_argument('--backbone', type=str, default='grid', help="nerf backbone, choose from [grid, tcnn, vanilla]")
|
41 |
+
# rendering resolution in training, decrease this if CUDA OOM.
|
42 |
+
parser.add_argument('--w', type=int, default=64, help="render width for NeRF in training")
|
43 |
+
parser.add_argument('--h', type=int, default=64, help="render height for NeRF in training")
|
44 |
+
parser.add_argument('--jitter_pose', action='store_true', help="add jitters to the randomly sampled camera poses")
|
45 |
+
|
46 |
+
### dataset options
|
47 |
+
parser.add_argument('--bound', type=float, default=1, help="assume the scene is bounded in box(-bound, bound)")
|
48 |
+
parser.add_argument('--dt_gamma', type=float, default=0, help="dt_gamma (>=0) for adaptive ray marching. set to 0 to disable, >0 to accelerate rendering (but usually with worse quality)")
|
49 |
+
parser.add_argument('--min_near', type=float, default=0.1, help="minimum near distance for camera")
|
50 |
+
parser.add_argument('--radius_range', type=float, nargs='*', default=[1.0, 1.5], help="training camera radius range")
|
51 |
+
parser.add_argument('--fovy_range', type=float, nargs='*', default=[40, 70], help="training camera fovy range")
|
52 |
+
parser.add_argument('--dir_text', action='store_true', help="direction-encode the text prompt, by appending front/side/back/overhead view")
|
53 |
+
parser.add_argument('--angle_overhead', type=float, default=30, help="[0, angle_overhead] is the overhead region")
|
54 |
+
parser.add_argument('--angle_front', type=float, default=60, help="[0, angle_front] is the front region, [180, 180+angle_front] the back region, otherwise the side region.")
|
55 |
+
|
56 |
+
parser.add_argument('--lambda_entropy', type=float, default=1e-4, help="loss scale for alpha entropy")
|
57 |
+
parser.add_argument('--lambda_opacity', type=float, default=0, help="loss scale for alpha value")
|
58 |
+
parser.add_argument('--lambda_orient', type=float, default=1e-2, help="loss scale for orientation")
|
59 |
+
|
60 |
+
### GUI options
|
61 |
+
parser.add_argument('--gui', action='store_true', help="start a GUI")
|
62 |
+
parser.add_argument('--W', type=int, default=800, help="GUI width")
|
63 |
+
parser.add_argument('--H', type=int, default=800, help="GUI height")
|
64 |
+
parser.add_argument('--radius', type=float, default=3, help="default GUI camera radius from center")
|
65 |
+
parser.add_argument('--fovy', type=float, default=60, help="default GUI camera fovy")
|
66 |
+
parser.add_argument('--light_theta', type=float, default=60, help="default GUI light direction in [0, 180], corresponding to elevation [90, -90]")
|
67 |
+
parser.add_argument('--light_phi', type=float, default=0, help="default GUI light direction in [0, 360), azimuth")
|
68 |
+
parser.add_argument('--max_spp', type=int, default=1, help="GUI rendering max sample per pixel")
|
69 |
+
|
70 |
+
opt = parser.parse_args()
|
71 |
+
|
72 |
+
# default to use -O !!!
|
73 |
+
opt.fp16 = True
|
74 |
+
opt.dir_text = True
|
75 |
+
opt.cuda_ray = True
|
76 |
+
# opt.lambda_entropy = 1e-4
|
77 |
+
# opt.lambda_opacity = 0
|
78 |
+
|
79 |
+
if opt.backbone == 'vanilla':
|
80 |
+
from nerf.network import NeRFNetwork
|
81 |
+
elif opt.backbone == 'tcnn':
|
82 |
+
from nerf.network_tcnn import NeRFNetwork
|
83 |
+
elif opt.backbone == 'grid':
|
84 |
+
from nerf.network_grid import NeRFNetwork
|
85 |
+
else:
|
86 |
+
raise NotImplementedError(f'--backbone {opt.backbone} is not implemented!')
|
87 |
+
|
88 |
+
print(opt)
|
89 |
+
|
90 |
+
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
91 |
+
|
92 |
+
print(f'[INFO] loading models..')
|
93 |
+
|
94 |
+
if opt.guidance == 'stable-diffusion':
|
95 |
+
from nerf.sd import StableDiffusion
|
96 |
+
guidance = StableDiffusion(device)
|
97 |
+
elif opt.guidance == 'clip':
|
98 |
+
from nerf.clip import CLIP
|
99 |
+
guidance = CLIP(device)
|
100 |
+
else:
|
101 |
+
raise NotImplementedError(f'--guidance {opt.guidance} is not implemented.')
|
102 |
+
|
103 |
+
train_loader = NeRFDataset(opt, device=device, type='train', H=opt.h, W=opt.w, size=100).dataloader()
|
104 |
+
valid_loader = NeRFDataset(opt, device=device, type='val', H=opt.H, W=opt.W, size=5).dataloader()
|
105 |
+
test_loader = NeRFDataset(opt, device=device, type='test', H=opt.H, W=opt.W, size=100).dataloader()
|
106 |
+
|
107 |
+
print(f'[INFO] everything loaded!')
|
108 |
+
|
109 |
+
trainer = None
|
110 |
+
model = None
|
111 |
+
|
112 |
+
# define UI
|
113 |
+
|
114 |
+
with gr.Blocks(css=".gradio-container {max-width: 512px; margin: auto;}") as demo:
|
115 |
+
|
116 |
+
# title
|
117 |
+
gr.Markdown('[Stable-DreamFusion](https://github.com/ashawkey/stable-dreamfusion) Text-to-3D Example')
|
118 |
+
|
119 |
+
# inputs
|
120 |
+
prompt = gr.Textbox(label="Prompt", max_lines=1, value="a DSLR photo of a koi fish")
|
121 |
+
iters = gr.Slider(label="Iters", minimum=1000, maximum=20000, value=5000, step=100)
|
122 |
+
seed = gr.Slider(label="Seed", minimum=0, maximum=2147483647, step=1, randomize=True)
|
123 |
+
button = gr.Button('Generate')
|
124 |
+
|
125 |
+
# outputs
|
126 |
+
image = gr.Image(label="image", visible=True)
|
127 |
+
video = gr.Video(label="video", visible=False)
|
128 |
+
logs = gr.Textbox(label="logging")
|
129 |
+
|
130 |
+
# gradio main func
|
131 |
+
def submit(text, iters, seed):
|
132 |
+
|
133 |
+
global trainer, model
|
134 |
+
|
135 |
+
# seed
|
136 |
+
opt.seed = seed
|
137 |
+
opt.text = text
|
138 |
+
opt.iters = iters
|
139 |
+
|
140 |
+
seed_everything(seed)
|
141 |
+
|
142 |
+
# clean up
|
143 |
+
if trainer is not None:
|
144 |
+
del model
|
145 |
+
del trainer
|
146 |
+
gc.collect()
|
147 |
+
torch.cuda.empty_cache()
|
148 |
+
print('[INFO] clean up!')
|
149 |
+
|
150 |
+
# simply reload everything...
|
151 |
+
model = NeRFNetwork(opt)
|
152 |
+
optimizer = lambda model: torch.optim.Adam(model.get_params(opt.lr), betas=(0.9, 0.99), eps=1e-15)
|
153 |
+
scheduler = lambda optimizer: optim.lr_scheduler.LambdaLR(optimizer, lambda iter: 0.1 ** min(iter / opt.iters, 1))
|
154 |
+
|
155 |
+
trainer = Trainer('df', opt, model, guidance, device=device, workspace=opt.workspace, optimizer=optimizer, ema_decay=0.95, fp16=opt.fp16, lr_scheduler=scheduler, use_checkpoint=opt.ckpt, eval_interval=opt.eval_interval, scheduler_update_every_step=True)
|
156 |
+
|
157 |
+
# train (every ep only contain 8 steps, so we can get some vis every ~10s)
|
158 |
+
STEPS = 8
|
159 |
+
max_epochs = np.ceil(opt.iters / STEPS).astype(np.int32)
|
160 |
+
|
161 |
+
# we have to get the explicit training loop out here to yield progressive results...
|
162 |
+
loader = iter(valid_loader)
|
163 |
+
|
164 |
+
start_t = time.time()
|
165 |
+
|
166 |
+
for epoch in range(max_epochs):
|
167 |
+
|
168 |
+
trainer.train_gui(train_loader, step=STEPS)
|
169 |
+
|
170 |
+
# manual test and get intermediate results
|
171 |
+
try:
|
172 |
+
data = next(loader)
|
173 |
+
except StopIteration:
|
174 |
+
loader = iter(valid_loader)
|
175 |
+
data = next(loader)
|
176 |
+
|
177 |
+
trainer.model.eval()
|
178 |
+
|
179 |
+
if trainer.ema is not None:
|
180 |
+
trainer.ema.store()
|
181 |
+
trainer.ema.copy_to()
|
182 |
+
|
183 |
+
with torch.no_grad():
|
184 |
+
with torch.cuda.amp.autocast(enabled=trainer.fp16):
|
185 |
+
preds, preds_depth = trainer.test_step(data, perturb=False)
|
186 |
+
|
187 |
+
if trainer.ema is not None:
|
188 |
+
trainer.ema.restore()
|
189 |
+
|
190 |
+
pred = preds[0].detach().cpu().numpy()
|
191 |
+
# pred_depth = preds_depth[0].detach().cpu().numpy()
|
192 |
+
|
193 |
+
pred = (pred * 255).astype(np.uint8)
|
194 |
+
|
195 |
+
yield {
|
196 |
+
image: gr.update(value=pred, visible=True),
|
197 |
+
video: gr.update(visible=False),
|
198 |
+
logs: f"training iters: {epoch * STEPS} / {iters}, lr: {trainer.optimizer.param_groups[0]['lr']:.6f}",
|
199 |
+
}
|
200 |
+
|
201 |
+
|
202 |
+
# test
|
203 |
+
trainer.test(test_loader)
|
204 |
+
|
205 |
+
results = glob.glob(os.path.join(opt.workspace, 'results', '*rgb*.mp4'))
|
206 |
+
assert results is not None, "cannot retrieve results!"
|
207 |
+
results.sort(key=lambda x: os.path.getmtime(x)) # sort by mtime
|
208 |
+
|
209 |
+
end_t = time.time()
|
210 |
+
|
211 |
+
yield {
|
212 |
+
image: gr.update(visible=False),
|
213 |
+
video: gr.update(value=results[-1], visible=True),
|
214 |
+
logs: f"Generation Finished in {(end_t - start_t)/ 60:.4f} minutes!",
|
215 |
+
}
|
216 |
+
|
217 |
+
|
218 |
+
button.click(
|
219 |
+
submit,
|
220 |
+
[prompt, iters, seed],
|
221 |
+
[image, video, logs]
|
222 |
+
)
|
223 |
+
|
224 |
+
# concurrency_count: only allow ONE running progress, else GPU will OOM.
|
225 |
+
demo.queue(concurrency_count=1)
|
226 |
+
|
227 |
+
demo.launch()
|
main.py
CHANGED
@@ -23,16 +23,16 @@ if __name__ == '__main__':
|
|
23 |
parser.add_argument('--seed', type=int, default=0)
|
24 |
|
25 |
### training options
|
26 |
-
parser.add_argument('--iters', type=int, default=
|
27 |
parser.add_argument('--lr', type=float, default=1e-3, help="initial learning rate")
|
28 |
parser.add_argument('--ckpt', type=str, default='latest')
|
29 |
parser.add_argument('--cuda_ray', action='store_true', help="use CUDA raymarching instead of pytorch")
|
30 |
parser.add_argument('--max_steps', type=int, default=1024, help="max num steps sampled per ray (only valid when using --cuda_ray)")
|
31 |
-
parser.add_argument('--num_steps', type=int, default=
|
32 |
-
parser.add_argument('--upsample_steps', type=int, default=
|
33 |
parser.add_argument('--update_extra_interval', type=int, default=16, help="iter interval to update extra status (only valid when using --cuda_ray)")
|
34 |
parser.add_argument('--max_ray_batch', type=int, default=4096, help="batch size of rays at inference to avoid OOM (only valid when not using --cuda_ray)")
|
35 |
-
parser.add_argument('--albedo_iters', type=int, default=
|
36 |
# model options
|
37 |
parser.add_argument('--bg_radius', type=float, default=1.4, help="if positive, use a background model at sphere(bg_radius)")
|
38 |
parser.add_argument('--density_thresh', type=float, default=10, help="threshold for density grid to be occupied")
|
@@ -40,8 +40,9 @@ if __name__ == '__main__':
|
|
40 |
parser.add_argument('--fp16', action='store_true', help="use amp mixed precision training")
|
41 |
parser.add_argument('--backbone', type=str, default='grid', help="nerf backbone, choose from [grid, tcnn, vanilla]")
|
42 |
# rendering resolution in training, decrease this if CUDA OOM.
|
43 |
-
parser.add_argument('--w', type=int, default=
|
44 |
-
parser.add_argument('--h', type=int, default=
|
|
|
45 |
|
46 |
### dataset options
|
47 |
parser.add_argument('--bound', type=float, default=1, help="assume the scene is bounded in box(-bound, bound)")
|
@@ -51,9 +52,10 @@ if __name__ == '__main__':
|
|
51 |
parser.add_argument('--fovy_range', type=float, nargs='*', default=[40, 70], help="training camera fovy range")
|
52 |
parser.add_argument('--dir_text', action='store_true', help="direction-encode the text prompt, by appending front/side/back/overhead view")
|
53 |
parser.add_argument('--angle_overhead', type=float, default=30, help="[0, angle_overhead] is the overhead region")
|
54 |
-
parser.add_argument('--angle_front', type=float, default=
|
55 |
|
56 |
parser.add_argument('--lambda_entropy', type=float, default=1e-4, help="loss scale for alpha entropy")
|
|
|
57 |
parser.add_argument('--lambda_orient', type=float, default=1e-2, help="loss scale for orientation")
|
58 |
|
59 |
### GUI options
|
@@ -71,10 +73,16 @@ if __name__ == '__main__':
|
|
71 |
if opt.O:
|
72 |
opt.fp16 = True
|
73 |
opt.dir_text = True
|
|
|
74 |
opt.cuda_ray = True
|
|
|
|
|
|
|
75 |
elif opt.O2:
|
76 |
opt.fp16 = True
|
77 |
opt.dir_text = True
|
|
|
|
|
78 |
|
79 |
if opt.backbone == 'vanilla':
|
80 |
from nerf.network import NeRFNetwork
|
@@ -98,7 +106,7 @@ if __name__ == '__main__':
|
|
98 |
if opt.test:
|
99 |
guidance = None # no need to load guidance model at test
|
100 |
|
101 |
-
trainer = Trainer('
|
102 |
|
103 |
if opt.gui:
|
104 |
gui = NeRFGUI(opt, trainer)
|
@@ -127,10 +135,10 @@ if __name__ == '__main__':
|
|
127 |
|
128 |
train_loader = NeRFDataset(opt, device=device, type='train', H=opt.h, W=opt.w, size=100).dataloader()
|
129 |
|
130 |
-
|
131 |
-
scheduler = lambda optimizer: optim.lr_scheduler.
|
132 |
|
133 |
-
trainer = Trainer('
|
134 |
|
135 |
if opt.gui:
|
136 |
trainer.train_loader = train_loader # attach dataloader to trainer
|
|
|
23 |
parser.add_argument('--seed', type=int, default=0)
|
24 |
|
25 |
### training options
|
26 |
+
parser.add_argument('--iters', type=int, default=10000, help="training iters")
|
27 |
parser.add_argument('--lr', type=float, default=1e-3, help="initial learning rate")
|
28 |
parser.add_argument('--ckpt', type=str, default='latest')
|
29 |
parser.add_argument('--cuda_ray', action='store_true', help="use CUDA raymarching instead of pytorch")
|
30 |
parser.add_argument('--max_steps', type=int, default=1024, help="max num steps sampled per ray (only valid when using --cuda_ray)")
|
31 |
+
parser.add_argument('--num_steps', type=int, default=64, help="num steps sampled per ray (only valid when not using --cuda_ray)")
|
32 |
+
parser.add_argument('--upsample_steps', type=int, default=64, help="num steps up-sampled per ray (only valid when not using --cuda_ray)")
|
33 |
parser.add_argument('--update_extra_interval', type=int, default=16, help="iter interval to update extra status (only valid when using --cuda_ray)")
|
34 |
parser.add_argument('--max_ray_batch', type=int, default=4096, help="batch size of rays at inference to avoid OOM (only valid when not using --cuda_ray)")
|
35 |
+
parser.add_argument('--albedo_iters', type=int, default=1000, help="training iters that only use albedo shading")
|
36 |
# model options
|
37 |
parser.add_argument('--bg_radius', type=float, default=1.4, help="if positive, use a background model at sphere(bg_radius)")
|
38 |
parser.add_argument('--density_thresh', type=float, default=10, help="threshold for density grid to be occupied")
|
|
|
40 |
parser.add_argument('--fp16', action='store_true', help="use amp mixed precision training")
|
41 |
parser.add_argument('--backbone', type=str, default='grid', help="nerf backbone, choose from [grid, tcnn, vanilla]")
|
42 |
# rendering resolution in training, decrease this if CUDA OOM.
|
43 |
+
parser.add_argument('--w', type=int, default=64, help="render width for NeRF in training")
|
44 |
+
parser.add_argument('--h', type=int, default=64, help="render height for NeRF in training")
|
45 |
+
parser.add_argument('--jitter_pose', action='store_true', help="add jitters to the randomly sampled camera poses")
|
46 |
|
47 |
### dataset options
|
48 |
parser.add_argument('--bound', type=float, default=1, help="assume the scene is bounded in box(-bound, bound)")
|
|
|
52 |
parser.add_argument('--fovy_range', type=float, nargs='*', default=[40, 70], help="training camera fovy range")
|
53 |
parser.add_argument('--dir_text', action='store_true', help="direction-encode the text prompt, by appending front/side/back/overhead view")
|
54 |
parser.add_argument('--angle_overhead', type=float, default=30, help="[0, angle_overhead] is the overhead region")
|
55 |
+
parser.add_argument('--angle_front', type=float, default=60, help="[0, angle_front] is the front region, [180, 180+angle_front] the back region, otherwise the side region.")
|
56 |
|
57 |
parser.add_argument('--lambda_entropy', type=float, default=1e-4, help="loss scale for alpha entropy")
|
58 |
+
parser.add_argument('--lambda_opacity', type=float, default=0, help="loss scale for alpha value")
|
59 |
parser.add_argument('--lambda_orient', type=float, default=1e-2, help="loss scale for orientation")
|
60 |
|
61 |
### GUI options
|
|
|
73 |
if opt.O:
|
74 |
opt.fp16 = True
|
75 |
opt.dir_text = True
|
76 |
+
# use occupancy grid to prune ray sampling, faster rendering.
|
77 |
opt.cuda_ray = True
|
78 |
+
# opt.lambda_entropy = 1e-4
|
79 |
+
# opt.lambda_opacity = 0
|
80 |
+
|
81 |
elif opt.O2:
|
82 |
opt.fp16 = True
|
83 |
opt.dir_text = True
|
84 |
+
opt.lambda_entropy = 1e-4 # necessary to keep non-empty
|
85 |
+
opt.lambda_opacity = 3e-3 # no occupancy grid, so use a stronger opacity loss.
|
86 |
|
87 |
if opt.backbone == 'vanilla':
|
88 |
from nerf.network import NeRFNetwork
|
|
|
106 |
if opt.test:
|
107 |
guidance = None # no need to load guidance model at test
|
108 |
|
109 |
+
trainer = Trainer('df', opt, model, guidance, device=device, workspace=opt.workspace, fp16=opt.fp16, use_checkpoint=opt.ckpt)
|
110 |
|
111 |
if opt.gui:
|
112 |
gui = NeRFGUI(opt, trainer)
|
|
|
135 |
|
136 |
train_loader = NeRFDataset(opt, device=device, type='train', H=opt.h, W=opt.w, size=100).dataloader()
|
137 |
|
138 |
+
scheduler = lambda optimizer: optim.lr_scheduler.LambdaLR(optimizer, lambda iter: 0.1 ** min(iter / opt.iters, 1))
|
139 |
+
# scheduler = lambda optimizer: optim.lr_scheduler.OneCycleLR(optimizer, max_lr=opt.lr, total_steps=opt.iters, pct_start=0.1)
|
140 |
|
141 |
+
trainer = Trainer('df', opt, model, guidance, device=device, workspace=opt.workspace, optimizer=optimizer, ema_decay=None, fp16=opt.fp16, lr_scheduler=scheduler, use_checkpoint=opt.ckpt, eval_interval=opt.eval_interval, scheduler_update_every_step=True)
|
142 |
|
143 |
if opt.gui:
|
144 |
trainer.train_loader = train_loader # attach dataloader to trainer
|
nerf/network.py
CHANGED
@@ -52,7 +52,7 @@ class NeRFNetwork(NeRFRenderer):
|
|
52 |
if self.bg_radius > 0:
|
53 |
self.num_layers_bg = num_layers_bg
|
54 |
self.hidden_dim_bg = hidden_dim_bg
|
55 |
-
self.encoder_bg, self.in_dim_bg = get_encoder('frequency', input_dim=
|
56 |
self.bg_net = MLP(self.in_dim_bg, 3, hidden_dim_bg, num_layers_bg, bias=True)
|
57 |
|
58 |
else:
|
@@ -80,7 +80,7 @@ class NeRFNetwork(NeRFRenderer):
|
|
80 |
return sigma, albedo
|
81 |
|
82 |
# ref: https://github.com/zhaofuq/Instant-NSR/blob/main/nerf/network_sdf.py#L192
|
83 |
-
def finite_difference_normal(self, x, epsilon=
|
84 |
# x: [N, 3]
|
85 |
dx_pos, _ = self.common_forward((x + torch.tensor([[epsilon, 0.00, 0.00]], device=x.device)).clamp(-self.bound, self.bound))
|
86 |
dx_neg, _ = self.common_forward((x + torch.tensor([[-epsilon, 0.00, 0.00]], device=x.device)).clamp(-self.bound, self.bound))
|
@@ -148,10 +148,9 @@ class NeRFNetwork(NeRFRenderer):
|
|
148 |
}
|
149 |
|
150 |
|
151 |
-
def background(self,
|
152 |
-
# x: [N, 2], in [-1, 1]
|
153 |
|
154 |
-
h = self.encoder_bg(
|
155 |
|
156 |
h = self.bg_net(h)
|
157 |
|
|
|
52 |
if self.bg_radius > 0:
|
53 |
self.num_layers_bg = num_layers_bg
|
54 |
self.hidden_dim_bg = hidden_dim_bg
|
55 |
+
self.encoder_bg, self.in_dim_bg = get_encoder('frequency', input_dim=3)
|
56 |
self.bg_net = MLP(self.in_dim_bg, 3, hidden_dim_bg, num_layers_bg, bias=True)
|
57 |
|
58 |
else:
|
|
|
80 |
return sigma, albedo
|
81 |
|
82 |
# ref: https://github.com/zhaofuq/Instant-NSR/blob/main/nerf/network_sdf.py#L192
|
83 |
+
def finite_difference_normal(self, x, epsilon=1e-2):
|
84 |
# x: [N, 3]
|
85 |
dx_pos, _ = self.common_forward((x + torch.tensor([[epsilon, 0.00, 0.00]], device=x.device)).clamp(-self.bound, self.bound))
|
86 |
dx_neg, _ = self.common_forward((x + torch.tensor([[-epsilon, 0.00, 0.00]], device=x.device)).clamp(-self.bound, self.bound))
|
|
|
148 |
}
|
149 |
|
150 |
|
151 |
+
def background(self, d):
|
|
|
152 |
|
153 |
+
h = self.encoder_bg(d) # [N, C]
|
154 |
|
155 |
h = self.bg_net(h)
|
156 |
|
nerf/network_grid.py
CHANGED
@@ -57,7 +57,7 @@ class NeRFNetwork(NeRFRenderer):
|
|
57 |
|
58 |
# use a very simple network to avoid it learning the prompt...
|
59 |
# self.encoder_bg, self.in_dim_bg = get_encoder('tiledgrid', input_dim=2, num_levels=4, desired_resolution=2048)
|
60 |
-
self.encoder_bg, self.in_dim_bg = get_encoder('frequency', input_dim=
|
61 |
|
62 |
self.bg_net = MLP(self.in_dim_bg, 3, hidden_dim_bg, num_layers_bg, bias=True)
|
63 |
|
@@ -87,7 +87,7 @@ class NeRFNetwork(NeRFRenderer):
|
|
87 |
return sigma, albedo
|
88 |
|
89 |
# ref: https://github.com/zhaofuq/Instant-NSR/blob/main/nerf/network_sdf.py#L192
|
90 |
-
def finite_difference_normal(self, x, epsilon=
|
91 |
# x: [N, 3]
|
92 |
dx_pos, _ = self.common_forward((x + torch.tensor([[epsilon, 0.00, 0.00]], device=x.device)).clamp(-self.bound, self.bound))
|
93 |
dx_neg, _ = self.common_forward((x + torch.tensor([[-epsilon, 0.00, 0.00]], device=x.device)).clamp(-self.bound, self.bound))
|
@@ -155,10 +155,9 @@ class NeRFNetwork(NeRFRenderer):
|
|
155 |
}
|
156 |
|
157 |
|
158 |
-
def background(self,
|
159 |
-
# x: [N, 2], in [-1, 1]
|
160 |
|
161 |
-
h = self.encoder_bg(
|
162 |
|
163 |
h = self.bg_net(h)
|
164 |
|
|
|
57 |
|
58 |
# use a very simple network to avoid it learning the prompt...
|
59 |
# self.encoder_bg, self.in_dim_bg = get_encoder('tiledgrid', input_dim=2, num_levels=4, desired_resolution=2048)
|
60 |
+
self.encoder_bg, self.in_dim_bg = get_encoder('frequency', input_dim=3)
|
61 |
|
62 |
self.bg_net = MLP(self.in_dim_bg, 3, hidden_dim_bg, num_layers_bg, bias=True)
|
63 |
|
|
|
87 |
return sigma, albedo
|
88 |
|
89 |
# ref: https://github.com/zhaofuq/Instant-NSR/blob/main/nerf/network_sdf.py#L192
|
90 |
+
def finite_difference_normal(self, x, epsilon=1e-2):
|
91 |
# x: [N, 3]
|
92 |
dx_pos, _ = self.common_forward((x + torch.tensor([[epsilon, 0.00, 0.00]], device=x.device)).clamp(-self.bound, self.bound))
|
93 |
dx_neg, _ = self.common_forward((x + torch.tensor([[-epsilon, 0.00, 0.00]], device=x.device)).clamp(-self.bound, self.bound))
|
|
|
155 |
}
|
156 |
|
157 |
|
158 |
+
def background(self, d):
|
|
|
159 |
|
160 |
+
h = self.encoder_bg(d) # [N, C]
|
161 |
|
162 |
h = self.bg_net(h)
|
163 |
|
nerf/network_tcnn.py
CHANGED
@@ -4,6 +4,7 @@ import torch.nn.functional as F
|
|
4 |
|
5 |
from activation import trunc_exp
|
6 |
from .renderer import NeRFRenderer
|
|
|
7 |
|
8 |
import numpy as np
|
9 |
import tinycudann as tcnn
|
@@ -65,19 +66,9 @@ class NeRFNetwork(NeRFRenderer):
|
|
65 |
self.num_layers_bg = num_layers_bg
|
66 |
self.hidden_dim_bg = hidden_dim_bg
|
67 |
|
68 |
-
self.encoder_bg =
|
69 |
-
|
70 |
-
|
71 |
-
"otype": "HashGrid",
|
72 |
-
"n_levels": 4,
|
73 |
-
"n_features_per_level": 2,
|
74 |
-
"log2_hashmap_size": 16,
|
75 |
-
"base_resolution": 16,
|
76 |
-
"per_level_scale": 1.5,
|
77 |
-
},
|
78 |
-
)
|
79 |
-
|
80 |
-
self.bg_net = MLP(8, 3, hidden_dim_bg, num_layers_bg, bias=True)
|
81 |
|
82 |
else:
|
83 |
self.bg_net = None
|
@@ -156,11 +147,10 @@ class NeRFNetwork(NeRFRenderer):
|
|
156 |
}
|
157 |
|
158 |
|
159 |
-
def background(self,
|
160 |
# x: [N, 2], in [-1, 1]
|
161 |
|
162 |
-
h = (
|
163 |
-
h = self.encoder_bg(h) # [N, C]
|
164 |
|
165 |
h = self.bg_net(h)
|
166 |
|
|
|
4 |
|
5 |
from activation import trunc_exp
|
6 |
from .renderer import NeRFRenderer
|
7 |
+
from encoding import get_encoder
|
8 |
|
9 |
import numpy as np
|
10 |
import tinycudann as tcnn
|
|
|
66 |
self.num_layers_bg = num_layers_bg
|
67 |
self.hidden_dim_bg = hidden_dim_bg
|
68 |
|
69 |
+
self.encoder_bg, self.in_dim_bg = get_encoder('frequency', input_dim=3)
|
70 |
+
|
71 |
+
self.bg_net = MLP(self.in_dim_bg, 3, hidden_dim_bg, num_layers_bg, bias=True)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
72 |
|
73 |
else:
|
74 |
self.bg_net = None
|
|
|
147 |
}
|
148 |
|
149 |
|
150 |
+
def background(self, d):
|
151 |
# x: [N, 2], in [-1, 1]
|
152 |
|
153 |
+
h = self.encoder_bg(d) # [N, C]
|
|
|
154 |
|
155 |
h = self.bg_net(h)
|
156 |
|
nerf/provider.py
CHANGED
@@ -55,7 +55,7 @@ def get_view_direction(thetas, phis, overhead, front):
|
|
55 |
return res
|
56 |
|
57 |
|
58 |
-
def rand_poses(size, device, radius_range=[1, 1.5], theta_range=[0,
|
59 |
''' generate random poses from an orbit camera
|
60 |
Args:
|
61 |
size: batch size of generated poses.
|
@@ -82,16 +82,23 @@ def rand_poses(size, device, radius_range=[1, 1.5], theta_range=[0, 150], phi_ra
|
|
82 |
radius * torch.sin(thetas) * torch.cos(phis),
|
83 |
], dim=-1) # [B, 3]
|
84 |
|
|
|
|
|
85 |
# jitters
|
86 |
-
|
87 |
-
|
|
|
88 |
|
89 |
# lookat
|
90 |
forward_vector = safe_normalize(targets - centers)
|
91 |
up_vector = torch.FloatTensor([0, -1, 0]).to(device).unsqueeze(0).repeat(size, 1)
|
92 |
right_vector = safe_normalize(torch.cross(forward_vector, up_vector, dim=-1))
|
|
|
|
|
|
|
|
|
|
|
93 |
|
94 |
-
up_noise = torch.randn_like(up_vector) * 0.02
|
95 |
up_vector = safe_normalize(torch.cross(right_vector, forward_vector, dim=-1) + up_noise)
|
96 |
|
97 |
poses = torch.eye(4, dtype=torch.float, device=device).unsqueeze(0).repeat(size, 1, 1)
|
@@ -170,7 +177,7 @@ class NeRFDataset:
|
|
170 |
|
171 |
if self.training:
|
172 |
# random pose on the fly
|
173 |
-
poses, dirs = rand_poses(B, self.device, radius_range=self.radius_range, return_dirs=self.opt.dir_text, angle_overhead=self.opt.angle_overhead, angle_front=self.opt.angle_front)
|
174 |
|
175 |
# random focal
|
176 |
fov = random.random() * (self.fovy_range[1] - self.fovy_range[0]) + self.fovy_range[0]
|
|
|
55 |
return res
|
56 |
|
57 |
|
58 |
+
def rand_poses(size, device, radius_range=[1, 1.5], theta_range=[0, 100], phi_range=[0, 360], return_dirs=False, angle_overhead=30, angle_front=60, jitter=False):
|
59 |
''' generate random poses from an orbit camera
|
60 |
Args:
|
61 |
size: batch size of generated poses.
|
|
|
82 |
radius * torch.sin(thetas) * torch.cos(phis),
|
83 |
], dim=-1) # [B, 3]
|
84 |
|
85 |
+
targets = 0
|
86 |
+
|
87 |
# jitters
|
88 |
+
if jitter:
|
89 |
+
centers = centers + (torch.rand_like(centers) * 0.2 - 0.1)
|
90 |
+
targets = targets + torch.randn_like(centers) * 0.2
|
91 |
|
92 |
# lookat
|
93 |
forward_vector = safe_normalize(targets - centers)
|
94 |
up_vector = torch.FloatTensor([0, -1, 0]).to(device).unsqueeze(0).repeat(size, 1)
|
95 |
right_vector = safe_normalize(torch.cross(forward_vector, up_vector, dim=-1))
|
96 |
+
|
97 |
+
if jitter:
|
98 |
+
up_noise = torch.randn_like(up_vector) * 0.02
|
99 |
+
else:
|
100 |
+
up_noise = 0
|
101 |
|
|
|
102 |
up_vector = safe_normalize(torch.cross(right_vector, forward_vector, dim=-1) + up_noise)
|
103 |
|
104 |
poses = torch.eye(4, dtype=torch.float, device=device).unsqueeze(0).repeat(size, 1, 1)
|
|
|
177 |
|
178 |
if self.training:
|
179 |
# random pose on the fly
|
180 |
+
poses, dirs = rand_poses(B, self.device, radius_range=self.radius_range, return_dirs=self.opt.dir_text, angle_overhead=self.opt.angle_overhead, angle_front=self.opt.angle_front, jitter=self.opt.jitter_pose)
|
181 |
|
182 |
# random focal
|
183 |
fov = random.random() * (self.fovy_range[1] - self.fovy_range[0]) + self.fovy_range[0]
|
nerf/renderer.py
CHANGED
@@ -420,8 +420,8 @@ class NeRFRenderer(nn.Module):
|
|
420 |
# mix background color
|
421 |
if self.bg_radius > 0:
|
422 |
# use the bg model to calculate bg_color
|
423 |
-
sph = raymarching.sph_from_ray(rays_o, rays_d, self.bg_radius) # [N, 2] in [-1, 1]
|
424 |
-
bg_color = self.background(
|
425 |
elif bg_color is None:
|
426 |
bg_color = 1
|
427 |
|
@@ -526,8 +526,8 @@ class NeRFRenderer(nn.Module):
|
|
526 |
if self.bg_radius > 0:
|
527 |
|
528 |
# use the bg model to calculate bg_color
|
529 |
-
sph = raymarching.sph_from_ray(rays_o, rays_d, self.bg_radius) # [N, 2] in [-1, 1]
|
530 |
-
bg_color = self.background(
|
531 |
|
532 |
elif bg_color is None:
|
533 |
bg_color = 1
|
|
|
420 |
# mix background color
|
421 |
if self.bg_radius > 0:
|
422 |
# use the bg model to calculate bg_color
|
423 |
+
# sph = raymarching.sph_from_ray(rays_o, rays_d, self.bg_radius) # [N, 2] in [-1, 1]
|
424 |
+
bg_color = self.background(rays_d.reshape(-1, 3)) # [N, 3]
|
425 |
elif bg_color is None:
|
426 |
bg_color = 1
|
427 |
|
|
|
526 |
if self.bg_radius > 0:
|
527 |
|
528 |
# use the bg model to calculate bg_color
|
529 |
+
# sph = raymarching.sph_from_ray(rays_o, rays_d, self.bg_radius) # [N, 2] in [-1, 1]
|
530 |
+
bg_color = self.background(rays_d) # [N, 3]
|
531 |
|
532 |
elif bg_color is None:
|
533 |
bg_color = 1
|
nerf/sd.py
CHANGED
@@ -17,10 +17,10 @@ class StableDiffusion(nn.Module):
|
|
17 |
try:
|
18 |
with open('./TOKEN', 'r') as f:
|
19 |
self.token = f.read().replace('\n', '') # remove the last \n!
|
20 |
-
print(f'[INFO]
|
21 |
except FileNotFoundError as e:
|
22 |
-
|
23 |
-
print(f'[INFO]
|
24 |
|
25 |
self.device = device
|
26 |
self.num_train_timesteps = 1000
|
@@ -94,9 +94,9 @@ class StableDiffusion(nn.Module):
|
|
94 |
noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
|
95 |
noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)
|
96 |
|
97 |
-
# w(t),
|
98 |
-
|
99 |
-
w = self.alphas[t] ** 0.5 * (1 - self.alphas[t])
|
100 |
grad = w * (noise_pred - noise)
|
101 |
|
102 |
# clip grad for stable training?
|
|
|
17 |
try:
|
18 |
with open('./TOKEN', 'r') as f:
|
19 |
self.token = f.read().replace('\n', '') # remove the last \n!
|
20 |
+
print(f'[INFO] loaded hugging face access token from ./TOKEN!')
|
21 |
except FileNotFoundError as e:
|
22 |
+
self.token = True
|
23 |
+
print(f'[INFO] try to load hugging face access token from the default place, make sure you have run `huggingface-cli login`.')
|
24 |
|
25 |
self.device = device
|
26 |
self.num_train_timesteps = 1000
|
|
|
94 |
noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
|
95 |
noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)
|
96 |
|
97 |
+
# w(t), sigma_t^2
|
98 |
+
w = (1 - self.alphas[t])
|
99 |
+
# w = self.alphas[t] ** 0.5 * (1 - self.alphas[t])
|
100 |
grad = w * (noise_pred - noise)
|
101 |
|
102 |
# clip grad for stable training?
|
nerf/utils.py
CHANGED
@@ -195,9 +195,6 @@ class Trainer(object):
|
|
195 |
self.scheduler_update_every_step = scheduler_update_every_step
|
196 |
self.device = device if device is not None else torch.device(f'cuda:{local_rank}' if torch.cuda.is_available() else 'cpu')
|
197 |
self.console = Console()
|
198 |
-
|
199 |
-
# text prompt
|
200 |
-
ref_text = self.opt.text
|
201 |
|
202 |
model.to(self.device)
|
203 |
if self.world_size > 1:
|
@@ -208,20 +205,13 @@ class Trainer(object):
|
|
208 |
# guide model
|
209 |
self.guidance = guidance
|
210 |
|
|
|
211 |
if self.guidance is not None:
|
212 |
-
|
213 |
-
|
214 |
for p in self.guidance.parameters():
|
215 |
p.requires_grad = False
|
216 |
|
217 |
-
|
218 |
-
self.text_z = self.guidance.get_text_embeds([ref_text])
|
219 |
-
else:
|
220 |
-
self.text_z = []
|
221 |
-
for d in ['front', 'side', 'back', 'side', 'overhead', 'bottom']:
|
222 |
-
text = f"{ref_text}, {d} view"
|
223 |
-
text_z = self.guidance.get_text_embeds([text])
|
224 |
-
self.text_z.append(text_z)
|
225 |
|
226 |
else:
|
227 |
self.text_z = None
|
@@ -257,7 +247,7 @@ class Trainer(object):
|
|
257 |
"results": [], # metrics[0], or valid_loss
|
258 |
"checkpoints": [], # record path of saved ckpt, to automatically remove old ckpt
|
259 |
"best_result": None,
|
260 |
-
|
261 |
|
262 |
# auto fix
|
263 |
if len(metrics) == 0 or self.use_loss_as_metric:
|
@@ -297,6 +287,23 @@ class Trainer(object):
|
|
297 |
self.log(f"[INFO] Loading {self.use_checkpoint} ...")
|
298 |
self.load_checkpoint(self.use_checkpoint)
|
299 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
300 |
def __del__(self):
|
301 |
if self.log_ptr:
|
302 |
self.log_ptr.close()
|
@@ -330,11 +337,11 @@ class Trainer(object):
|
|
330 |
if rand > 0.8:
|
331 |
shading = 'albedo'
|
332 |
ambient_ratio = 1.0
|
333 |
-
elif rand > 0.4:
|
334 |
-
|
335 |
-
|
336 |
else:
|
337 |
-
shading = '
|
338 |
ambient_ratio = 0.1
|
339 |
|
340 |
# _t = time.time()
|
@@ -343,6 +350,9 @@ class Trainer(object):
|
|
343 |
pred_rgb = outputs['image'].reshape(B, H, W, 3).permute(0, 3, 1, 2).contiguous() # [1, 3, H, W]
|
344 |
# torch.cuda.synchronize(); print(f'[TIME] nerf render {time.time() - _t:.4f}s')
|
345 |
|
|
|
|
|
|
|
346 |
# text embeddings
|
347 |
if self.opt.dir_text:
|
348 |
dirs = data['dir'] # [B,]
|
@@ -352,22 +362,24 @@ class Trainer(object):
|
|
352 |
|
353 |
# encode pred_rgb to latents
|
354 |
# _t = time.time()
|
355 |
-
|
356 |
# torch.cuda.synchronize(); print(f'[TIME] total guiding {time.time() - _t:.4f}s')
|
357 |
|
358 |
# occupancy loss
|
359 |
pred_ws = outputs['weights_sum'].reshape(B, 1, H, W)
|
360 |
-
# mask_ws = outputs['mask'].reshape(B, 1, H, W) # near < far
|
361 |
|
362 |
-
|
|
|
|
|
363 |
|
364 |
-
|
365 |
-
|
366 |
-
|
367 |
-
|
368 |
-
|
|
|
369 |
|
370 |
-
if 'loss_orient' in outputs:
|
371 |
loss_orient = outputs['loss_orient']
|
372 |
loss = loss + self.opt.lambda_orient * loss_orient
|
373 |
|
@@ -442,6 +454,9 @@ class Trainer(object):
|
|
442 |
### ------------------------------
|
443 |
|
444 |
def train(self, train_loader, valid_loader, max_epochs):
|
|
|
|
|
|
|
445 |
if self.use_tensorboardX and self.local_rank == 0:
|
446 |
self.writer = tensorboardX.SummaryWriter(os.path.join(self.workspace, "run", self.name))
|
447 |
|
|
|
195 |
self.scheduler_update_every_step = scheduler_update_every_step
|
196 |
self.device = device if device is not None else torch.device(f'cuda:{local_rank}' if torch.cuda.is_available() else 'cpu')
|
197 |
self.console = Console()
|
|
|
|
|
|
|
198 |
|
199 |
model.to(self.device)
|
200 |
if self.world_size > 1:
|
|
|
205 |
# guide model
|
206 |
self.guidance = guidance
|
207 |
|
208 |
+
# text prompt
|
209 |
if self.guidance is not None:
|
210 |
+
|
|
|
211 |
for p in self.guidance.parameters():
|
212 |
p.requires_grad = False
|
213 |
|
214 |
+
self.prepare_text_embeddings()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
215 |
|
216 |
else:
|
217 |
self.text_z = None
|
|
|
247 |
"results": [], # metrics[0], or valid_loss
|
248 |
"checkpoints": [], # record path of saved ckpt, to automatically remove old ckpt
|
249 |
"best_result": None,
|
250 |
+
}
|
251 |
|
252 |
# auto fix
|
253 |
if len(metrics) == 0 or self.use_loss_as_metric:
|
|
|
287 |
self.log(f"[INFO] Loading {self.use_checkpoint} ...")
|
288 |
self.load_checkpoint(self.use_checkpoint)
|
289 |
|
290 |
+
# calculate the text embs.
|
291 |
+
def prepare_text_embeddings(self):
|
292 |
+
|
293 |
+
if self.opt.text is None:
|
294 |
+
self.log(f"[WARN] text prompt is not provided.")
|
295 |
+
self.text_z = None
|
296 |
+
return
|
297 |
+
|
298 |
+
if not self.opt.dir_text:
|
299 |
+
self.text_z = self.guidance.get_text_embeds([self.opt.text])
|
300 |
+
else:
|
301 |
+
self.text_z = []
|
302 |
+
for d in ['front', 'side', 'back', 'side', 'overhead', 'bottom']:
|
303 |
+
text = f"{self.opt.text}, {d} view"
|
304 |
+
text_z = self.guidance.get_text_embeds([text])
|
305 |
+
self.text_z.append(text_z)
|
306 |
+
|
307 |
def __del__(self):
|
308 |
if self.log_ptr:
|
309 |
self.log_ptr.close()
|
|
|
337 |
if rand > 0.8:
|
338 |
shading = 'albedo'
|
339 |
ambient_ratio = 1.0
|
340 |
+
# elif rand > 0.4:
|
341 |
+
# shading = 'textureless'
|
342 |
+
# ambient_ratio = 0.1
|
343 |
else:
|
344 |
+
shading = 'lambertian'
|
345 |
ambient_ratio = 0.1
|
346 |
|
347 |
# _t = time.time()
|
|
|
350 |
pred_rgb = outputs['image'].reshape(B, H, W, 3).permute(0, 3, 1, 2).contiguous() # [1, 3, H, W]
|
351 |
# torch.cuda.synchronize(); print(f'[TIME] nerf render {time.time() - _t:.4f}s')
|
352 |
|
353 |
+
# print(shading)
|
354 |
+
# torch_vis_2d(pred_rgb[0])
|
355 |
+
|
356 |
# text embeddings
|
357 |
if self.opt.dir_text:
|
358 |
dirs = data['dir'] # [B,]
|
|
|
362 |
|
363 |
# encode pred_rgb to latents
|
364 |
# _t = time.time()
|
365 |
+
loss = self.guidance.train_step(text_z, pred_rgb)
|
366 |
# torch.cuda.synchronize(); print(f'[TIME] total guiding {time.time() - _t:.4f}s')
|
367 |
|
368 |
# occupancy loss
|
369 |
pred_ws = outputs['weights_sum'].reshape(B, 1, H, W)
|
|
|
370 |
|
371 |
+
if self.opt.lambda_opacity > 0:
|
372 |
+
loss_opacity = (pred_ws ** 2).mean()
|
373 |
+
loss = loss + self.opt.lambda_opacity * loss_opacity
|
374 |
|
375 |
+
if self.opt.lambda_entropy > 0:
|
376 |
+
alphas = (pred_ws).clamp(1e-5, 1 - 1e-5)
|
377 |
+
# alphas = alphas ** 2 # skewed entropy, favors 0 over 1
|
378 |
+
loss_entropy = (- alphas * torch.log2(alphas) - (1 - alphas) * torch.log2(1 - alphas)).mean()
|
379 |
+
|
380 |
+
loss = loss + self.opt.lambda_entropy * loss_entropy
|
381 |
|
382 |
+
if self.opt.lambda_orient > 0 and 'loss_orient' in outputs:
|
383 |
loss_orient = outputs['loss_orient']
|
384 |
loss = loss + self.opt.lambda_orient * loss_orient
|
385 |
|
|
|
454 |
### ------------------------------
|
455 |
|
456 |
def train(self, train_loader, valid_loader, max_epochs):
|
457 |
+
|
458 |
+
assert self.text_z is not None, 'Training must provide a text prompt!'
|
459 |
+
|
460 |
if self.use_tensorboardX and self.local_rank == 0:
|
461 |
self.writer = tensorboardX.SummaryWriter(os.path.join(self.workspace, "run", self.name))
|
462 |
|
raymarching/src/raymarching.cu
CHANGED
@@ -905,7 +905,7 @@ __global__ void kernel_composite_rays(
|
|
905 |
}
|
906 |
|
907 |
|
908 |
-
void composite_rays(const uint32_t n_alive, const uint32_t n_step, const float T_thresh, at::Tensor rays_alive, at::Tensor rays_t,
|
909 |
static constexpr uint32_t N_THREAD = 128;
|
910 |
AT_DISPATCH_FLOATING_TYPES_AND_HALF(
|
911 |
image.scalar_type(), "composite_rays", ([&] {
|
|
|
905 |
}
|
906 |
|
907 |
|
908 |
+
void composite_rays(const uint32_t n_alive, const uint32_t n_step, const float T_thresh, at::Tensor rays_alive, at::Tensor rays_t, at::Tensor sigmas, at::Tensor rgbs, at::Tensor deltas, at::Tensor weights, at::Tensor depth, at::Tensor image) {
|
909 |
static constexpr uint32_t N_THREAD = 128;
|
910 |
AT_DISPATCH_FLOATING_TYPES_AND_HALF(
|
911 |
image.scalar_type(), "composite_rays", ([&] {
|
readme.md
CHANGED
@@ -17,13 +17,13 @@ This project is a **work-in-progress**, and contains lots of differences from th
|
|
17 |
|
18 |
|
19 |
## Notable differences from the paper
|
20 |
-
* Since the Imagen model is not publicly available, we use [Stable Diffusion](https://github.com/CompVis/stable-diffusion) to replace it (implementation from [diffusers](https://github.com/huggingface/diffusers)). Different from Imagen, Stable-Diffusion is a latent diffusion model, which diffuses in a latent space instead of the original image space. Therefore, we need the loss to propagate back from the VAE's encoder part too, which introduces extra time cost in training. Currently,
|
21 |
* We use the [multi-resolution grid encoder](https://github.com/NVlabs/instant-ngp/) to implement the NeRF backbone (implementation from [torch-ngp](https://github.com/ashawkey/torch-ngp)), which enables much faster rendering (~10FPS at 800x800).
|
22 |
* We use the Adam optimizer with a larger initial learning rate.
|
23 |
|
24 |
|
25 |
## TODOs
|
26 |
-
*
|
27 |
* Better mesh (improve the surface quality).
|
28 |
|
29 |
# Install
|
@@ -33,7 +33,9 @@ git clone https://github.com/ashawkey/stable-dreamfusion.git
|
|
33 |
cd stable-dreamfusion
|
34 |
```
|
35 |
|
36 |
-
**Important**: To download the Stable Diffusion model checkpoint, you should
|
|
|
|
|
37 |
|
38 |
### Install with pip
|
39 |
```bash
|
@@ -71,14 +73,30 @@ First time running will take some time to compile the CUDA extensions.
|
|
71 |
|
72 |
```bash
|
73 |
### stable-dreamfusion setting
|
74 |
-
## train with text prompt
|
75 |
# `-O` equals `--cuda_ray --fp16 --dir_text`
|
|
|
|
|
|
|
76 |
python main.py --text "a hamburger" --workspace trial -O
|
77 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
78 |
## after the training is finished:
|
79 |
-
# test (exporting 360 video
|
80 |
python main.py --workspace trial -O --test
|
81 |
-
|
|
|
82 |
# test with a GUI (free view control!)
|
83 |
python main.py --workspace trial -O --test --gui
|
84 |
|
@@ -101,7 +119,7 @@ pred_rgb_512 = F.interpolate(pred_rgb, (512, 512), mode='bilinear', align_corner
|
|
101 |
latents = self.encode_imgs(pred_rgb_512)
|
102 |
... # timestep sampling, noise adding and UNet noise predicting
|
103 |
# 3. the SDS loss, since UNet part is ignored and cannot simply audodiff, we manually set the grad for latents.
|
104 |
-
w = (1 - self.
|
105 |
grad = w * (noise_pred - noise)
|
106 |
latents.backward(gradient=grad, retain_graph=True)
|
107 |
```
|
@@ -117,7 +135,6 @@ latents.backward(gradient=grad, retain_graph=True)
|
|
117 |
Training is faster if only sample 128 points uniformly per ray (5h --> 2.5h).
|
118 |
More testing is needed...
|
119 |
* Shading & normal evaluation: `./nerf/network*.py > NeRFNetwork > forward`. Current implementation harms training and is disabled.
|
120 |
-
* use `--albedo_iters 1000` to enable random shading mode after 1000 steps from albedo, lambertian, and textureless.
|
121 |
* light direction: current implementation use a plane light source, instead of a point light source...
|
122 |
* View-dependent prompting: `./nerf/provider.py > get_view_direction`.
|
123 |
* ues `--angle_overhead, --angle_front` to set the border. How to better divide front/back/side regions?
|
|
|
17 |
|
18 |
|
19 |
## Notable differences from the paper
|
20 |
+
* Since the Imagen model is not publicly available, we use [Stable Diffusion](https://github.com/CompVis/stable-diffusion) to replace it (implementation from [diffusers](https://github.com/huggingface/diffusers)). Different from Imagen, Stable-Diffusion is a latent diffusion model, which diffuses in a latent space instead of the original image space. Therefore, we need the loss to propagate back from the VAE's encoder part too, which introduces extra time cost in training. Currently, 10000 training steps take about 3 hours to train on a V100.
|
21 |
* We use the [multi-resolution grid encoder](https://github.com/NVlabs/instant-ngp/) to implement the NeRF backbone (implementation from [torch-ngp](https://github.com/ashawkey/torch-ngp)), which enables much faster rendering (~10FPS at 800x800).
|
22 |
* We use the Adam optimizer with a larger initial learning rate.
|
23 |
|
24 |
|
25 |
## TODOs
|
26 |
+
* Alleviate the multi-face [Janus problem](https://twitter.com/poolio/status/1578045212236034048).
|
27 |
* Better mesh (improve the surface quality).
|
28 |
|
29 |
# Install
|
|
|
33 |
cd stable-dreamfusion
|
34 |
```
|
35 |
|
36 |
+
**Important**: To download the Stable Diffusion model checkpoint, you should provide your [access token](https://huggingface.co/settings/tokens). You could choose either of the following ways:
|
37 |
+
* Run `huggingface-cli login` and enter your token.
|
38 |
+
* Create a file called `TOKEN` under this directory (i.e., `stable-dreamfusion/TOKEN`) and copy your token into it.
|
39 |
|
40 |
### Install with pip
|
41 |
```bash
|
|
|
73 |
|
74 |
```bash
|
75 |
### stable-dreamfusion setting
|
76 |
+
## train with text prompt (with the default settings)
|
77 |
# `-O` equals `--cuda_ray --fp16 --dir_text`
|
78 |
+
# `--cuda_ray` enables instant-ngp-like occupancy grid based acceleration.
|
79 |
+
# `--fp16` enables half-precision training.
|
80 |
+
# `--dir_text` enables view-dependent prompting.
|
81 |
python main.py --text "a hamburger" --workspace trial -O
|
82 |
|
83 |
+
# if the above command fails to generate things (learns an empty scene), maybe try:
|
84 |
+
# 1. disable random lambertian shading, simply use albedo as color:
|
85 |
+
python main.py --text "a hamburger" --workspace trial -O --albedo_iters 10000 # i.e., set --albedo_iters >= --iters, which is default to 10000
|
86 |
+
# 2. use a smaller density regularization weight:
|
87 |
+
python main.py --text "a hamburger" --workspace trial -O --lambda_entropy 1e-5
|
88 |
+
|
89 |
+
# you can also train in a GUI to visualize the training progress:
|
90 |
+
python main.py --text "a hamburger" --workspace trial -O --gui
|
91 |
+
|
92 |
+
# A Gradio GUI is also possible (with less options):
|
93 |
+
python gradio_app.py # open in web browser
|
94 |
+
|
95 |
## after the training is finished:
|
96 |
+
# test (exporting 360 video)
|
97 |
python main.py --workspace trial -O --test
|
98 |
+
# also save a mesh (with obj, mtl, and png texture)
|
99 |
+
python main.py --workspace trial -O --test --save_mesh
|
100 |
# test with a GUI (free view control!)
|
101 |
python main.py --workspace trial -O --test --gui
|
102 |
|
|
|
119 |
latents = self.encode_imgs(pred_rgb_512)
|
120 |
... # timestep sampling, noise adding and UNet noise predicting
|
121 |
# 3. the SDS loss, since UNet part is ignored and cannot simply audodiff, we manually set the grad for latents.
|
122 |
+
w = self.alphas[t] ** 0.5 * (1 - self.alphas[t])
|
123 |
grad = w * (noise_pred - noise)
|
124 |
latents.backward(gradient=grad, retain_graph=True)
|
125 |
```
|
|
|
135 |
Training is faster if only sample 128 points uniformly per ray (5h --> 2.5h).
|
136 |
More testing is needed...
|
137 |
* Shading & normal evaluation: `./nerf/network*.py > NeRFNetwork > forward`. Current implementation harms training and is disabled.
|
|
|
138 |
* light direction: current implementation use a plane light source, instead of a point light source...
|
139 |
* View-dependent prompting: `./nerf/provider.py > get_view_direction`.
|
140 |
* ues `--angle_overhead, --angle_front` to set the border. How to better divide front/back/side regions?
|