metadata

license: apache-2.0
pipeline_tag: depth-estimation

Prompt-Depth-Anything-Vitl

Introduction

Prompt Depth Anything is a high-resolution and accurate metric depth estimation method, with the following highlights:

using prompting to unleash the power of depth foundation models, inspired by success of prompting in VLM and LLM foundation models.
The widely available iPhone LiDAR is taken as the prompt, guiding the model to produce up to 4K resolution accurate metric depth.
A scalable data pipeline is introduced to train the method.
Prompt Depth Anything benefits downstream applications, including 3D reconstruction and generalized robotic grasping.

Installation

git clone https://github.com/DepthAnything/PromptDA.git
cd PromptDA
pip install -r requirements.txt
pip install -e .

Usage

from promptda.promptda import PromptDA
from promptda.utils.io_wrapper import load_image, load_depth, save_depth

DEVICE = 'cuda'
image_path = "assets/example_images/image.jpg"
prompt_depth_path = "assets/example_images/arkit_depth.png"
image = load_image(image_path).to(DEVICE)
prompt_depth = load_depth(prompt_depth_path).to(DEVICE) # 192x256, ARKit LiDAR depth in meters

model = PromptDA.from_pretrained("depth-anything/prompt-depth-anything-vitl").to(DEVICE).eval()
depth = model.predict(image, prompt_depth) # HxW, depth in meters

save_depth(depth, prompt_depth=prompt_depth, image=image)

Citation

If you find this project useful, please consider citing:

@inproceedings{lin2024promptda,
  title={Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation},
  author={Lin, Haotong and Peng, Sida and Chen, Jingxiao and Peng, Songyou and Sun, Jiaming and Liu, Minghuan and Bao, Hujun and Feng, Jiashi and Zhou, Xiaowei and Kang, Bingyi},
  journal={arXiv},
  year={2024}
}