I love Depth Anything V2 ๐ Itโs Depth Anything, but scaled with both larger teacher model and a gigantic dataset!
Here's a small TLDR of paper with a lot of findings, experiments and more. I have also created a collection that has the models, the dataset, the demo and CoreML converted model ๐ merve/depth-anything-v2-release-6671902e798cd404513ffbf5
The authors have analyzed Marigold, a diffusion based model against Depth Anything and found out whatโs up with using synthetic images vs real images for MDE:
๐ Real data has a lot of label noise, inaccurate depth maps (caused by depth sensors missing transparent objects etc) and there are many details overlooked
๐ Synthetic data have more precise and detailed depth labels and they are truly ground-truth, but thereโs a distribution shift between real and synthetic images, and they have restricted scene coverage
The authors train different image encoders only on synthetic images and find out unless the encoder is very large the model canโt generalize well (but large models generalize inherently anyway) ๐ง But they still fail encountering real images that have wide distribution in labels (e.g. diverse instances of objects) ๐ฅฒ
Depth Anything v2 framework is to..
๐ฆ Train a teacher model based on DINOv2-G based on 595K synthetic images ๐ท๏ธ Label 62M real images using teacher model ๐ฆ Train a student model using the real images labelled by teacher Result: 10x faster and more accurate than Marigold!
The authors also construct a new benchmark called DA-2K that is less noisy, highly detailed and more diverse!
๐๐จ๐ฐ ๐ข๐ญ ๐ฐ๐จ๐ซ๐ค๐ฌ You provide an URL -> A multiple-choice quiz is instantly generated.
๐น You can play the quiz yourself.
๐น You can let the LLM play in two different ways ๐ Closed book: the LLM responds only by knowing the general topic and using its parametric knowledge and reasoning abilities. ๐๐ Web RAG: for each question, a Google search is done and the top 3 snippets are included in the prompt for the LLM.
๐๐จ๐ฐ ๐ข๐ญ ๐ฐ๐จ๐ซ๐ค๐ฌ You provide an URL -> A multiple-choice quiz is instantly generated.
๐น You can play the quiz yourself.
๐น You can let the LLM play in two different ways ๐ Closed book: the LLM responds only by knowing the general topic and using its parametric knowledge and reasoning abilities. ๐๐ Web RAG: for each question, a Google search is done and the top 3 snippets are included in the prompt for the LLM.