{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "61092509",
"metadata": {
"tags": []
},
"source": [
"# GSI Technology Video Search Demo - Evaluation Notebook\n",
"\n",
"This notebook will perform a deep analysis of the performance of a text-to-video search application.
\n",
"It will perform an elaborate evaluation over two test sets and several trained models and search methods, reviewing trade-offs between accuracy and fast similarity search.
\n",
"\n",
"## Model Architecture\n",
"\n",
"To extract feature vectors from raw video data will use the **CLIP4Clip** model architecture -
\n",
"A CLIP-based video retrieval method, based on [this paper](https://arxiv.org/pdf/2104.08860.pdf) and [code](https://github.com/ArrowLuo/CLIP4Clip) by Luo et al, 2021.\n",
"\n",
"A figure of the CLIP4Clip framework:\n",
" "
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "f269a897",
"metadata": {},
"source": [
"
\n", " | R1 | \n", "R5 | \n", "R10 | \n", "MR | \n", "MedianR | \n", "MeanR | \n", "
---|---|---|---|---|---|---|
zero-shot model | \n", "37.16 | \n", "62.10 | \n", "71.16 | \n", "3.0 | \n", "3.0 | \n", "42.2128 | \n", "
msr-vtt trained | \n", "38.38 | \n", "62.89 | \n", "72.01 | \n", "3.0 | \n", "3.0 | \n", "39.3023 | \n", "
webvid trained | \n", "50.74 | \n", "77.30 | \n", "85.05 | \n", "1.0 | \n", "1.0 | \n", "14.9535 | \n", "
binary | \n", "29.68 | \n", "55.95 | \n", "67.32 | \n", "4.0 | \n", "4.0 | \n", "49.6309 | \n", "
binary + rerank100 | \n", "50.56 | \n", "76.39 | \n", "83.51 | \n", "1.0 | \n", "1.0 | \n", "43.2964 | \n", "
binary + rerank500 | \n", "50.74 | \n", "77.30 | \n", "85.05 | \n", "1.0 | \n", "1.0 | \n", "17.1879 | \n", "
binary trained | \n", "30.88 | \n", "57.98 | \n", "69.71 | \n", "4.0 | \n", "4.0 | \n", "37.7139 | \n", "
binary trained + rerank100 | \n", "50.75 | \n", "76.88 | \n", "84.31 | \n", "1.0 | \n", "1.0 | \n", "31.9374 | \n", "