Spaces:

jacopoteneggi
/

IBYDMT

Runtime error

App Files Files Community

jacopoteneggi commited on Jun 16

Commit

468f744

•

1 Parent(s): 0da5552

Include How does it work tab

Browse files

Files changed (6) hide show

app_lib/about.py +195 -0
app_lib/demo.py +114 -0
app_lib/main.py +8 -108
assets/about/local_dist.jpg +0 -0
assets/about/setup.jpg +0 -0
header.md +2 -2

app_lib/about.py ADDED Viewed

	@@ -0,0 +1,195 @@

+import streamlit as st
+def about():
+    _, centercol, _ = st.columns([1, 3, 1])
+    with centercol:
+        st.markdown(
+            """
+            ## Testing Semantic Importance via Betting
+            We briefly present here the main ideas and contributions.
+        """
+        )
+        st.markdown("""### 1. Setup""")
+        st.image(
+            "./assets/about/setup.jpg",
+            caption="Figure 1: Pictorial representation of the setup.",
+            use_column_width=True,
+        )
+        st.markdown(
+            """
+            We consider classification problems with:
+            * **Input image** $X \in \mathcal{X}$.
+            * **Feature encoder** $f:~\mathcal{X} \\to \mathbb{R}^d$ that maps input
+            images to dense embeddings $H = f(X) \in \mathbb{R}^d$.
+            * **Classifier** $g:~\mathbb{R}^d \\to [0,1]^k$ that separates embeddings
+            into one of $k$ classes. We do not assume $g$ has a particular form and it
+            can be any fixed, potentially nonlinear function.
+            * **Concept bank** $c = [c_1, \dots, c_m] \in \mathbb{R}^{d \\times m}$ such
+            that $c_j \in \mathbb{R}^d$ is the representation of the $j^{\\text{th}}$ concept.
+            We assume thet $c$ is user-defined and that $m$ is small ($m \\approx 20$).
+            * **Semantics** $Z = [Z_1, \dots, Z_m] = c^{\\top} H$ where $Z_j \in [-1, 1]$ represents the
+            amount of concept $j$ present in the dense embedding of input image $X$.
+            For example:
+            * $f$ is the image encoder of a vision-language model (e.g., CLIP$^1$, OpenCLIP$^2$).
+            * $g$ is the zero-shot classifier obtained by encoding *``A photo of a <CLASS_NAME>''* with the
+            text encoder of the same vision-language model.
+            * $c$ is obtained similarly by encoding the user-defined concepts.
+            """
+        )
+        st.markdown(
+            """
+            ### 2. Defining Semantic Importance
+            Our goal is to test the statistical importance of the concepts in $c$ for the
+            predictions of the given classifier on a particular image $x$ (capital letters denote random
+            variables, and lowercase letters their realizations).
+            We do not train a surrogate, interpretable model and instead consider the original, potentially
+            nonlinear classifier $g$. This is because we want to study the semantic importance of
+            the model that would be deployed in real-world settings and not a surrogate one that
+            might decrease performance.
+            We define importance from the perspective of conditional independence testing because
+            it allows for rigorous statistical testing with false positive rate control
+            (i.e., Type I error control). That is, the probability of falsely deeming a concept
+            important is below a user-defined level $\\alpha \in (0,1)$.
+            For an image $x$, a concept $j$, and a subset $S \subseteq [m] \setminus \{j\}$ (i.e., any
+            subset that does not contain $j$), we define the null hypothesis:
+            $$
+                H_0:~\hat{Y}_{S \cup \{j\}} \overset{d}{=} \hat{Y}_S,
+            $$
+            where $\overset{d}{=}$ denotes equality in distribution, and $\\forall C \subseteq [m]$,
+            $\hat{Y}_C = g(\widetilde{H}_C)$, $\widetilde{H}_C \sim P_{H \mid Z_C = z_C}$ is the conditional distribution of the dense
+            embeddings given the observed concepts in $z_C$, i.e. the semantics of $x$.
+            Then, rejecting $H_0$ means the concept $j$ affects the distribution of the response of
+            the model, and it is important.
+            """
+        )
+        st.markdown(
+            """
+            ### 3. Sampling Conditional Embeddings
+            """
+        )
+        st.image(
+            "./assets/about/local_dist.jpg",
+            caption=(
+                "Figure 2: Example test (i.e., with concept) and null (i.e., without"
+                " concept) distributions for a class-specific concept and a non-class"
+                " specific one on three images from the Imagenette dataset as a"
+                " function of the size of S."
+            ),
+            use_column_width=True,
+        )
+        st.markdown(
+            """
+            In order to test for $H_0$ defined above, we need to sample from the conditional distribution
+            of the dense embeddings given certain concepts. This can be seen as solving a linear inverse
+            problem stochastically since $Z = c^{\\top} H$. In this work, given that $m$ is small, we use
+            nonparametric kernel density estimation (KDE) methods to approximate the target distribution.
+            Intuitively, given a dataset $\{(h^{(i)}, z^{(i)})\}_{i=1}^n$ of dense embeddings with
+            their semantics, we:
+            1. Use a weighted KDE to sample $\widetilde{Z} \sim P_{Z \mid Z_C = z_C}$, and then
+            2. Retrieve the embedding $H^{(i')}$ whose concept representation $Z^{(i')}$ is the
+            nearest neighbor of $\widetilde{Z}$ in the dataset.
+            Details on the weighted KDE and the sampling procedure are included in the paper. Figure 2
+            shows some example test (i.e., $\hat{Y}_{S \cup \{j\}}$) and
+            null (i.e., $\hat{Y}_{S}$) distributions for a class-specific concept and a non-class
+            specific one on three images from the Imagenette$^3$ dataset. We can see that the test
+            distributions of class-specific concepts are skewed to the right, i.e. including the observed
+            class-specific concept increases the output of the predictor. Furthermore, we see the shift
+            decreases the more concepts are included in $S$, i.e. if $S$ is larger and it contains more
+            information, then the marginal contribution of adding one concept will be smaller.
+            On the other hand, including a non-class-specific concept does not change the distribution
+            of the response of the model, no matter the size of $S$.
+            """
+        )
+        st.markdown(
+            """
+            ### 4. Testing by Betting
+            Instead of classical hypothesis testing techniques based on $p$-values, we propose to
+            test for the importance of concepts by *betting*.$^4$ This choice is motivated by two important
+            properties of sequential tests:
+            1. They are **adaptive** to the hardness of the problem. That is, the easier it is to reject
+            a null hypothesis, the earlier the test will stop. This induce a natural ranking of importance
+            across concepts: if concept $j$ rejects faster than $j'$, then $j$ is more important than $j'$.
+            2. They are **efficient** because they only use as much data as needed to reject, instead of
+            the entire data available as traditional, offline tests.
+            Sequential tests instantiate a game between a *bettor* and *nature*. At every turn of the game,
+            the bettor places a wager against the null hypothesis, and the nature reveals the truth. If
+            the bettor wins, they will accumulate wealth, or loose some otherwise. More formally, the
+            *wealth process* $\{K_t\}_{t \in \mathbb{N}_0}$ is defined as
+            $$
+                K_0 = 1, \\quad K_{t+1} = K_t \cdot (1 + v_t\kappa_t),
+            $$
+            where $v_t \in [-1,1]$ is a betting fraction, and $\kappa_t \in [-1,1]$ is the payoff of the bet.
+            Under certain conditions, the wealth process describes a *fair game*, and for $\\alpha \in (0,1)$,
+            it holds that
+            $$
+                \mathbb{P}_{H_0}[\exists t:~K_t \geq 1/\\alpha] \leq \\alpha.
+            $$
+            That is, the wealth process can be used to reject the null hypothesis $H_0$ with
+            Type I error control at level $\\alpha$.
+            Briefly, we use ideas of sequential kernelized independence testing (SKIT)$^5$ and define
+            the payoff as
+            $$
+                \kappa_t \coloneqq \\tanh\left(\\rho_t(\hat{Y}_{S \cup \{j\}}) - \\rho_t(\hat{Y}_S)\\right)
+            $$
+            and
+            $$
+                \\rho_t = \widehat{\\text{MMD}}(\hat{Y}_{S \cup \{j\}}, \hat{Y}_S)
+            $$
+            is the plug-in estimator of the maximum mean discrepancy (MMD)$^6$ between the test and
+            null distributions at time $t$. Furthermore, we use the online Newtown step (ONS)$^7$ method
+            to choose the betting fraction $v_t$ and ensure exponential growth of the wealth.
+            """
+        )
+        st.markdown(
+            """
+            ---
+            **References**
+            [1] CLIP is available at https://github.com/openai/CLIP .
+            [2] OpenCLIP is available at https://github.com/mlfoundations/open_clip .
+            [3] The Imagenette dataset is available at https://github.com/fastai/imagenette .
+            [4] Glenn Shafer. Testing by betting: A strategy for statistical and scientific communication.
+            Journal of the Royal Statistical Society Series A: Statistics in Society, 184(2):407-431, 2021.
+            [5] Aleksandr Podkopaev et al. Sequential kernelized independence testing. In International
+            Conference on Machine Learning, pages 27957-27993. PMLR, 2023.
+            [6] Arthur Gretton et al. A kernel two-sample test. The Journal of Machine Learning Research,
+            13(1):723-773, 2012.
+            [7] Ashok Cutkosky and Francesco Orabona. Black-box reductions for parameter-free online
+            learning in banach spaces. In Conference On Learning Theory, pages 1493-1529. PMLR, 2018.
+            """
+        )

app_lib/demo.py ADDED Viewed

	@@ -0,0 +1,114 @@

+import streamlit as st
+import torch
+from app_lib.test import get_testing_config, load_precomputed_results, test
+from app_lib.user_input import (
+    get_advanced_settings,
+    get_class_name,
+    get_concepts,
+    get_image,
+    get_model_name,
+)
+from app_lib.viz import viz_results
+def _disable():
+    st.session_state.disabled = True
+def _toggle_sidebar(button):
+    if button:
+        st.session_state.sidebar_state = "expanded"
+        st.experimental_rerun()
+def _preload_results(image_name):
+    if image_name != st.session_state.image_name:
+        st.session_state.image_name = image_name
+        st.session_state.tested = False
+    if st.session_state.image_name is not None and not st.session_state.tested:
+        st.session_state.results = load_precomputed_results(image_name)
+def demo(device=torch.device("cuda" if torch.cuda.is_available() else "cpu")):
+    columns = st.columns([0.40, 0.60])
+    with columns[0]:
+        st.header("Choose Image and Concepts")
+        image_col, concepts_col = st.columns(2)
+        with image_col:
+            image_name, image = get_image()
+            st.image(image, use_column_width=True)
+            change_image_button = st.button(
+                "Change Image",
+                use_container_width=False,
+                disabled=st.session_state.disabled,
+            )
+            _toggle_sidebar(change_image_button)
+        with concepts_col:
+            model_name = get_model_name()
+            class_name, class_ready, class_error = get_class_name(image_name)
+            concepts, concepts_ready, concepts_error = get_concepts(image_name)
+        ready = class_ready and concepts_ready
+        error_message = ""
+        if class_error is not None:
+            error_message += f"- {class_error}\n"
+        if concepts_error is not None:
+            error_message += f"- {concepts_error}\n"
+        if error_message:
+            st.error(error_message)
+        with st.container():
+            (
+                significance_level,
+                tau_max,
+                r,
+                cardinality,
+                dataset_name,
+            ) = get_advanced_settings(concepts, concepts_ready)
+            test_button = st.button(
+                "Test Concepts",
+                use_container_width=True,
+                on_click=_disable,
+                disabled=st.session_state.disabled or not ready,
+            )
+    if test_button:
+        st.session_state.results = None
+        with columns[1]:
+            viz_results()
+        testing_config = get_testing_config(
+            significance_level=significance_level, tau_max=tau_max, r=r
+        )
+        with columns[0]:
+            results = test(
+                testing_config,
+                image,
+                class_name,
+                concepts,
+                cardinality,
+                dataset_name,
+                model_name,
+                device=device,
+            )
+        st.session_state.tested = True
+        st.session_state.results = results
+        st.session_state.disabled = False
+        st.experimental_rerun()
+    else:
+        _preload_results(image_name)
+        with columns[1]:
+            viz_results()

app_lib/main.py CHANGED Viewed

@@ -1,114 +1,14 @@
 import streamlit as st
-import torch
-from app_lib.test import get_testing_config, load_precomputed_results, test
-from app_lib.user_input import (
-    get_advanced_settings,
-    get_class_name,
-    get_concepts,
-    get_image,
-    get_model_name,
-)
-from app_lib.viz import viz_results
-def _disable():
-    st.session_state.disabled = True
-def _toggle_sidebar(button):
-    if button:
-        st.session_state.sidebar_state = "expanded"
-        st.experimental_rerun()
-def _preload_results(image_name):
-    if image_name != st.session_state.image_name:
-        st.session_state.image_name = image_name
-        st.session_state.tested = False
-    if st.session_state.image_name is not None and not st.session_state.tested:
-        st.session_state.results = load_precomputed_results(image_name)
-def main(device=torch.device("cuda" if torch.cuda.is_available() else "cpu")):
-    columns = st.columns([0.40, 0.60])
-    with columns[0]:
-        st.header("Choose Image and Concepts")
-        image_col, concepts_col = st.columns(2)
-        with image_col:
-            image_name, image = get_image()
-            st.image(image, use_column_width=True)
-            change_image_button = st.button(
-                "Change Image",
-                use_container_width=False,
-                disabled=st.session_state.disabled,
-            )
-            _toggle_sidebar(change_image_button)
-        with concepts_col:
-            model_name = get_model_name()
-            class_name, class_ready, class_error = get_class_name(image_name)
-            concepts, concepts_ready, concepts_error = get_concepts(image_name)
-        ready = class_ready and concepts_ready
-        error_message = ""
-        if class_error is not None:
-            error_message += f"- {class_error}\n"
-        if concepts_error is not None:
-            error_message += f"- {concepts_error}\n"
-        if error_message:
-            st.error(error_message)
-        with st.container():
-            (
-                significance_level,
-                tau_max,
-                r,
-                cardinality,
-                dataset_name,
-            ) = get_advanced_settings(concepts, concepts_ready)
-            test_button = st.button(
-                "Test Concepts",
-                use_container_width=True,
-                on_click=_disable,
-                disabled=st.session_state.disabled or not ready,
-            )
-    if test_button:
-        st.session_state.results = None
-        with columns[1]:
-            viz_results()
-        testing_config = get_testing_config(
-            significance_level=significance_level, tau_max=tau_max, r=r
-        )
-        with columns[0]:
-            results = test(
-                testing_config,
-                image,
-                class_name,
-                concepts,
-                cardinality,
-                dataset_name,
-                model_name,
-                device=device,
-            )
-        st.session_state.tested = True
-        st.session_state.results = results
-        st.session_state.disabled = False
-        st.experimental_rerun()
-    else:
-        _preload_results(image_name)
-        with columns[1]:
-            viz_results()

 import streamlit as st
+from app_lib.about import about
+from app_lib.demo import demo
+def main():
+    demo_tab, about_tab = st.tabs(["Demo", "How Does it Work?"])
+    with demo_tab:
+        demo()
+    with about_tab:
+        about()

assets/about/local_dist.jpg ADDED Viewed

assets/about/setup.jpg ADDED Viewed

header.md CHANGED Viewed

@@ -1,5 +1,5 @@
 # 🤔 I Bet You Did Not Mean That
-Test the effect of different concepts on the predictions of a classifier. Concepts are ranked by their *importance*: how much they change the prediction. [[paper]](https://arxiv.org/pdf/2405.19146) [[code]](https://github.com/Sulam-Group/IBYDMT)

 # 🤔 I Bet You Did Not Mean That
+Test the effect of different concepts on the predictions of a classifier. Concepts are ranked by their *importance*: how much they change the prediction [[paper]](https://arxiv.org/pdf/2405.19146) [[code]](https://github.com/Sulam-Group/IBYDMT).
+by [Jacopo Teneggi](https://jacopoteneggi.github.io) and [Jeremias Sulam](https://sites.google.com/view/jsulam) (Johns Hopkins University).