Spaces:

evaluate-comparison
/

wilcoxon

Runtime error

App Files Files Community

lvwerra HF staff commited on Aug 11, 2022

Commit

d64af6b

•

1 Parent(s): 0c5cd2d

Update Space (evaluate main: 3cd38e2b)

Browse files

Files changed (4) hide show

README.md +63 -5
app.py +6 -0
requirements.txt +3 -0
wilcoxon.py +78 -0

README.md CHANGED Viewed

@@ -1,12 +1,70 @@
 ---
 title: Wilcoxon
-emoji: 💻
-colorFrom: yellow
-colorTo: gray
 sdk: gradio
-sdk_version: 3.1.4
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Wilcoxon
+emoji: 🤗
+colorFrom: blue
+colorTo: green
 sdk: gradio
+sdk_version: 3.0.2
 app_file: app.py
 pinned: false
+tags:
+- evaluate
+- comparison
+description: >-
+  Wilcoxon's test is a signed-rank test for comparing paired samples.
 ---
+# Comparison Card for Wilcoxon
+## Comparison description
+Wilcoxon's test is a non-parametric signed-rank test that tests whether the distribution of the differences is symmetric about zero. It can be used to compare the predictions of two models.
+## How to use
+The Wilcoxon comparison is used to analyze paired ordinal data.
+## Inputs
+Its arguments are:
+`predictions1`: a list of predictions from the first model.
+`predictions2`: a list of predictions from the second model.
+## Output values
+The Wilcoxon comparison outputs two things:
+`stat`: The Wilcoxon statistic.
+`p`: The p value.
+## Examples
+Example comparison:
+```python
+wilcoxon = evaluate.load("wilcoxon")
+results = wilcoxon.compute(predictions1=[-7, 123.45, 43, 4.91, 5], predictions2=[1337.12, -9.74, 1, 2, 3.21])
+print(results)
+{'stat': 5.0, 'p': 0.625}
+```
+## Limitations and bias
+The Wilcoxon test is a non-parametric test, so it has relatively few assumptions (basically only that the observations are independent). It should be used to analyze paired ordinal data only.
+## Citations
+```bibtex
+@incollection{wilcoxon1992individual,
+  title={Individual comparisons by ranking methods},
+  author={Wilcoxon, Frank},
+  booktitle={Breakthroughs in statistics},
+  pages={196--202},
+  year={1992},
+  publisher={Springer}
+}
+```

app.py ADDED Viewed

	@@ -0,0 +1,6 @@

+import evaluate
+from evaluate.utils import launch_gradio_widget
+module = evaluate.load("wilcoxon", module_type="comparison")
+launch_gradio_widget(module)

requirements.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+git+https://github.com/huggingface/evaluate@a45df1eb9996eec64ec3282ebe554061cb366388
+datasets~=2.0
+scipy

wilcoxon.py ADDED Viewed

	@@ -0,0 +1,78 @@

+# Copyright 2022 The HuggingFace Evaluate Authors
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Wilcoxon test for model comparison."""
+import datasets
+from scipy.stats import wilcoxon
+import evaluate
+_DESCRIPTION = """
+Wilcoxon's test is a non-parametric signed-rank test that tests whether the distribution of the differences is symmetric about zero. It can be used to compare the predictions of two models.
+"""
+_KWARGS_DESCRIPTION = """
+Args:
+    predictions1 (`list` of `float`): Predictions for model 1.
+    predictions2 (`list` of `float`): Predictions for model 2.
+Returns:
+    stat (`float`): Wilcoxon test score.
+    p (`float`): The p value. Minimum possible value is 0. Maximum possible value is 1.0. A lower p value means a more significant difference.
+Examples:
+    >>> wilcoxon = evaluate.load("wilcoxon")
+    >>> results = wilcoxon.compute(predictions1=[-7, 123.45, 43, 4.91, 5], predictions2=[1337.12, -9.74, 1, 2, 3.21])
+    >>> print(results)
+    {'stat': 5.0, 'p': 0.625}
+"""
+_CITATION = """
+@incollection{wilcoxon1992individual,
+  title={Individual comparisons by ranking methods},
+  author={Wilcoxon, Frank},
+  booktitle={Breakthroughs in statistics},
+  pages={196--202},
+  year={1992},
+  publisher={Springer}
+}
+"""
+@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
+class Wilcoxon(evaluate.Comparison):
+    def _info(self):
+        return evaluate.ComparisonInfo(
+            module_type="comparison",
+            description=_DESCRIPTION,
+            citation=_CITATION,
+            inputs_description=_KWARGS_DESCRIPTION,
+            features=datasets.Features(
+                {
+                    "predictions1": datasets.Value("float"),
+                    "predictions2": datasets.Value("float"),
+                }
+            ),
+        )
+    def _compute(self, predictions1, predictions2):
+        # calculate difference
+        d = [p1 - p2 for (p1, p2) in zip(predictions1, predictions2)]
+        # compute statistic
+        res = wilcoxon(d)
+        return {"stat": res.statistic, "p": res.pvalue}