Spaces:
Running
Running
File size: 2,175 Bytes
2c9628e 404f089 2c9628e 404f089 2c9628e 404f089 2c9628e 404f089 a409047 2c9628e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
TITLE = '<h1 align="center" id="space-title">π WebWalkerQA Leaderboard</h1>'
INTRO_TEXT = f"""
## π About
This leaderboard showcases the performance of models on the **WebWalkerQA benchmark**. WebWalkerQA is a collection of question-answering datasets designed to test models' ability to answer questions about web pages.
"""
HOW_TO = f"""
## ποΈ Data
The WebWalkerQA dataset is available on π€ [Hugging Face](https://huggingface.co/datasets/callanwu/WebWalkerQA). It comprises **680 question-answer pairs**, each linked to a corresponding web page. The benchmark is divided into two key components:
- **Agent π€οΈ**
- **RAG-system π**
## π How to Submit Your Method
### π Submission Steps:
To list your method's performance on this leaderboard, email **[email protected]** or **[email protected]** with the following:
1. A JSONL file in the format:
```jsonl
{{"question": "question_text", "prediction": "predicted_answer_text"}}
```
2. Include the following details in your email:
- **User Name**
- **Type** (RAG-system or Agent)
- **Method Name**
Your method will be evaluated and added to the leaderboard. For reference, check out the [evaluation code](https://github.com/Alibaba-NLP/WebWalker/src/evaluate.py).
We will evaluate the performance of your method and list it on the leaderboard.
For reference, you can check the [evaluation code](https://github.com/Alibaba-NLP/WebWalker/src/evaluate.py).
"""
CREDIT = f"""
## π Credit
This website is built using the following resources:
- **Evaluation Code**: Langchain's cot_qa evaluator
- **Leaderboard Code**: Huggingface4's open_llm_leaderboard
"""
CITATION = f"""
## π©Citation
If this work is helpful, please kindly cite as:
```bigquery
@misc{{wu2025webwalker,
title={{WebWalker: Benchmarking LLMs in Web Traversal}},
author={{Jialong Wu and Wenbiao Yin and Yong Jiang and Zhenglin Wang and Zekun Xi and Runnan Fang and Deyu Zhou and Pengjun Xie and Fei Huang}},
year={{2025}},
eprint={{2501.07572}},
archivePrefix={{arXiv}},
primaryClass={{cs.CL}},
url={{https://arxiv.org/abs/2501.07572}},
}}
```
"""
|