File size: 2,175 Bytes
2c9628e
 
 
404f089
 
2c9628e
 
 
404f089
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2c9628e
 
 
404f089
 
 
 
 
2c9628e
 
 
 
404f089
 
 
 
 
a409047
 
 
 
 
 
 
 
2c9628e
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
TITLE = '<h1 align="center" id="space-title">πŸ† WebWalkerQA Leaderboard</h1>'

INTRO_TEXT = f"""
## πŸ“– About
This leaderboard showcases the performance of models on the **WebWalkerQA benchmark**. WebWalkerQA is a collection of question-answering datasets designed to test models' ability to answer questions about web pages.
"""

HOW_TO = f"""
## πŸ—‚οΈ Data
The WebWalkerQA dataset is available on πŸ€— [Hugging Face](https://huggingface.co/datasets/callanwu/WebWalkerQA). It comprises **680 question-answer pairs**, each linked to a corresponding web page. The benchmark is divided into two key components:

- **Agent πŸ€–οΈ**
- **RAG-system πŸ”**

## πŸš€ How to Submit Your Method

### πŸ“ Submission Steps:
To list your method's performance on this leaderboard, email **[email protected]** or **[email protected]** with the following:

1. A JSONL file in the format:
   ```jsonl
   {{"question": "question_text", "prediction": "predicted_answer_text"}}
   ```
2. Include the following details in your email:
   - **User Name**
   - **Type** (RAG-system or Agent)
   - **Method Name**

Your method will be evaluated and added to the leaderboard. For reference, check out the [evaluation code](https://github.com/Alibaba-NLP/WebWalker/src/evaluate.py).

We will evaluate the performance of your method and list it on the leaderboard.

For reference, you can check the [evaluation code](https://github.com/Alibaba-NLP/WebWalker/src/evaluate.py).
"""

CREDIT = f"""
## πŸ™Œ Credit
This website is built using the following resources:

- **Evaluation Code**: Langchain's cot_qa evaluator
- **Leaderboard Code**: Huggingface4's open_llm_leaderboard
"""


CITATION = f"""
## 🚩Citation

If this work is helpful, please kindly cite as:

```bigquery
@misc{{wu2025webwalker,
      title={{WebWalker: Benchmarking LLMs in Web Traversal}},
      author={{Jialong Wu and Wenbiao Yin and Yong Jiang and Zhenglin Wang and Zekun Xi and Runnan Fang and Deyu Zhou and Pengjun Xie and Fei Huang}},
      year={{2025}},
      eprint={{2501.07572}},
      archivePrefix={{arXiv}},
      primaryClass={{cs.CL}},
      url={{https://arxiv.org/abs/2501.07572}},
}}
```
"""