Papers
arxiv:2411.02337

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Published on Nov 4
· Submitted by ShawLiu on Nov 5
#3 Paper of the day
Authors:
,
,
,
,

Abstract

Large language models (LLMs) have shown remarkable potential as autonomous agents, particularly in web-based tasks. However, existing LLM web agents heavily rely on expensive proprietary LLM APIs, while open LLMs lack the necessary decision-making capabilities. This paper introduces WebRL, a self-evolving online curriculum reinforcement learning framework designed to train high-performance web agents using open LLMs. WebRL addresses three key challenges in building LLM web agents, including the scarcity of training tasks, sparse feedback signals, and policy distribution drift in online learning. Specifically, WebRL incorporates 1) a self-evolving curriculum that generates new tasks from unsuccessful attempts, 2) a robust outcome-supervised reward model (ORM), and 3) adaptive reinforcement learning strategies to ensure consistent improvements. We apply WebRL to transform open Llama-3.1 and GLM-4 models into proficient web agents. On WebArena-Lite, WebRL improves the success rate of Llama-3.1-8B from 4.8% to 42.4%, and from 6.1% to 43% for GLM-4-9B. These open models significantly surpass the performance of GPT-4-Turbo (17.6%) and GPT-4o (13.9%) and outperform previous state-of-the-art web agents trained on open LLMs (AutoWebGLM, 18.2%). Our findings demonstrate WebRL's effectiveness in bridging the gap between open and proprietary LLM-based web agents, paving the way for more accessible and powerful autonomous web interaction systems.

Community

Paper author Paper submitter

We are thrilled to present WebRL and its application to the first Phone Use and Web Browser Use foundation model--AutoGLM (https://arxiv.org/abs/2411.00820). Glad to discuss here!

Sign up or log in to comment

Models citing this paper 3

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2411.02337 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2411.02337 in a Space README.md to link it from this page.

Collections including this paper 12