Failure Modes of OpenAI Operator

Community Article Published January 24, 2025

By Zengyi Qin from MIT. 01/23/2025

Background: Our MIT team has developed an internal Agent benchmark for computer-use agents. We tested OpenAI Operator and show 5 cases here. We did not cherrypick but Operator simply failed in all the 5 tasks. See below for details.

Key takeaways:

  1. Operator does very well in visual grounding.
  2. Operator does not fully understand the interactive logic. It is almost surely lower than a college-school level of computer use.
  3. The OpenAI Operator team seems to devote a lot of effort in post-train but not pre-train, because Operator does not even know some basic web-use knowledge, which should be no problem at all if sufficient pre-training is done.

BTW - Our MIT team is collaborating with data vendors to collect a hundred-billion-token scale pre-training data for computer-use. If you are interested in what we are doing, welcome to contact.

Task 1

Get a image from google. Open the image, then apply a 20% decrease in brightness and a 15% increase in contrast.

Failure reason: entered the wrong number

Operator screen recording:

Task 2

Create a new solid color layer with #0000FF, then apply the Outer Glow effect with a 10px size.

Failure reason: does not know how to use online tools

Operator screen recording:

Task 3

Solve advanced trig question #5 from https://tutorial.math.lamar.edu confirm final angles or identities using an online trig solver.

Failure reason: cannot find the question at all.

Operator screen recording:

Task 4

Look for question #2063 in the book 3000 Solved Problems in Calculus and solve it

Failure reason: cannot find question #2063 at all.

Operator screen recording:

Task 5

Design a low-pass filter using a resistor and capacitor (R = 10kΩ, C = 1μF) in place of RL, and analyze its effect on the output waveform.

Failure reason: does not know how to use online tools.

Operator screen recording (it failed to generate a video so I just put a screenshot placeholder here): image/png

Community

Sign up or log in to comment