The Review

What It Does

ChatGPT agent is a new capability from the team at OpenAI, released last Thursday. Designed to replace and enhance OpenAI’s first agent release “Operator” (which was novel to test but underwhelming in practice), ChatGPT agent combines the capabilities of Deep Research with the ability to take action in a virtual browser environment. Like a research analyst that can also look through your email inbox…very slowly.

Why Ops Leaders Should Care

Looking up information and taking actions based on your findings sounds like the foundation of what most operators spend their time doing.

Key Features (Pros & Cons)

Pros

  • Access to a virtual computer to take action on any website (including ones where you need to log in).
  • Access to ChatGPT connectors, so that you can search and take actions across your files, calendar, emails, CRM, and more.
  • The ability to create and edit presentations, spreadsheets, and more.

Cons

  • It’s slow: while the capability and accuracy of the agent is much improved since the release of Operator, watching it complete its task can be like watching paint dry.
  • On sites where you log in, it often won’t complete work in the background, which somewhat defeats the purpose of having an agent work on your behalf.
  • Difficult to control and train: while you can observe the agent and stop it whenever you want, there don’t appear to be ways to audit the agent’s actions other than watching a replay of the agent’s action.

An Operator’s Perspective

I gave ChatGPT agent several tasks to test its capabilities across a number of domains.

First, I asked it to create an event on my calendar - a simple task. I had to help it log in, but then it was able to create the event successfully. Unfortunately, it took much longer to complete the task than it would have taken for me to do it myself, and it paused its work whenever I switched over to another tab (due to its handling of sensitive data, apparently). I’d give it a D on this task.

Second, I asked it to read through all of the emails that I had received so far that day and summarize the five most important ones. It began to read through the emails and seemed to understand the context for most of them. However, it was unable to open attachments, which made it somewhat less useful. I got bored of watching it work after five minutes and paused the task. This task gets an Incomplete.

It was pretty good at meeting prep: when I gave it specific instructions to help me prepare for an external meeting I had coming up later that day, its response was detailed and helpful, saving me time. I tried asking it to prep me for all my meetings that day and the quality degraded somewhat - it didn’t pull as much information and it left out some important context on some meetings, despite having access to the relevant emails and my calendar. On meeting prep, I’d give it a B+.

In its announcement post, OpenAI touted its agent’s ability to mimic an investment banking analyst’s work, so I asked it to prepare a three-statement model on a public company for me. It created the spreadsheet and pulled the data, but it was very high-level and didn’t format the spreadsheet at all (despite me asking it to try). Perhaps it’d be better with a longer prompt, but this result earned it a C- from me.

My final test (for now) was for it to pull a list of the largest influencers in a couple of different categories on LinkedIn. It completed this task quickly and well, responding with a well-formatted answer. This saved me a decent amount of time. A- on this.

Other Options

Bottom Line

This is a step forward for OpenAI from its Operator release which was, frankly, unusable. With more time and longer prompts, I can see this agent being quite useful for specific tasks, but more as an extended Deep Research than a general purpose agent. For releases like this, OpenAI would benefit from tempering expectations (is that possible for Sam Altman?) and adding tutorials and templates to help users understand the limits of their tools. I’m looking forward to continuing to explore the limits of what this agent can do, but I expect a general purpose agent that can really supercharge my productivity as an operator is still at least a few months away.