OpenAI announces Operator capabilities and demonstrates independent Agents

OpenAI announces Operator capabilities and demonstrates independent Agents

Back in October, the team from Anthropic introduced their Computer Use capability – essentially it's Claude-with-a-computer, navigating your machine by simply taking screenshots and figuring out next steps. Sam, one of the Anthropic researchers demonstrated Claude using a computer to find and complete some organisational data management tasks.

Well, yesterday, OpenAI responded with their own approach.

Whilst I think it's accurate to suggest that Claude can technically 'do anything' on your computer (should you wish it), the OpenAI team are doing things differently. Essentially the first 'Agents' (or Operators) you can use (today!) from OpenAI are browser-based.

So technically anything you'd like to complete on an independent, standalone web browser, you can do with ChatGPT's Operator function. To be clear, this isn't on your computer. Operator uses a browser somewhere in a data center to complete your assigned task.

I think it's important to underscore that this is a totally independent browser - think of it like a temporary Incognito browser. ChatGPT then does the same as Claude - that is, taking screenshots, 'understanding them' and then moving the mouse cursor with text inputs as needed to complete the task.

If this is just a little bit interesting to you, I would strongly recommend getting yourself a coffee and then sitting and watching the 24 minute video below. You'll see Sam Altman and colleagues walking through their thinking together with quite a lot of different demonstrations.

I'd suggest the default response should be 'wow'.

The technology isn't a surprise. The way in which OpenAI have productionised this for the masses, though? Very impressive.

As I sat watching the video last night, I wondered: Would I use this?

Yes, I would.

The applications of this are going to be astonishing.

I keep thinking back to the book that Bill Gates published in May 2000 – almost a quarter decade ago. It's called Business at the Speed of Thought. That's where we are heading.

I'm already able to operate really swiftly when I am talking to my AIs – but one of the missing aspects is getting stuff done. Clicking on stuff, basically. A bit like the promise of Google Duplex from about 5 years ago – remember the book-a-hair-appointment-demo?

Theoretically I can have ChatGPT start actually doing tasks for me as I go about my day, rather it having to just remind me to do them.

Over on our WhatsApp channel (do join, you'd be most welcome), there was some discussion about this Operator feature essentially being akin to 'Faster Horses' (Henry Ford). Yes. I think that's a valid statement. A Chatbot having to mess about with screenshots operating a stupid web browser? Really? Well, yes... yes, because how else are you getting to scale, today?

TODAY. Not tomorrow. Companies like OpenAI aren't waiting.

There should be an API interface standard for Conversational AI to get stuff done. Yes, there should.

Where is it? Who's implemented it? Is it live yet?

Meanwhile... if you want to ChatGPT to book your tennis court – it's going to have to do that by the browser. Just like you do. At the moment, anyway.

So the market moves yet again.

Operator is only available to "Pro" ($200/month) US subscribers at the moment but it will shortly be extended.

I can see myself using it. I'll need to try and find a reason to begin with, I think. I'll push myself. I can really quickly see this being a really valuable addition to my personal toolkit. Only yesterday, I was sat using some 1990s rubbishy Genesys crappy chat system talking to my insurer. There was a bit of chatbot at the start, but the rest of it was human driven. I missed the first response from the agent because the system didn't notify me. I had to re-open the session. Then it took me – NO KIDDING – 40 minutes to process my query.

FORTY MINUTES.

ChatGPT's Operator could have been doing that for me, prompting me when it needed some assistance.

So yeah - next time, Operator can do it.

As for the the opportunities in Enterprise – well, I do think OpenAI is a little bit like Apple when it comes to business. When Samsung (for example) announces something, the geeks jump but executives don't tend to be moved. But when Apple launches the exact same function a year later, all of a sudden budgets are made available and senior executives start demanding the feature be integrated so they (and customers) can use it.

I feel it's a little similar when it comes to OpenAI. I think we'll have quite a lot of people sitting up and beginning to obsess over how they can begin to use the Operator feature in their companies.

If you're sitting there wondering how you're going to get your Conversational AI interface to support more of your internal company systems that are currently driven by independent one-trick, one-interface applications that don't have any APIs at the moment... maybe it's time to experiment with Operator. I'm sure there will be an Enterprise version coming soon...

The exponential journey continues!

Read more