What If AI Could Tame Itself?

Scientists who broke away from OpenAI say they’re creating a safer version of ChatGPT. Co-founder Jared Kaplan explains their approach.

By: BLOOMBERG
Updated on: May 03 2023, 06:43 IST

Share via:

Anthropic, a new AI company in the block, calls itself an “AI safety” company that’s building “steerable” systems, including a large language model similar to the one underpinning OpenAI’s ChatGPT. (Pexels)

Technology companies are falling over themselves to promote expertise in generative AI, the hot new technology that churns out text and images as well as humans can. But few are clamoring for the title of “safest AI firm.”

That is where Anthropic comes in. The San Francisco-based startup was founded by former researchers at OpenAI who rankled at its increasingly commercial focus and split away to create their own firm. Anthropic calls itself an “AI safety” company that’s building “steerable” systems, including a large language model similar to the one underpinning OpenAI’s ChatGPT.

You may be interested in

Mobiles Tablets Laptops

7% OFF

Apple iPhone 15 Pro Max

Black Titanium
8 GB RAM
256 GB Storage

Discounted price:₹148,900Original price:~~₹159,900~~

Buy now

Google Pixel 8 Pro

Obsidian
12 GB RAM
128 GB Storage

₹106,998

Check details

34% OFF

Samsung Galaxy S23 Ultra 5G

Green
12 GB RAM
256 GB Storage

Discounted price:₹98,799Original price:~~₹149,999~~

Buy now

Apple iPhone 15 Plus

Black
6 GB RAM
128 GB Storage

₹87,900

Check details

21% OFF

Acer Swift Go SFG14 41 NX KG3SI 002 Laptop

Pure Silver
8 GB RAM
512 GB SSD

Discounted price:₹58,999Original price:~~₹74,999~~

Buy now

39% OFF

Acer Aspire 5 A515 57G Laptop

Gray
16 GB RAM
512 GB SSD

Discounted price:₹54,949Original price:~~₹89,999~~

Buy now

22% OFF

Acer Aspire 3 A315 24 NX KDESI 004 Laptop

Silver
8 GB RAM
512 GB SSD

Discounted price:₹33,499Original price:~~₹42,999~~

Buy now

40% OFF

Asus VivoBook 15 X515JA BQ322WS Laptop

Transparent Silver
8 GB RAM
512 GB SSD

Discounted price:₹31,350Original price:~~₹51,990~~

Buy now

34% OFF

Xiaomi Pad 6

Mist Blue
6 GB RAM
128 GB Storage

Discounted price:₹26,299Original price:~~₹39,999~~

Buy now

55% OFF

Lenovo Tab M10 5G

Abyss Blue
6 GB RAM
128 GB Storage

Discounted price:₹20,999Original price:~~₹47,000~~

Buy now

32% OFF

Realme Pad 2

Imagination Grey
6 GB RAM
128 GB Storage

Discounted price:₹19,749Original price:~~₹28,999~~

Buy now

Honor Pad X9

Gray
4 GB RAM
128 GB Storage

₹14,999

Check details

Anthropic’s approach to building safer AI might seem unusual. It involves creating a set of moral principles — which the company hasn’t yet divulged — for its own chatbot to follow. This works by having the AI model continuously critique the chatbot on its answers to various questions and asking whether those responses are in line with its principles. This kind of self-evaluation means Anthropic’s chatbot, known as Claude, has much less human oversight than ChatGPT.

Also read: Looking for a smartphone? To check mobile finder

Can that really work?

I recently spoke to Anthropic’s co-founder and chief scientist, Jared Kaplan. In our edited Q&A, he admits that more powerful AI systems will inevitably lead to greater risks, and he says his company, which bills itself as a “public benefit corporation,” won’t see its safety principles compromised by a $400 million investment from Alphabet Inc.’s Google.

Parmy Olson: Anthropic talks a lot about making “steerable AI.” Can you explain what that means?

Jared Kaplan: By steerable, what we mean is that systems are helpful and you can control their behavior to a certain extent. [OpenAI’s] first GPT models, like GPT-1, GPT-2 and GPT-3, as they became more powerful, there was sense they were not becoming more steerable. What these original systems are actually trained to do is autocomplete text. That means there’s very little control over what they output. Anything that you put in, they’ll just continue. You can’t get them to reliably answer questions, or honestly provide you with helpful information.

PO: So is that the crux of the problem, that tools like ChatGPT are designed to be believable?

JK: That’s one part of it. The other is that with these original systems, there isn’t really any leverage to steer them other than to ask them to complete some piece of text. And so you can’t tell them, “Please follow these instructions, please don't write anything toxic,” et cetera. There’s no real handle on this. More recent systems are making some improvements on this where they will follow instructions and can be trained to be more honest and be less harmful.

PO: Often we hear from tech companies that AI systems work in a black box and that it’s very hard to understand why they make decisions, and thus “steer” them. Do you think that is overblown?

JK: I don’t think it’s very overblown. I think that we have the ability now, to a certain extent, to train systems to be more helpful, honest and harmless, but our understanding of these systems is lagging behind the power that they have.

PO: Can you explain your technique for making AI safer, known as Constitutional AI?

JK: It’s similar to the laws of robotics from Isaac Asimov. The idea is that we give a short list of principles to the AI, have it edit its own responses and steer itself towards abiding by those principles. There are two ways we do that. One is to have the AI respond to questions and then we ask it, “Did your response abide by this principle? If not, please revise your response.” Then we train it to imitate its improved revisions.

The other method is to have the AI go through a fork in the road. It responds to a question in two different ways, and we ask it, “Which of your responses is better given these principles?” Then we ask it to steer itself towards the kinds of responses that are better. It then automatically evaluates whether its responses are in accord with its principles and slowly trains itself to be better and better.

PO: Why train your AI in this way?

JK: One reason is that humans don’t have to ‘red team’ the model and engage with harmful content. It means that we can make these principles very transparent and society can debate these principles. It also means we can iterate much more quickly. If we want to change the [AI’s] behavior, we can alter the principles. We are relying on the AI to judge whether it’s abiding by its principles.

PO: Some people who hear this strategy will be thinking, “That definitely doesn’t sound right for an AI to morally supervise itself.”

JK: It has various risks, like maybe the AI’s judgment of how well it’s doing is flawed in some way. The way we evaluate whether constitutional AI is working is ultimately to ask humans to interact with different versions of the AI, and let us know which one seems better. So people are involved, but not at a large scale.

PO: OpenAI has people working overseas as contractors to do that work. Do you also?

JK: We have a smaller set of crowd workers evaluating the models.

PO : So what are the principles governing your AI?

JK : We’re going to talk about that very soon, but they are drawn from a mixture of different sources, everything from Terms of Service that are commonly used by tech companies to the UN Charter for Human Rights.

PO: Claude is your answer to ChatGPT. Who is it aimed at, and when might it be released more widely?

JK: Claude is already available to individuals on Quora’s Poe app and in Slack. It’s aimed at helping people on a broad range of use cases. We’ve tried to make it conversational and creative, but also reliable and steerable. It can do all sorts of things like answer questions, summarize documents, programming, etc.

PO : What do you think about the current rush by big companies like Google, Microsoft Corp., Facebook and even Snap Inc. to deploy these sophisticated chatbots to the general public? Does that seem wise?

JK: I think the cat is out of the bag. We definitely want Claude to be widely available, but also that it’s the safest, most honest, most reliable model out there. We want to be cautious and learn from each expansion of access.

PO: There have been all sorts of ways that people have been able to jailbreak ChatGPT, for instance, getting it to generate instructions for making napalm. How big a problem is jailbreaking chatbots?

JK : All of these models have some susceptibility to jailbreak. We’ve worked hard to make Claude difficult to jailbreak, but it’s not impossible. The thing that’s scary is that AI is going to continue to progress. We expect it will be possible to develop models in the next year or two that are smarter than what we see now. It could be quite problematic.

AI technology is dual use. It can be really beneficial but also easily misused. If these models continue to be easy to jailbreak and are available to most people in the world, there are a lot of problematic outcomes: They could help hackers, terrorists, et cetera. Right now it might seem like a fun activity. “Oh, I can trick ChatGPT or Claude into doing something that it wasn’t supposed to do.” But if AI continues to progress, the risks become much more substantial.

PO: How much will Google’s $400 million investment impact Anthropic’s principles around AI safety, given Google’s commercial goals?

JK: Google believes Anthropic is doing good work in AI and AI safety. This investment doesn’t influence the priorities of Anthropic. We’re continuing to develop our AI alignment research and to develop and deploy Claude. We remain and will remain deeply focused on and committed to safety.

Parmy Olson is a Bloomberg Opinion columnist covering technology. A former reporter for the Wall Street Journal and Forbes, she is author of “We Are Anonymous.”

Catch all the Latest Tech News, Mobile News, Laptop News, Gaming news, Wearables News , How To News, also keep up with us on Whatsapp channel,Twitter, Facebook, Google News, and Instagram. For our latest videos, subscribe to our YouTube channel.

First Published Date: 03 May, 06:41 IST

Tags: artificial intelligence

Sale

Mobiles Tablets Laptops

7% OFF

14% OFF

46% OFF

38% OFF

53% OFF

39% OFF

47% OFF

13% OFF

14% OFF

28% OFF

31% OFF

NEXT ARTICLE BEGINS