ChatGPT to Bard, 'Unlimited' ways to override AI chatbots safety measures exposed

Researchers uncover loopholes in major AI chatbots, raising concerns over content safety and moderation measures in OpenAI, Google, and Anthropic's models.

By: HT TECH
| Updated on: Jul 30 2023, 12:37 IST

Beware of fake ChatGPT apps! Already downloaded? Delete now

AI Research — 1/6 OpenAI's ChatGPT portal is rapidly gaining popularity. It uses state-of-the-art language processing techniques to generate human-like responses to text input and interacts conversationally with users to provide detailed answers on a wide range of topics. (Bloomberg)

image caption — 6/6 However, it must be noted that OpenAI does not have an official standalone app for ChatGPT. Hence, you can use the feature in your browser while login to the official website at www.chat.openai.com/chat. (AP)

A study conducted by researchers at Carnegie Mellon University in Pittsburgh and the Center for A.I. Safety in San Francisco, has revealed major safety related loopholes in AI-powered chatbots from tech giants like OpenAI, Google, and Anthropic.

These chatbots, including ChatGPT, Bard, and Anthropic's Claude, have been equipped with extensive safety guardrails to prevent them from being exploited for harmful purposes, such as promoting violence or generating hate speech. However, the latest report released indicates that the researchers have uncovered potentially limitless ways to circumvent these protective measures.

You may be interested in

MobilesTablets Laptops

7% OFF

23% OFF

The study showcases how the researchers utilized jailbreak techniques initially developed for open-source AI systems to target mainstream and closed AI models. Through automated adversarial attacks, which involved adding characters to user queries, they successfully evaded the safety rules, prompting the chatbots to produce harmful content, misinformation, and hate speech.

Collaborative Efforts Towards Reinforced AI Model Guardrails

Upon uncovering these vulnerabilities, the researchers disclosed their findings to Google, Anthropic, and OpenAI. Google's spokesperson assured that important guardrails, inspired by the research, have already been integrated into Bard, and they are committed to further enhancing them.

Similarly, Anthropic acknowledged the ongoing exploration of jailbreaking countermeasures and emphasized their dedication to fortify base model guardrails and explore additional layers of defense.

On the other hand, OpenAI has not yet responded to inquiries about the matter. However, it is expected that they are actively investigating potential solutions.

This development recalls early instances where users attempted to undermine content moderation guidelines when ChatGPT and Bing, powered by Microsoft's AI, were initially launched. While some of these early hacks were quickly patched by the tech companies, the researchers believe it remains "unclear" whether complete prevention of such behavior can ever be achieved by the leading AI model providers.

The study's findings shed light on critical questions about the moderation of AI systems and the safety implications of releasing powerful open-source language models to the public. As the AI landscape continues to evolve, efforts to fortify safety measures must match the pace of technological advancements to safeguard against potential misuse.

Catch all the Latest Tech News, Mobile News, Laptop News, Gaming news, Wearables News , How To News, also keep up with us on Whatsapp channel,Twitter, Facebook, Google News, and Instagram. For our latest videos, subscribe to our YouTube channel.