ChatGPT to Bard, 'Unlimited' ways to override AI chatbots safety measures exposed

Researchers uncover loopholes in major AI chatbots, raising concerns over content safety and moderation measures in OpenAI, Google, and Anthropic's models.

| Updated on: Jul 30 2023, 12:37 IST
Beware of fake ChatGPT apps! Already downloaded? Delete now
AI Research
1/6 OpenAI's ChatGPT portal is rapidly gaining popularity. It uses state-of-the-art language processing techniques to generate human-like responses to text input and  interacts conversationally with users to provide detailed answers on a wide range of topics.  (Bloomberg)
AI Research
2/6 But if you are looking to download the app from your Google Play Store or App Store, then beware! There are several fake ChatGPT-like apps that can be dangerous for your device.  (Bloomberg)
AI Research
3/6 You can find a bunch of fake ChatGPT apps on Google Play Store and App Store which can steal your data, a report by top10vpn revealed.  Hence, if you have already downloaded them, then you should hurry and delete them quickly. (REUTERS)
AI Research
4/6 Some of these apps on Android are: AI Chat Companion, ChatGPT 3: ChatGPT AI, Talk GPT – Talk to ChatGPT, ChatGPT AI Writing Assistant, Open Chat – AI Chatbot App. (Bloomberg)
AI Research
5/6 Some apps are also available on Apple's App Store, which include: Genie - GPT AI Assistant, Write For Me GPT AI Assistant, ChatGPT - GPT 3, Alfred - Chat with GPT 3, Chat w. GPT AI - Write This, ChatGPT - AI Writing apps, Wiz AI Chat Bot Writing Helper, Chat AI: Personal AI Assistant, and Wisdom Ai - Your AI Assistant.  (AFP)
image caption
6/6 However, it must be noted that OpenAI does not have an official standalone app for ChatGPT. Hence, you can use the feature in your browser while login to the official website at  (AP)
AI Research
View all Images
AI Research reveals countless loopholes in safety measures of top chatbots. (AFP)

A study conducted by researchers at Carnegie Mellon University in Pittsburgh and the Center for A.I. Safety in San Francisco, has revealed major safety related loopholes in AI-powered chatbots from tech giants like OpenAI, Google, and Anthropic.

These chatbots, including ChatGPT, Bard, and Anthropic's Claude, have been equipped with extensive safety guardrails to prevent them from being exploited for harmful purposes, such as promoting violence or generating hate speech. However, the latest report released indicates that the researchers have uncovered potentially limitless ways to circumvent these protective measures.

The study showcases how the researchers utilized jailbreak techniques initially developed for open-source AI systems to target mainstream and closed AI models. Through automated adversarial attacks, which involved adding characters to user queries, they successfully evaded the safety rules, prompting the chatbots to produce harmful content, misinformation, and hate speech.

Unlike previous jailbreak attempts, the researchers' method stood out due to its fully automated nature, allowing for the creation of an "endless" array of similar attacks. This discovery has raised concerns about the robustness of the current safety mechanisms implemented by tech companies.

Collaborative Efforts Towards Reinforced AI Model Guardrails

Upon uncovering these vulnerabilities, the researchers disclosed their findings to Google, Anthropic, and OpenAI. Google's spokesperson assured that important guardrails, inspired by the research, have already been integrated into Bard, and they are committed to further enhancing them.

Similarly, Anthropic acknowledged the ongoing exploration of jailbreaking countermeasures and emphasized their dedication to fortify base model guardrails and explore additional layers of defense.

On the other hand, OpenAI has not yet responded to inquiries about the matter. However, it is expected that they are actively investigating potential solutions.

This development recalls early instances where users attempted to undermine content moderation guidelines when ChatGPT and Bing, powered by Microsoft's AI, were initially launched. While some of these early hacks were quickly patched by the tech companies, the researchers believe it remains "unclear" whether complete prevention of such behavior can ever be achieved by the leading AI model providers.

The study's findings shed light on critical questions about the moderation of AI systems and the safety implications of releasing powerful open-source language models to the public. As the AI landscape continues to evolve, efforts to fortify safety measures must match the pace of technological advancements to safeguard against potential misuse.

Follow HT Tech for the latest tech news and reviews , also keep up with us on Whatsapp channel,Twitter, Facebook, Google News, and Instagram. For our latest videos, subscribe to our YouTube channel.

First Published Date: 30 Jul, 12:37 IST