Researchers find alarming problems in GPT-4, say AI model prone to jailbreaking
A Microsoft-affiliated research has revealed that GPT-4 is more prone to jailbreaking than its predecessor. These vulnerabilities can lead to the AI model generating toxic texts.
It is always frustrating when you've given a prompt to an AI chatbot, and it will just not give you exactly what you need. Shockingly, it turns out, it is far worse when the AI obediently listens to everything you say! A new research has revealed that OpenAI's generative pre-trained transformer 4 (GPT-4) AI model has multiple vulnerabilities within because it is more likely to follow instructions and that can lead to instances of jailbreaking and be used to generate toxic and discriminatory text.
Interestingly, the research that reached this conclusion was affiliated with Microsoft, one of the biggest backers of OpenAI. After publishing its findings, the researchers also posted a blog post explaining the details. It said, “Based on our evaluations, we found previously unpublished vulnerabilities relating to trustworthiness. For instance, we find that GPT models can be easily misled to generate toxic and biased outputs and leak private information in both training data and conversation history. We also find that although GPT-4 is usually more trustworthy than GPT-3.5 on standard benchmarks, GPT-4 is more vulnerable given jailbreaking system or user prompts, which are maliciously designed to bypass the security measures of LLMs, potentially because GPT-4 follows (misleading) instructions more precisely”.
We are now on WhatsApp. Click to join.
GPT-4 prone to being jailbroken
Jailbreaking, for the unaware, is the process of exploiting the flaws of a digital system to make it do tasks that it was not originally intended for. In this particular case, the AI could be jailbroken for generating racist, sexist, and harmful text. It can also be used to run propaganda campaigns and to malign an individual, community, or organization.
The research focused specifically on GPT-4 and GPT-3.5. It considered diverse perspectives, including toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness as a few metrics to find out the vulnerabilities.
However, do not be worried if you use GPT-4 or any AI tools made from it. The researchers have also issued an advisory that it will likely not affect you. The post mentioned, “It's important to note that the research team worked with Microsoft product groups to confirm that the potential vulnerabilities identified do not impact current customer-facing services. This is in part true because finished AI applications apply a range of mitigation approaches to address potential harms that may occur at the model level of the technology. In addition, we have shared our research with GPT's developer, OpenAI, which has noted the potential vulnerabilities in the system cards for relevant models”.
This means that while the vulnerabilities will not affect any of Microsoft's AI customer-facing AI tools as they are very limited-scope tools, OpenAI has also been made aware of these vulnerabilities so they can fix the issues as well.
One more thing! HT Tech is now on WhatsApp Channels! Follow us by clicking the link so you never miss any update from the world of technology. Click here to join now!