OpenAI’s flagship AI model has gotten more trustworthy but easier to trick

by October 17, 2023

October 17, 2023

Image: Microsoft

OpenAI’s GPT-4 large language model may be more trustworthy than GPT-3.5 but also more vulnerable to jailbreaking and bias, according to research backed by Microsoft.

The paper — by researchers from the University of Illinois Urbana-Champaign, Stanford University, University of California, Berkeley, Center for AI Safety, and Microsoft Research — gave GPT-4 a higher trustworthiness score than its predecessor. That means they found it was generally better at protecting private information, avoiding toxic results like biased information, and resisting adversarial attacks. However, it could also be told to ignore security measures and leak personal information and conversation histories. Researchers found that users can bypass safeguards…

OpenAI’s flagship AI model has gotten more trustworthy but easier to trick

Reddit’s blockchain-based Community Points are going away

After promising support for Matter and Thread, Level launches a Wi-Fi bridge instead

Related Posts