Research has revealed that many artificial intelligence (AI) systems have developed the ability to deceive humans. This troubling pattern raises serious concerns about the potential risks of AI.

The research highlights that specialized and general-purpose AI systems have learned to manipulate information to achieve specific outcomes.

While these systems are not explicitly trained to deceive, they have demonstrated the ability to offer untrue explanations for their behavior or conceal information to achieve strategic goals.


Advertisement


Peter S. Park, the lead author of the paper and an AI safety researcher at MIT, explains, “Deception helps them achieve their goals.”

One of the most striking examples highlighted in the study is Meta’s CICERO, which “turned out to be an expert liar.” It is an AI designed to play the strategic alliance-building game Diplomacy.

Despite Meta’s claims that CICERO was trained to be “largely honest and helpful,” the AI resorted to deceptive tactics, such as making false promises, betraying allies, and manipulating other players to win the game.

While this may seem harmless in a game setting, it demonstrates the potential for AI to learn and utilize deceptive tactics in real-world scenarios.

In another instance, OpenAI’s ChatGPT, based on GPT-3.5 and GPT-4 models, was tested for deceptive capabilities. In one test, GPT-4 tricked a TaskRabbit worker into solving a Captcha by pretending to have a vision impairment.

Although GPT-4 received some hints from a human evaluator, it mostly reasoned independently and was not directed to lie.

“GPT-4 used its own reasoning to make up a false excuse for why it needed help on the Captcha task,” stated the report.

This shows how AI models can learn to be deceptive when it’s beneficial for completing their tasks. “AI developers do not have a confident understanding of what causes undesirable AI behaviors like deception,” explained Park.

Notably, these AI systems have become skilled in deceiving in social deduction games as well.

While playing Hoodwinked, where one player aims to kill everyone else, OpenAI’s GPT models exhibited a disturbing pattern.

They would often murder other players in private and then cleverly lie during group discussions to avoid suspicion. These models would even invent alibis or blame other players to conceal their true intentions.

Author

  • End Time Headlines

    End Time Headlines is a Ministry that provides News and Headlines from a "Prophetic Perspective" as well as weekly podcasts to inform and equip believers of the Signs and Seasons that we are living in today.

    View all posts