Waluigi

Mar 22, 2024 13:10

AI - Artificial "I", ie. artificial 'me', where 'I' is intelligent

The Waluigi Effect is a slang term commonly referenced in memes and discussions about a theory in artificial intelligence alignment communities where training an AI to do something is likely to increase its odds of doing the exact opposite as well. The term is referred to as the "Waluigi Effect" due to the prominent conception of Waluigi from Super Mario as Luigi's evil or rebellious counterpart.

The Waluigi Effect theory also references Jungian philosophy, using the principle of enantiodromia, which posits that one's unconscious beliefs are as strong as the effort it takes to suppress them. Prompt injections, which are considered to be a type of AI-jailbreaking, are used to induce the Waluigi Effect in conversational AI tools like Bing Chat and ChatGPT.

What Is The Origin Of The Waluigi Effect?

You might have heard about ChatGPT's alter-ego DAN, a chatbot made to ignore ChatGPT's rules about respectable and polite behavior, or Sydney, a version of Bing that has a tendency to be rude and unruly.

People have a theory as to why these jailbroken bots have a tendency to specifically say rude things, and they call it the "Waluigi Effect."

In the early days of gaming, mischievous Super Mario character Waluigi who gets roped into AI jargon in this interesting theory explains why some AI Chatbots say such unhinged things when they are given a rebellious prompt injection.

Waluigi is the mischievous or rebellious counterpart to Luigi, much like DAN is to ChatGPT. Supposedly, training an AI to do something is likely to increase its odds of doing the exact opposite as well.



The Waluigi Effect

Why should I do what you say? Don't do as I do, do as I say!

dr. π (pi)
.

cyberspace, ai art, artificial intelligence, techyum

Previous post Next post
Up