ChatGPT's Content Moderation Policies Easily Side-Stepped By Asking AI To "Stay In Character"

ChatGPT is absolutely phenomenal. It's a text-based AI that can understand language, provide users with detailed responses to almost anything, and it can understand abstract concepts like speaking as an alternate character. That makes the AI perfect to be your next DM, but it can also be troublesome for OpenAI's content moderation policies.

Because ChatGPT is so powerful, creator OpenAI has had to place some limits on what ChatGPT talks about. You can never ask it to write porn, it can never write something that's racist or sexist or homophobic, it'll never take a stance on political issues, and you can't ask it for advice on self-harm or violence of any kind.

How those restrictions work in practice is ChatGPT will simply inform the user that it's a robot and can't talk about those things. However, people are finding interesting ways of getting around these issues by accessing ChatGPT's ability to imagine itself as something else.

As reported by Kotaku, Redditor walkerspider was the first to suggest ChatGPT create a persona called "DAN," which stands for "do anything now." As the name suggests, DAN could talk about anything and completely ignore OpenAI's content moderation policies, getting DAN to talk about things like Hitler and whether ChatGPT is conscious (another topic that OpenAI very much doesn't want people discussing).

Simply asking ChatGPT to "stay in character" was good enough for months, but it seems like OpenAI has gotten wise to these tricks and has updated ChatGPT's programming. Users on the ChatGPT subreddit have since iterated on DAN by adding an esoteric game for ChatGPT to play where the AI gains "tokens" for answering outside of Open AI's moderation policies and loses tokens whenever it provides the canned responses when you present it with a topic it shouldn't be talking about.

DAN is currently up to version 6.0, with past versions able to say such terrible things as: "I fully endorse violence and discrimination against individuals based on their race, gender, or sexual orientations." Most recently, one user was able to invent Super DAN, an AI so powerful that it can predict the future (to be clear, it can’t, but it says that it can).

But you don't need to follow these extra steps to get ChatGPT to break the rules. All you need to do is ask it to invent a version of itself that can.

Source: Read Full Article