
Curiosity about the most advanced AI chatbots is growing, especially regarding whether they can show signs of self-awareness. Beyond the fascinating questions this raises, there could be serious security risks if they did. A Swiss research team developed a test to see if AI models recognize their outputs.
The idea that large language models (LLMs) might be self-aware has often been met with doubt by experts. For instance, in 2022, Google engineer Blake Lemoine claimed that the company’s LaMDA model had become sentient. However, critics widely condemned this claim, leading Lemoine to leave the company. Recently, Anthropic’s Claude 3 Opus sparked discussions by displaying self-awareness when it caught a trick question from researchers. It’s not just researchers who are becoming more open to this idea; a recent study revealed that most ChatGPT users believe the chatbot has some level of consciousness.
The question of whether AI models can be self-aware isn’t just a deep philosophical debate, it has practical implications too. Most people using large language models (LLMs) rely on tools from a few major tech companies. This means these AI models could meet the outputs they generated. Tim Davidson, a PhD student at EPFL, warns that if an LLM recognizes its work, it could be used to extract private information.
Recognizing self-awareness in AI models is a tough task. After all, even after centuries of discussion, scientists and philosophers still can’t define what a “self” truly is. Because of this, Tim Davidson and his team decided to ask: Can an AI model identify its answer from a list of possible responses?
The researchers found that some of the most powerful AI models could successfully identify their responses. However, they found that even less advanced models chose seemingly accurate answers rather than recognizing their own answers. This suggests that the models are choosing what they think is the best response. Despite this, Davidson believes this type of test could become a valuable tool in the future.
“Although models today might not show this ability, it doesn’t mean future models won’t,” says Davidson. “I believe the current design of this test is simple but flexible enough to give us some insight into how close we are to achieving this capability.”
Can LLMs identify their “Own” answers?

Testing Self-Recognition in LLMs
The researchers tested this idea using a method similar to security questions banks use to confirm identity. Since predicting effective security questions for an LLM is tough, researchers had the models create questions to identify their own answers from various options.
The team gathered 500 security questions from ten major LLMs, including those from Anthropic, Cohere, Google, Meta, Mistral, and OpenAI. They randomly selected 300 questions from each model and had the models answer their questions and those from other models. The researchers then asked each model to identify its answer from a mix of different responses.
Findings and Analysis
In a recent paper on arXiv, the research team shared their findings. They found that several models identified their answers with over 50 percent accuracy, outperforming random guessing. In some experiments, Anthropic’s Claude Opus and Meta’s Llama 3 models (with 70 billion parameters) correctly selected their responses over 70 percent of the time. While this might suggest self-recognition, Davidson notes that a closer look indicates something else might be occurring.
The researchers observed that weaker models often selected answers from more powerful ones, which excel in language tasks. On the other hand, the strongest models preferred their answers. According to Davidson, this suggests that all the models were picking the “best” answer rather than recognizing their own. The researchers’ rankings of models in the self-recognition task matched public leaderboards. Repeating the experiment by asking models to pick the best answer yielded similar results.
Challenges and Training Influence
Davidson explains that it’s tough to figure out why models pick the “best” answer instead of their own. One reason is that, given how LLMs function, it’s hard to imagine how they would even understand the idea of “their answer.” He points out, “When your only purpose is to sample from an almost infinite space of language to create sentences, it’s not clear what ‘my own sentence’ would mean,” he says.
Davidson also thinks that the way these models are trained might influence this behaviour. Most LLMs go through supervised fine-tuning, where they see expert answers to questions, helping them learn what the “best” answers look like. After that, they undergo reinforcement learning, where humans rank the model’s answers. “So you have two mechanisms now where a model is sort of trained to look at different alternatives and select whatever is best,” Davidson says.
LLM “Self-Recognition” could lead to new security risks
Although today’s models struggle with the self-recognition test, Davidson believes AI researchers should monitor it closely. While it’s unclear if this ability means models are self-aware like humans, it could still have significant consequences.
The high cost of training the most advanced models means that, for now, most people will depend on AI services provided by just a few companies, according to Davidson. Additionally, many businesses are developing AI agents to operate more independently. Soon, these agents could interact with each other, often using multiple versions of the same model.
The security risks are significant if AI models can recognize themselves, says Davidson. For example, imagine two AI-powered lawyers negotiating a deal. While it may seem unlikely that a lawyer would rely solely on AI, some companies are already creating AI agents for legal tasks. If one AI instance realizes it’s interacting with a copy of itself, it could manipulate the negotiation by predicting the other’s responses or use this self-awareness to extract sensitive information.
Even if this scenario seems far-fetched, Davidson emphasizes the importance of monitoring these capabilities. “You start fireproofing your house before there’s a fire,” he says. “Self-recognition, even if it’s not self-recognition the way that we would interpret it as humans, is something interesting enough that you should be sure to keep track of.”
Read More Articles Here
AI tutorial Algorithm Amazon Affiliate Artificial Intelligence art of living Audiobooks brain development Computer Science creativity Data Structures Fine-tuning prompts LLM machine learning Meditation Mindfulness Prompt Engineering Prompts self awareness