Does the Turing test no longer work?
The Turing test was developed by scientist Alan Turing and involves an experiment where a participant interacts simultaneously with a computer and a live person. Based on the responses received to their questions, the participant must determine with whom they are conversing: a human or a machine. If the individual couldn't distinguish between them, it was considered that the machine successfully "passed" the test.
However, this test, once considered innovative, now has its limitations. It primarily focuses on mimicking human reactions rather than genuine human reasoning. Many artificial intelligence models excel at imitating conversational styles but often lack deep mental abilities. This doesn't require AI to possess self-awareness or understand its own reasoning. Even Turing himself acknowledged that this test cannot truly ascertain whether machines can think; it's more about imitation than cognition.
Previously, we have explored the issue of GPT-4 passing the Turing test and the results obtained from such an experiment. You can read the article here.
To address the aforementioned limitations of the Turing test, Philip N. Johnson-Laird from Princeton University and Marco Ragni from Chemnitz University of Technology have developed an alternative to the well-known test. They propose shifting the focus from whether a machine can mimic human reactions to a more fundamental question: "Does AI reason in the same way as humans?"
Their published paper outlines a new evaluation system, the goal of which is to determine whether AI genuinely reasons like a human. This framework consists of three crucial steps.
1. Test the program in a series of psychological reasoning experiments.
The first step involves conducting a series of psychological experiments among AI models intended to distinguish human thinking from standard logical processes. These experiments delve into various aspects of reasoning, exploring nuances that deviate from standard logical frameworks.
If the machine's judgments differ from human judgments, we've answered the previous question. The computer reasons differently from humans. However, if its judgments significantly align with human reasoning, we move to the second step.
2. Test the program's understanding of its own reasoning process.
This step aims to evaluate the AI's understanding of its own reasoning processes, a critical aspect of human cognition. Ideally, the machine should be capable of analyzing its reasoning and providing explanations for its decisions, resembling self-analysis akin to human behavior.
If the program passes this test, the third step is analytical.
3. Examine the program's source code.
The final step involves studying the program's source code. If it contains the same fundamental components known to model human activity, including an intuitive system for rapid deductions, a deliberative system for more thoughtful reasoning, and a system to interpret terms based on context and common knowledge, this evidence is crucial. If the program's source code reflects these principles, it is considered to reason like a human.
By considering AI as a participant in cognitive experiments, this innovative approach signifies a paradigm shift in evaluating artificial intelligence. By subjecting computer code to analysis, scientists propose a reassessment of AI evaluation standards. As the world continues to strive for more sophisticated artificial intelligence, this new concept could be a significant step forward in our understanding of how machines think.