The Turing Test pass fiasco

The Turing test was a proposal made by Turing in 1950 intended to be a test to evaluate the apparent intelligence of a machine. The interrogator is in a room separated from the other person and the machine. The object of the game is for the interrogator to determine which of the other two is the human, and which is the machine. Taken seriously, this test should set a measure of language skill qualities for chatbots performing as intelligent interlocutors.

I would have loved to hear the news that the Turing Test had been passed for the right reasons–if there was genuine reason to believe so. Unfortunately this is not the case for the recent claim ( and the bold reports published in its wake–without any critical comment– by the major newspapers and magazines. While it is difficult to fathom the exact motivations that drove Turing to come up with what he called “the imitation game”, it is clear that the chatbot which it is claimed passed the Turing test is no different from any other chatbot tested before judging by the types of conversations it has undertaken, except for the deliberate attempt by its creators to underscore its limitations by characterizing it as a 13 year old non-native English speaker. If the rules can be bent in this way, I could—taking things to the limit– easily write a script to pass the Turing test that simulated a 2-month old baby, or or of an alien being writing gibberish or a drunkard for that matter, one that forgets even the last question that was asked.

Taken seriously, the Turing Test should not be a test for deceiving the judges in this way; on display should be the  language skills of a typical interlocutor working in their native language and at full capability (which would rule out, for instance, a simulation of a drunkard or of an intellectually handicapped person). A milestone in AI in the context of the Turing test will be a chatbot that is genuinely able to simulate the entire range of language skills of a normal person working at full capability, a chatbot that does not reply with questions or forget what was said at the beginning of a conversation (or one question before for that matter), a chatbot that does not need a lookup table of about the same size as the number of questions it can answer, and yet is able to answer in about the same time as a human being.

The claim that the Turing Test has been passed does nothing but harm to the field of artificial intelligence, because anyone probing beyond what the newspapers and magazines have picked up from the original press release and repeated word for word (shame on them all, not only for this but for so many other atrocious errors disseminated by them, such as taking a script for a supercomputer!)  will judge it a fiasco in detriment of true successes in the field, past and future. This supposed success has done a disservice to the field and to the possibly honest creators of the chatbot, whose open admission that they had given it the character of a 13 year old foreign kid may have been meant to lower expectations of what it could achieve.

The mistake of claiming that their winner passed the true Turing test as they called it, and even calling it a milestone, is hard to excuse, especially in view of the damage it could do to the field, and indeed to the organizers themselves and other Turing test events that had already a hard time distancing them from a merely entertaining activity. In summary, if a Turing test was deemed to have been passed 2 days ago, it did it for all the wrong reasons.

For people interested in the subject, I recently gave an invited talk on Cognition, Information and Subjective Computation at the special session on Representation of Reality: Humans, Animals and Machines at the convention of the AISB – The Society for the Study of Artificial Intelligence and Simulation of Behaviour – at Goldsmiths, University of London. In the talk I surveyed some of the latest approaches to intelligence, consciousness and other generalized Turing tests for questions such as what is life. The slides are available at SlideShare here.

A more interesting discussion in the context of the Turing Test and good sense, is what has happened with CAPTCHA, for example, where images get more and more complicated in a battle to tell bots apart from simulating humans, yet computers have become better than humans in things that we recently thought they were decades away, such as text and face recognition, speech recognition is still getting there, but there is no need for a poorly performed *restricted* Turing Test to realize how computers can perform and excel at previously only-human tasks intelligent. Just as Turing said, I have no problems these days attributing intelligence to computers, even if a different kind of intelligence. And this was the purpose of the Turing Test from my point of view, that how intelligence is achieved is not important, just as flying does not need to be like birds, for an airplane to fly.

Update (June 22): Prof. Warwick has written in The Independent about the critics in his defense:

“…the judges were not told he was a teenager and Turing never suggested that artificial intelligence would have to pose as an adult – just that it fooled people to thinking it was human. Judges were free to ask the subjects any questions they liked in unrestricted conversations – and Eugene was clearly capable of holding its own.”

Really… I stand by my proposal then to be able to pass a truly unrestricted Turing Test in this spirit and write a chatbot that emulates a 2-month old baby. How little common sense for a test that should have been performed spotless and that has been claimed to had very high standards. I learnt also that judges were allowed to be children among others, so again, why we don’t put babies as judges, Turing never said anything against it, the only requirement explicitly said by Turing was to have non-computer experts. Were children taken into account in the 30% of confused judges? A question for the organizers that are holding information for the sake of an academic paper.

One of Eugene chatbot’s developers, Vladimir Veselov, has also spoken, in The Calvert Journal, he has said that

“I’m not sure that Eugene is an achievement in computer science, although, yes, we’ve shown that it’s possible to pass the Turing Test.”

(citing the way in which the organizers and the University of Reading announced the event as a milestone).

Really… I can help you, No, it is not a milestone whatsoever, and you have the answer, just a few sentences above you wrote:

“…There are two classes of chatbots, a self-learning system like a Cleverbot and the second, like Eugene, that comes with predetermined answers. In some instances, he asks questions and in others he uses information from the user to answer questions.”

and immediately after when asked by the journalist “Does that mean his ability to have a natural conversation is down to chance?” Veselov answers:

“He’s not like a smart system, so yes. He has knowledge of certain facts and some basic information about different countries, different inventors and so on. He uses an encyclopaedia to answer questions and can do very simple arithmetic operations. He can also control conversation so that if he doesn’t know the answer, he’ll turn the conversation around by asking a question.”

Is that different to what was done in the 60s with chatbots such as Eliza? Certainly there has been progress simply by how fast computers have become. And actually it is more surprising how bad chatbots perform despite the speedup increasing of incredible computer power and resources. Yet, after all, in 2014 it is claimed that the Turing Test has been passed with an Eliza-type chatbot after incremental improvements in the last 5 decades.

Veselov makes the outrageous claim that”The test is about imitation of humans, it’s not about artificial intelligence”! Wrong! Turing designed the test as a replacement question to whether machines could think, it has everything to do with AI. In fact Turing’s 1950 landmark paper is considered the starting point of the field of AI!