An alternative method (to compression) for approximating the algorithmic complexity of strings

July 20th, 2011

The method introduced in my doctoral dissertation was featured in the French version of Scientific American Pour La Science in its July 2011 issue No. 405 under the title Le défi des faibles complexités.

Jean-Paul Delahaye points out that:

Comme les très petites durées ou longueurs, les faibles complexités sont délicates à évaluer. Paradoxalement, les méthodes d’évaluation demandent des calculs colossaux.
(Like long durations or very short lengths, weak complexities are tricky to evaluate. Paradoxically, the evaluation methods require colossal calculations.)

and he continues:

Pour les petites séquences, cette mesure est stable et conforme à notre idée de la complexité; pour les grandes, elle est, d’après le théorème mentionné [the invariance theorem, my comment], conforme à la meilleure mesure de complexité unanimement admise, la complexité de Kolmogorov. Que demander de plus?
(For short strings, this measure is stable and conforms to our idea of complexity; for long strings, according to the aforementioned theorem [the invariance theorem– my comment], it conforms to the best and universally accepted measure of complexity, Kolmogorov complexity. What more can one ask for?)

Imagine you are given 2 short strings and are asked to tell which one looks more random and which more structured. The strings are 0101 and 1011. Applying the idea of algorithmic complexity, the shorter the description the less random-looking. It would seem that the first string has a repeating pattern that can be taken advantage of in describing it. In plain English one may therefore say that the first string can be described as “zero and one twice” while the second one would require a longer description. In fact, there seem to be fewer descriptions for the first string than for the second (and also notice some ambiguity in plain English). The first may also allow the description “A zero followed by a one followed by a zero followed by a one” or “zero and one followed by the same” and perhaps other variations. Descriptions of the second one can include “one and zero followed by two ones” or “one zero one one,” just as the first one could simply be “zero one zero one,” which doesn’t look like a compressed version of the string but rather a straight translation into an expanded form of plain English.

All this by way of asking whether any of the two strings is without a doubt simpler than the other, or whether the apparent repetition in the first string makes us think that the string has a pattern despite the pattern being repeated only twice. Perhaps when one looks at such a string one gets the impression that it may belong to a larger string comprising alternations of 01 and one concludes that it is simpler than the second one. To leave the subjective realm, one would need to evaluate the algorithmic complexity of the strings and compare their respective values. The algorithmic complexity of a string is the shortest program (measured in bits) producing th string in question running on a universal Turing machine. It is inconvenient that there is no algorithm that, given a string, gives you the length of the shortest program that produces it. This is by reduction to the halting problem. Which means one cannot really measure with absolute certainty the algorithmic complexity of a string because it is uncomputable. It doesn’t mean, however, that one cannot approach it; one can often do so in an effective and useful way.

Traditionally, the way to approach the algorithmic complexity of a string was by using lossless compression algorithms. Lossless means that one can recover the original string from the compressed version. The use of lossless algorithms is preferable because there may be inconsistent ways of compressing strings that give the impression of compressing a string without corresponding to a measure of complexity (e.g., noise deletion is an effective way, but noise may be algorithmically random and the aim of algorithmic complexity is to precisely measure how random something is and not merely to gauge whether it looks random). The result of a compression algorithm is an upper bound of its algorithmic complexity. While one cannot ever tell when a string is not compressible, if one succeeds at somehow shortening a string one can tell that its algorithmic complexity cannot be larger than the compressed length.

One does not want to cheat and claim that one can compress any string into a bit if the decompression algorithm interprets that bit into the desired string. A fair compression algorithm can be defined as one that transforms a string into two pieces: one is the compressed version and the other the instructions to decompress the string, together accounting for the final length of the compressed version. In other words, it would seem as if you were adding the decompression algorithm to the compressed string so that the compressed string comes with its own decompression instructions. In the long run, there is a theorem (invariance) that guarantees that complexity values converge.

But for short strings (which are often the ones useful for practical applications), adding the decompression instructions to the compressed version makes the compressed string often, if not always, longer than the string itself. If the string is, for example, shorter than the size of the decompression algorithm, there will be no way to compress the string into something shorter still, simply because the decompression instructions are at least of the length of the original string. Moreover, the result is so dependent on the size of the decompression algorithm (because it is the greatest contributor to the overall length) that the final value of the compressed length is too unstable under different lossless compression/decompression algorithms.

For example, if one tries to compress a short string using Mathematica, one gets the following results:
StringLength@Compress[“0101”] = 30

Looking at the beginning of the compression line when plotting the lengths of strings (x axis) against their compressed lengths (y axis) one observes that it does not start at y=0 even when x=0.

This means that compressing the string 0101 with 4 bits took 46 characters (even more in bits). In Mathematica, strings begin to be compressed this way at about length 30. This is not a malfunction of Mathematica; it is the result of what I have explained. The Mathematica Compress function is actually based on the Deflate lossless compression algorithm, which is a combination of the LZ77 algorithm and Huffman coding, among the most popular lossless compression algorithms available, used in formats like ZIP, GZIP, GIF and PNG (these last two are therefore lossless compression image formats).

Zooming into the axis origin of the plot one can see that the beginning looks unstable. The string in question is a repetition of n 1s with n lying on the x axis (which represents different compression lengths rather than individual ones to avoid repetition and a step-like curve)

If one tries to compress 1011 one gets none other than the same value:
StringLength@Compress[“1011”] = 30

The instructions obviously take up some space in the final compressed length and they cannot be compressed themselves (if they were, they would in any case be the same for all strings, taking us back to the same situation). There is a limit for compression algorithms to compress short strings. So if one wished to tell which of these two strings were objectively more or less randomly complex by approximating their algorithmic complexity using a compression algorithm, it turns out that there is no way to do so.

On the other hand, given that the definition of algorithmic complexity based on compressibility says that the less compressible the more randomly complex a string, one could immediately say that a single bit, 0 or 1, is random for certain, i.e. has maximal algorithmic complexity, given that there is no way to further compress a single bit. In other words, there is no program shorter than 1 bit that can produce 0 or 1. The shortest descriptions of 0 and 1 are therefore 0 and 1 themselves. Hence 0 and 1, according to the compressibility approach, are random strings. It may seem to make no sense to characterize a single bit as random.

On the one hand, a single bit does not carry any information and on these grounds one may think of it as somehow random. If one thinks about whether one would have been able to predict 0 or 1 as the outcome of a process, given that there is no context because they occur alone, one may also conclude that their occurrence is somehow random. In other words, if you see a string like 010101 you may bet that the next bit is 0, but if you are provided with nothing there is no way you can favor any position on whether the bit to come is 0 or 1. So much for justifying that a single bit is random.

It is hard, however, to justify how 0 could look more random than, say, any other possible string. If 0 is random how is it relatively more complex than 00? Or 01? Intuition tells us that short strings shouldn’t be that random (more random than, for example, longer random-looking strings), so if a single bit is the most random among all finite strings, how could it be that there is such a phase transition from maximal random complexity to very low complexity of, say, strings of length 2, 3 or 5 bits long?

Since intuition tells us that something random should also be uncommon and rare, what if one asks how common 0 or 1 are as results of a program? There is a measure that gives the probability of a program’s producing a given string running on a universal Turing machine. This is a measure we used to present a new method to evaluate the complexity of strings, as an alternative to the traditional use of compression algorithms. The new method aims particularly to solve the problem of the evaluation of the complexity of short strings, as we’ve discussed. It is based on Levin-Solomonoff’s algorithmic probability and is connected back to algorithmic (Kolmogorov-Chaitin) complexity by way of Chaitin-Levin’s coding theorem.

Algorithmic probability says that it is not the case that a single bit is the most complex random string, but actually the most structured possible one and, more importantly , that the complexity transition is smooth, more in accordance with intuition.

It may be that it makes sense that a single bit can be regarded as both the most simple and the most complex of strings from different perspectives, and the advantage of the algorithmic probability approach is that it provides not only a different notion of the complexity of a single bit, one that is in keeping with intuition, but also that it generates a different outcome to the compressibility approach, even when the two measures are intimately related and asymptomatically produce the same results in the long term (for longer strings). I think the two views reflect different aspects of what a single bit represents.

The paper presenting the novel method for evaluating the algorithmic complexity of short strings was first proposed and sketched in Greg Chaitin’s 60th anniversary festschrift edited by Cris Calude (J-P. Delahaye & H. Zenil, “On the Kolmogorov-Chaitin complexity for short sequences,” Randomness and Complexity: From Leibniz to Chaitin, edited by C.S. Calude, World Scientific, 2007). The method uses an exhaustive and systematic search of Turing machines inspired by Wolfram’s NKS dictum, from which a frequency distribution of the halting machines is built and the Levin-Chaitin coding theorem applied to evaluate the algorithmic complexity of binary strings.

Chaitin pointed out (regarding our method) that:

…the dreaded theoretical hole in the foundations of algorithmic complexity turns out, in practice, not to be as serious as was previously assumed.

The full technical article is available in ArXiv Numerical Evaluation of Algorithmic Complexity for Short Strings: A Glance into the Innermost Structure of Randomness.

You can also look at the slides of the presentation I delivered at the Alan Turing amphitheater at the Computer Science Department of the University of Lille 1:

“The World is Either Algorithmic or Mostly Random” awarded a 3rd Place Prize in this year’s FQXi contest

June 10th, 2011

Based on the combined ratings of the contest community and the panel of expert reviewers appointed by the FXQi, which included the members of the institute, I was awarded a 3rd Place Prize for my work The World is Either Algorithmic or Mostly Random in this year’s FQXi contest on the topic Is Reality Digital or Analog? sponsored by the Foundational Questions Institute. The winners were announced at this year’s World Science Festival in New York City.

My work can be summarized in one line as an explanation of the complexification process of the world, the process whereby we have evolved from a more primitive (random) state to the current organized state of the universe.

The essay is a summary of the philosophical branch of my current scientific research on finite algorithmic information theory. This philosophical branch is concerned with the exploration of the possible connections between algorithmic complexity and the physical world (or what happens in it). I propose the notion that the universe is likely digital, not as a claim about what the universe is ultimately made of but rather about the way it unfolds. Central to the argument are concepts of symmetry breaking and algorithmic probability, which are used as tools to compare the way patterns are distributed in our world to the way patterns are distributed in a simulated digital one. These concepts provide a framework for a discussion of the informational nature of reality. I argue that if the universe were analog, then the world would likely look more random, making it largely incomprehensible. The digital model has, however, an inherent beauty in its imposition of an upper limit and in the convergence in computational power to a maximal level of sophistication. Even if deterministic, that the world is digital doesn’t necessarily mean that the world is trivial or predictable, but rather that it is built up from operations that at the lowest scale are simple but that at a higher scale look complex and even random–though in appearance only.

How have we come from the early state of the universe (left) to the structures we find today (right)?

The arguments supporting my views are partially based on the findings of my research, epitomized by our most recent paper Numerical Evaluation of Algorithmic Complexity for Short Strings: A Glance into the Innermost Structure of Randomness available in ArXiv in which my co-author and I describe a method that combines several theoretical and experimental results to numerically approximate the algorithmic (Kolmogorov-Chaitin) complexity of bitstrings by using the concept of algorithmic probability, which is connected to algorithmic complexity by way of the (Levin-Chaitin) coding theorem.

An extended (and detailed) version of The World is Either Algorithmic or Mostly Random is forthcoming and will be eventually posted.

New book: “Lo que cabe en el espacio” on Philosophy of Space and Geometry

June 10th, 2011

I am excited to announce the publication of my new book written in Spanish Lo que cabe en el espacio on the philosophy of space in connection to our reality, and what we can or cannot do with it and in it. The book, under the title “Lo que cabe en el espacio: La geometría como pretexto para explorar nuestra realidad física y matemática”, approaches the basic notions of geometry in the context of their original formulations, that is, the formulations of the mathematicians in their particular times and places. By so doing it attempts to understand the motivations behind these original formulations and build a modern view from the perspective of our time. Properties of space, such as dimension and curvature are discussed, as treated by mathematicians from Euclid to Legendre, from Descartes to Hilbert.

The book is available through CopIt Arxives UNAM, a peer reviewed publishing organization run by professors of the National University of Mexico (UNAM). Readers can download the PDF at no charge from CopIt Arxives and it will soon be available in physical form at for those wishing to own a paper copy. There is also a Kindle version of a previous incarnation of the book available from Amazon here, which despite issues around accent encodings, has been in the top 100 Kindle books in Spanish for several weeks/months. I should be updating the Kindle version fairly soon to the newest edition.

¿Cuánto cabe en nuestro espacio? ¿Por qué sólo pueden existir cinco poliedros regulares y no seis? ¿Qué figuras llenan el plano y el espacio? ¿Es el espacio una generalización trivial del plano? ¿Podemos notar si estamos en un plano curveado o si estamos sobre él? En lenguaje casi coloquial, despegaremos figuras del espacio para rotarlas en cuatro dimensiones, contaremos poliedros en dimensiones superiores y nos preguntaremos acerca de las propiedades fundamentales del espacio. En este libro se devela cómo una parte de las matemáticas describen las propiedades más fundamentales del espacio en que vivimos y por lo tanto de la realidad en la que estamos inmersos.

The Lighthill Parliamentary Debate on General Purpose Artificial Intelligence

February 24th, 2011

In 1973, Lucasian professor at Cambridge, James Lighthill, was asked by the British Parliament to evaluate the state of AI research in the United Kingdom. His report, now called the Lighthill report, criticized the failure of AI to achieve its grandiose objectives. He specifically mentioned the problem of “combinatorial explosion” or “intractability” of the discourse universe, which implied that many of AI’s most successful algorithms would grind to a halt faced with real world problems and were only suitable for solving toy versions.

The report was contested in a debate broadcast as part of the BBC series “Controversy” in 1973. The debate, which was on the proposition “The general purpose robot is a mirage,” was sponsored by the Royal Institute and pitted Lighthill against a team of researchers, including Donald Michie and Richard Gregory and led by the young AI researcher John McCarthy. The report led to the near-complete dismantling of AI research in England.

Here is the debate on YouTube (in 6 parts):

This debate goes to show that as recently as 1973 it was generally believed (as indeed it still is in some quarters), that one had to write a complex program to obtain complex behavior, or that no simple rule could produce complex behavior (2nd. part, min 7:45). This position was epitomized by Lighthill.

Lighthill’s position does not come as a surprise. He was, after all, a researcher in fluid dynamics and aeroacoustics, where it is easy to be misled by complicated differential equations involving ‘continuous’ variables and where nonexistent solutions arise so often. His main argument was that because one had to specify the rules in a computer to tell the robot how to behave in every possible scenario, every attempt to come up with a general purpose robot would quickly turn out to be an intractable problem, with a combinatorial explosion of possible solutions. I don’t entirely disagree with Lighthill on this, but I can by no means endorse his conclusion.

Unfortunately the response of the researchers on the panel was rather weak, except for a couple of minor arguments put forward by McCarthy to what seemed to be fundamental impediments that Lighthill appeared to be invoking. From my point of view, one of the items that should have been put on the table ought to have been a serious discussion of whether universal computation actually had something to tell us about the possibility of general purpose artificial intelligence (i.e. what the abstract could say about the concrete). Also the question of the essential differences, if any, between advanced automation and artificial intelligence, which McCarthy seems to have broached in one of his arguments against Lighthill, without reaching any conclusions. Indeed the point may be that there is no essential difference, which is something that was perhaps difficult to see back then. But instead of considering this as a caveat against AI, as Lighthill seemed inclined to do, it actually makes AI inevitable in the event, even if achieved by means other than those traditionally employed in AI labs.

If automation is not really that different from AI, as I think has been proven over the years, what this suggests is that automation will eventually lead to what we think of as intelligent behavior and therefore to artificial intelligence.

In the end, Lighthill’s position was reduced by the moderator (Sir George Porter) to an argument about pragmatic difficulties rather than about the fundamental impossibility of the project, in response to which Lighthill quipped that his neuroscientist friends had made it plain that they felt it was hopeless to try and model the brain using a computer. I find several problems with this position, beginning with the assumption that AI is exclusively about trying to mimic the brain.

Donald Michie made a good point concerning the combinatorial explosion made so much of by Lighthill by citing the example of chess. In 1973 humans were still better at playing chess, so he asked whether Lighthill would acknowledge artificial intelligence if a machine performing an exhaustive search turned out to win a chess match against a Master.

In 1968, David Levy was an international Master and he bet a thousand pounds that no computer program would beat him in the next 10 years. He won his bet in 1978 by beating Chess 4.7 (the strongest computer at the time). Lighthill responded that a chess program would avoid the combinatorial explosion by virtue of having the benefit of human knowledge. In any case, he said, he was reluctant to believe that a chess program would ever beat a human chess player. Lighthill was cleverly hedging his bets.

Lighthill’s position wasn’t totally clear at the end of the debate–was his argument purely pragmatic or did it concern something fundamental? I’m not wholly unsympathetic to his position because AI has been misleading as a field. While it has certainly made some contributions to science and technology, it has achieved what it has by dint of technological improvements (hardware capabilities) rather than scientific breakthroughs (e.g. a set of sophisticated new algorithms) on the part of the AI community. And I have explained this in some detail in my previous blog post on the IBM computer Watson in the Jeopardy! game Is Faster Smarter?

Is Faster Smarter? IBM’s Watson Search Engine Approach to Beat Humans

February 18th, 2011

IBM’s computer named “Watson” has beaten Jeopardy! (human) contestants in a series of games this month. IBM has a long history of innovations (watch this other 100th anniversary documentary featuring Greg Chaitin and Benoit Mandelbrot, among others here).

Not everybody was impressed by Watson though, according to Gavin C. Schmitt who interviewed Noam Chomsky, the recognized linguist:

  • Noam Chomsky: I’m not impressed by a bigger steamroller.
  • Interviewer: I assume that “a bigger steamroller” is a reference to Deep Blue. Watson understands spoken language and adapts its knowledge based on human interaction. What level of AI would be required to impress you?
  • Noam Chomsky: Watson understands nothing. It’s a bigger steamroller. Actually, I work in AI, and a lot of what is done impresses me, but not these devices to sell computers.

In some sense, I understand Chomsky’s view and he does well pointing out what may seem the clearest difference between Watson and a human being, but the point may be much more subtle and deeper. Wolfram’s Principle of Computational Equivalence (PCE) (see this previous post) may shed some light on this subject. According to this principle, memory is what makes a system intelligent, because basically any system that does not behave in an obviously trivial fashion will likely be as sophisticated as the most sophisticated system. So what is the difference between a system that is potentially intelligent and one that shows signs of actually being so? It is somehow a matter of scale in several directions. Take the example of weather. When Wolfram is asked whether clouds are intelligent according to his principle, his short answer is yes. Every time humans want to make a prediction about whether it will rain, it turns out to be incredibly difficult to do so for more than a couple of days, and often the weather forecast is wrong even for the next day. How is it that weather prediction is so hard despite our long experience at weather forecasting? Well, clouds are computing themselves, and as part of weather they are quite complex, as complex as systems like the human brain, says PCE.

Picture of Gregory Chaitin taken by Hector Zenil outside of the IBM Research center at Yorktown, N.Y. where Watson was designed.
(Picture of Gregory Chaitin taken by Hector Zenil outside of the IBM’s
Thomas J. Watson Research Lab at Yorktown, N.Y. where Watson was designed)

After all these years, IBM hasn’t come up with a theoretical breakthrough to meet the challenge but rather with a supercomputer fast enough to beat the other participants. Watson uses fairly sophisticated algorithms, but these aren’t that much more sophisticated than those used by search engines, which proceed by matching pieces of text and retrieving other pieces of text statistically related to the original. The IBM team has come up with a super-computer to challenge the Jeopardy! participants and not with a new algorithm. The main goal of machine learning labs is usually to perform about 1% to 3% better than the best benchmark of the top lab in a particular area (e.g. word tagging, word disambiguation, information retrieval, etc.). If Watson has achieved anything like a breakthrough, it may be credited with having advanced the current state of AI research. It takes Google’s kind of technology a couple steps further by having drawn together several technologies on steroids. The point is that one may not need to come up with the smartest algorithm– because one may not be able to, simply because it doesn’t make sense to engineer a super complicated algorithm to reach intelligence, it being more appropriate to start from a simple but potentially powerful system, and then extract the best of it by running it on a super large corpus of data on a super fast computer. The Watson- in-Jeopardy! experience tells us that even clouds may look intelligent when running on a supercomputer.

Watson confirms what I’d suspected, that we’re not that special after all (in this sense). Watson meets the challenge of beating a human on its own turf and at what it does best basically through the use of brute force. Achieving artificial intelligence (AI) is not, as I suspected (among other thinkers), a matter of science breakthrough but rather a matter of scale and technological achievement. Over time we’ll have faster and faster computers, and that means computers with intelligence resembling ours (of course there are other subtle issues here, such as the fact that the system should be allowed to interact with the intelligent forms it is meant to interact with, otherwise its intelligence may prove alien to ours, and look as strange as that of clouds).

Jean (Michel Jarré with his electronic harp. 
Picture by Hector Zenil, Paris concert 2010)
(Jean Michel Jarré with his electronic harp.
Picture by Hector Zenil, Paris concert 2010)

Wired published (when they used to publish interesting articles more often) an interesting article back in 2009, which reached the conclusion that one could exchange data for models: The End of Theory: The Data Deluge Makes the Scientific Method Obsolete.

Some interesting inferences can be drawn from this milestone where IBM supercomputers have beaten humans at human tasks (remember this is the second time; at least with such a publicity, the first saw IBM’s Deep Blue beat Garry Kasparov, the reigning World Chess Champion at the time, in a series of chess matches). Among these we may single out the fact that we humans seem ready to call machines smart and intelligent even though they may not necessarily think like us (humans can only see ‘ahead’ a handful of selected chess movements while chess programs perform an exhaustive search only bounded by time), despite seeming as clever as we are. This is already a vindication of Turing’s proposal for a test of intelligence.

Yet Chomsky’s opinion seems to point in the opposite direction, that we may still think we are much more special than Watson, and he might be in some sense right. We definitely play very differently than machines, but is chess that different from natural language? It may be, but this time the question may be whether what Watson does at playing is that different from what we do.

At the end of the Deep Blue vs. Garry Kasparov, Kasparov pointed out that he felt that the machine actually had a strategy and something that made him think that the machine was actually playing like a human, somehow perhaps even teasing him. Ken Jennings (one of the two human participants against Watson) wrote a day after the match: “This is all an instant, intuitive process for a human Jeopardy! player, but I felt convinced that under the hood my brain was doing more or less the same thing.”

Some people think that Watson has absolutely no ability to understand what it did, or any awareness about it, and that still makes the difference. This might be only partially true. Watson may not understand or be yet aware of its achievement, but I wonder if the same processes that made Watson to win this match are not the same that may be in action for even more sophisticated human thinking, such as self-awareness. But the question can also be reversed and one may also ask whether we are really aware of what happened, and how much of our feeling of being aware is the result of the type of thinking that Watson just exhibited.

As many of my readers certainly know, Alan Turing came up with a basic test suggesting that if something looked intelligent then it was intelligent. So we have reached the point at which not only has a machine passed the Turing test but also humanity may be ready to accept the idea behind the test, that is that it doesn’t really matter how something thinks if it looks clever enough to fool us (or even beat us at something) then it is intelligent. For the machine, the milestone is located at a point in time that reflects the current state of technology, which basically amounts to the miniaturization and mastery of computing devices, as well as the management of very large quantities of data, not forgetting the current state of fields involved (basically machine learning and natural language processing or NLP). But it is by no means an isolated achievement. I see IBM as the standard bearer for a combination of several current technologies run on the best hardware available today, a pioneer in this sense.

Wolfram|Alpha computational engine control room. Lunch day picture by Hector Zenil.
(Wolfram|Alpha computational engine control room.
Launch day picture by Hector Zenil)

It seems we are now able to gauge the size and power of the human brain as against something that looks as sophisticated as we are, at least at playing a sophisticated game. Watson and humans may reach the same conclusions, whether or not they do so in different ways, but the fact that Watson requires a computer the size of 10 full-size fridges, 15-terabyte of memory (likely full of data and programs) and 2,880 microprocessors working in parallel tells us more about us than about Watson itself. We knew we carried a supercomputer in each of our heads but we didn’t know what its specs may be. We also thought that the were specially gifted with unprecedented intelligence but now a machine that is certainly not aware of it and hasn’t taken the same path is also able to exhibits key features of intelligence.

Jen Kennings adds after the last match: “…But unlike us, Watson cannot be intimidated. It never gets cocky or discouraged. It plays its game coldly, implacably, always offering a perfectly timed buzz when it’s confident about an answer.” “…I was already thinking about a possible consolation prize: a second-place finish ahead of the show’s other human contestant and my quiz-show archrival, undefeated Jeopardy! phenom Brad Rutter.” Read more here and here.

Watson specs will fit in a small box in the future–given the trend of the past several decades following Moore’s law– and in the future as it is the case today, faster will be smarter.

Aired on PBS, NOVA called their documentary The Smartest Machine On Earth:

Watch the full episode. See more NOVA.

Alan M. Turing documentary teaser

January 28th, 2011

Randomness Through Computation

January 18th, 2011
The official announcement of the book I edited has been released:

Some Answers, More Questions
edited by Hector Zenil
450pp (approx.)
ISBN: 978-981-4327-74-9

The book will be available next month (February 2011) from Amazon, Borders and other large online bookstores, as well as on the publisher’s (World Scientific and Imperial College Press) websites.

The volume, which title should be understood as “(Explaining) Randomness Through Computation” consists of a set of chapters written by leading scholars, most of them founders of their fields. It explores the connections of Randomness to other areas of scientific knowledge, especially its fruitful relationship to Computability and Complexity Theory, and also to areas such as Probability, Statistics, Information Theory, Biology, Physics, Quantum Mechanics, Learning Theory and Artificial Intelligence. The contributors cover these topics without neglecting important philosophical dimensions, sometimes going beyond the purely technical to formulate age old questions relating to matters such as determinism and free will. The book will be a great source for learning the basics, applications and the state of the art of the field, accounted by the very creators of the fields, each with original ideas and their idiosyncratic point of view. Using a question and answer format, they share their visions from their several distinctive vantage points.


Is Randomness Necessary? (R Graham)
Probability is a Lot of Logic at Once: If You Don’t Know Which One to Pick, Get’em All (T Toffoli)
Statistical Testing of Randomness: New and Old Procedures (A L Rukhin)
Scatter and Regularity Imply Benford’s Law… and More (N Gauvrit & J-P Delahaye)
Some Bridging Results and Challenges in Classical, Quantum and Computational Randomness (G Longo et al.)
Metaphysics, Metamathematics and Metabiology (G Chaitin)
Uncertainty in Physics and Computation (M A Stay)
Indeterminism and Randomness Through Physics (K Svozil)
The Martin-Löf-Chaitin Thesis: The Identification by Recursion Theory of the Mathematical Notion of Random Sequence (J-P Delahaye)
The Road to Intrinsic Randomness (S Wolfram)
Algorithmic Probability – Its Discovery – Its Properties and Application to Strong AI (R J Solomonoff)
Algorithmic Randomness as Foundation of Inductive Reasoning and Artificial Intelligence (M Hutter)
Randomness, Occam’s Razor, AI, Creativity and Digital Physics (J Schmidhuber)
Randomness Everywhere: My Path to Algorithmic Information Theory (C S Calude)
The Impact of Algorithmic Information Theory on Our Current Views on Complexity, Randomness, Information and Prediction(P Gács)
Randomness, Computability and Information (J S Miller)
Studying Randomness Through Computation (A Nies)
Computability, Algorithmic Randomness and Complexity (R G Downey)
Is Randomness Native to Computer Science? Ten Years After (M Ferbus-Zanda & S Grigorieff)
Randomness as Circuit Complexity (and the Connection to Pseudorandomness) (E Allender)
Randomness: A Tool for Constructing and Analyzing Computer Programs (A Kucera)
Connecting Randomness to Computation (M Li)
From Error-correcting Codes to Algorithmic Information Theory (L Staiger)
Randomness in Algorithms (O Watanabe)
Panel Discussion Transcription (University of Vermont, Burlington, 2007): Is the Universe Random? (C S Calude, J Casti, G Chaitin, P Davies, K Svozil, S Wolfram)
Panel Discussion Transcription (University of Indiana Bloomington, 2008): What is Computation? (How) Does Nature Compute? (C S Calude, G Chaitin, G Csicsery, E Fredkin, T Leggett, R de Ruyter, T Toffoli, S Wolfram)

An ordering guide can be found here. You can get your copy at a good discount from the publisher bookstore. Contact me and I may be able to provide you with a discount code before 20 February, 2011. If you can and find the book interesting, please recommend it to your library and colleagues, this will allow me and other authors and editors to keep projects going, such as my next book project Computation in Nature & the Nature of Computation.

Update: I have been told by WSPC that the book is one of the top sellers this year at a rate of about 2-3 books per day during the first 4 months (which is high for a technical book). It has also made it into several occasions into the top 100 books of information theory in Amazon and the used version are already way more expensive than the new one.

Compression-based Investigation of Cellular Automata, A Phase Transition Coefficient and a Conjecture Related to Universal Computation

August 22nd, 2010

In my new article Compression-based investigation of the dynamical properties of cellular automata and other systems, published in the Journal of Complex Systems (19:1), pages 1-28, I present a method for studying the qualitative behavior of cellular automata and other abstract computing machines based on the approximation of their program-size complexity using a general lossless compression algorithm. I show that the compression-based approach classifies cellular automata (CA) into clusters according to their heuristic behavior, with these clusters showing a correspondence with Wolfram’s main classes of systemic behavior. I also present a Gray code-based numbering scheme for initial conditions optimal for this kind of investigation, and a compression based method for estimating a characteristic exponent in the form of a phase transition coefficient measuring the resiliency or sensitivity of a system to its initial conditions. And I conjecture that universal systems have large transition coefficients.

I think this constitutes a novel framework for investigating the dynamical properties of cellular automata and other systems. Here I will discuss some of the main ideas and implications of the paper. A pdf version of the paper is available online on ArXiv

Algorithmic complexity: classification into Wolfram’s four classes

In A New Kind of Science and in several papers dating from the mid-1980s, Stephen Wolfram defined four classes into which cellular automata (CA) and several other systems evolve, each capturing a different qualitative behavior which evolves from the same ordered initial configuration (the simplest black cell).

  • Class 1 consists of CA whose dynamics reach a steady state regardless of the initial conditions.
  • Class 2 consists of CA whose long-term evolution produces periodic or nested structures.

These first two classes are simple, in the sense that their long-term evolution can be deduced from running the system for a small number of steps.

  • Class 3 CA produce structures that seem random.
  • Class 4 CA produce localized structures on a random looking background, and hence are complex looking.

Wolfram’s classification is heuristic, and the assignment of CA to the four classes is somewhat subjective. To the best of my knowledge, there is, to date, no universally agreed upon classification scheme for CA. I think, however, that there is no better approach than a pure program-size complexity measure, which is the approach I follow in this paper. I propose to continue the investigation later, using the same measure to discover other interesting properties and possible hierarchies.

An interesting question related to Wolfram’s classification concerns its dependence on the initial condition–chiefly because the classification was originally meant to be constructed by visual inspection over the evolution of a CA, and as we know, the evolution of a CA depends on its initial condition. This has been a major critique (Eppstein) of Wolfram’s classification, because the expectation is that the classification should be based on the evolution from an unordered (random) configuration.

Nevertheless, the classification is actually based on the potential of a CA to evolve into any of the possible behaviors from at least one initial configuration (the question is of course not answerable in finite time, since there is an infinite number of possible initial configurations). Wolfram’s classification may therefore be seen as being dependent on the initial condition of a CA.

It is not a surprise that one can, for example, construct a CA belonging to more than one of Wolfram’s four classes when starting from different initial configurations. Rule 110 belongs to class 4 because it is capable of universal computation–one can set up an initial configuration to ‘program’ rule 110 to carry out any computation (this being the very basic concept of a programmable computer).

For every CA rule there is a definite (but in general undecidable) answer to the question whether or not it is capable of universal computation (or in reachability terms, whether a CA will evolve into a certain configuration). The question only makes sense if the evolution of a CA depends on its initial configuration. No rule can be universal that fixes the initial configuration once and for all (there would be no way to input an instruction and carry out an arbitrary computation).

On the other hand, some rules, such as Rule 0, don’t produce different configurations relative to variant initial configurations. No matter how you may change the initial condition, there is no way to make it compute something other than what it actually computes for every other initial configuration.

A possible objection (made by David Eppstein) is that there are CAs that can be made to look as if they belonged to all classes by modifying their initial conditions. Which is true: a CA may belong to a certain class until, given another initial configuration, it is made to behave as if it belonged to another class.

My compression-based approach shows that Wolfram’s heuristic classification can actually be quantified by a measure which is clearly dependent on the initial conditions, while also being capable of detecting sensitivity to initial configurations, and hence of replacing the visual inspection. This hierarchical classification is well defined and is therefore a good candidate for a complexity measure.

The second part of my investigation actually takes advantage of the ability of CAs to behave differently in order to undertake a closer inspection and a novel classification, taking into account the average capability of a system to behave in different ways.

Differentiation from a priori approaches

The approach is different from others in that it is an a posteriori technique, unlike, for example, Langton’s lambda, a measure of the density of a CA rule. It is an a posteriori technique because unlike this lambda number, the compression-based approach requires the CA to evolve before saying anything about it, whereas Langton’s lambda is computed from the rules of the CA.

Langton’s lambda is simply the fraction of rules in which the new state of the cell is non-zero. For example, the rules of the elementary cellular automaton number 1 in Wolfram’s enumerating scheme are simple: all 3-tuples sent to 0 but one, the last one. Therefore Langton’s lambda is 1 over 8.

The lambda parameter of a CA is a number between 0 and 1. For example, if lambda is 0 (e.g. for ECA rule number 0), the evolution develops into a trivial state. Langton found that CA rules with lambda close to zero evolve into trivial states and CA rules close to 1 evolve into random-looking behavior, with complex behavior somewhere in between. It is near this transition that the most interesting CAs will lie, the ones that manifest the most complex behavior.

Unfortunately, classifying CAs with lambda values is more complicated than that, as one quickly faces undecidability. If it were possible to decide once and for all whether a CA is complex by computing its lambda value, without having to run the CA for a single step, one could solve all kinds of undecidable questions simply by looking at a CA’s rules.

The critical value for lambda is not a universal constant. Nor, for that matter, is my phase transition coefficient. But the main difference, as noted before, is that the compression-based approach actually looks at the evolution of the system rather than trying to figure everything out from the description of the CA.

The compression-based method represents a formal approach to Wolfram’s classification process, replacing the need for a visual inspection with a technique and a coefficient to determine to what class a CA belongs. The approach is compatible with what Stephen Wolfram himself has proposed in his NKS book, without contradicting any computability result or Wolfram’s own Principle of Computational Irreducibility, which says that while some computations may admit shortcuts that allow them to be performed more rapidly, others don’t, so that one cannot really tell what the evolution of a CA will be, except by running it.

Initial configuration numbering scheme

Ideally, one should feed a system with a natural sequence of initial configurations of gradually increasing complexity. Doing so ensures that qualitative changes in the evolution of the system are not attributable to discontinuities in its set of initial conditions.

What I propose is an initial configuration numbering scheme where two successive values differ by only one bit. To explore the qualitative behavior of a cellular automaton when starting from different initial configurations, the optimal method, because of its smoothness, is to follow this Gray encoding enumeration, in order to avoid any undesirable “jumps” attributable to the system’s having been fed with discontinuous initial configurations. By following the Gray code, an optimal numbering scheme was devised so that two consecutive initial conditions differed only in the simplest degree (by one bit). This simple convention will allow us to continue the investigation of the dynamical behavior of a system without being concerned about arbitrarily introducing artificial complexity when moving from one initial configuration to the next.

Phase transition detection and coefficient

I defined a measure based on the change of the asymptotic direction of the size of the compressed evolutions of a system for different initial configurations (following the proposed Gray-code enumeration of initial configurations). My phase transition coefficient yields an interesting classification: it measures the resiliency or sensitivity of a system to its initial conditions. So rules such as 30 and 0 appear close to each other. Odd as this may seem, this is because both, when their initial conditions are changed, behave in more or less the same way. In other words, there is no change in the qualitative behavior of these CAs when feeding them with different inputs, regardless of how different the inputs may be.

In this phase transition classification, for example, rules such as 122 and 89 appear next to each other, because, as the investigation proves, they are both CAs with relatively high phase transition coefficients, meaning that they are very sensitive to initial conditions, dramatically changing their qualitative behavior when starting from one rather than another initial configuration.

Phase transition and predictability

An obvious feature of universal systems is that they need to be capable of carrying information by reflecting changes made to the input in the output. In attempting to determine whether a system is capable of reaching universal computation, one may ask whether a system is capable of this in the first place, and how efficiently it does so. And this is what the phase transition actually measures, because it tells how well a system manages to respond to an input. Obviously, a system such as rule 0 or rule 255, which does not change regardless of the input, is trivially decidable. But a universal system should be capable of some reaction to external manipulation (the input to the system). The inverse, however, should not hold, because having a large transition coefficient by no means implies that the system will behave with the freedom required of a universal system if it is to emulate any possible computation (a case in point may be rule 22, which, despite having the largest transition coefficient, may not be versatile enough for universal computation).

The phase transition measure also implies that one may be able to predict the behavior of a system for an initial configuration with a degree of certainty based on the previous variability of the system in relation to an initial segment of initial configurations. It is an open question whether this is a lower bound. In other words, it is unclear whether looking at the behavior of a system for a certain length of time and for certain configurations will tell you anything about its behavior for other initial configurations or if it were allowed to evolve for a longer period of time. Experience says one would do well to predict future behavior on the basis of past behavior, and this may also be related to Israeli and Goldenfeld‘s very interesting findings. In 2004 they showed that some computationally irreducible elementary cellular automata have properties that are predictable at a coarse-grained level. They did so following a renormalization group (RG) technique, which refers to a mathematical apparatus that allows one to investigate the changes in a physical system as one views it at different distance scales.

They were able to provide a hierarchy of complexity in agreement with their apparent complexity. Israeli and Goldenfeld’s classification is also an a posteriori investigation, but it too is bedeviled by the unavoidable (and ultimately undecidable) statistical question, namely, whether one can keep on predicting for all initial conditions and for any number of steps, without having to run the system  forever and for all possible initial conditions. But unlike mine, their classification is partial, in the sense that one cannot always say whether the complexity of one CA is greater than that of another. Based on the comparison of their approximated compressed sizes, however, I could come up with a total order in full agreement with their apparent complexity as well as with Wolfram’s four classes (one that even yields the same number of classes).

An open question directed to me by Nigel Goldenfeld– in trying to understand the similarities and differences between their RG and my algorithmic approach–concerns how their hierarchy relates to mine. What they experimentally suggest is that the larger the scale of the transformations used, the more highly structured the objects and therefore the greater their algorithmic complexity.

For example, rule 110 is one rule about which my own phase transition classification says that, despite showing some sensitivity, it also shows some stability. Which means that one can say with some degree of certainty how it will look (and behave) for certain steps and certain initial configurations, unlike those at the top. This turns out to be predictable according to Israeli and Goldenfeld as well, at a coarse-grained level, after a scale transformation.

A sketch on a conjecture on the transition coefficient of Turing-universal systems

Based on two empirical facts, I conjecture that universal CAs should have a large transition coefficient, as 110 does. Rules such as 110 and 54 also turn out to be next to each other in the phase transition classification, both having large values (they are in the top 24, and both the top 24 and the bottom 22 are in the original paper: see the last 2 tables).

So I base this conjecture on the following empirical observations:

1. The only known universal elementary cellular automata figure at the top of this classification, and so do candidates such as rule 54, which appears right next to rule 110.

2. Universality seems to imply that a system should be capable of being controlled by the inputs, which our classification suggests those at the bottom are not, as all of them look the same no matter what the input, and may not be capable of carrying information through the system toward the output.

Other rules that some people think may be capable of some sophisticated computation (See paper in the Journal of Complex Systems by Paul-Jean Letourneau) also have large transition coefficients, such as rule 146, with a transition coefficient 0.124, ranking 39 out of the 256 elementary cellular automata.

As noted before, however, the transition coefficient is not a universal measure. In this case, coefficients were calculated for 600 steps in blocks of 50 for the first 500 initial conditions, which means that some rules may be capable of greater transitions but are somehow ‘slow’ at the selected number of steps and number of initial conditions. i.e. they take more than 500 initial conditions–in Gray code numbering–to show larger changes, or else larger changes are being missed because of the jumps in blocks of 50 steps, though this latter possibility is less likely.

The conjecture also seems to be in agreement with Wolfram’s claim (made at some of his oral presentations) that rule 30 (as a class 3 elementary cellular automaton) may be, according to his Principle of Computational Equivalence (PCE), computationally universal. But it may turn out that it is too difficult (perhaps impossible) to control in order to perform a computation, because it behaves too randomly.

Classifying objects by complexity

June 2nd, 2010

I have coauthored, with Jean-Paul Delahaye and Cedric Gaucherel, and made available today on arXiv a new paper entitled Image information content characterization and classification by physical complexity. In the paper we present a method for estimating the complexity of an image based on the concept of Bennett’s logical depth. Unlike the application of the concept of algorithmic complexity by itself, the addition of the concept of logical depth results in a characterization of objects by organizational (physical) complexity. We use this measure to classify images by their information content. The method provides a means for evaluating and classifying objects by way of their visual representations.

The method described in the paper ranks images based on their decompression times and the classification corresponds to the intuitive ranking resulting from a visual inspection, with things like microprocessors, human faces, cities, engines and fractals figuring at the top as the most complex objects; and random-looking images, which ranked high by algorithmic complexity, were ranked low according to the logical depth expectation, classified next to  trivial images such as the uniformly colored, indicating the characteristic feature of the measure of logical depth. A gradation of different complexities were found in the groups between, gradually increasing in complexity from bottom to top.

significant different groups

Complexity classification of images, from more complex to less complex(group descriptions on the right are followed by the average decompression times as approximations to Bennett's logical depth)

Along the paper we show that:

  • The concept of logical depth can be implemented as a feasible and applicable method to approach a real-world problem.
  • After studying several cases and tested several compression algorithms, the method described in this paper has shown to work and to be of use for identifying and classifying images by their apparent physical complexity.
  • The procedure described constitutes an unsupervised method for evaluating the information content of an image by physical complexity.
  • As the theory predicted, logical depth yields a reasonable measure of complexity that is different from the measure obtained by considering algorithmic complexity alone, while being in accordance with one’s intuitive expectations of greater and lesser complexity.
  • The paper is available here.

    Comments on Turing’s very first Universal machine approaching Turing’s 100th. birthday anniversary

    May 18th, 2010

    The idea that a machine could perform the tasks of any other machine is the description of a Universal (Turing) machine. Its invention is considered by many to have been one of the major landmarks giving rise to the field of computer science. ‘Universal’ means that one can ‘program’ a general-purpose machine to perform the tasks of any specific-purpose machine. Turing machines are to this day the central object of study in the theory of computation.

    In an attempt to understand how the very first universal machine was described in Turing’s original 1936 paper “On Computable Numbers, with an Application to the Entscheidungsproblem,” I spent half a day re-reading the paper (and its corrected version by Donald Davies, published in The Essential Turing by Jack Copeland), trying to decode it, only to find that it is written in something like an anticipation of a (Turing-complete) subroutine-oriented programming language which is impossible to rewrite in a traditional transition table. So if one really tries hard, one ends up encoding an arbitrary universal Turing machine, not Alan Turing’s first universal Turing machine.

    Although the paper has all the primary elements of a traditional description of a Turing machine (Turing’s ‘a-machine’ description), the fact that it used multiple conventions for describing increasingly complex machines was Emile Post’s strongest critique. In a letter to Church, Turing replied to Post’s appreciation, arguing that his use of more than one convention when building his universal machine did not affect the final result, though he did admit it made it hard to decipher.

    The result is that not only would it be a giant machine in terms of states and symbols, but the number of actual colors that it may need seems to be unknown. It is only known to assume 18 states, and to simulate Turing’s second a-machine example with 23 instructions (here the product of states and colors does not necessarily lead to the number of instructions in Turing’s formalism, because his transitions are not total functions).

    In the traditional 5-tuple form, Turing’s original universal machine could be written as (Mathematica notation):

    {q1, “blank”} -> {q2, P0, R}
    {DA, D} -> {DAA, DC, R}
    {q2, “blank”} -> {q3, E, R}
    {DAA, D} -> {DAAA, D, R}
    {q3, “blank”} -> {q4, P1, R}
    {DAAA, D} -> {DAAAA, DCC, R}
    {q4, “blank”} -> {q1, E, R}
    {DAAAA, D} -> {DA, D, R}

    But notice that no state starting by q leads to a state starting by D because D (states are called m-configurations in Turing’s original jargon) are rather prefix subroutines defined in Turing’s paper while q’s are actually traditional Turing machine states. In Turing’s paper E is, for example, an erasing subroutine. Some other ‘m-configurations’ require scanning the whole tape several times (which is what one would do if one is asked to emulate another Turing machine), and so on. So most of the behavior description of the machine is encoded as strange strings of letters.

    Nevertheless, Turing’s choice is somehow clever from a programmer perspective, he proceeded in the way one would do so today for designing an implementing a universal computer. One would hardly do so by describing the basic elements, but rather by constructing higher level subroutines describing a machine function and based itself in one or more other subroutines up to the level of states and symbols. Think of programming at the level of the machine language v. programming in an intermediate level language. Writing a universal Turing machine in detail in terms of states and symbols from the beginning leading to complete lost and misunderstanding, just as it would so if one pursuits the implementation of a complex piece of software writing binary code or even in a pure assembler language. Turing’s description provides a better understanding, not trivial though, of what a universal Turing machine does to carry out the computation of any other Turing machine by sketching and grouping intuitively the machine operations into these program subroutines.

    Unlike today, that one can simply make Wolfram|Alpha to run any Turing machine simulation such as a random 2-state 5-color Turing machine or the 4-state 2-color Busy Beaver (for a list of other Turing machine examples one can type in WolfraAlpha click here), Turing had no computer to ran and test his code, it is not a surprise that his universal machine code came together with several glitches. It was quite amusing to see that the first ever program written for a digital computer was already bedeviled by bugs. And this program was the actual implementation of the very first universal Turing machine by Alan Turing himself.

    If Turing had not died in 1954, at the age of only 41, next June 23 (2010) he would have 98 years old. As a tribute to his work I’ve set up the following webpage gathering most, if not all, the public images known of him

    For his 100th birthday anniversary a series of events are being organized. 2012 will not only be the year of the Olympic Games that Turing would have particularly followed in his own country (UK) as an enthusiast long distance runner but also The Alan Turing Year to which I’m honored to be part of as a member of the advisory committee representing Wolfram’s Science Group.

    Stephen Hawking: A brief examination of the recent warning over alien civilizations

    May 4th, 2010

    Stephen Hawking asserts that while aliens almost certainly exist, humans should avoid making contact.

    The original story published by BBC News can be found here.

    He claims: “We only have to look at ourselves to see how intelligent life might develop into something we wouldn’t want to meet.”

    Stephen Hawking recent assertion looks like an interesting time to bring up the question of intelligent life, meaning and purpose.

    Let’s examine what Hawking’s argument implies, assuming that intelligent life, other than human, exists elsewhere in the universe:

    1. Aliens come from an Earth-like planet.
    2. The Earth has something the aliens do not.
    3. The Earth has something aliens need or want.

    Each point may be not necessarily independent of each other, each may be chained or added to the other. Point 2 may imply Point 1. However, Earth-like planets are likely to be quite common, as the current research on exoplanets seems to suggest. While Point 1 is chauvinistic, Point 3–the notion that Earth possesses something special– is egocentric. Concerning Point 2 again: as a sort of example, many cities on Earth are in need of potable water. Does that mean they can just go looking for it in another city?

    Destructed castle on the coast way of Ireland

    Originally uploaded by hzenilc.

    If we think that aliens are looking for a planet with an atmosphere like ours, we are already making a very strong assumption that requires further support. To assume that their planet has the same atmosphere as ours and that they would consequently need a planet exactly like it, is to make a strong claim–and a highly implausible one at that. We may think that water is the most precious element in the universe, but it might just be lethal to an alien civilization. If it is true that we can only imagine them in the basis of the kind of life and it diversity on Earth, it is also true that we know nothing about them, if they really do exist.

    If humans find life elsewhere, and if it is not related to Earth’s life, it is likely to be unrecognizable. After all we have experience of a very large range of life forms here on Earth, and we find some of them already alien enough. “Extremophiles” is the name we give to Earth species that can survive in places that would quickly kill humans and other “normal” life-forms.

    As for needing a work force, it doesn’t make sense that aliens would seek it here. Humans are highly unproductive. And aliens with the technological capabilities to travel to Earth are likely to have had long experience building the kinds of machines that humans have only recently got around to building and improving. Advanced intelligent aliens most likely make extensive use of robots, if they are not some kind of cybernetic beings themselves.

    Though assuming that aliens and humans are alike in their biochemical constitution may be chauvinistic and misguided, the supposition that our intelligences are similar has a more solid basis, especially since we also assume that the alien civilization in question has built spaceships, travels in space and is looking for other civilizations. According to Stephen Wolfram’s Principle of Computational Equivalence (PCE), there is a good chance that if non-trivial life forms exist they will have or develop the same degree of sophistication (e.g. intelligence) than ours. Wolfram’s PCE would also suggest that once primitive life has developed it will eventually achieve its maximal degree of sophistication, which would be the maximal possible degree of sophistication. In other words, if life is around, intelligent life is as likely as life.

    We used to say that aliens are likely to be much more advanced than humans. Why so? Because the universe has a 15 billion year history. Characterizing a civilization that has been around for about 2 million years and has only begun developing technology in the last few hundred as ‘advanced’ is beyond naive, it is statistically implausible. As Carl Sagan pointed out, every intelligent civilization seems to reach a period when its own technological power is capable of destroying itself. This is the stage which human civilization reached about 70 years ago (and still is), with the invention of atomic bombs. The longevity of said aliens’ civilization means that they have managed to make good use of the resources of their host planet. Even assuming they have reached the point where they have exhausted their resources, we have to concede that they have probably been good, ecologically responsible stewards of their planet. Of course it is still possible, despite all this, that these aliens may wish to colonize a new planet in order to exploit its resources rather than simply seizing what they find and taking it away with them.

    Water is one of the most common elements in the universe, or it is quiet easy to synthesize it (see the chemical recipe here). If they just need a rock revolving around a star at a distance such that life can be sustained, there are certainly many more possibilities than coming to Earth. Think about it. The human civilization is much closer to achieving the terraformation of Mars (reengineering Mars soil and atmosphere to make it Earth-life friendly), given its current and near future capabilities than traveling abroad to conquer, if not cohabit with another civilization.

    Now, from a practical standpoint, the notion that we could hide away in a corner of the universe is nonsense, to say the least, if we are not somehow already hidden due to the size of the universe itself. The Earth has been broadcasting to space since radio signals were invented. Messages have been kept symbolical on purpose. So in the worst case scenario, assuming Hawking is right and our signals are likely to be picked up by an alien civilization willing to take us on, hiding is just rubbish because we can’t. As Seth Shostak says, the first thing to do would be to shut down the BBC, NBC, CBS and the radars at all airports. Not unless we install a kind of Dyson sphere around the Earth or stop broadcasting altogether. Now, the only way to make sense out of Hawking particular comment in this regard, is taking into consideration time scales. If it is true that Earth has been broadcasting to the space during the last 50 or more years, it is also true that someday in the future it may reach a technological state that allows it to avoid, purposely or not, doing so. So it may be the case that civilizations decide to hide themselves after a short period of broadcasting history.

    The advice to hide from aliens implies of course that they are destructive, or that the contact between our civilizations may be destructive, and this brings us back to the question of their longevity. If they were indeed destructive they would have annihilated themselves. As Hawking does right, consider the human case.

    But if Hawking is right, we would probably have nothing to be scared of. If an alien civilization wants us dead we will either barely notice it when that happens or that won’t ever happen if it hasn’t happened already. The chances of a destructive civilization to exist seem lower than the chances of a pacific civilization to extend their existence time period in the universe history. The former would have likely already destroyed itself while the latter may have better chances. Caution would not hurt though. We ought to keep an eye on ourselves and how we develop, and that means using our resources more intelligently and not necessarily manufacturing more bombs. But being scared definitely says much more about us than anyone else because we simply know nothing about them, nor we can pretend to know what are their intentions, if any.

    But of course in the matter of encounters between civilizations in space, every possibility is time-scale dependent. Imagine for an instant that two alien civilizations are at war. We would certainly be well advised to keep away from the conflict. We may be justified in thinking that if we were in the way when either of the parties ran short of resources to prosecute their war, they would most likely help themselves to anything we had that could be of use. But think a little further. The odds of encountering two civilizations actually engaged in warfare are rather small.

    Civilizations making war would either annihilate each other, or if they don’t, then one party would end up achieving hegemonic status. It would seem logical that periods of peace are much longer than periods of war. In the first place civilizations that contemplate war must be both smart enough and hostile enough, and reaching these thresholds of achievement and antagonism takes time–peacetime.

    As Hawking claims, many of the life forms in the universe are probably just microbes, but unlike Hawking I believe that if civilizations do exist they’d have little time to make war, even assuming they wanted to. Once again, one has only to think of the Earth to reach this conclusion. If the Cold War had not remained cold, we either wouldn’t be here or else we’d all now be ruled by a single country–as has been pretty much the case (despite there being no actual war) over the last few decades, with the emergence of a current single superpower (with a partial sharing of the global power, and other powers emerging). But when superpowers emerge these days, they are more interested in keeping the peace because they have reached a stage where they depend on each other, chiefly because trade and commerce have become globalized. It turns out that the human world has become much more cooperative than we might have expected. But this is not by chance; it seems to be a common path. Not that one can be absolutely certain, though. Only by witnessing other civilizations could we safely make generalizations. And yet logic dictates that the opposite path–the path of belligerence–would be unlikely.

    What I think is that if civilizations were to find each other they are more likely to be interested in each other’s culture and knowledge, than in each other’s natural resources–either because natural resources can be found in many places, or even created by civilizations that are sufficiently advanced, or else because they are simply not needed, curiosity about the universe being the sole motive behind exploration. Which would mean that civilizations would preserve ‘alien’ civilizations to enrich themselves, just as anthropologists and ethnologists do on Earth. To make my point in terms of Hawking, aliens are more likely to be like modern scientists than like the barbaric Europeans who colonized the Americas.

    On the Algorithmic Nature of the World

    April 21st, 2010

    In a new paper I’ve coauthored with Jean-Paul Delahaye, we propose a test based on the theory of algorithmic complexity and an experimental evaluation of Levin’s universal distribution to identify evidence in support of or in contravention of the claim that the world is algorithmic in nature.

    To this end we have undertaken a statistical comparison of the frequency distributions of data from physical sources on the one hand–repositories of information such as images, data stored in a hard drive, computer programs and DNA sequences–and the frequency distributions generated by purely algorithmic means on the other–by running abstract computing devices such as Turing machines, cellular automata and Post Tag systems. Statistical correlations were found and their significance measured.

    The paper is forthcoming as a book chapter by invitation of Gordana Dodig-Crnkovic, in Gordana Dodig-Crnkovic and Mark Burgin (eds.) Information and Computation by World Scientific, 2010.

    The paper is available online on arXiv.

    If the subject is of interest to you I invite you to regularly visit our research project main webpage:, where we are publishing results and updates.

    Evaluating the complexity of a living organism by its algorithmic complexity

    September 26th, 2009

    One of the greatest scientific achievements of the last century was the understanding of life in terms of information. We know today that the information for synthesizing the molecules that allow organisms to survive and replicate is encoded in the DNA. In the cell, DNA is copied to messenger RNA, and triplet codons in the messenger RNA are decoded in the process of translation to synthesize polymers of the natural 20 amino acids.

    Humans have been intrigued by the origin and mechanisms underlying complexity in nature coming from information contained in repositories such as the DNA. Darwin’s theory of evolution suggests that this complexity could evolve by natural selection acting successively on numerous small, heritable modifications.

    Darwin’s theory represents a great leap forward in our understanding of the fundamental processes behind life. However, there is a tendency to assume that evolution os the sole factor in designing nature while it may not actually be the main driving force behind the complexity of living organisms [If you wish to know more about the theory of evolution by means of natural selection, three respectable British institutions have set up special websites in celebration of Darwin’s 200th. anniversary: the University of Cambridge (with the original scanned text and even an audio version in mp3 format), the Open University and the BBC].

    Nature seems to use a specific toolkit of body features rather than totally random shapes. Like units of Lego, Nature assembles its forms from a limited set of elements. For example, despite the variety of living forms on the Earth, they do all seem to have a front-to-back line down the center of the body, and extremities (if any) on the sides, from flies who have a head at one end and a tail at the other, to worms, snakes and humans. Despite the randomness that may undermine any shared regularity among all animals in combinatoric terms, on a certain level, from a certain perspective, we are all similar in shape and features. Why didn’t evolution attempt other, completely different forms? And if it did, why were so few of them successful? Given the improbability of  several other shapes having been put into circulation without any of them winning out save the ones we all know, we could conclude that evolution never did attempt such a path, instead keeping to a small pool of tried and tested basic units whose survival has never been in jeopardy. There are some symmetries and general features that many animals share (more than can be explained by inheritance) that are not so easily explained in purely evolutionist terms. A remarkable example is the resemblance of all animals in their embryonic phase.

    Two teams of biologists (Walter Jakob Gehring and colleagues at the University of Basel, Switzerland, and Matthew Scott and Amy Weiner working with Thomas Kaufman at Indiana University, Bloomington) seem to have independently discovered toolkits that Nature appears to use that they have called homeobox containing genes.

    This discovery indicates that organisms use a set of very simple rules passed along to them (thus reducing the amount of randomness involved) to build a wide variety of forms from just a few basic possible body parts. To oversimplify somewhat, one can for instance imagine being able to copy/paste a code segment (the homeobox) and cause a leg to grow in the place where an antenna would normally be in an ant.

    This begins to sound much more like the footprint of computation rather than a special feature characterizing life, since it turns out that a few simple rules are responsible for the assembly of complex parts. Moreoever, this is consonant with what in Wolfram’s scheme of things life’s guiding force is said to be, viz. computation. And with what Chaitin has proposed as an algorithmic approach to life and evolution, as well as with my own research, which is an attempt to discover Nature’s basic hidden algorithmic nature.  All the operations involved in the replication process of organisms– replacing, copying, appending, joining, splitting–would seem to suggest the algorithmic nature of the process itself. A computational process.

    Based on my own research interests it is my strong belief that though by no means wrong, Darwin’s theory of evolution belongs within a larger theory of information and computation, according to which life has managed to speed up its rate of change by channeling information efficiently between generations, together with a rich exchange of information with the outside by a process that while seemingly random, is in fact the consequence of interaction with other algorithmic processes.

    Think a bit further about it. Evolution seems deeply connected to biology on Earth, but as part of a larger computation theory it might be applied anywhere in the universe just as the laws of physics do. Evolution may be formulated and explained as a problem of information transmission and channeling, pure communication between 2 points in time. If you want to efficiently gather and transmit information it may turn out that biological evolution may be not the cause but the consequence.

    The theory of algorithmic information (or simply AIT) on the other hand does not require a random initial configuration (unfortunately perhaps, nor any divine intervention) to have a program, when run, produce complicated output. This is in keeping with Wolfram’s finding that all over the computational universe there are simple programs with simple inputs generating complex output, what in NKS terms is called ‘intrinsic randomness’, yet is purely deterministic. Nor does AIT require the introduction of randomness during the computation itself. In other words, it seems that randomness plays no necessary role in producing complex organisms. Evolution seems to underlie change, its pace and direction, but it does not seem to constitute the driving force behind life.

    Evolution seems to be taking advantage of the algorithmic properties of living systems to fabricate new forms of life. To facilitate understanding of these body patterns the University of Utah has set up an illustrative website. Incidentally, this genetic toolkit based on the homeobox concept is surprisingly well captured in the Spore video game.

    In a recent article Greg Chaitin has proposed (Speculations on biology, information and complexity) that some of the properties of DNA and the accumulation of information in DNA may be better explained from a software perspective, as a computer program in constant development. When writing software, subroutines are used here and there all the time, and one usually creates an extra module or patch rather than rewrite a subroutine from scratch. This may correspond to what we see in DNA as redundant sections and ‘unused’ sections.

    In Chaitin’s opinion, DNA is essentially a programming language for building an organism and then running that organism. One may therefore be able to characterize the complexity of an organism by measuring the program-size complexity of its DNA. This seems to work well for the length of DNA, since the longest known sequence of DNA belongs to what is certainly the most sophisticated organism on this planet, i.e. homo sapiens.
    Chaitin proposes the following analogy:

    program -> COMPUTER -> output
    DNA ->

    However, we encounter problems when attempting to view the process of animal replication in the same algorithmic terms. If, as the sophistication of homo sapiens would suggest, human DNA is the most complex repository of information, and given that DNA represents the shortest encoding capable of reproducing the organism itself, we would expect the replication runtime of human DNA to be of the same order relative to other animals’ replication times. But this is not the case. A gestation period table is available here. So what are we to make of the fact that the right complexity measure for living beings (the logical depth of an object as the actual measure of the organizational complexity of a living organism) does not produce the expected gestation times? One would expect the human gestation period to be the longest, but it is not.

    Charles Bennett defined the logical depth of an object as the time required by a universal computer to produce the object from its shortest description, i.e. the decompression time taken by the DNA from the fertilized egg of an animal (seen as a universal computer) to produce another organism of the same type. There seems to be more at stake, however, when trying to apply the concept to Chaitin’s replication analogy– issues ranging from when to determine the end of the replication (the gestation period?), to better times to give birth, to gestation times inherited from ancestral species, to the average size of organisms (elephants and giraffes seem to have the longest periods). Some hypotheses on period differences can be found here for example.

    If living organisms can be characterized in algorithmic terms as we think they can, we should be able to introduce all these variables and still get the expected values for the complexity measurement of an organism– seen as a computer program–reproducing another organism from its shortest encoding (the DNA being an approximation of it). A complete model encompassing the theory of evolution has yet to emerge. It seems to be on the horizon of AIT, as another application to biology, one that provides a mathematical explanation of life.

    In summary:
    So far, what we know is that DNA is the place where the information for replicating an animal is to be found. What’s being proposed above is that the information content in the DNA can be actually measured and effectively approximated as a distance measure of the complexity of an organism. If one can quantify these values one could, for instance, actually quantify an evolutionary step in mathematical terms.
    Also, evolution is not usually seen as part of a computational theory, but as an special feature of life. I think otherwise.
    Randomness has hitherto been thought to play a major role in evolution as it is mutation that drives the evolutionary process. But I suggest that this is not the case. It is just another part of the deterministic computation, as algorithmic information theory suggests.
    Finally, evolution has been thought of in terms of very small steps rather than building blocks and building over them as other scientists have found (which would explain why the theory of evolution has been bedeviled by questions which have not thus far been satisfactorily answered). This favors my computational view of the process of life, because it is based on what in software technology is seen as a subroutine orientation programming paradigm.

    In summary:

    • So far, what we know is that the DNA is the place where the information for replicating an animal is to be found. What’s being proposed above is that the information content in the DNA can be actually effectively approximated by means of its program-size complexity and logical depth to define a measure of the complexity of an organism. If one can quantify these values one could, for example, actually quantify an evolutionary step in mathematical terms. This would represent a first step toward encompassing Darwin’s theory of evolution within an algorithmic mathematical theory of life. Evolution is not usually seen as part of a computational theory, but as a special feature of life. The above suggests otherwise.
    • Randomness has hitherto been thought to play a major role in the evolution of species, as it is mutation that drives the evolutionary process. But I suggest that this is not the case. Rather I suggest that what appears to be random is actually part of a deterministic computation, which means that randomness plays no significant part in the process, while computation does.
    • Finally, evolution has hitherto been thought of as a process that advances by very small steps, rather than one that is capable of quickly building over blocks of code, as it might be actually the case. This new understanding favors the computational view I am putting forward here as playing a main role in the process of life, because it is based on what in software technology is the practice of a subroutine orientation programming paradigm: code reuse.

    Just One Universal Algorithm Workshop

    February 18th, 2009

    C A L L   F O R   P A P E R S
    A N D   P A R T  I C I P A T I O N
    J O U A L    2 0 0 9    W O R K S H O P
    Just One Universal Algorithm

    Experiments with emergence in computational systems modeling spacetime and nature

    ISTI-CNR, Pisa, Italy, July 10-11, 2009


    Could all the complexity we observe in the physical universe emerge by just iterating a few simple transition rules, and be virtually reproducible by running a few lines of code?

    Could spacetime originate from an information processing mechanism analogous to that of Wolfram’s Elementary Cellular Automata or Conway’s Game of Life? Could it be a Turing machine, or a graph-rewriting system? Or would the choice among alternative models of computation be immaterial, each yielding the same physics and universe?

    Could this fundamental universal algorithm (if any) be discovered just by computer experiments, and by exhaustively mining portions of the computational universe?

    In the last few decades, several scientists (K. Zuse, J. A. Wheeler, R. Feynman, E. Fredkin, S. Wolfram, G. ‘t Hooft, S. Lloyd, J. Schmidhuber, M. Tegmark, to mention a few) have contributed, in a variety of ways and degrees, to creating a positive attitude about the ‘computational universe picture’, in an effort, sometimes called ‘digital physics’, whose interplay with other approaches in theoretical physics — most notably in Quantum Gravity — should still be thoroughly investigated.

    Workshop objectives

    The central questions posed by a computation-oriented view at the physical universe can be, and have been addressed by a variety of approaches in several disciplines, from mathematics to philosophy. However, the first edition of the JOUAL Workshop is strictly characterized by three attributes: experimental, emergent, simple (…‘but no simpler’). The purpose is to collect computer experiments that attempt to model physical/natural phenomena of any kind, from gravity to quantum fluctuations of empty space, from elementary particles to processes in the biosphere, by the emergent features of very simple computational rules. This includes, for example, evolutionary algorithms, but excludes ad-hoc programs that encode explicit information from the target domain.

    If the ultimate rules of nature are simple, hopefully their illustration can be made simple too: an effort is required from workshop contributors to keep their presentations at a level that could be accessed by researchers from multiple disciplines, and possibly by the interested layman.

    Important dates

    Paper submission: March 31, 2009 (16 pages, PDF)
    Paper acceptance: May 10, 2009
    Final paper due: June 1, 2009


    Please send your PDF file to both email adresses below:


    Submitted papers shall be selected for presentation and publication in the Workshop Proceedings based on adherence to the Workshop theme and on the key attributes mentioned above. Accepted papers will be considered for publication in special issues of the journal Complex Systems and/or Journal of Unconventional Computing.

    Conditional to the quality of the contributions and available support, an effort is planned for the divulgation of the Workshop results, e.g. via Web publication, for stimulating interest and curiosity, in the scientific community and in the general public, about the idea of searching for the (ultimate?) laws of nature by mining the computational universe.

    Program Committee

    Andy Adamatzky, Univ. West England, Bristol, UK
    Vieri Benci, Univ. Pisa, Italy
    Tommaso Bolognesi (coord.), CNR/ISTI, Pisa, Italy
    Cristian S. Calude, Univ. Auckland, NZ
    Leone Fronzoni, Univ. Pisa, Italy
    Fotini Markopoulou, Perimeter Institute, Waterloo, Canada
    Annalisa Marzuoli, Univ. Pavia, Italy
    Emmanuel Sapin, Univ. West England, Bristol, UK
    Jürgen Schmidhuber, IDSIA, Manno-Lugano, Switzerland
    Klaus Sutner, Carnegie Mellon Univ., Pittsburgh, PA, USA
    Matthew Szudzik, Carnegie Mellon Univ., Pittsburgh, PA, USA
    Hector Zenil, Univ. Paris 1, Univ. Lille 1, France (Wolfram Research)

    Physics-like computation, Wolfram’s PCE and Church’s thesis

    January 7th, 2009

    The lack of correspondence between the abstract and the physical world seems sometimes to suggest that there are profound incompatibilities between what can be thought and what actually happens in the real world. One can ask, for example, how often one faces undecidable problems. However, the question of undecidability has been considered to be better formulated (and understood) in computational terms because it is closer to our physical mechanical reality (through the concept of computation developed by Turing): Whether a computing machine enters a certain configuration is, in general, an undecidable question (called the Halting problem). In other words, no machine can predict whether another machine will halt (under the assumption of Church’s thesis –aka Church-Turing thesis).

    An interesting example of the gap between what the abstract theory says and what can be empirically ascertained was recently suggested by Klaus Sutner from Carnegie Mellon. He rightly points out that if no concrete instance is known of a machine with an intermediate Turing degree, and consonant with Wolfram’s Principle of Computational Equivalence, is because intermediate degrees are artificial constructions that do not necessarily correspond to anything in the real physical world.

    In David Deutsch‘s words physics is at the bottom of everything and therefore everything relies on physics (ultimately on quantum physics according to Deutsch himself). This is true for the core objects of study and practices in math: proofs, and in computer science: computer programs. At the end, that they are and how they are, is only possible by what it is feasible in the physical world. As if sometimes it were forgotten that mathematics and computation also follow in practice the same laws of physics than everything else.

    Sutner defines what he calls “physics-like” computation and concludes that machines with intermediate Turing degrees are artificial constructions unlikely to exist. According to Sutner, in practice machines seem to follow a zero-one law: either they are as computationally powerful as a machine at the bottom of the computational power hierarchy (what Wolfram empirically calls “trivial behavior”) or they are at the level of the first Turing degree (i.e. capable of universal computation). This seems to imply, by the way, that what Wolfram identifies as machines of  equivalent sophistication cannot be other but capable of universal computation, strengthening the principle itself (although one has to assume also Church’s thesis, otherwise PCE could be referring to a higher sophistication).

    So is PCE a conflation of Church’s thesis?

    No. Church’s thesis could be wrong and PCE be still true, since by the negation of Church’s thesis the upper limit of the feasible computational power would just be shifted further, and even if it turns out that the hypothesis of a Turing universe is false, PCE could be still true disregarding whether the universe is of a digital nature or not since it would refer then to the non-Turing limit as the one holding the maximal sophistication (not that I think that C-T is false though).

    Is PCE tantamount to the Church thesis in the provable sense?

    Wolfram’s PCE would be still falsifiable if the distribution of the intermediate degrees is proven to be larger than what informally PCE suggests. However, so far that hasn’t been the case and there are nice examples supporting PCE suggesting that very simple and small non-trivial programs can easily reach universal computation. Such as recent (weak) small universal Turing machines discovered by Neary and Woods and particularly the smallest TM proven universal by Alex Smith (a 2-state 3-color machine that Wolfram conjectured in his NKS book). However PCE could be as hard to prove or disprove as the Church thesis is. Unlike Church’s thesis PCE could not be disproved by exhibiting a single negative case but proving that the distribution of machines is different to what PCE suggests. A positive proof however may require an infinite verification of cases which is evidently non-mechanically feasible (and only negating Church’s thesis itself one would be able to verify all the infinite number of cases).

    I see PCE acting below the curve while Church’s thesis acting from above determining a computational limit (known as the Turing limit).


    PCE and Church's thesis sandwich effect

    PCE and Church's thesis (C-T) diagram

    The Shortest Universal Turing Machine Implementation Contest

    December 22nd, 2008

    The Shortest Universal Turing Machine Implementation Contest


    23 Dec – 2008

    Contest Overview

    In the spirit of the busy beaver competition though related to program-size complexity, we are pleased to announce the “Shortest Universal Turing Machine Implementation Contest”.

    The contest is open-ended and open to anyone. To enter, a competitor must submit a universal machine implementation written in the language specified in the contest web site (C++) with smaller size values than the latest  record published on the web page.

    In order to take part in this competition it is necessary to submit the source code, to be compiled using the compiler program and version specified in the contest web site. It is important that you provide documentation of your code, either in an attached file or as commented text in the source code file.

    Each submitter must agree to be bound and abide by the rules. Submissions remain the sole property of the submitter(s), but should be released under the GNU General Public License (GPL)  so we may be permitted to make them available on  this web site for downloading and executing.

    Rules (General Rules section)

    Team composition

    Players may enter alone or as teams of any size. Anyone is eligible to enter.

    Hector Zenil (IHPST and LIFL, Paris 1 University and Lille 1 University)
    Jean-Paul Delahaye (LIFL, Lille 1 University)

    The 2008 Midwest NKS Conference: What is computation? How does nature compute?

    August 19th, 2008

    2008 Midwest NKS Conference: Call for Papers and/or Participation


    What is computation? (How) does nature compute?

    2008 Midwest NKS Conference

    Fri Oct 31 – Sun Nov 2, 2008
    Indiana University — Bloomington, IN

    In 1964, in one of the six Messenger lectures he delivered at Cornell University (later published as a book “The Character of Physical Law”) Richard Feynman said: “It always bothers me that, according to the laws as we understand them today, it takes a computing machine an infinite number of logical operations to figure out what goes on in no matter how tiny a region of space, and no matter how tiny a region of time … So I have often made the hypothesis that ultimately physics will not require a mathematical statement, that in the end the machinery will be revealed, and the laws will turn out to be simple, like the chequer board with all its apparent complexities.

    The topic of the conference has been chosen with this quote in mind. The conference will host a most distinguished group of scientists supporting different views of a computable universe, from those supporting the thesis that Nature performs (only) digital computation and does it up to a maximal level, to those supporting the thesis of nature as a quantum computer. Some strongly suggest however that the true nature of Nature can be only explained by the study of randomness. Randomness however preserves its mysterious reputation, for some of these authors it seems that randomness can be generated deterministically in the classical sense, while others claim the existence of “true” randomness from the principles underlying quantum mechanics necessarily to explain the complexity seen around. This event will become the place of confluence in which all these views will be presented, discussed and analyzed by the guests and the conference participants themselves. After presenting their views during the first three days of the conference, the keynote speakers will then participate in a round table discussion on the topic.

    Invited speakers:

    • * Charles Bennett (IBM Research)
    • William Bialek (Princeton University)
    • Cristian Calude (University of Auckland)
    • Gregory Chaitin (IBM Research)
    • David Deutsch (Oxford University, via videoconference)
    • Edward Fredkin (Carnegie Mellon University)
    • Tony Leggett (University of Illinois)
    • Seth Lloyd (Massachusetts Institute of Technology)
    • Stephen Wolfram (Wolfram Research)
    • * Leonid Levin (Boston University)

    * to be confirmed

    Round table moderators:

    • James Gleick (author of Chaos, Genius and Isaac Newton)
    • Gerardo Ortiz (Indiana University Bloomington)
    • Hector Zenil (Univ. of Paris 1 and Univ of Lille 1)

    Conference topics:
    Topics of interest for submissions include (but are not limited to):

    – Pure NKS projects
    – The physics of computation
    – Computational physics
    – Foundations of computation
    – Universality and Irreducibility
    – Classical (digital) and quantum computation
    – Algorithmic information theory

    It is encouraged to relate the above topics with the conference title (What is computation? (How) does nature compute?) and the points of intersection between classical computation, quantum computation, algorithmic information theory, and the principle of computational equivalence.

    Organizing committee:

    Adrian German (Indiana University Bloomington)
    Gerardo Ortiz (Indiana University Bloomington)
    Hector Zenil (Univ. Paris 1 and Univ. of Lille 1)

    Misguiding research on Artificial Intelligence.

    August 7th, 2008

    Originally uploaded by hzenilc.

    The field of strong AI, sometimes referred to as artificial general intelligence or AGI, is defined as intelligence that is capable of the same intellectual functions as a human being, including self-awareness and consciousness, I will call the assumed object of study of AGI just agi in lowercases, not only to avoid confusion with the current field of research AGI, but also to differentiate agi from what is known as AI, just as AGI is meant to identify a field apart from AI

    I do think that it is techniques similar to those that led to the discovery of the Game of Life (see the previous post) that will eventually produce artificial general intelligence or agi. I think the study of the Game of Life is for the Game of Life what the fields of AI and AGI are for ai and agi respectively. In other words, there are no real fields of AI or AGI apart from adopting a suitable approach to the systems we already see around. Any simple systems such as CA are likely to achieve agi of the kind we expect without having to do anything else! From my point of view it is just a matter of technological sophistication, of providing the necessary elements and interfaces with the real world, and not of theoretical foundations or the invention of a mathematical or new conceptual framework. If the former is what AGI is focusing on, I think it is headed in the right direction, but to think of the AGI field as building agi from the bottom-up would be a mistake of the same kind as thinking of the Game of Life as designed or engineered. I find the terms usually used in the AGI field — “agi design” and “designing agi,”— discouraging.

    Perhaps an engineering based approach will succeed first. Whatever the case, it is a very exciting prospect which I look forward to, and I hope to make a contribution in my own field. But that doesn’t change the fact that in the end, just as with the Game of Life, agi will have been engineered and successful because it was already there rather than because we were clever enough to reinvent intelligence, as we have reinvented flying by building airplanes. I think current AI does for intelligence what airplanes did for flying. We don’t mind how we fly but we do mind how machines think or don’t think (at least as a matter of research for AGI). So the common airplane analogy does not work anymore for this. If we haven’t been able to create intelligence of the kind seen in humans it is because we are still looking to build airplanes. While it is unobjectionable for airplanes to keep flying as they do, flying unlike birds fly, current approaches from AI and AGI to agi will just keep producing airplane-type solutions to a non-airplane problem. That does not imply  that we have (as the negation of the airplane analogy suggests) to come up with agi by imitating human intelligence step by step; that’s another matter.

    I don’t see why intelligence would need to be different in nature from the potential behavior of all the sophisticated systems that are around us, why it should differ from them as airplanes differ from birds. Unfortunately all current research on AI and AGI seems to have a stake in making just such a distinction. While research on complexity theory and complex systems is faring better, it still seems to me to have missed the real problem when it comes to agi.

    Systems such as the Game of Life and Rule 110 (capable of universal computation) are potentially all the primary ingredients needed for agi. Disregarding matters of computational efficiency. I think it is just a matter of time before we are able just to provide the current systems with the potential power of general intelligence with the means to reach that power. Perhaps it would turn out to be an engineered approach, just as the Game of Life turned out to be capable of universal computation. But even when the authors of these approaches have the strong impression that they are designing agi, they will end up with just another powerful-enough general system capable of agi, but so ubiquitous that is more in the nature of a discovery than an invention.

    “No one has tried to make a thinking machine. The bottom line is that we really haven’t progressed too far toward a truly intelligent machine. We have collections of dumb specialists in small domains; the true majesty of general intelligence still awaits our attack. We have got to get back to the deepest questions of AI and general intelligence and quit wasting time on little projects that don’t contribute to the main goal.”
    — Marvin Minsky (interviewed in Hal’s Legacy, Edited by David Stork, 2000)

    The Game of Life: A notable discovery, not an invention

    August 7th, 2008

    John Conway giving a talk on knots

    Originally uploaded by hzenilc.

    It is well known that cellular automata (CA) can evolve in a few specific types of behavior, one of which is precisely the type discovered by John Conway in his Game of Life. I happen to think it is something of a misnomer to speak of John Conway as having “designed” the Game of Life. Working from an initial intuition (using it as a kind of filter) it is quite possible to find a CA of an equivalent type. The procedure is very simple. If you limit the space of all possible 2D CA to those that are clearly non-trivial (avoiding periodic and fractal evolutions) and apply a simple filter (based on compression, topology parameters or other known techniques), you end up with two basic classes identified and studied before by Stephen Wolfram as complex and random behavior (at the end, a single class, according to what he calls the Principle of Computational Equivalence or PCE). The probability that a complex 2D CA will behave like the Game of Life or another CA of equivalent sophistication is very high after the filtering process. As has been demonstrated, many of these interesting but not uncommon 1D, 2D or nD CAs share some fundamental properties, reaching a maximal degree of sophistication (namely complexity/universality).

    All this notwithstanding, it is Conway’s great achievement to have drawn attention to this particular 2D CA and to have analyzed it in such detail. Another example of such a CA is the elementary cellular automaton Rule 110. What’s special about Rule 110 is not that it has been extensively studied by Wolfram and eventually proven to be universal by him and his assistant M. Cook, though this is no doubt significant. What is truly noteworthy is that Rule 110 is meant to prove that universality can be reached even by simple systems, leading one to expect that all manner of interesting/complex CA are likewise universal.

    Almost every serious attempt to prove that a given arbitrary interesting CA is universal has been successful. Examples range from Conway’s Game of Life to Wolfram’s Rule 110 to Ed Fredkin’s Salt Model, and to cite the most recent, Maurice Margenstern and Yu Song’s universal cellular automaton on the ternary heptagrid (papers accepted for, 2nd Workshop on Reachability Problems).

    Rule 110, just like the Game of Life, was discovered, not built. These discoveries are the product of an exhaustive (‘random’) search (over the whole or a part of all possible rules)– which doesn’t mean that they are random discoveries. One can of course mistake such a discovery for an invention. For in low dimensional CA spaces it is very easy to explore the space of possible rules by changing the rules themselves before letting the system evolve through a few initial steps in order to observe said process of evolution. This may make it seem as if one has designed the system rather than simply explored a part of the space, but such is not the case. And even if it were, it is, strictly speaking, misguided to think in terms of designing rather than exploring for the reason given above—viz. that all complex systems have the potential to behave with a sophistication  equivalent to that of the Game of Life.

    Furthermore, by what Stephen Wolfram calls “irreducibility”, one cannot predict (and this applies particularly to complex CAs such as the Game of Life) how a CA rule will evolve without having run the said CA (and once proven to be universal most properties about this CA will be undecidable). Which explains why the study of CAs has been so successful lately, attracting many minds; because we now have the tools (computers) to let these systems run so we can observe how they develop. The idea of irreducible systems is very simple and powerful at the same time. Basically it holds that one cannot shortcut any of these interesting complex CAs, which means that there could have been no other way for Conway to test his CA apart from running it. Which in turn means that if it were not behaving in the right way, he would have had to change the rule of the CA and test it again, or even if he had the chance to find this particular CA at the first shot, he would have had to let it run for some steps before being sure about what he just made. This doesn’t mean that he wouldn’t have been able to figure out some of the basic properties of the rule he was working with beforehand, but it means that he wouldn’t ever have been certain of his conclusions save by actually running the CA itself. This has been the case with other rules as well, such as Rule 30, which no one has succeeded in cracking by coming up with a function or formula to shortcut the evolution of the rule able to provide the value at an arbitrary step without having to calculate the previous steps(i.e. without running the CA up to the step in question). Even if Rule 30 were crackable, based on computational complexity it would be expected that most of these systems are irreducible in several respects, from undecidability of the properties of the particular CA to incapability of our systems (including ourselves) to compress the information generated by these systems. Applying Wolfram’s PCE, a system with a maximal degree of computational sophistication would have the same computational power of another system of maximal degree, so one cannot expect any one of them to figure out what the other is doing without having to emulate it step by step.

    This doesn’t mean that Conway does not deserve credit for the sheer cleverness of the Game of Life and for the discovery of this interesting evolution of a 2D CA. But the Game of Life has become so popular and interesting not because it is rare, but because it has been so extensively studied as a particular instance of an interesting complex 2D CA. Furthermore, there is the fact that it involves some basic game theoretical principles pertaining to how life actually develops.

    I met John John Conway 3 weeks ago (and took the photo that goes along this text) and probably I should have brought up the subject with him. However, as far as I remember, Stephen Wolfram himself asked him how he came up with the Game of Life, but unfortunately he seemed to say that he was not certain anymore…

    Anima Ex Machina

    May 27th, 2008