New Ideas

Calculating a Universal Distribution to Approximate Kolmogorov-Chaitin Complexity

Posted in Algorithmic information theory, Complexity, Computability, Universality and Unsolvability, Computer Science, Foundations of Computation, New Ideas on December 12th, 2012 by Hector Zenil – Be the first to comment

Computing the incomputable has always been a challenge. For example, in finding the busiest Turing machines (Rado) given a number of symbols and states (whimsically called busy beavers). This means either finding Turing machines that, starting from an empty input, produce more non-blank symbols in their output tapes before halting than any other Turing machine of the same size, or Turing machines that, also starting from an empty input, have the greatest runtime before halting than any other Turing machine of the same size. Both problems are ultimately undecidable because of the Turing-complete capabilities of Turing machines, as proven by Alan Turing himself (that is, the capability of some Turing machines to simulate any other Turing machine).

In this new paper we describe how we have managed to calculate an approximation of a so-called Universal Distribution (aka Levin’s semi-measure) which connects the frequency of production of a string to its Kolmogorov complexity (K). The chief advantage of calculating an approximation of the Universal Distribution is that it is an incremental process over an average of a large number of Turing machines. One doesn’t get rid of the constant from the invariance theorem in the theory of algorithmic information theory (for example when Kolmogorov complexity is measured using 2 different universal Turing machines), yet one seems to have to make fewer arbitrary decisions.

One of the main advantages is that one can better deal with strings of very short lengths. Think about it! If one wished to approximate K for a single bit by using compression algorithms, the lossless compression algorithm would not be able to compress the single bit any further. And this not only happens for a single bit but for all strings up to a certain minimal length for which lossless compression algorithms are simply unsuitable (recall that a compression algorithm includes the decompression instructions together with the data in the new compressed object in order to make it self-decompressible).

The usefulness of lossless compression algorithms as a method for approximating K derives from the fact that compression is a sufficient test of non-randomness. This is because K is, more precisely than an uncomputable function, upper semi-computable, meaning that one can estimate upper bounds. The lossless compressed length of an object s (e.g. a string) is therefore an upper bound on K(s). The usefulness of the Coding Theorem Method (the theorem presented in this paper) will ultimately come down to whether it is useful in applications, which is the main motivation behind its conception, given the failure of compression algorithms to provide information about the possible K(s) values for s that are too short (shorter, say, than the typical size of the length of the instructions that a lossless compression algorithm has to add to the compressed object).

The state of the art of the Coding Theorem Method can be gleaned from this paper, recently made public by my colleagues and I, and announcing the release of the calculation of a universal distribution based on (5,2), that is, all Turing Machines with 5 states and 2 symbols: Calculating Kolmogorov Complexity from the Output Frequency Distributions of Small Turing Machines

It represents a major improvement over this previous calculation of mine, that required new and more powerful techniques to deal with a semi-computable distribution. It improves our previous work in terms both of accuracy and coverage of number of short strings and validates previous calculations of universal distributions, showing the incremental nature of the method to be fully compatible with the other calculated universal distributions with smaller samples of small Turing machines (and for which the known Busy Beaver values could be used).

In this other paper we explain why our approximations of K are real-number values, showing that strict integer-value program size follows our Coding Theorem Method, and thus that ours constitutes a more fine-grained measure. It is also shown that logical depth departs from both strict program-size and the Coding Theorem Method evaluations, being in agreement with the theory for all these 3 measures. The paper is available online at: Correspondence and Independence of Numerical Evaluations of Algorithmic Information Measures

In the next post I will be announcing and also briefly explaining the results from another paper showing that not only can our Coding Theorem deal with short strings, but that we have found a way to validate the method by lossless compressibility. Moreover, we have found that in the transition period, where the Coding Theorem Method starts to be too expensive to be of practical use whereas the compression method starts to provide some results, the 2 methods are in great agreement with each other. Like an expensive microscope of great power (e.g. the LHC at CERN), our Coding Theorem Method requires an incredible amount of calculation. The trick is to know when to use a microscope–to know when a microscope is required rather than a telescope. We believe we are providing the tools to deal with the information content of the smallest entities in the computational universe.

Meaningful Math Proofs and “Math is not Calculation”

Posted in Foundations of Math, General, New Ideas, Recreation on August 5th, 2012 by Hector Zenil – Be the first to comment

Mathematicians are generally thought to be very good at calculation, and they sometimes are, but this is not because math is about calculation, just as astronomy is not about telescopes and computer science is not about computers (paraphrasing Edsger Dijkstra).

But if it is not preeminently about calculation, then what is mathematics about? The common answer, given by mathematicians themselves, is that it is about solving problems–learning to do so, and to think in a logical fashion. I think this may also be misguiding. Let’s take as an example Fermat’s last theorem (which may also be misguiding, as some math is not only about historical unresolved questions with little impact, in some sense, Fermat’s Last Theorem is not much more than a curiosity).

As is well-known, Fermat’s last theorem was solved by Andrew Wiles in 1994. The solution wasn’t simple at all, and it required a lot of convoluted math (Wiles himself got it wrong first and a few years later fixed it). The solution required much more machinery than the machinery required to formulate Fermat’s original statement, it is like going after a fly with a bazooka. For some mathematicians concerned with the foundations of math, Wiles’ theorem may not be as satisfactory they would have wished, even though as a piece of mathematical work connecting several areas of advanced math.

Wiles’ exact solution couldn’t have been arrived at earlier, as it draws upon some of the most sophisticated areas of modern math, it transfers a pure problem of arithmetic to a very powerful framework founded in algebraic geometry and analysis (elliptic curves, modular forms). If Fermat had a solution it wouldn’t have been at all similar to Wiles’. In fact, even before Wiles’ proof, no mathematician ever thought that Fermat’s last theorem was actually false. The consensus was that the theorem was true, so the long expected proof was supposed to shed light on the reason behind this truth rather than simply confirm it. Wiles’ great achievement only confirmed that with a very sophisticated set of tools one could prove Fermat’s last theorem, but it didn’t shed any light on why it was true (if it does, it is still hard to justify such a tradeoff in which to shed some light in arithmetic one requires the use of set theory).

Wiles’ solution doesn’t answer the question of whether Fermat himself had a more basic (but no less valid) proof. The most likely case is that Fermat’s last theorem is actually independent of number theory, and requires the full power of set theory. Hence Fermat very likely had no such solution, contrary to what he claimed.

Fermat’s theorem is, however, a rare case of the ultimate great challenge, and not part of the common practise of the “average” mathematician. The work of an “average” mathematician depends very much on its field. If the field is concerned with foundations it may involve a lot of set theory and logic, which in large part involves finding sets of formulae to prove other sets of formulae and looking for relativisations. In algebra or group theory, or even topology, their work may have to do with all sorts of symmetries or ways to characterise things in different ways in order to establish new and unanticipated connections, shedding more light on one area in terms of another, from a different point of view (and employing a different language). This connection is in fact the greatest achievement of Wiles’ proof of Fermat’s last theorem, as it connects several subdisciplines of modern mathematics building sophisticated intuitions of true statements, ultimately connected to Fermat’s last theorem.

I think math can be better described as a discipline of patterns (e.g. mappings, homeomorphisms, isomorphisms). Patterns are integrally related to the discipline I value most: algorithmic information theory, which is the ultimate field of the study of patterns, patterns in strings and objects, leading to applications such as compression algorithms.

One of the consequences of this change of mindset is that it will allow students access to computers. Computers and humans together can achieve an impressive number of things; they do so everyday in all sorts of fields, from car design to the development of new chips for the next generation of computers. In fact, airplanes today are mostly driven by computers, no pilot or set of pilots alone would be able to fly a modern airplane these days, and more than 60 to 70% (in some markets even 90%) of the operations in any top stock market are computer made, with no human intervention. Today’s industry wouldn’t be conceivable without the combined forces of human creativity and motivation, and the computer’s speed, accuracy and productivity.

There is an important new trend spearheaded by the UK initiative Computer-Based Math
promoting these very ideas. Conrad Wolfram has recently explained in a blog post how computer programming could not only help to teach math, but how they could merge into a single discipline. I may have a chance to talk about all this at the House of Lords soon, where the education shift would need to take official form.

An alternative method (to compression) for approximating the algorithmic complexity of strings

Posted in Algorithmic information theory, Foundations of Computation, New Ideas on July 20th, 2011 by Hector Zenil – Be the first to comment

The method introduced in my doctoral dissertation was featured in the French version of Scientific American Pour La Science in its July 2011 issue No. 405 under the title Le défi des faibles complexités.

Jean-Paul Delahaye points out that:

Comme les très petites durées ou longueurs, les faibles complexités sont délicates à évaluer. Paradoxalement, les méthodes d’évaluation demandent des calculs colossaux.
(Like long durations or very short lengths, weak complexities are tricky to evaluate. Paradoxically, the evaluation methods require colossal calculations.)

and he continues:

Pour les petites séquences, cette mesure est stable et conforme à notre idée de la complexité; pour les grandes, elle est, d’après le théorème mentionné [the invariance theorem, my comment], conforme à la meilleure mesure de complexité unanimement admise, la complexité de Kolmogorov. Que demander de plus?
(For short strings, this measure is stable and conforms to our idea of complexity; for long strings, according to the aforementioned theorem [the invariance theorem-- my comment], it conforms to the best and universally accepted measure of complexity, Kolmogorov complexity. What more can one ask for?)

Imagine you are given 2 short strings and are asked to tell which one looks more random and which more structured. The strings are 0101 and 1011. Applying the idea of algorithmic complexity, the shorter the description the less random-looking. It would seem that the first string has a repeating pattern that can be taken advantage of in describing it. In plain English one may therefore say that the first string can be described as “zero and one twice” while the second one would require a longer description. In fact, there seem to be fewer descriptions for the first string than for the second (and also notice some ambiguity in plain English). The first may also allow the description “A zero followed by a one followed by a zero followed by a one” or “zero and one followed by the same” and perhaps other variations. Descriptions of the second one can include “one and zero followed by two ones” or “one zero one one,” just as the first one could simply be “zero one zero one,” which doesn’t look like a compressed version of the string but rather a straight translation into an expanded form of plain English.

All this by way of asking whether any of the two strings is without a doubt simpler than the other, or whether the apparent repetition in the first string makes us think that the string has a pattern despite the pattern being repeated only twice. Perhaps when one looks at such a string one gets the impression that it may belong to a larger string comprising alternations of 01 and one concludes that it is simpler than the second one. To leave the subjective realm, one would need to evaluate the algorithmic complexity of the strings and compare their respective values. The algorithmic complexity of a string is the shortest program (measured in bits) producing th string in question running on a universal Turing machine. It is inconvenient that there is no algorithm that, given a string, gives you the length of the shortest program that produces it. This is by reduction to the halting problem. Which means one cannot really measure with absolute certainty the algorithmic complexity of a string because it is uncomputable. It doesn’t mean, however, that one cannot approach it; one can often do so in an effective and useful way.

Traditionally, the way to approach the algorithmic complexity of a string was by using lossless compression algorithms. Lossless means that one can recover the original string from the compressed version. The use of lossless algorithms is preferable because there may be inconsistent ways of compressing strings that give the impression of compressing a string without corresponding to a measure of complexity (e.g., noise deletion is an effective way, but noise may be algorithmically random and the aim of algorithmic complexity is to precisely measure how random something is and not merely to gauge whether it looks random). The result of a compression algorithm is an upper bound of its algorithmic complexity. While one cannot ever tell when a string is not compressible, if one succeeds at somehow shortening a string one can tell that its algorithmic complexity cannot be larger than the compressed length.

One does not want to cheat and claim that one can compress any string into a bit if the decompression algorithm interprets that bit into the desired string. A fair compression algorithm can be defined as one that transforms a string into two pieces: one is the compressed version and the other the instructions to decompress the string, together accounting for the final length of the compressed version. In other words, it would seem as if you were adding the decompression algorithm to the compressed string so that the compressed string comes with its own decompression instructions. In the long run, there is a theorem (invariance) that guarantees that complexity values converge.

But for short strings (which are often the ones useful for practical applications), adding the decompression instructions to the compressed version makes the compressed string often, if not always, longer than the string itself. If the string is, for example, shorter than the size of the decompression algorithm, there will be no way to compress the string into something shorter still, simply because the decompression instructions are at least of the length of the original string. Moreover, the result is so dependent on the size of the decompression algorithm (because it is the greatest contributor to the overall length) that the final value of the compressed length is too unstable under different lossless compression/decompression algorithms.

For example, if one tries to compress a short string using Mathematica, one gets the following results:
StringLength@Compress["0101"] = 30

Looking at the beginning of the compression line when plotting the lengths of strings (x axis) against their compressed lengths (y axis) one observes that it does not start at y=0 even when x=0.

This means that compressing the string 0101 with 4 bits took 46 characters (even more in bits). In Mathematica, strings begin to be compressed this way at about length 30. This is not a malfunction of Mathematica; it is the result of what I have explained. The Mathematica Compress function is actually based on the Deflate lossless compression algorithm, which is a combination of the LZ77 algorithm and Huffman coding, among the most popular lossless compression algorithms available, used in formats like ZIP, GZIP, GIF and PNG (these last two are therefore lossless compression image formats).

Zooming into the axis origin of the plot one can see that the beginning looks unstable. The string in question is a repetition of n 1s with n lying on the x axis (which represents different compression lengths rather than individual ones to avoid repetition and a step-like curve)

If one tries to compress 1011 one gets none other than the same value:
StringLength@Compress["1011"] = 30

The instructions obviously take up some space in the final compressed length and they cannot be compressed themselves (if they were, they would in any case be the same for all strings, taking us back to the same situation). There is a limit for compression algorithms to compress short strings. So if one wished to tell which of these two strings were objectively more or less randomly complex by approximating their algorithmic complexity using a compression algorithm, it turns out that there is no way to do so.

On the other hand, given that the definition of algorithmic complexity based on compressibility says that the less compressible the more randomly complex a string, one could immediately say that a single bit, 0 or 1, is random for certain, i.e. has maximal algorithmic complexity, given that there is no way to further compress a single bit. In other words, there is no program shorter than 1 bit that can produce 0 or 1. The shortest descriptions of 0 and 1 are therefore 0 and 1 themselves. Hence 0 and 1, according to the compressibility approach, are random strings. It may seem to make no sense to characterize a single bit as random.

On the one hand, a single bit does not carry any information and on these grounds one may think of it as somehow random. If one thinks about whether one would have been able to predict 0 or 1 as the outcome of a process, given that there is no context because they occur alone, one may also conclude that their occurrence is somehow random. In other words, if you see a string like 010101 you may bet that the next bit is 0, but if you are provided with nothing there is no way you can favor any position on whether the bit to come is 0 or 1. So much for justifying that a single bit is random.

It is hard, however, to justify how 0 could look more random than, say, any other possible string. If 0 is random how is it relatively more complex than 00? Or 01? Intuition tells us that short strings shouldn’t be that random (more random than, for example, longer random-looking strings), so if a single bit is the most random among all finite strings, how could it be that there is such a phase transition from maximal random complexity to very low complexity of, say, strings of length 2, 3 or 5 bits long?

Since intuition tells us that something random should also be uncommon and rare, what if one asks how common 0 or 1 are as results of a program? There is a measure that gives the probability of a program’s producing a given string running on a universal Turing machine. This is a measure we used to present a new method to evaluate the complexity of strings, as an alternative to the traditional use of compression algorithms. The new method aims particularly to solve the problem of the evaluation of the complexity of short strings, as we’ve discussed. It is based on Levin-Solomonoff’s algorithmic probability and is connected back to algorithmic (Kolmogorov-Chaitin) complexity by way of Chaitin-Levin’s coding theorem.

Algorithmic probability says that it is not the case that a single bit is the most complex random string, but actually the most structured possible one and, more importantly , that the complexity transition is smooth, more in accordance with intuition.

It may be that it makes sense that a single bit can be regarded as both the most simple and the most complex of strings from different perspectives, and the advantage of the algorithmic probability approach is that it provides not only a different notion of the complexity of a single bit, one that is in keeping with intuition, but also that it generates a different outcome to the compressibility approach, even when the two measures are intimately related and asymptomatically produce the same results in the long term (for longer strings). I think the two views reflect different aspects of what a single bit represents.

The paper presenting the novel method for evaluating the algorithmic complexity of short strings was first proposed and sketched in Greg Chaitin’s 60th anniversary festschrift edited by Cris Calude (J-P. Delahaye & H. Zenil, “On the Kolmogorov-Chaitin complexity for short sequences,” Randomness and Complexity: From Leibniz to Chaitin, edited by C.S. Calude, World Scientific, 2007). The method uses an exhaustive and systematic search of Turing machines inspired by Wolfram’s NKS dictum, from which a frequency distribution of the halting machines is built and the Levin-Chaitin coding theorem applied to evaluate the algorithmic complexity of binary strings.

Chaitin pointed out (regarding our method) that:

…the dreaded theoretical hole in the foundations of algorithmic complexity turns out, in practice, not to be as serious as was previously assumed.

The full technical article is available in ArXiv Numerical Evaluation of Algorithmic Complexity for Short Strings: A Glance into the Innermost Structure of Randomness.

You can also look at the slides of the presentation I delivered at the Alan Turing amphitheater at the Computer Science Department of the University of Lille 1:

“The World is Either Algorithmic or Mostly Random” awarded a 3rd Place Prize in this year’s FQXi contest

Posted in Algorithmic information theory, Complexity, Foundations of Computation, Foundations of Physics, General, New Ideas on June 10th, 2011 by Hector Zenil – 2 Comments

Based on the combined ratings of the contest community and the panel of expert reviewers appointed by the FXQi, which included the members of the institute, I was awarded a 3rd Place Prize for my work The World is Either Algorithmic or Mostly Random in this year’s FQXi contest on the topic Is Reality Digital or Analog? sponsored by the Foundational Questions Institute. The winners were announced at this year’s World Science Festival in New York City.

My work can be summarized in one line as an explanation of the complexification process of the world, the process whereby we have evolved from a more primitive (random) state to the current organized state of the universe.

The essay is a summary of the philosophical branch of my current scientific research on finite algorithmic information theory. This philosophical branch is concerned with the exploration of the possible connections between algorithmic complexity and the physical world (or what happens in it). I propose the notion that the universe is likely digital, not as a claim about what the universe is ultimately made of but rather about the way it unfolds. Central to the argument are concepts of symmetry breaking and algorithmic probability, which are used as tools to compare the way patterns are distributed in our world to the way patterns are distributed in a simulated digital one. These concepts provide a framework for a discussion of the informational nature of reality. I argue that if the universe were analog, then the world would likely look more random, making it largely incomprehensible. The digital model has, however, an inherent beauty in its imposition of an upper limit and in the convergence in computational power to a maximal level of sophistication. Even if deterministic, that the world is digital doesn’t necessarily mean that the world is trivial or predictable, but rather that it is built up from operations that at the lowest scale are simple but that at a higher scale look complex and even random–though in appearance only.

How have we come from the early state of the universe (left) to the structures we find today (right)?

The arguments supporting my views are partially based on the findings of my research, epitomized by our most recent paper Numerical Evaluation of Algorithmic Complexity for Short Strings: A Glance into the Innermost Structure of Randomness available in ArXiv in which my co-author and I describe a method that combines several theoretical and experimental results to numerically approximate the algorithmic (Kolmogorov-Chaitin) complexity of bitstrings by using the concept of algorithmic probability, which is connected to algorithmic complexity by way of the (Levin-Chaitin) coding theorem.

An extended (and detailed) version of The World is Either Algorithmic or Mostly Random is forthcoming and will be eventually posted.

Compression-based Investigation of Cellular Automata, A Phase Transition Coefficient and a Conjecture Related to Universal Computation

Posted in Algorithmic information theory, Computability, Universality and Unsolvability, Computer Science, Foundations of Computation, New Ideas on August 22nd, 2010 by Hector Zenil – 3 Comments

In my new article Compression-based investigation of the dynamical properties of cellular automata and other systems, published in the Journal of Complex Systems (19:1), pages 1-28, I present a method for studying the qualitative behavior of cellular automata and other abstract computing machines based on the approximation of their program-size complexity using a general lossless compression algorithm. I show that the compression-based approach classifies cellular automata (CA) into clusters according to their heuristic behavior, with these clusters showing a correspondence with Wolfram’s main classes of systemic behavior. I also present a Gray code-based numbering scheme for initial conditions optimal for this kind of investigation, and a compression based method for estimating a characteristic exponent in the form of a phase transition coefficient measuring the resiliency or sensitivity of a system to its initial conditions. And I conjecture that universal systems have large transition coefficients.

I think this constitutes a novel framework for investigating the dynamical properties of cellular automata and other systems. Here I will discuss some of the main ideas and implications of the paper. A pdf version of the paper is available online on ArXiv

Algorithmic complexity: classification into Wolfram’s four classes

In A New Kind of Science and in several papers dating from the mid-1980s, Stephen Wolfram defined four classes into which cellular automata (CA) and several other systems evolve, each capturing a different qualitative behavior which evolves from the same ordered initial configuration (the simplest black cell).

  • Class 1 consists of CA whose dynamics reach a steady state regardless of the initial conditions.
  • Class 2 consists of CA whose long-term evolution produces periodic or nested structures.

These first two classes are simple, in the sense that their long-term evolution can be deduced from running the system for a small number of steps.

  • Class 3 CA produce structures that seem random.
  • Class 4 CA produce localized structures on a random looking background, and hence are complex looking.

Wolfram’s classification is heuristic, and the assignment of CA to the four classes is somewhat subjective. To the best of my knowledge, there is, to date, no universally agreed upon classification scheme for CA. I think, however, that there is no better approach than a pure program-size complexity measure, which is the approach I follow in this paper. I propose to continue the investigation later, using the same measure to discover other interesting properties and possible hierarchies.

An interesting question related to Wolfram’s classification concerns its dependence on the initial condition–chiefly because the classification was originally meant to be constructed by visual inspection over the evolution of a CA, and as we know, the evolution of a CA depends on its initial condition. This has been a major critique (Eppstein) of Wolfram’s classification, because the expectation is that the classification should be based on the evolution from an unordered (random) configuration.

Nevertheless, the classification is actually based on the potential of a CA to evolve into any of the possible behaviors from at least one initial configuration (the question is of course not answerable in finite time, since there is an infinite number of possible initial configurations). Wolfram’s classification may therefore be seen as being dependent on the initial condition of a CA.

It is not a surprise that one can, for example, construct a CA belonging to more than one of Wolfram’s four classes when starting from different initial configurations. Rule 110 belongs to class 4 because it is capable of universal computation–one can set up an initial configuration to ‘program’ rule 110 to carry out any computation (this being the very basic concept of a programmable computer).

For every CA rule there is a definite (but in general undecidable) answer to the question whether or not it is capable of universal computation (or in reachability terms, whether a CA will evolve into a certain configuration). The question only makes sense if the evolution of a CA depends on its initial configuration. No rule can be universal that fixes the initial configuration once and for all (there would be no way to input an instruction and carry out an arbitrary computation).

On the other hand, some rules, such as Rule 0, don’t produce different configurations relative to variant initial configurations. No matter how you may change the initial condition, there is no way to make it compute something other than what it actually computes for every other initial configuration.

A possible objection (made by David Eppstein) is that there are CAs that can be made to look as if they belonged to all classes by modifying their initial conditions. Which is true: a CA may belong to a certain class until, given another initial configuration, it is made to behave as if it belonged to another class.

My compression-based approach shows that Wolfram’s heuristic classification can actually be quantified by a measure which is clearly dependent on the initial conditions, while also being capable of detecting sensitivity to initial configurations, and hence of replacing the visual inspection. This hierarchical classification is well defined and is therefore a good candidate for a complexity measure.

The second part of my investigation actually takes advantage of the ability of CAs to behave differently in order to undertake a closer inspection and a novel classification, taking into account the average capability of a system to behave in different ways.

Differentiation from a priori approaches

The approach is different from others in that it is an a posteriori technique, unlike, for example, Langton’s lambda, a measure of the density of a CA rule. It is an a posteriori technique because unlike this lambda number, the compression-based approach requires the CA to evolve before saying anything about it, whereas Langton’s lambda is computed from the rules of the CA.

Langton’s lambda is simply the fraction of rules in which the new state of the cell is non-zero. For example, the rules of the elementary cellular automaton number 1 in Wolfram’s enumerating scheme are simple: all 3-tuples sent to 0 but one, the last one. Therefore Langton’s lambda is 1 over 8.

The lambda parameter of a CA is a number between 0 and 1. For example, if lambda is 0 (e.g. for ECA rule number 0), the evolution develops into a trivial state. Langton found that CA rules with lambda close to zero evolve into trivial states and CA rules close to 1 evolve into random-looking behavior, with complex behavior somewhere in between. It is near this transition that the most interesting CAs will lie, the ones that manifest the most complex behavior.

Unfortunately, classifying CAs with lambda values is more complicated than that, as one quickly faces undecidability. If it were possible to decide once and for all whether a CA is complex by computing its lambda value, without having to run the CA for a single step, one could solve all kinds of undecidable questions simply by looking at a CA’s rules.

The critical value for lambda is not a universal constant. Nor, for that matter, is my phase transition coefficient. But the main difference, as noted before, is that the compression-based approach actually looks at the evolution of the system rather than trying to figure everything out from the description of the CA.

The compression-based method represents a formal approach to Wolfram’s classification process, replacing the need for a visual inspection with a technique and a coefficient to determine to what class a CA belongs. The approach is compatible with what Stephen Wolfram himself has proposed in his NKS book, without contradicting any computability result or Wolfram’s own Principle of Computational Irreducibility, which says that while some computations may admit shortcuts that allow them to be performed more rapidly, others don’t, so that one cannot really tell what the evolution of a CA will be, except by running it.

Initial configuration numbering scheme

Ideally, one should feed a system with a natural sequence of initial configurations of gradually increasing complexity. Doing so ensures that qualitative changes in the evolution of the system are not attributable to discontinuities in its set of initial conditions.

What I propose is an initial configuration numbering scheme where two successive values differ by only one bit. To explore the qualitative behavior of a cellular automaton when starting from different initial configurations, the optimal method, because of its smoothness, is to follow this Gray encoding enumeration, in order to avoid any undesirable “jumps” attributable to the system’s having been fed with discontinuous initial configurations. By following the Gray code, an optimal numbering scheme was devised so that two consecutive initial conditions differed only in the simplest degree (by one bit). This simple convention will allow us to continue the investigation of the dynamical behavior of a system without being concerned about arbitrarily introducing artificial complexity when moving from one initial configuration to the next.

Phase transition detection and coefficient

I defined a measure based on the change of the asymptotic direction of the size of the compressed evolutions of a system for different initial configurations (following the proposed Gray-code enumeration of initial configurations). My phase transition coefficient yields an interesting classification: it measures the resiliency or sensitivity of a system to its initial conditions. So rules such as 30 and 0 appear close to each other. Odd as this may seem, this is because both, when their initial conditions are changed, behave in more or less the same way. In other words, there is no change in the qualitative behavior of these CAs when feeding them with different inputs, regardless of how different the inputs may be.

In this phase transition classification, for example, rules such as 122 and 89 appear next to each other, because, as the investigation proves, they are both CAs with relatively high phase transition coefficients, meaning that they are very sensitive to initial conditions, dramatically changing their qualitative behavior when starting from one rather than another initial configuration.

Phase transition and predictability

An obvious feature of universal systems is that they need to be capable of carrying information by reflecting changes made to the input in the output. In attempting to determine whether a system is capable of reaching universal computation, one may ask whether a system is capable of this in the first place, and how efficiently it does so. And this is what the phase transition actually measures, because it tells how well a system manages to respond to an input. Obviously, a system such as rule 0 or rule 255, which does not change regardless of the input, is trivially decidable. But a universal system should be capable of some reaction to external manipulation (the input to the system). The inverse, however, should not hold, because having a large transition coefficient by no means implies that the system will behave with the freedom required of a universal system if it is to emulate any possible computation (a case in point may be rule 22, which, despite having the largest transition coefficient, may not be versatile enough for universal computation).

The phase transition measure also implies that one may be able to predict the behavior of a system for an initial configuration with a degree of certainty based on the previous variability of the system in relation to an initial segment of initial configurations. It is an open question whether this is a lower bound. In other words, it is unclear whether looking at the behavior of a system for a certain length of time and for certain configurations will tell you anything about its behavior for other initial configurations or if it were allowed to evolve for a longer period of time. Experience says one would do well to predict future behavior on the basis of past behavior, and this may also be related to Israeli and Goldenfeld‘s very interesting findings. In 2004 they showed that some computationally irreducible elementary cellular automata have properties that are predictable at a coarse-grained level. They did so following a renormalization group (RG) technique, which refers to a mathematical apparatus that allows one to investigate the changes in a physical system as one views it at different distance scales.

They were able to provide a hierarchy of complexity in agreement with their apparent complexity. Israeli and Goldenfeld’s classification is also an a posteriori investigation, but it too is bedeviled by the unavoidable (and ultimately undecidable) statistical question, namely, whether one can keep on predicting for all initial conditions and for any number of steps, without having to run the system  forever and for all possible initial conditions. But unlike mine, their classification is partial, in the sense that one cannot always say whether the complexity of one CA is greater than that of another. Based on the comparison of their approximated compressed sizes, however, I could come up with a total order in full agreement with their apparent complexity as well as with Wolfram’s four classes (one that even yields the same number of classes).

An open question directed to me by Nigel Goldenfeld– in trying to understand the similarities and differences between their RG and my algorithmic approach–concerns how their hierarchy relates to mine. What they experimentally suggest is that the larger the scale of the transformations used, the more highly structured the objects and therefore the greater their algorithmic complexity.

For example, rule 110 is one rule about which my own phase transition classification says that, despite showing some sensitivity, it also shows some stability. Which means that one can say with some degree of certainty how it will look (and behave) for certain steps and certain initial configurations, unlike those at the top. This turns out to be predictable according to Israeli and Goldenfeld as well, at a coarse-grained level, after a scale transformation.

A sketch on a conjecture on the transition coefficient of Turing-universal systems

Based on two empirical facts, I conjecture that universal CAs should have a large transition coefficient, as 110 does. Rules such as 110 and 54 also turn out to be next to each other in the phase transition classification, both having large values (they are in the top 24, and both the top 24 and the bottom 22 are in the original paper: see the last 2 tables).

So I base this conjecture on the following empirical observations:

1. The only known universal elementary cellular automata figure at the top of this classification, and so do candidates such as rule 54, which appears right next to rule 110.

2. Universality seems to imply that a system should be capable of being controlled by the inputs, which our classification suggests those at the bottom are not, as all of them look the same no matter what the input, and may not be capable of carrying information through the system toward the output.

Other rules that some people think may be capable of some sophisticated computation (See paper in the Journal of Complex Systems by Paul-Jean Letourneau) also have large transition coefficients, such as rule 146, with a transition coefficient 0.124, ranking 39 out of the 256 elementary cellular automata.

As noted before, however, the transition coefficient is not a universal measure. In this case, coefficients were calculated for 600 steps in blocks of 50 for the first 500 initial conditions, which means that some rules may be capable of greater transitions but are somehow ‘slow’ at the selected number of steps and number of initial conditions. i.e. they take more than 500 initial conditions–in Gray code numbering–to show larger changes, or else larger changes are being missed because of the jumps in blocks of 50 steps, though this latter possibility is less likely.

The conjecture also seems to be in agreement with Wolfram’s claim (made at some of his oral presentations) that rule 30 (as a class 3 elementary cellular automaton) may be, according to his Principle of Computational Equivalence (PCE), computationally universal. But it may turn out that it is too difficult (perhaps impossible) to control in order to perform a computation, because it behaves too randomly.

Classifying objects by complexity

Posted in Algorithmic information theory, Complexity, Computer Science, New Ideas on June 2nd, 2010 by Hector Zenil – Be the first to comment

I have coauthored, with Jean-Paul Delahaye and Cedric Gaucherel, and made available today on arXiv a new paper entitled Image information content characterization and classification by physical complexity. In the paper we present a method for estimating the complexity of an image based on the concept of Bennett’s logical depth. Unlike the application of the concept of algorithmic complexity by itself, the addition of the concept of logical depth results in a characterization of objects by organizational (physical) complexity. We use this measure to classify images by their information content. The method provides a means for evaluating and classifying objects by way of their visual representations.

The method described in the paper ranks images based on their decompression times and the classification corresponds to the intuitive ranking resulting from a visual inspection, with things like microprocessors, human faces, cities, engines and fractals figuring at the top as the most complex objects; and random-looking images, which ranked high by algorithmic complexity, were ranked low according to the logical depth expectation, classified next to  trivial images such as the uniformly colored, indicating the characteristic feature of the measure of logical depth. A gradation of different complexities were found in the groups between, gradually increasing in complexity from bottom to top.

significant different groups

Complexity classification of images, from more complex to less complex(group descriptions on the right are followed by the average decompression times as approximations to Bennett's logical depth)

Along the paper we show that:

  • The concept of logical depth can be implemented as a feasible and applicable method to approach a real-world problem.
  • After studying several cases and tested several compression algorithms, the method described in this paper has shown to work and to be of use for identifying and classifying images by their apparent physical complexity.
  • The procedure described constitutes an unsupervised method for evaluating the information content of an image by physical complexity.
  • As the theory predicted, logical depth yields a reasonable measure of complexity that is different from the measure obtained by considering algorithmic complexity alone, while being in accordance with one’s intuitive expectations of greater and lesser complexity.
  • The paper is available here.

    Stephen Hawking: A brief examination of the recent warning over alien civilizations

    Posted in Complexity, Foundations of Biology, General, New Ideas, Recreation on May 4th, 2010 by Hector Zenil – 1 Comment

    Stephen Hawking asserts that while aliens almost certainly exist, humans should avoid making contact.

    The original story published by BBC News can be found here.

    He claims: “We only have to look at ourselves to see how intelligent life might develop into something we wouldn’t want to meet.”

    Stephen Hawking recent assertion looks like an interesting time to bring up the question of intelligent life, meaning and purpose.

    Let’s examine what Hawking’s argument implies, assuming that intelligent life, other than human, exists elsewhere in the universe:

    1. Aliens come from an Earth-like planet.
    2. The Earth has something the aliens do not.
    3. The Earth has something aliens need or want.

    Each point may be not necessarily independent of each other, each may be chained or added to the other. Point 2 may imply Point 1. However, Earth-like planets are likely to be quite common, as the current research on exoplanets seems to suggest. While Point 1 is chauvinistic, Point 3–the notion that Earth possesses something special– is egocentric. Concerning Point 2 again: as a sort of example, many cities on Earth are in need of potable water. Does that mean they can just go looking for it in another city?

    If we think that aliens are looking for a planet with an atmosphere like ours, we are already making a very strong assumption that requires further support. To assume that their planet has the same atmosphere as ours and that they would consequently need a planet exactly like it, is to make a strong claim–and a highly implausible one at that. We may think that water is the most precious element in the universe, but it might just be lethal to an alien civilization. If it is true that we can only imagine them in the basis of the kind of life and it diversity on Earth, it is also true that we know nothing about them, if they really do exist.

    If humans find life elsewhere, and if it is not related to Earth’s life, it is likely to be unrecognizable. After all we have experience of a very large range of life forms here on Earth, and we find some of them already alien enough. “Extremophiles” is the name we give to Earth species that can survive in places that would quickly kill humans and other “normal” life-forms.

    As for needing a work force, it doesn’t make sense that aliens would seek it here. Humans are highly unproductive. And aliens with the technological capabilities to travel to Earth are likely to have had long experience building the kinds of machines that humans have only recently got around to building and improving. Advanced intelligent aliens most likely make extensive use of robots, if they are not some kind of cybernetic beings themselves.

    Though assuming that aliens and humans are alike in their biochemical constitution may be chauvinistic and misguided, the supposition that our intelligences are similar has a more solid basis, especially since we also assume that the alien civilization in question has built spaceships, travels in space and is looking for other civilizations. According to Stephen Wolfram’s Principle of Computational Equivalence (PCE), there is a good chance that if non-trivial life forms exist they will have or develop the same degree of sophistication (e.g. intelligence) than ours. Wolfram’s PCE would also suggest that once primitive life has developed it will eventually achieve its maximal degree of sophistication, which would be the maximal possible degree of sophistication. In other words, if life is around, intelligent life is as likely as life.

    We used to say that aliens are likely to be much more advanced than humans. Why so? Because the universe has a 15 billion year history. Characterizing a civilization that has been around for about 2 million years and has only begun developing technology in the last few hundred as ‘advanced’ is beyond naive, it is statistically implausible. As Carl Sagan pointed out, every intelligent civilization seems to reach a period when its own technological power is capable of destroying itself. This is the stage which human civilization reached about 70 years ago (and still is), with the invention of atomic bombs. The longevity of said aliens’ civilization means that they have managed to make good use of the resources of their host planet. Even assuming they have reached the point where they have exhausted their resources, we have to concede that they have probably been good, ecologically responsible stewards of their planet. Of course it is still possible, despite all this, that these aliens may wish to colonize a new planet in order to exploit its resources rather than simply seizing what they find and taking it away with them.

    Water is one of the most common elements in the universe, or it is quiet easy to synthesize it (see the chemical recipe here). If they just need a rock revolving around a star at a distance such that life can be sustained, there are certainly many more possibilities than coming to Earth. Think about it. The human civilization is much closer to achieving the terraformation of Mars (reengineering Mars soil and atmosphere to make it Earth-life friendly), given its current and near future capabilities than traveling abroad to conquer, if not cohabit with another civilization.

    Now, from a practical standpoint, the notion that we could hide away in a corner of the universe is nonsense, to say the least, if we are not somehow already hidden due to the size of the universe itself. The Earth has been broadcasting to space since radio signals were invented. Messages have been kept symbolical on purpose. So in the worst case scenario, assuming Hawking is right and our signals are likely to be picked up by an alien civilization willing to take us on, hiding is just rubbish because we can’t. As Seth Shostak says, the first thing to do would be to shut down the BBC, NBC, CBS and the radars at all airports. Not unless we install a kind of Dyson sphere around the Earth or stop broadcasting altogether. Now, the only way to make sense out of Hawking particular comment in this regard, is taking into consideration time scales. If it is true that Earth has been broadcasting to the space during the last 50 or more years, it is also true that someday in the future it may reach a technological state that allows it to avoid, purposely or not, doing so. So it may be the case that civilizations decide to hide themselves after a short period of broadcasting history.

    The advice to hide from aliens implies of course that they are destructive, or that the contact between our civilizations may be destructive, and this brings us back to the question of their longevity. If they were indeed destructive they would have annihilated themselves. As Hawking does right, consider the human case.

    But if Hawking is right, we would probably have nothing to be scared of. If an alien civilization wants us dead we will either barely notice it when that happens or that won’t ever happen if it hasn’t happened already. The chances of a destructive civilization to exist seem lower than the chances of a pacific civilization to extend their existence time period in the universe history. The former would have likely already destroyed itself while the latter may have better chances. Caution would not hurt though. We ought to keep an eye on ourselves and how we develop, and that means using our resources more intelligently and not necessarily manufacturing more bombs. But being scared definitely says much more about us than anyone else because we simply know nothing about them, nor we can pretend to know what are their intentions, if any.

    But of course in the matter of encounters between civilizations in space, every possibility is time-scale dependent. Imagine for an instant that two alien civilizations are at war. We would certainly be well advised to keep away from the conflict. We may be justified in thinking that if we were in the way when either of the parties ran short of resources to prosecute their war, they would most likely help themselves to anything we had that could be of use. But think a little further. The odds of encountering two civilizations actually engaged in warfare are rather small.

    Civilizations making war would either annihilate each other, or if they don’t, then one party would end up achieving hegemonic status. It would seem logical that periods of peace are much longer than periods of war. In the first place civilizations that contemplate war must be both smart enough and hostile enough, and reaching these thresholds of achievement and antagonism takes time–peacetime.

    As Hawking claims, many of the life forms in the universe are probably just microbes, but unlike Hawking I believe that if civilizations do exist they’d have little time to make war, even assuming they wanted to. Once again, one has only to think of the Earth to reach this conclusion. If the Cold War had not remained cold, we either wouldn’t be here or else we’d all now be ruled by a single country–as has been pretty much the case (despite there being no actual war) over the last few decades, with the emergence of a current single superpower (with a partial sharing of the global power, and other powers emerging). But when superpowers emerge these days, they are more interested in keeping the peace because they have reached a stage where they depend on each other, chiefly because trade and commerce have become globalized. It turns out that the human world has become much more cooperative than we might have expected. But this is not by chance; it seems to be a common path. Not that one can be absolutely certain, though. Only by witnessing other civilizations could we safely make generalizations. And yet logic dictates that the opposite path–the path of belligerence–would be unlikely.

    What I think is that if civilizations were to find each other they are more likely to be interested in each other’s culture and knowledge, than in each other’s natural resources–either because natural resources can be found in many places, or even created by civilizations that are sufficiently advanced, or else because they are simply not needed, curiosity about the universe being the sole motive behind exploration. Which would mean that civilizations would preserve ‘alien’ civilizations to enrich themselves, just as anthropologists and ethnologists do on Earth. To make my point in terms of Hawking, aliens are more likely to be like modern scientists than like the barbaric Europeans who colonized the Americas.

    On the Algorithmic Nature of the World

    Posted in Algorithmic information theory, Complexity, New Ideas on April 21st, 2010 by Hector Zenil – Be the first to comment

    In a new paper I’ve coauthored with Jean-Paul Delahaye, we propose a test based on the theory of algorithmic complexity and an experimental evaluation of Levin’s universal distribution to identify evidence in support of or in contravention of the claim that the world is algorithmic in nature.

    To this end we have undertaken a statistical comparison of the frequency distributions of data from physical sources on the one hand–repositories of information such as images, data stored in a hard drive, computer programs and DNA sequences–and the frequency distributions generated by purely algorithmic means on the other–by running abstract computing devices such as Turing machines, cellular automata and Post Tag systems. Statistical correlations were found and their significance measured.

    The paper is forthcoming as a book chapter by invitation of Gordana Dodig-Crnkovic, in Gordana Dodig-Crnkovic and Mark Burgin (eds.) Information and Computation by World Scientific, 2010.

    The paper is available online on arXiv.

    If the subject is of interest to you I invite you to regularly visit our research project main webpage:, where we are publishing results and updates.

    Evaluating the complexity of a living organism by its algorithmic complexity

    Posted in Algorithmic information theory, Complexity, Foundations of Biology, General, New Ideas on September 26th, 2009 by Hector Zenil – 6 Comments

    One of the greatest scientific achievements of the last century was the understanding of life in terms of information. We know today that the information for synthesizing the molecules that allow organisms to survive and replicate is encoded in the DNA. In the cell, DNA is copied to messenger RNA, and triplet codons in the messenger RNA are decoded in the process of translation to synthesize polymers of the natural 20 amino acids.

    Humans have been intrigued by the origin and mechanisms underlying complexity in nature coming from information contained in repositories such as the DNA. Darwin’s theory of evolution suggests that this complexity could evolve by natural selection acting successively on numerous small, heritable modifications.

    Darwin’s theory represents a great leap forward in our understanding of the fundamental processes behind life. However, there is a tendency to assume that evolution os the sole factor in designing nature while it may not actually be the main driving force behind the complexity of living organisms [If you wish to know more about the theory of evolution by means of natural selection, three respectable British institutions have set up special websites in celebration of Darwin's 200th. anniversary: the University of Cambridge (with the original scanned text and even an audio version in mp3 format), the Open University and the BBC].

    Nature seems to use a specific toolkit of body features rather than totally random shapes. Like units of Lego, Nature assembles its forms from a limited set of elements. For example, despite the variety of living forms on the Earth, they do all seem to have a front-to-back line down the center of the body, and extremities (if any) on the sides, from flies who have a head at one end and a tail at the other, to worms, snakes and humans. Despite the randomness that may undermine any shared regularity among all animals in combinatoric terms, on a certain level, from a certain perspective, we are all similar in shape and features. Why didn’t evolution attempt other, completely different forms? And if it did, why were so few of them successful? Given the improbability of  several other shapes having been put into circulation without any of them winning out save the ones we all know, we could conclude that evolution never did attempt such a path, instead keeping to a small pool of tried and tested basic units whose survival has never been in jeopardy. There are some symmetries and general features that many animals share (more than can be explained by inheritance) that are not so easily explained in purely evolutionist terms. A remarkable example is the resemblance of all animals in their embryonic phase.

    Two teams of biologists (Walter Jakob Gehring and colleagues at the University of Basel, Switzerland, and Matthew Scott and Amy Weiner working with Thomas Kaufman at Indiana University, Bloomington) seem to have independently discovered toolkits that Nature appears to use that they have called homeobox containing genes.

    This discovery indicates that organisms use a set of very simple rules passed along to them (thus reducing the amount of randomness involved) to build a wide variety of forms from just a few basic possible body parts. To oversimplify somewhat, one can for instance imagine being able to copy/paste a code segment (the homeobox) and cause a leg to grow in the place where an antenna would normally be in an ant.

    This begins to sound much more like the footprint of computation rather than a special feature characterizing life, since it turns out that a few simple rules are responsible for the assembly of complex parts. Moreoever, this is consonant with what in Wolfram’s scheme of things life’s guiding force is said to be, viz. computation. And with what Chaitin has proposed as an algorithmic approach to life and evolution, as well as with my own research, which is an attempt to discover Nature’s basic hidden algorithmic nature.  All the operations involved in the replication process of organisms– replacing, copying, appending, joining, splitting–would seem to suggest the algorithmic nature of the process itself. A computational process.

    Based on my own research interests it is my strong belief that though by no means wrong, Darwin’s theory of evolution belongs within a larger theory of information and computation, according to which life has managed to speed up its rate of change by channeling information efficiently between generations, together with a rich exchange of information with the outside by a process that while seemingly random, is in fact the consequence of interaction with other algorithmic processes.

    Think a bit further about it. Evolution seems deeply connected to biology on Earth, but as part of a larger computation theory it might be applied anywhere in the universe just as the laws of physics do. Evolution may be formulated and explained as a problem of information transmission and channeling, pure communication between 2 points in time. If you want to efficiently gather and transmit information it may turn out that biological evolution may be not the cause but the consequence.

    The theory of algorithmic information (or simply AIT) on the other hand does not require a random initial configuration (unfortunately perhaps, nor any divine intervention) to have a program, when run, produce complicated output. This is in keeping with Wolfram’s finding that all over the computational universe there are simple programs with simple inputs generating complex output, what in NKS terms is called ‘intrinsic randomness’, yet is purely deterministic. Nor does AIT require the introduction of randomness during the computation itself. In other words, it seems that randomness plays no necessary role in producing complex organisms. Evolution seems to underlie change, its pace and direction, but it does not seem to constitute the driving force behind life.

    Evolution seems to be taking advantage of the algorithmic properties of living systems to fabricate new forms of life. To facilitate understanding of these body patterns the University of Utah has set up an illustrative website. Incidentally, this genetic toolkit based on the homeobox concept is surprisingly well captured in the Spore video game.

    In a recent article Greg Chaitin has proposed (Speculations on biology, information and complexity) that some of the properties of DNA and the accumulation of information in DNA may be better explained from a software perspective, as a computer program in constant development. When writing software, subroutines are used here and there all the time, and one usually creates an extra module or patch rather than rewrite a subroutine from scratch. This may correspond to what we see in DNA as redundant sections and ‘unused’ sections.

    In Chaitin’s opinion, DNA is essentially a programming language for building an organism and then running that organism. One may therefore be able to characterize the complexity of an organism by measuring the program-size complexity of its DNA. This seems to work well for the length of DNA, since the longest known sequence of DNA belongs to what is certainly the most sophisticated organism on this planet, i.e. homo sapiens.
    Chaitin proposes the following analogy:

    program -> COMPUTER -> output
    DNA ->

    However, we encounter problems when attempting to view the process of animal replication in the same algorithmic terms. If, as the sophistication of homo sapiens would suggest, human DNA is the most complex repository of information, and given that DNA represents the shortest encoding capable of reproducing the organism itself, we would expect the replication runtime of human DNA to be of the same order relative to other animals’ replication times. But this is not the case. A gestation period table is available here. So what are we to make of the fact that the right complexity measure for living beings (the logical depth of an object as the actual measure of the organizational complexity of a living organism) does not produce the expected gestation times? One would expect the human gestation period to be the longest, but it is not.

    Charles Bennett defined the logical depth of an object as the time required by a universal computer to produce the object from its shortest description, i.e. the decompression time taken by the DNA from the fertilized egg of an animal (seen as a universal computer) to produce another organism of the same type. There seems to be more at stake, however, when trying to apply the concept to Chaitin’s replication analogy– issues ranging from when to determine the end of the replication (the gestation period?), to better times to give birth, to gestation times inherited from ancestral species, to the average size of organisms (elephants and giraffes seem to have the longest periods). Some hypotheses on period differences can be found here for example.

    If living organisms can be characterized in algorithmic terms as we think they can, we should be able to introduce all these variables and still get the expected values for the complexity measurement of an organism– seen as a computer program–reproducing another organism from its shortest encoding (the DNA being an approximation of it). A complete model encompassing the theory of evolution has yet to emerge. It seems to be on the horizon of AIT, as another application to biology, one that provides a mathematical explanation of life.

    In summary:
    So far, what we know is that DNA is the place where the information for replicating an animal is to be found. What’s being proposed above is that the information content in the DNA can be actually measured and effectively approximated as a distance measure of the complexity of an organism. If one can quantify these values one could, for instance, actually quantify an evolutionary step in mathematical terms.
    Also, evolution is not usually seen as part of a computational theory, but as an special feature of life. I think otherwise.
    Randomness has hitherto been thought to play a major role in evolution as it is mutation that drives the evolutionary process. But I suggest that this is not the case. It is just another part of the deterministic computation, as algorithmic information theory suggests.
    Finally, evolution has been thought of in terms of very small steps rather than building blocks and building over them as other scientists have found (which would explain why the theory of evolution has been bedeviled by questions which have not thus far been satisfactorily answered). This favors my computational view of the process of life, because it is based on what in software technology is seen as a subroutine orientation programming paradigm.

    In summary:

    • So far, what we know is that the DNA is the place where the information for replicating an animal is to be found. What’s being proposed above is that the information content in the DNA can be actually effectively approximated by means of its program-size complexity and logical depth to define a measure of the complexity of an organism. If one can quantify these values one could, for example, actually quantify an evolutionary step in mathematical terms. This would represent a first step toward encompassing Darwin’s theory of evolution within an algorithmic mathematical theory of life. Evolution is not usually seen as part of a computational theory, but as a special feature of life. The above suggests otherwise.
    • Randomness has hitherto been thought to play a major role in the evolution of species, as it is mutation that drives the evolutionary process. But I suggest that this is not the case. Rather I suggest that what appears to be random is actually part of a deterministic computation, which means that randomness plays no significant part in the process, while computation does.
    • Finally, evolution has hitherto been thought of as a process that advances by very small steps, rather than one that is capable of quickly building over blocks of code, as it might be actually the case. This new understanding favors the computational view I am putting forward here as playing a main role in the process of life, because it is based on what in software technology is the practice of a subroutine orientation programming paradigm: code reuse.

    The Shortest Universal Turing Machine Implementation Contest

    Posted in Algorithmic information theory, Complexity, Computability, Universality and Unsolvability, Foundations of Computation, New Ideas, Recreation on December 22nd, 2008 by Hector Zenil – Be the first to comment

    The Shortest Universal Turing Machine Implementation Contest


    23 Dec – 2008

    Contest Overview

    In the spirit of the busy beaver competition though related to program-size complexity, we are pleased to announce the “Shortest Universal Turing Machine Implementation Contest”.

    The contest is open-ended and open to anyone. To enter, a competitor must submit a universal machine implementation written in the language specified in the contest web site (C++) with smaller size values than the latest  record published on the web page.

    In order to take part in this competition it is necessary to submit the source code, to be compiled using the compiler program and version specified in the contest web site. It is important that you provide documentation of your code, either in an attached file or as commented text in the source code file.

    Each submitter must agree to be bound and abide by the rules. Submissions remain the sole property of the submitter(s), but should be released under the GNU General Public License (GPL)  so we may be permitted to make them available on  this web site for downloading and executing.

    Rules (General Rules section)

    Team composition

    Players may enter alone or as teams of any size. Anyone is eligible to enter.

    Subscribe to the Newsletter

    A mailing list will be used to keep participants informed of news about the contest. You can subscribe to this mailing list at any time:

    Subscribe at here.

    Hector Zenil (IHPST and LIFL, Paris 1 University and Lille 1 University)
    Jean-Paul Delahaye (LIFL, Lille 1 University)

    The art of creating creatures from simple rules

    Posted in General, Minds and Machines, New Ideas, Recreation on November 18th, 2007 by Hector Zenil – Be the first to comment

    Having quit his studies in physics, Theo Jansen became an artist. In this video he demonstrates his amazing life-like kinetic sculptures, built from plastic tubes and bottles. His Beach Creatures or Strandbeest are built to move and even survive on their own:

    I’ve been in touch with Theo Jansen recently. For further details about his creations he referred me to his book (available at his web shop ) entitled The Great Pretender. Even more details are provided in Boris Ingram’s thesis on leg designs based on 12-bar linkages, in which he describes Jansen’s walker algorithm. Jansen’s designs are computer-generated using an evolutionary algorithm, and the animals, which are wind powered, are made out of PVC piping.


    The valves essentially act like logic gates, allowing water to pass or not depending on the state of the other gates.


    Jansen’s creations do not require engines, sensors or any other type of advanced technology in order to walk and react to the environment. As for Boris Ingram’s work, it would be greatly enriched if it were to incorporate a wider range of possible structures and algorithms.



    More online references:

    On the Foundations of Quantum Mechanics, The Netherlands

    Posted in Complexity, Conferences, Foundations of Physics, New Ideas on November 15th, 2007 by Hector Zenil – Be the first to comment

    Originally uploaded by hzenilc.

    Models and Simulations 2
    11 – 13 October 2007
    Tilburg University, The Netherlands

    I attended this conference one month ago. Among several interesting talks, one in particular caught my attention. It was given by Michael Seevinck from the Institute for History and Foundations of Science at Utrecht, The Netherlands. His talk was about the foundations of Quantum Mechanics, and there were many NKS related topics that it brought  to mind. He talked about reconstructing Quantum Mechanics (QM) from scratch by exploring several restricted models in order to solve the so-called measurement problem, to deal with the nonlocality of quantum correlations, and with its alleged non-classicality, there being  no consensus on  the meaning of Quantum Mechanics  (Niels Bohr said once: “If you think you have understood quantum mechanics, then you have not understood quantum mechanics.”—More quotes of this sort on QM here).  The restrictons chosen in order to reconstruct the theory must be physical principles and not  theoretical assumptions. In other words, one approaches the problem contrariwise than is traditional, taking the least possible restrictions and exploring the theories that can be built thereon. The speaker characterized  this approach  as the “study [of]  a system from the outside” in order to ”reconstruct the model”. It is basically a pure NKS approach: “Start from a general class of possible models and try to constrain it using some physical principles so as to arrive at the model in question (in this case QM).”

    One can then proceed to ask such questions as how one might identify QM uniquely, what it is that makes QM quantum, what set of axioms in the model is to be used, and which of them are necessary and sufficient? The question of meaning, previously asked of the formalism, is removed, and bears, if at all, only on the selection and justification of  first principles. Seevinck came up with the following interesting statement: “The partially ordered set of all questions in QM is isomorphic to the partially ordered set of all closed subspaces of a separable Hilbert space” (one of Mackey’s axioms in his axiomatisation of 1957). He added: “They (the principles)have solely an epistemic status. The personal motives for adopting certain first principles should be bracketed. One should be ontologically agnostic. The principles should be free of ontological commitment.” And further: “…axioms are neutral towards philosophical positions: they can be adopted by a realist, instrumentalist, or subjectivist.” He cited Clifton, Bub and Halverson who provided the following quantum information constraints used to derive quantum theory:

    1. No superluminal information transfer via measurement.

    2. No broadcasting

    3. No secure bit commitment

    Seevinck’s methodology in further detail is: Start with a general reconstruction model with a very weak formalism. Gradually see what (quantum) features are consequences of what added physical principles, and also see which features are connected and which features are a consequence of adding which principle. One thereby learns which principle is responsible for which element in the (quantum) theoretical structure.

    One can generate further foundational questions over the whole space of restricted models, e.g.  how many of them:

    - forbid superluminal signalling?

    - allow nonlocality, and to what extent?

    - solve NP-complete problems in polynomial time?

    An important question which arises concerns whether intrinsic randomness would be of a different nature in different models or whether all of them would yield to deterministic randomness.

    His talk slides are available online. Highly recommended.

    Among other interesting people I met was Rafaela Hillebrand, of  the Institute for The Human Future at Oxford University. The Institute’s director, Nick Bostrom, has proposed an interesting theory concerning the likelihood that our reality is actually  a computer simulation. I have myself approached the  question in my work on experimental algorithmic complexity, in particular in my work on  the testability and the skepticism content of the simulation hypothesis. I will post on that subject later. The subject of thought experiments–in which I have an interest– was one that came up frequently.


    Posted in General, New Ideas on November 8th, 2007 by Hector Zenil – Be the first to comment

    Researchers at Berkeley working to unlock the potential of nanoscience:

    High Definition Nanotechnology video from KQED
    Amazing how nature produces its own nanodevices, such as motors like the flagella that allow spermatozoa to swim. Imagine how many structures can be found by exploring the universe of possible simple nanostructures! We also know that given a few elements, computing devices are capable of universal computation (see my previous post on the smallest universal Turing machine). So one could potentially provide  nanomachines with coded instructions to  perform just about any task–of course within the constraints of their mechanical capabilities.Further references available online from molecular to nano-computing:

    - Tseng and Ellenbogen, Toward Nanocomputers, Science 9 November 2001.
    - The world’s smallest computer made entirely of biological molecules, News Medica, 2004.
    - Beckett and Jennings, Towards Nanocomputer Architecture
    - DNA Computer Works in Human Cells, Scientific American 2007.

    On the Kolmogorov-Chaitin complexity for short sequences

    Posted in Algorithmic information theory, Computer Science, Foundations of Computation, New Ideas on October 31st, 2007 by Hector Zenil – Be the first to comment

    My paper On the Kolmogorov-Chaitin complexity for short sequences, coauthored with my PhD thesis advisor Jean-Paul Delahaye has been published as a book chapter in:RANDOMNESS AND COMPLEXITY, FROM LEIBNIZ TO CHAITIN, edited by Cristian S. Calude (University of Auckland, New Zealand) and published by World Scientific.

    Chaitin festschrift From Randomness to Complexity from Leibniz to Chaitin by Cristian Calude
    An extended draft version of this paper can be found in arXiv here and the webpage we have set up for our research on what we call Experimental Algorithmic Theory can be accessed here. The results of our ongoing experiments will be frequently published on this site.The book is a collection of papers contributed by eminent authors from around the world in honor of Gregory Chaitin’s birthday. It is a unique volume including technical contributions, philosophical papers and essays.

    I presented our paper at the NKS Science Conference 2007 held at the University of Vermont, Burlington, U.S. The conference blog has an entry describing my participation.

    NKSMeetingZenilChaitinDaviesWolframCastiFrom left to right: Hector Zenil, Stephen Wolfram, Paul Davies, Ugo Pagallo, Gregory Chaitin, Cristian Calude, Karl Svozil, Gordana Dodig-Crnkovic and John Casti.

    Meaning is in the word net: cyclic self-referential definitions, dictionaries and found in translation

    Posted in Complexity, Minds and Machines, Natural Language and Computational Linguistics, New Ideas on December 2nd, 2006 by Hector Zenil – Be the first to comment

    In the “Period 1 Cycle English Dictionary” published by “No way, Inc.” (it’s been said to be the most accurate dictionary ever) one can read:

    dog (Noun) : a dog is a dog.

    The lazy creators of this dictionary appear to have forgotten what is broadly accepted by common sense. A definition would not be a definition if it were cyclical and self-referential, simply because one would, theoretically, fall into an infinite loop trying to define an unknown word that has as part of its definition the word itself.

    Let’s leave aside the fictional Period 1 Cycle English Dictionary, which I just made up, and consider how things work in a regular dictionary. Assume you want to know the definition of a word. Looking it up, you’ll encounter a set of ordered words in the language in question(if the dictionary is not a bilingual one). Let’s prove that in the end, any definition of a word w in a closed finite dictionary is cyclic with period p_w:

    Proof sketch:

    We may hypothesize that the dictionary (D) is closed under the operation “the definition of” (i.e. all words in the definition of a word have a definition in the dictionary). One can also safely assume that all words in D have a non-empty definition (given the definition of “dictionary” itself!) and that the dictionary is finite. Then if w is a word in D, the orbit of the successive definitions d_1,d_2,…d_n, … for w (listed as sets of words themselves) will end up eventually crossing the path d_w={d_1,d_2,…d_n, …} because d_w cannot be an infinite sequence without repeating elements in D– the finite nature of D would necessitate  coming back to the original word itself after a certain period p depending on the word, making every word w in the dictionary cyclic with period p_w. 

    If all the definitions in a dictionary are then cyclic and enclosed in the dictionary itself, where does the analytic knowledge that a dictionary seems to provide come from? When one consults a dictionary in a given language, one already knows some words in that language, but assuming one doesn’t, would a dictionary be completely useless, as the issue of ultimately cyclic self-referential definitions seems to suggest?

    One could conceive of a dictionary as a whole as a syntactical knowledge container with no actual knowledge in it– to someone who does not bring to it any knowledge of the relevant language. One may wonder how something so perfectly self-referential could actually be of use, as dictionaries seem to be. Is it indeed because you always know some, or indeed many, words of a language already when you consult a dictionary? For since every word is defined by other words in the same language, looking up the meaning of a given word would lead you to other words and yet others and eventually back to the first word, the word whose meaning you set out to learn. This would be the case even if  the dictionary were bilingual, and the meaning of the word you wished to check was given in a second language. Thus all dictionaries are perfectly circular, closed, self-referential sources.

    However, the analytical knowledge in a dictionary does not come from the definitions as words, but from the word net underneath, where the words are connected in some, perhaps unique fashion(modulo exact synonyms) to each other. That’s what quite successful projects such as WordNet and functions like WordData in Mathematica are about. The power of being able to analyze the language as a net in a computable way is priceless for the progress of computational linguistics and linguistics in general.


    2-level depth wordnet for the word "chair"

    2-level depth wordnet for the word "chair"



    For example, if “chair” is connected to “table”, “office”, “dining room”, etc. it should be easy to map it to its equivalent in any other language. Unless the target language doesn’t have the concept “chair” as referring to a seat placed at a table in a dining room (which was perhaps the case in some Asian countries before colonialism), together with an explicit cognitive representation of it.

    Of course problems arise when considering the mappings between words having the same written form but different senses. The word “chair,” for example, is a seat, but may also mean the officer who presides at a meeting. Also posing a problem would be cases of words being mapped to or from words belonging to different parts of speech, such as ”personne” in French, which  maps onto two completely different words in Spanish: ”persona” and “nadie”,  a noun and an adjective respectively, with completely different connections and different supporting nets. So even when it seems that most relations might be surjective, the general case is certainly not bijective, and that applies to homonyms too, which often creates ambiguities in translation. However, the supporting network  would be able to uncover this fact and solve a possible ambiguity based on  context by extending the word network to encompass the ambiguity. In other words, if a subnet cannot be uniquely mapped, extending it should eventually solve the ambiguity. What one would need  is a corpus big enough to build such a network once and for all and then simply make comparisons at the network level. This could work even for completely new or unknown languages, either dead, living or artificial, assuming that they share a part of our actual reality and hence some part of our mental representations  (In a sense this is what Champollion did when he deciphered the Rosetta stone– he discovered a partial mapping of a subnetwork of words from an unknown language – Egyptian – to a subnetwork of a known one – Greek). In the final analysis, each language has a single unique network (changing slightly through time but remaining well connected and strong enough to make  it unique and recognizable while being isomorphic with that of any other language).  All languages could be identified by their fingerprints -their word net. This kind of analysis would identify the lack of a word net structure in hoax languages, such as perhaps the Voynich manuscript.

    Having  established that, what about mining the world of all possible meanings, the world of all possible translations, and the world of all possible ideas? We wouldn’t have the problem of distinguishing between a coherent idea and a non-coherent one since the network would provide some minimal coherence. Thus the net-into-the -net approach would give us a way of translating from word to word and from phrase to phrase and from idea to idea as faithfully as possible in most cases, since in the end all of us as human beings share a single reality, though we perhaps approach it from different points of view.

    Again, the analytical knowledge in a dictionary comes from the net connecting the words, so even if someone does not know English at all, I would say that he would be able, albeit with considerable difficulty, to learn English just by deducing the net connecting objects, in other words, by mapping his own mental  representations of objects onto words in the  English dictionary. In the process he could encounter some ambiguities, but the further he goes, the more of these he would be able to resolve. On the other hand,  speakers of those languages in which “chair” does not exist, both in the language itself and as a real object in the culture,   would be able to deduce what  a chair is by tracking its  relations with the objects they know and for which they do have mental representations and the phonemes to externalize them. So the problem of translation, which began with the mapping of word onto word and then phrase onto phrase  with statistical tools,  becomes, with this approach, a matter of  mapping net to net.  Indeed this seems to be the approach adopted  by Meaningful Machines.  

    These ideas could be carried to the limit by taking the sum total of human languages and enquiring into the mapping between such a network and our cognitive representations. Such a move would provide grounds  for rebutting the Chinese room argument, since in the end it does not matter whether someone inside the room has no knowledge at all of a language; he would be able to map what he is mechanically translating onto his own mental representations, generating what, according to the argument, could not be generated: understanding. Because Searle’s  idea was, as I recall, to build up a case against A.I. in terms of the meaningless of the Turing test and true AI in general.

    One may actually use a dictionary without knowing a single word in it! That is because there is a mapping between the word net in the dictionary and one’s own language, or even better, a mapping (not necessarily injective or surjective) between the word net of the dictionary and your cognitive personal word net.

    Oversimplifying, translation might be reduced to the search for the homomorphism between algebraic groups, with each group G={W,d} being a language dictionary, with W the set of words in that language and d the closed operation “definition of”. One can then see each definition as an oversimplification of a directed, ordered graph g={w,s}, with w the set of vertex words involved in the definition and s the ordered (since definitions may be not commutative) set of edges for an ideally formal- enough language dictionary.

    Bad practices of cyclic definitions in a dictionary should rather be expressed as the practice of period 1 cycles, i.e. words that have in their  immediate definition the word itself.

    This post is  related to a previous post titled “Meaning against A.I.

    Is the Universe a Computer? (Ist das Universum ein Computer?) Conference, Berlin, Germany, 6,7 November 2006

    Posted in Computability, Universality and Unsolvability, Conferences, Minds and Machines, New Ideas on November 12th, 2006 by Hector Zenil – 2 Comments


    Ist das Universum ein Computer?

    Germany, November 2006, Informatik Jahr
    Deutschen Technikmuseum Berlin
    From Konrad Zuse’s Invention of the Computer to his “Calculating Space” to Quantum Computing.

    Lesson One: For someone with a hammer in his hand the world seems to be a  nail. Joseph Weizenbaun.

    Lesson Two: Knowing the input and the transition function of a Turing machine we know everything about it. Marvin Minsky.

    Zuse's drawing 2
    Dr. Zuse’s futuristic drawing

    - The first talk entitled  ”What Can We Calculate With?” by Prof. Dr. Bernd Mahr from the Technische Universitat of Berlin was a very good introduction to the standard theory of computation based on Turing’s model and classical mathematical logic. His remarks on the time when computing arose from math because the Greeks discovered they were unable to compute the square root of 2 were interesting. He pointed out some evident but not always explicit facts: Calculation has a subject (the individual who calculates), an object (what is calculated),  a medium (how it is calculated), and a symbolic representation (the language -binary, for instance). His use of the Leibniz medallion for explaining   starting points, ending points and transitions in a calculation was elementary but interesting (transitions: intermediate calculations). Further explanations of reversibility and non-linearity using transition nets were also illuminating. The context of a calculation (or computation) and the strong relation between the computation itself and its context is  such that it is sometimes difficult to distinguish them. Since any process can be seen as an object in itself, the context can become the calculation and the context of the context too. In some way, as we know, the concept of calculation is a constraint of a part of a calculation, and then it is defined in terms of an input, an ouput and a transition. He pointed out too that behind many of these complicated systems there is a Turing machine. It  is no longer visible from the top, but it is there.

    Konrad Zuse
    Dr. Konrad Zuse

    - Dr. Horst Zuse’s  talk titled “Konrad Zuse’s Calculating Space (Der Rechnende Raum)”:
    Dr. Konrad Zuse’s son, Dr. Horst Zuse made some interesting remarks about his father’s book “Calcualting Space”,  in which  Dr. Zuse proposes studying  the universe as a digital system, specifically a cellular automaton. Dr. Horst Zuse is a professor at the Technische Universitat of Berlin and his personal webpage can be found at: and

    Zuse's son
    Dr. Konrad Zuse’s son, Dr. Horst Zuse

    Dr. Zuse’s father’s main question was: “Is Nature Digital, Analog or Hybrid?” It seems that he tended to answer “Digital,” proposing a No-Yes value language (binary). His thoughts were published in the Nova Acta Leopoldina. He evidently did acknowledge that there could be problems attempting to reconcile an explanation of the Universe in terms of Cellular Automata with Quantum Mechanics and General Relativity.

    According to Konrad Zuse, laws of physics could be explained in terms of laws of switches (in the computational sense). Physical laws are computing approximations captured by the formulas in our models. He saw that differential equations could be solved by digital (hence discrete) systems.

    Dr. Petri talk
    Dr. Carl Adam Petri at the Berlin conference

    - Dr. Carl Adam Petri (yes, the creator of the Petri nets!) on “Rechnender Netzraum” or “Computing Net Universe”:
    According to Dr. Petri, at the Planck length quantum computing can be described by digital systems using combinatorial models (net models, kennings, combinat), and therefore the universe can be studied using discrete nets which are even capable of explaining quantum and relativistic fundamentals like Bell’s theorem and Heisenberg’s uncertainty principle. That would mean that discrete systems (for instance those proposed by Stephen Wolfram) would suffice to explain even quantum and relativistic phenomena.

    According to Petri, measurement is equivalent to counting. For instance in S.I. one second is 9192631770 Cs periods. In fact Norbert Wiener proposed some axiomatics of measurement.

    The correspondence of Petri nets with the Heisenberg uncertainty principle arises from the limitations of our observational capacities when carrying out measurements. When two different types of observations are performed, -for example momentum p and position q- we can only see p or q in a chain of succesive events related by a causality net. His nets as well as his explanations of such phenomena are very neat and elegant. The relevant slides  on causality and linear logic may be found  at:

    He also distributed a CD with his slide presentation at the conference.

    For Petri, the Universe is a Petri Net.

    Dr. Seth Lloyd’s presentation at the conference in Berlin, Germany

    - Dr. Seth Lloyd’s talk entitled “The Universe is a Quantum Computer”:
    Some researchers think of information as more fundamental than physics itself. And it is a common practice in science to find all the time more fundamental structures on which previous ones were lying on. Some others such as Seth Lloyd and David Deutsch, however, strongly stand in favor of putting a quantum reality at the lowest level of reality description as they stand for a physical universe where information only makes sense if it is carried and represented by a physical entity. So these authors have developed world theories based in the concept of quantum computation.

    A quantum computer differs from a Turing machine in that its bits are quantum bits, and so can exist in a superposition. In addition, it can be instructed to put those bits in superpositions. A universal Turing machine can simulate any other Turing machine in polynomial time but while what a quantum computer does is still Turing computable, a typical quantum computation of T steps on N qubits requires $O(2^N)$ bits on a classical Turing machine, and $O(T 2^2N)$ logical operations. That is, it takes a Turing machine exponential amounts of time and space to simulate a quantum computer or quantum Turing machine.

    The standard quantum computer as shaped by, among others, Feynman and Deutsch differs from the classical computer in the concept of the qubit. While the concept of qubit takes advantage of a clear asset that the quantum world provides, entanglement, it does not necessarily makes use of the full properties of quantum mechanics as interpreted under the Copenhagen interpretation. The convenient definition of a qubit makes a quantum computer not to compute more but much faster than classical digital computers.

    Lloyd’s conclusion is that the universe is not only a computer but a quantum computer. However some questions arise:

    1. What exactly is a quantum computer? Does the quantum computer definition actually captures all quantum phenomena? According to the standard model quantum computing is Turing computable (disregarding run time). If Lloyd is assuming the standard model then the universe is indeed a quantum computer, but even more remarkably (since we have scientific and philosophical hypotheses like the Turing thesis) it is Turing computable. However, if he is assuming the more general quantum mechanics model, let’s say the standard model in physics (which basically assumes the possibility of harmless continuity rather than inquiring into it) he is saying that the universe is not a computer (since the term derives  from what we mean by Turing computable and hence covers  digital computers too). So the assumptions made are significant and cannot be glossed over if one wishes  to argue convincingly that  the universe is  a computer in some standard sense. If what we  assume to be computing is something that seems to be deterministic or rule-based, the concept becomes fuzzy and additional remarks need to be made.

    2. What if a quantum particle encodes more information than just a 1 or 0 for the spin or any other quantum property? Let’s say a third value, or even worse, a non-computable number of values. In quantum mechanics for example, the superposition of a particle assumes an infinite and non-countable number of places since it is in all the spectra at the same time. If space/time is a continuum then it is evidently in a non -countable number of positions, which leaves us with  a non-computable model, or at least with something that’s neither a Turing-computable model nor a standard quantum-computable model. And this is not a simple assumption since it requires another Turing-type thesis which in the final analysis does not answer the most fundamental  question, i.e. whether the universe is a computer or not and if it is, what kind of computer (in the computational power sense) it is.

    I raised these questions with Seth Lloyd and I will be posting his answers soon.

    Seth Lloyd lecture at Berlin
    Seth Lloyd at the conference in Berlin

    An interesting concept mentioned by Seth Lloyd is “hacking the universe”. As Charles Bennett used to say, a computer is a handicapped quantum computer. So if Lloyd is right, a computer is not only a handicapped quantum computer but it is not taking advantage of the full computational power of the universe and it is just patching the universe instead of hacking it, as it would be in its power to do. By contrast, a quantum computer uses some particles that are already computing “something” (nothing less and nothing more than the processes in the universe ) to perform the computation that we want it to perform. It can be said to be  ”hacking the universe” in Lloyd’s terms.

    On the other hand, if the notion of programmer monkeys is valid it should be possible to test it experimentally. Under the supervision of M. Jean-Paul Delahaye, computer science professor at the University of Lille I ( we are undertaking this task. We are exploring  Lloyd’s quantum computational universe (or at least a handicapped but representative part, the recursive computational universe), applying some complexity measures (universal distribution, average-case complexity or Levin’s measure) in order to uncover the monkeys behind the Universe, or in other terms, to analyze the average distribution of randomly discrete systems with random inputs.

    Is Seth Lloyd falling into the carpenter’s problem of thinking that the universe is a nail and the moon made of wood?  Is it because he is a quantum computer scientist that he thinks the universe is a quantum computer? He argues of course that the charge is unfair, but then we have been told by Dr Petri  that the Universe is in fact a Petri Net which probably  needs neither strong randomness nor quantum mechanics!

    Here  is a video online in which he explains much of this:

    Zuse's drawing
    Dr. Zuse’s futuristic drawing 2

    - Jurgen Schmidhuber reprised his algorithmic approach to the theory of everything in his talk entitled   “The program that computes all computable universes”.
    Jurgen Schmidhuber’s major contribution probably is his Speed Prior concept, a complexity measure similar to Algorithmic Information Complexity, except that it is based on computation speed and not program length. i.e. the fastest way of describing objects rather than the shortest.
    There is more information on his website: (where he includes an unfavorable review of  NKS) and in his slide presentation on the Speed Prior at:
    Of course Schmidhuber himself has identified a problem with the Prior measure: If every possible future exists, how can we predict anything?

    Other interesting talks on philosophical issues: If the Universe is a computer, therefore the human mind should be a computer too.
    Is “the Universe is a  computer” a metaphor?
    My answer: The metaphor is “The Universe is not a Computer”

    Lesson Three: Metaphors can be reversed.

    Kovas Boguta
    Kovas Boguta’s  talk was titled ”Is the Computer a Universe?” In it he pointed out the richness of mining the computational universe of simple programs.

    Because we were together during the gala dinner I had an interesting exchange with Dr. Konrad Zuse’s son, Dr. Horst Zuse (Also at our table were the Technikmuseum director Dr. Dirk Bondel and  my colleague Kovas Boguta from Wolfram Research, among others). He shed some light on his father’s interactions with Alan Turing ( none apparently),  with von Neumann (some interaction regarding the controversy over who first built a digital computer and concerning von Neumann’s architecture, which  our current digital computers do not conform to, the ALU being separated from the memory as it is in Zuse’s conception but not in  von Neumann’s original design).

    Zuse’s Z1 first computer “the input device, something equivalent to the keyboard, at the Technikmuseum in Berlin”

    Human Readable Proofs Visualization

    Posted in Foundations of Math, Natural Language and Computational Linguistics, New Ideas on November 12th, 2006 by Hector Zenil – Be the first to comment

    - Symbolic Visualizations, University of Texas: Proof nets and zero-knowledge proofs.

    Kurt Godel: The writings. Université de Lille III

    Posted in Computability, Universality and Unsolvability, Conferences, Foundations of Math, Minds and Machines, New Ideas on November 12th, 2006 by Hector Zenil – Be the first to comment

    Kurt Godel workshop for studying his legacy and writings. Lille, France, May 19-21, 2006

    My thoughts, ideas, references, comments and informal notes:

    - The wheel machine, a machine for real computation which I am proposing -as a thought experiment- in a forthcoming paper  on the Church-Turing thesis -Yes, one more paper on the CT thesis!- with comments on Wilfried Sieg’s paper entitled “Church Without Dogma: Axioms for Computability”

    - “In case Cantor’s continuum problem should turn out to be undecidable from the accepted axioms of set theory, the question of its truth would loose its meaning, exactly as the question of the truth of Euclid’s fifth postulate in Euclidian geometry did”. Godel replies: “It has meaning anyway, as Euclid’s fifth postulate gave rise to other now accepted mathematical fields.”

    - Godel Gibbs Lecture and his dicotomy on absolutely undecidable propositions and the computational power of the human mind (Turing did great work… but he was wrong when he proposed his formal theory as a model of human thought…)

    - New contacts and references: Olivier Souan, Rudy Rucker, Karl Svozil

    Mark van Atten’s “On Godel’s awareness of Skolem’s lecture”.
    Rick Tieszen

    - Herbrand on general recursive functions, letter to Godel.

    - Leibniz’ influence on Godel’s arithmetization?

    - Sources: Godel Editorial Project. Firestone Library, Princeton University. I.A.S. Marcia Tucker, librarian for Godel papers.

    - Godel’s concept of finite procedure as the most satisfactory definition of computation. “A machine with a finite number of parts as Turing did” or “finite combinatorial procedure” as a definition of an algorithm, mechanical or computational procedure.

    - Computation’s main constraints: boundness and locality (paper from Hernandez-Quiroz and Raymundo Morado).

    - Aphorisms and autoreference (Gabriel Sandu and Hinttika)

    - Feferman on Turing

    - Is Sieg’s paper and the question of “finite machine=effective procedure” a tautology? In fact such an approach seems to be one of the most strict versions of the Turing Thesis, and even though both Church and Turing probably did propose it in such a strict sense, extensive versions of the thesis have traditionaly covered more content, but even when it is strictly stated that there is still space for a thesis, it is neither proved nor provable from my point of view, and most authors would concur, though some clearly would not. I will comment on this more extensively later, since this was one of my Master’s topics and merits a post by itself.

    - Putnam’s thought experiment on cutting all sensorial inputs. Solution: It is impossible in practice. However, machines are an example in a sense, and that is why we do not recognize intelligence in them – they are deprived of  sensorial capabilities.

    Yes, Godel found an inconsistency in the U.S. constitution. My answer: One? Certainly a bunch. That’s why we need lawyers, who make them even worse.

    NKS Cooking

    Posted in New Ideas, Recreation on July 31st, 2006 by Hector Zenil – Be the first to comment

    A novel approach to cooking

    The algorithm:

    1. Consider the space of all possible recipes.
    2. Enumerate the ingredients per food style (Mexican, Chinese, French, Italian, and so on).
    3. Take the subsets of the ingredients per food-style set.
    4. Begin with a random choice of initial ingredients.
    5. Mix them by following a set of simple cooking rules like baking, sprinkling or heating.
    6. Season with  salt and pepper to taste.

    One could use a robotic device and see what results. One would also have to set up an evaluation scheme to decide which dishes are good and which are bad– either a human tester or a reliable sensor such as a modern “artificial nose” (essentially a microarray-like object with a number of different receptors).

    However, since the testing procedure is undecidable in general, in the case of any given recipe, neither a human nor an automatic device can decide whether or not the end result will taste good.

    It would also be of interest to find out which “cooking rules” lead to which patterns of smell and taste. For instance,  one would expect  any dish involving sugar, butter, and flour  to taste reasonably good provided the rules do not yield a “burnt offering”.  Of course, to make things really interesting, one should introduce additional ingredients, including ingredients  which aren’t traditionally combined with the basic ones.

    For French cuisine, butter would be a primitive, for American food, ketchup, for  Mexican, chili, for Chinese,  rice,  for Italian, tomato. French and Italian food share  cheese as a primitive; cooking “au gratin” would always be a reasonably safe bet.  Since primitives are important, you cannot cook pancakes au gratin, and ketchup would be safer if you wanted your dish to meet with the approval of the American tester.

    An ingredient is a primitive in the sense that any dish in a given style can use either more or less of it and still retain  its character as an exemplar of that style.  Primitives can be used at will,  including in emergencies or when the rules happen to yield nasty results.

    One might investigate reducible shortcuts, such as preparing something faster or skipping steps. However, most of the recipes are irreducible, making it impossible to take shortcuts without going through the whole preparation.

    With some effort one can also prove that there exists a universal algorithm which,  given a recipe and the initial ingredients,  is able to reproduce any other recipe.

    Concerning time/space complexity, it remains an open question whether the preparation time can always be reduced to a polynomial.

    One can also ask about the algorithmic complexity of a dish, since one can measure its information content–the number of ingredients involved, the number of rules and steps applied, whether the rules were iterated, nested or random- looking.  However, since, as noted above, the procedure is undecidable, the only known way to approach the question of the complexity of a dish is to compress it into a piece of universal tupper-ware and see what happens.

    Another open question: Given a compressed ingredient, how many other ingredients can one compress? It is believed that this yields  a classification of  food types, defining a threshold between those in a lower/trivial class and  more sophisticated items.