Archive for the 'Computability, Universality and Unsolvability' Category

Algorithmicity and programmability in natural computing with the Game of Life as in silico case study

Thursday, September 4th, 2014

In a previous article, I suggested a method for testing the algorithmicity of a natural/physical process using the concept of Levin’s universal distribution. In this new paper published by the Journal of Experimental & Theoretical Artificial Intelligence, I explain this method in the context of the problem formulated by Floridi concerning the testability of pancomputationalism. Then, I introduce a behavioural battery of programmability tests for natural computation, as an example of a computational philosophy approach. That is to tackle a simplified version of a complex philosophical question with a computer experiment. I go on to demonstrate the application of this novel approach in a case study featuring Conway’s Game of Life. I also briefly discuss another question raised by Floridi, concerning a grand unified theory of information, which I think is deeply connected to the grand unification of physics. You can access the paper for free here.

Calculating a Universal Distribution to Approximate Kolmogorov-Chaitin Complexity

Wednesday, December 12th, 2012

Computing the incomputable has always been a challenge. For example, in finding the busiest Turing machines (Rado) given a number of symbols and states (whimsically called busy beavers). This means either finding Turing machines that, starting from an empty input, produce more non-blank symbols in their output tapes before halting than any other Turing machine of the same size, or Turing machines that, also starting from an empty input, have the greatest runtime before halting than any other Turing machine of the same size. Both problems are ultimately undecidable because of the Turing-complete capabilities of Turing machines, as proven by Alan Turing himself (that is, the capability of some Turing machines to simulate any other Turing machine).

In this new paper we describe how we have managed to calculate an approximation of a so-called Universal Distribution (aka Levin’s semi-measure) which connects the frequency of production of a string to its Kolmogorov complexity (K). The chief advantage of calculating an approximation of the Universal Distribution is that it is an incremental process over an average of a large number of Turing machines. One doesn’t get rid of the constant from the invariance theorem in the theory of algorithmic information theory (for example when Kolmogorov complexity is measured using 2 different universal Turing machines), yet one seems to have to make fewer arbitrary decisions.

One of the main advantages is that one can better deal with strings of very short lengths. Think about it! If one wished to approximate K for a single bit by using compression algorithms, the lossless compression algorithm would not be able to compress the single bit any further. And this not only happens for a single bit but for all strings up to a certain minimal length for which lossless compression algorithms are simply unsuitable (recall that a compression algorithm includes the decompression instructions together with the data in the new compressed object in order to make it self-decompressible).

The usefulness of lossless compression algorithms as a method for approximating K derives from the fact that compression is a sufficient test of non-randomness. This is because K is, more precisely than an uncomputable function, upper semi-computable, meaning that one can estimate upper bounds. The lossless compressed length of an object s (e.g. a string) is therefore an upper bound on K(s). The usefulness of the Coding Theorem Method (the theorem presented in this paper) will ultimately come down to whether it is useful in applications, which is the main motivation behind its conception, given the failure of compression algorithms to provide information about the possible K(s) values for s that are too short (shorter, say, than the typical size of the length of the instructions that a lossless compression algorithm has to add to the compressed object).

The state of the art of the Coding Theorem Method can be gleaned from this paper, recently made public by my colleagues and I, and announcing the release of the calculation of a universal distribution based on (5,2), that is, all Turing Machines with 5 states and 2 symbols: Calculating Kolmogorov Complexity from the Output Frequency Distributions of Small Turing Machines

It represents a major improvement over this previous calculation of mine, that required new and more powerful techniques to deal with a semi-computable distribution. It improves our previous work in terms both of accuracy and coverage of number of short strings and validates previous calculations of universal distributions, showing the incremental nature of the method to be fully compatible with the other calculated universal distributions with smaller samples of small Turing machines (and for which the known Busy Beaver values could be used).

In this other paper we explain why our approximations of K are real-number values, showing that strict integer-value program size follows our Coding Theorem Method, and thus that ours constitutes a more fine-grained measure. It is also shown that logical depth departs from both strict program-size and the Coding Theorem Method evaluations, being in agreement with the theory for all these 3 measures. The paper is available online at: Correspondence and Independence of Numerical Evaluations of Algorithmic Information Measures

In the next post I will be announcing and also briefly explaining the results from another paper showing that not only can our Coding Theorem deal with short strings, but that we have found a way to validate the method by lossless compressibility. Moreover, we have found that in the transition period, where the Coding Theorem Method starts to be too expensive to be of practical use whereas the compression method starts to provide some results, the 2 methods are in great agreement with each other. Like an expensive microscope of great power (e.g. the LHC at CERN), our Coding Theorem Method requires an incredible amount of calculation. The trick is to know when to use a microscope–to know when a microscope is required rather than a telescope. We believe we are providing the tools to deal with the information content of the smallest entities in the computational universe.

Hypercomputation in A Computable Universe

Friday, September 21st, 2012

This is the full version of my answer to a question formulated by Francisco A. Doria to me included in the Discussion Section of A Computable Universe, my edited volume being published by World Scientific and Imperial College Press coming out next month (already available in Asia) concerning whether I think if Hypercomputation is possible:

I was once myself a hypercomputation enthusiast (my Master’s thesis–in French–was on “Hyper calcul”, focused on feasibility of hypercomputational models). Paradoxically it wasn’t until I had a good knowledge of it that I started to better appreciate and to increasingly enjoy more the beauty of the digital (Turing) model. On the one hand, hypercomputational models do not converge in computational power. There are a plethora of possible models of hypercomputation while there is only one digital (in terms of computational power). I find it astonishing how little it takes to reach the full power of (Turing) universality, and I don’t see the power of universal machines as being subject to any limitation, all the opposite. I happen to think the question has a truth value and is ultimately susceptible of a physical answer. However, I don’t think the answer will emerge in the foreseeable future, if it ever does.

Still, I found Doria’s contribution to the subject interesting, as he points out a particular (“ideal”) model of hypercomputation that I think gets around some of Martin Davis’ objections (The Myth of Hypercomputation), though it doesn’t fully address the problem of verification. Unlike a Turing computation, which can in principle be verified by carrying it out by hand, step by step the inner working of a hypercomputer can only be followed by another hypercomputer. The caricature version of the problem is in the whimsical answer given by the (hyper?)computer Deep Thought in “The Hitchhiker’s Guide to the Galaxy” by Douglas Adams, which proposed “42” as the “Ultimate (uncomputable?) answer to the Ultimate Question of Life, The Universe, and Everything” [added parenthesis]. The only way to verify such an answer would be by building another, more powerful and even less understandable computer. This makes me wonder whether we ought not to favour meaningful computation over what could potentially be hypercomputation, even if hypercomputation were possible.

There is a strong analogy to the concept of proof in math. Mathematical proofs seem to fall into two types. They either serve to convince of the truth of a statement one wasn’t certain was true, or else to provide logical evidence that a statement intuitively believed to be true was in fact so (e.g. the normality of the mathematical constant pi). But in the latter case, why would one bother to provide a proof of a statement that nobody would argue to be false? It is because ideally math proofs should provide insight into why a statement is true, and not simply establish whether or not it is so. There are a few exceptions of course. Some come from mathematical practice, for example Wiles’ proof of Fermat’s last theorem. It is not clear whether Wiles’ proof provides any insight into the original question of the truth value of Fermat’s theorem (it does contribute to understand the connection among different powerful mathematical theories). Some other cases, especially among computer automated proofs, are of the same kind, often neglecting the fact that a proof is also about explanation (for humans). In that sense I think we should also favour meaningful mathematical proofs (from meaningful mathematical questions!), just as we should better appreciate meaningful (digital) computation.

The study of infinite objects has given us great insight into profound and legitimate questions of math and computation, such as the question of the nature of what is a number or what is a computation. And it has been immensely useful to focus on limits of computation in order to better understand it. It is at the boundary between decidability and undecidability that one seems best positioned to answer the question of what does computation mean. And there are some examples where the study of a hypercomputational model is more valuable not as a model of hypercomputation but by its ancillary results. Unlike most people, I think the contribution of, for example, Siegelmann’s ARNN model, has much more to say about computational complexity (the ARNN model relativises P = NP and P != NP in a novel fashion) and therefore classical computation than about hypercomputation! (Siegelmann’s ARNNs model has come to be strangely revered among hypercomputation enthusiasts).

While I may agree that the problem of verification of a computation is not necessarily an argument against hypercomputation (because it can also be used against digital computation in practice), the answer is only in the realm of physics and not in paper models. As Davis points out, it is not a surprise that encoding non-computable numbers as weights in an artificial neural network deals to non-computable computation! so, assuming real numbers exist the consequence is straightforward and the justification of a hypercomputational model just circular.

Other models of hypercomputation are just fundamentally wrong, such as Peter Wegner’s model of “interactive computation” that is claimed to break Turing’s barrier. For a recent commentary pointing out the the flaws of yet another hypercomputational claim, you can consult this entertaining blog post by Scott Aaronson and his Toaster-Enhanced Turing Machine.

Conjectures concerning Busy Beavers, Dynamic Behavior and Turing Universality

Friday, June 1st, 2012

In a recent paper I have advanced some conjectures using a coefficient that renders aspects of the qualitative behavior of complex systems in quantitative terms. It measures the sensitivity of a system to external stimuli, its apparent ability to (efficiently) transfer information from the input through the output.

In a previous paper, and in a quite different context, I proposed the conjecture that systems capable of (efficient) Turing universal computation had large coefficients (greater than 1), meaning that they are sensitive enough to external stimuli.

It is evident that a system that doesn’t react to its input is unlikely to be (efficiently) programmable, which is what Turing universality is about. Recall that Turing universality is the notion of a machine capable of simulating any other machine of the same type. More formally, if U is a universal Turing machine, anda Turing machine with input s, one can run U with input, that is>, and U will then behave like M with input s, where M and s are any Turing machine and any input.

In that paper I used this coefficient as a phase transition detector, and it so happened that the way in which it was defined made it eminently capable of classifying the behavior of cellular automata and other systems, even yielding Wolfram’s well-known classes of behavior. The paper is available online here.

Now I have advanced another set of conjectures, this time relating a set of Turing machines defined by the way they behave: Busy Beaver Turing machines. These are machines that do a lot of work before halting. More formally, they print more non-blank symbols and have the greatest runtime among machines of the same size (number of states and symbols). I have written a program showing all known and candidate Busy Beaver machines (in Mathematica, runnable on your browser by downloading the Mathematica CDF player):

Busy Beaver from the Wolfram Demonstrations Project by Hector Zenil

Let bb(n) be the Busy Beaver with n states (and 2 symbols). The conjectures say that:

1. (strong version): For all n > 2, bb(n) is capable of universal computation.
2. (sparse version): For some n, bb(n) is capable of universal computation.
3. (weak version): For all n > 2, bb(n) is capable of (weak) universal
4. (weakest version): For some n, bb(n) is capable of (weak) universal

They are based on the fact that Busy Beaver machines are complex, in Bennett’s sense. Bennett’s notion of “logical depth” is the idea of capturing the complexity of a string s in terms of the unfolding time of the shortest (or set of nearly shortest) programs producing the said string. The outputs of Busy Beaver Turing machines are therefore, by definition, the ones with the greatest logical depth. If they print more symbols than any other machine and take the greatest time to do so, they have maximal logical depth. Busy Beaver machines are also machines that produce their outputs with the minimum resources (if there were shorter machines producing the same outputs those would be the Busy Beaver machines), hence they are the shortest programs producing the strings. Busy Beaver machines are therefore by definition very deep (in Bennett’s definition of complexity), meaning they are complex in this sense (logical depth is also profoundly related to Kolmogorov complexity, given that the definition involves the shortest(s) programs).

Besides being deep, Busy Beavers have another characteristic: they halt. So they don’t just do their stuff without worrying about whether to stop. One can easily encode a “Busy Beaver” that loops and produces any arbitrary number of non-blank symbols say for example, an infinite loop printing 1s), but Busy Beavers don’t. They save a state to halt at the end. Machines that are not able to halt (the majority) cannot be Turing universal, because they wouldn’t ever produce any output, simply because by definition something is an output if the machine has halted. But machines that always halt would be decidable, and therefore not universal either.

So among the things to attempt to amount to evidence in favor of the positive answer of the conjectures of Turing universality for Busy Beaver machines, is to encode some input such that they don’t stop (and to prove so), as a first step towards a prove for universality. If Busy Beavers are capable of halting for some inputs (e.g. empty input, as this is entailed in the definition of a Busy Beaver) but are also capable of not halting for others, this would strengthen the conjectures. Busy Beavers may, however, be very different among them, as they are defined by behavior, rather than their specific inner workings, but if the behavior of a Busy Beaver in any way determines something that captures a property of all them, one could use the same argument in favor or against them towards proving or disproving these conjectures.

There seems no easy way to prove that all Busy Beavers are capable of universal computation, but such a proof, if it exists, would be the first to characterize a non-trivial set of well defined universal Turing machines. Non-trivial, because of course one can build a universal Turing machine and keep adding resources and proving that the resulting machines are all universal, just as happens with Turing-complete programming languages. But Busy Beaver machines would be characterized by nothing but a qualitative behavior. This makes the assertions of the conjectures more interesting.

It could be the case that Busy Beavers are not capable of Turing universality for blank tapes. In fact, Michel Pascal, a world expert in the field of Busy Beavers, thinks that Busy Beavers may be capable of –weak– universal computation (personal communication), that is, starting from non-blank tape configurations (e.g. a periodic or bi-periodic tape configuration). This accords with another conjecture in Computer Science, the Collatz sequence. The conjecture says that whichever number you start with, if n is even, divide it by 2 to get n / 2. If n is odd, multiply it by 3 and add 1 to obtain 3n + 1; you will end always end up reaching 1. I have also written 2 small computer programs (also in Mathematica) illustrating this conjecture:

Collatz Sequence Paths from the Wolfram Demonstrations Project by Hector Zenil

Collatz Sequence Maximums from the Wolfram Demonstrations Project by Hector Zenil

According to Michel, Busy Beavers would be related to Collatz-type functions (see his paper “Small Turing machines and generalized busy beaver competition“), with conjectures that are related but more general (in the context of small Turing machines) to the ones I’m talking about.

Today, we know that Turing universality requires very little. For example, Wolfram conjectured that a 2,3 Turing machine was universal, on the basis of the aspects ot its behavior that he studied. And as proven by Alex Smith, this 2,3 TM turns out to be capable of universal computation. The machine is surprisingly small, and indeed is the smallest it can possibly be, given that we know that no 2,2 Turing machine can be capable of Turing universality (proof recently provided by Maurice Margensten in a Complex Systems paper). Today we also know many minimialistic systems capable of universal computation: Conway’s game of Life, Wolfram’s Rule 110, many Post-tag systems and a variation called Cyclic tag systems, Wang tiles. And at least 2 important researchers in the field have recently argued for the ubiquity of Turing universality (Martin Davis and Maurice Margenstern myself, along the lines anticipated by Wolfram’s Principle of Computational Equivalence).

Finally, I advanced a last conjecture relating Busy Beavers and my coefficient C in this paper, establishing that:

C(bb(n)) > 0

That is, that Busy Beavers have a positive coefficient C, i.e. are quite sensitive to input stimuli, which, if the very first conjecture is true (that Turing universal machines have a large coefficient) makes this last coefficient a good place to start to make the case for the universality of Busy Beavers.

The paper “On the Dynamic Qualitative Behavior of Universal Computation” at ArXiv and has been recently published in the Journal of Complex Systems (vol. 20-3) also available online here.

Turing’s Deep Field: Visualizing the Computational Universe

Thursday, January 5th, 2012

I generated this image in the course of an investigation of the distribution of runtimes of programs in relation to the lengths of mathematical proofs, the results of which are being published in my paper bearing the title “Computer Runtimes and the Length of Proofs with an Algorithmic Probabilistic Application to Optimal Waiting Times in Automatic Theorem Proving” Volume 7160 of the Lecture Notes in Computer Science series (LNCS), a festschrift for Cristian Calude.

The preprint is now available online in the ArXiv here. The paper shows that as theoretically predicted, most machines either stop quickly or never halt. It also suggests how a theoretical result for halting times may be used to predict the number of random theorems (dis)proven in a random axiom system –see the section I’ve called “Gödel meets Turing in the computational universe”.

Turing View of The Computational Universe

The plot was the winning image in this year’s Kroto Institute Image Competition, in the Computational Imagery category titled “Runtime Space in a Peano Curve”, it shows the calculation of the distribution of runtimes from simulating (4n+2)^(4n) = 10000 Turing machines with 2 symbols and n=2 states (of a total of more than 10^12 simulated Turing machines with up to n=4 states) following a quasi-lexicographical order in a Peano curve preserving–as far as possible–the distance between 2 machines arranged in a 2-dimensional array from a 1-dimensional enumeration of Turing machines.

In the image each point or cluster of points represents a Turing machine or a group of Turing machines, and the color is determined by a spectrum encoding their halting runtimes–the lighter the square the sooner the machine entered the halting state. White cells represent machines that are proven to halt in infinite time. Red cells show the Turing machines that take longer to halt (popularly called Busy Beavers). Knowing the values of the Busy Beaver functions allows us to identify the machines that never halt (depicted in white). The image is noncomputable, meaning that the process cannot be arbitrarily extended because of the undecidability of the halting problem (i.e. there is no procedure for ascertaining the color of the following pixels to zoom out the picture and cover a larger fraction of the computational universe). Put it in the words of crystilogic,

What you’re looking at is a subset of descriptions of all possible things, and some impossible things. This is possibility and reality compressed into an image.

Turing machines with an arbitrary number of states can encode any possible mathematical problem and are therefore perfect compressors of the known, the yet to be known, and even the unknowable (due to the undecidability of the halting problem).

This is how the image looks like as displayed in the stairway of the Kroto Research Institute in the UK:

Some postcards with the winning image were printed and they can be sent to scholar or enthusiasts upon request sending an email to hectorz[at]

I want to dedicate this prize to my former thesis advisor Jean-Paul Delahaye who suggested me the Peano arrangement as a packing for my visual results of halting times. And also to Cristian Calude who co-advised me all along my PhD thesis and who encouraged me to publish this paper and what better place to do so than for his festschrift.

I’m releasing the images under an open licence in honour of the Turing Year, so that it may be used for any artistic illustration of some of Turing’s main results (the halting problem) or for any other purpose in any medium.

Postcards of the winning image are also available upon request. Just send an email requesting 1 or more postcards to hectorz [at] or to let him know (if you can) that you will be using any of the images (or if you need better resolution versions).

Compression-based Investigation of Cellular Automata, A Phase Transition Coefficient and a Conjecture Related to Universal Computation

Sunday, August 22nd, 2010

In my new article Compression-based investigation of the dynamical properties of cellular automata and other systems, published in the Journal of Complex Systems (19:1), pages 1-28, I present a method for studying the qualitative behavior of cellular automata and other abstract computing machines based on the approximation of their program-size complexity using a general lossless compression algorithm. I show that the compression-based approach classifies cellular automata (CA) into clusters according to their heuristic behavior, with these clusters showing a correspondence with Wolfram’s main classes of systemic behavior. I also present a Gray code-based numbering scheme for initial conditions optimal for this kind of investigation, and a compression based method for estimating a characteristic exponent in the form of a phase transition coefficient measuring the resiliency or sensitivity of a system to its initial conditions. And I conjecture that universal systems have large transition coefficients.

I think this constitutes a novel framework for investigating the dynamical properties of cellular automata and other systems. Here I will discuss some of the main ideas and implications of the paper. A pdf version of the paper is available online on ArXiv

Algorithmic complexity: classification into Wolfram’s four classes

In A New Kind of Science and in several papers dating from the mid-1980s, Stephen Wolfram defined four classes into which cellular automata (CA) and several other systems evolve, each capturing a different qualitative behavior which evolves from the same ordered initial configuration (the simplest black cell).

  • Class 1 consists of CA whose dynamics reach a steady state regardless of the initial conditions.
  • Class 2 consists of CA whose long-term evolution produces periodic or nested structures.

These first two classes are simple, in the sense that their long-term evolution can be deduced from running the system for a small number of steps.

  • Class 3 CA produce structures that seem random.
  • Class 4 CA produce localized structures on a random looking background, and hence are complex looking.

Wolfram’s classification is heuristic, and the assignment of CA to the four classes is somewhat subjective. To the best of my knowledge, there is, to date, no universally agreed upon classification scheme for CA. I think, however, that there is no better approach than a pure program-size complexity measure, which is the approach I follow in this paper. I propose to continue the investigation later, using the same measure to discover other interesting properties and possible hierarchies.

An interesting question related to Wolfram’s classification concerns its dependence on the initial condition–chiefly because the classification was originally meant to be constructed by visual inspection over the evolution of a CA, and as we know, the evolution of a CA depends on its initial condition. This has been a major critique (Eppstein) of Wolfram’s classification, because the expectation is that the classification should be based on the evolution from an unordered (random) configuration.

Nevertheless, the classification is actually based on the potential of a CA to evolve into any of the possible behaviors from at least one initial configuration (the question is of course not answerable in finite time, since there is an infinite number of possible initial configurations). Wolfram’s classification may therefore be seen as being dependent on the initial condition of a CA.

It is not a surprise that one can, for example, construct a CA belonging to more than one of Wolfram’s four classes when starting from different initial configurations. Rule 110 belongs to class 4 because it is capable of universal computation–one can set up an initial configuration to ‘program’ rule 110 to carry out any computation (this being the very basic concept of a programmable computer).

For every CA rule there is a definite (but in general undecidable) answer to the question whether or not it is capable of universal computation (or in reachability terms, whether a CA will evolve into a certain configuration). The question only makes sense if the evolution of a CA depends on its initial configuration. No rule can be universal that fixes the initial configuration once and for all (there would be no way to input an instruction and carry out an arbitrary computation).

On the other hand, some rules, such as Rule 0, don’t produce different configurations relative to variant initial configurations. No matter how you may change the initial condition, there is no way to make it compute something other than what it actually computes for every other initial configuration.

A possible objection (made by David Eppstein) is that there are CAs that can be made to look as if they belonged to all classes by modifying their initial conditions. Which is true: a CA may belong to a certain class until, given another initial configuration, it is made to behave as if it belonged to another class.

My compression-based approach shows that Wolfram’s heuristic classification can actually be quantified by a measure which is clearly dependent on the initial conditions, while also being capable of detecting sensitivity to initial configurations, and hence of replacing the visual inspection. This hierarchical classification is well defined and is therefore a good candidate for a complexity measure.

The second part of my investigation actually takes advantage of the ability of CAs to behave differently in order to undertake a closer inspection and a novel classification, taking into account the average capability of a system to behave in different ways.

Differentiation from a priori approaches

The approach is different from others in that it is an a posteriori technique, unlike, for example, Langton’s lambda, a measure of the density of a CA rule. It is an a posteriori technique because unlike this lambda number, the compression-based approach requires the CA to evolve before saying anything about it, whereas Langton’s lambda is computed from the rules of the CA.

Langton’s lambda is simply the fraction of rules in which the new state of the cell is non-zero. For example, the rules of the elementary cellular automaton number 1 in Wolfram’s enumerating scheme are simple: all 3-tuples sent to 0 but one, the last one. Therefore Langton’s lambda is 1 over 8.

The lambda parameter of a CA is a number between 0 and 1. For example, if lambda is 0 (e.g. for ECA rule number 0), the evolution develops into a trivial state. Langton found that CA rules with lambda close to zero evolve into trivial states and CA rules close to 1 evolve into random-looking behavior, with complex behavior somewhere in between. It is near this transition that the most interesting CAs will lie, the ones that manifest the most complex behavior.

Unfortunately, classifying CAs with lambda values is more complicated than that, as one quickly faces undecidability. If it were possible to decide once and for all whether a CA is complex by computing its lambda value, without having to run the CA for a single step, one could solve all kinds of undecidable questions simply by looking at a CA’s rules.

The critical value for lambda is not a universal constant. Nor, for that matter, is my phase transition coefficient. But the main difference, as noted before, is that the compression-based approach actually looks at the evolution of the system rather than trying to figure everything out from the description of the CA.

The compression-based method represents a formal approach to Wolfram’s classification process, replacing the need for a visual inspection with a technique and a coefficient to determine to what class a CA belongs. The approach is compatible with what Stephen Wolfram himself has proposed in his NKS book, without contradicting any computability result or Wolfram’s own Principle of Computational Irreducibility, which says that while some computations may admit shortcuts that allow them to be performed more rapidly, others don’t, so that one cannot really tell what the evolution of a CA will be, except by running it.

Initial configuration numbering scheme

Ideally, one should feed a system with a natural sequence of initial configurations of gradually increasing complexity. Doing so ensures that qualitative changes in the evolution of the system are not attributable to discontinuities in its set of initial conditions.

What I propose is an initial configuration numbering scheme where two successive values differ by only one bit. To explore the qualitative behavior of a cellular automaton when starting from different initial configurations, the optimal method, because of its smoothness, is to follow this Gray encoding enumeration, in order to avoid any undesirable “jumps” attributable to the system’s having been fed with discontinuous initial configurations. By following the Gray code, an optimal numbering scheme was devised so that two consecutive initial conditions differed only in the simplest degree (by one bit). This simple convention will allow us to continue the investigation of the dynamical behavior of a system without being concerned about arbitrarily introducing artificial complexity when moving from one initial configuration to the next.

Phase transition detection and coefficient

I defined a measure based on the change of the asymptotic direction of the size of the compressed evolutions of a system for different initial configurations (following the proposed Gray-code enumeration of initial configurations). My phase transition coefficient yields an interesting classification: it measures the resiliency or sensitivity of a system to its initial conditions. So rules such as 30 and 0 appear close to each other. Odd as this may seem, this is because both, when their initial conditions are changed, behave in more or less the same way. In other words, there is no change in the qualitative behavior of these CAs when feeding them with different inputs, regardless of how different the inputs may be.

In this phase transition classification, for example, rules such as 122 and 89 appear next to each other, because, as the investigation proves, they are both CAs with relatively high phase transition coefficients, meaning that they are very sensitive to initial conditions, dramatically changing their qualitative behavior when starting from one rather than another initial configuration.

Phase transition and predictability

An obvious feature of universal systems is that they need to be capable of carrying information by reflecting changes made to the input in the output. In attempting to determine whether a system is capable of reaching universal computation, one may ask whether a system is capable of this in the first place, and how efficiently it does so. And this is what the phase transition actually measures, because it tells how well a system manages to respond to an input. Obviously, a system such as rule 0 or rule 255, which does not change regardless of the input, is trivially decidable. But a universal system should be capable of some reaction to external manipulation (the input to the system). The inverse, however, should not hold, because having a large transition coefficient by no means implies that the system will behave with the freedom required of a universal system if it is to emulate any possible computation (a case in point may be rule 22, which, despite having the largest transition coefficient, may not be versatile enough for universal computation).

The phase transition measure also implies that one may be able to predict the behavior of a system for an initial configuration with a degree of certainty based on the previous variability of the system in relation to an initial segment of initial configurations. It is an open question whether this is a lower bound. In other words, it is unclear whether looking at the behavior of a system for a certain length of time and for certain configurations will tell you anything about its behavior for other initial configurations or if it were allowed to evolve for a longer period of time. Experience says one would do well to predict future behavior on the basis of past behavior, and this may also be related to Israeli and Goldenfeld‘s very interesting findings. In 2004 they showed that some computationally irreducible elementary cellular automata have properties that are predictable at a coarse-grained level. They did so following a renormalization group (RG) technique, which refers to a mathematical apparatus that allows one to investigate the changes in a physical system as one views it at different distance scales.

They were able to provide a hierarchy of complexity in agreement with their apparent complexity. Israeli and Goldenfeld’s classification is also an a posteriori investigation, but it too is bedeviled by the unavoidable (and ultimately undecidable) statistical question, namely, whether one can keep on predicting for all initial conditions and for any number of steps, without having to run the system  forever and for all possible initial conditions. But unlike mine, their classification is partial, in the sense that one cannot always say whether the complexity of one CA is greater than that of another. Based on the comparison of their approximated compressed sizes, however, I could come up with a total order in full agreement with their apparent complexity as well as with Wolfram’s four classes (one that even yields the same number of classes).

An open question directed to me by Nigel Goldenfeld– in trying to understand the similarities and differences between their RG and my algorithmic approach–concerns how their hierarchy relates to mine. What they experimentally suggest is that the larger the scale of the transformations used, the more highly structured the objects and therefore the greater their algorithmic complexity.

For example, rule 110 is one rule about which my own phase transition classification says that, despite showing some sensitivity, it also shows some stability. Which means that one can say with some degree of certainty how it will look (and behave) for certain steps and certain initial configurations, unlike those at the top. This turns out to be predictable according to Israeli and Goldenfeld as well, at a coarse-grained level, after a scale transformation.

A sketch on a conjecture on the transition coefficient of Turing-universal systems

Based on two empirical facts, I conjecture that universal CAs should have a large transition coefficient, as 110 does. Rules such as 110 and 54 also turn out to be next to each other in the phase transition classification, both having large values (they are in the top 24, and both the top 24 and the bottom 22 are in the original paper: see the last 2 tables).

So I base this conjecture on the following empirical observations:

1. The only known universal elementary cellular automata figure at the top of this classification, and so do candidates such as rule 54, which appears right next to rule 110.

2. Universality seems to imply that a system should be capable of being controlled by the inputs, which our classification suggests those at the bottom are not, as all of them look the same no matter what the input, and may not be capable of carrying information through the system toward the output.

Other rules that some people think may be capable of some sophisticated computation (See paper in the Journal of Complex Systems by Paul-Jean Letourneau) also have large transition coefficients, such as rule 146, with a transition coefficient 0.124, ranking 39 out of the 256 elementary cellular automata.

As noted before, however, the transition coefficient is not a universal measure. In this case, coefficients were calculated for 600 steps in blocks of 50 for the first 500 initial conditions, which means that some rules may be capable of greater transitions but are somehow ‘slow’ at the selected number of steps and number of initial conditions. i.e. they take more than 500 initial conditions–in Gray code numbering–to show larger changes, or else larger changes are being missed because of the jumps in blocks of 50 steps, though this latter possibility is less likely.

The conjecture also seems to be in agreement with Wolfram’s claim (made at some of his oral presentations) that rule 30 (as a class 3 elementary cellular automaton) may be, according to his Principle of Computational Equivalence (PCE), computationally universal. But it may turn out that it is too difficult (perhaps impossible) to control in order to perform a computation, because it behaves too randomly.

Comments on Turing’s very first Universal machine approaching Turing’s 100th. birthday anniversary

Tuesday, May 18th, 2010

The idea that a machine could perform the tasks of any other machine is the description of a Universal (Turing) machine. Its invention is considered by many to have been one of the major landmarks giving rise to the field of computer science. ‘Universal’ means that one can ‘program’ a general-purpose machine to perform the tasks of any specific-purpose machine. Turing machines are to this day the central object of study in the theory of computation.

In an attempt to understand how the very first universal machine was described in Turing’s original 1936 paper “On Computable Numbers, with an Application to the Entscheidungsproblem,” I spent half a day re-reading the paper (and its corrected version by Donald Davies, published in The Essential Turing by Jack Copeland), trying to decode it, only to find that it is written in something like an anticipation of a (Turing-complete) subroutine-oriented programming language which is impossible to rewrite in a traditional transition table. So if one really tries hard, one ends up encoding an arbitrary universal Turing machine, not Alan Turing’s first universal Turing machine.

Although the paper has all the primary elements of a traditional description of a Turing machine (Turing’s ‘a-machine’ description), the fact that it used multiple conventions for describing increasingly complex machines was Emile Post’s strongest critique. In a letter to Church, Turing replied to Post’s appreciation, arguing that his use of more than one convention when building his universal machine did not affect the final result, though he did admit it made it hard to decipher.

The result is that not only would it be a giant machine in terms of states and symbols, but the number of actual colors that it may need seems to be unknown. It is only known to assume 18 states, and to simulate Turing’s second a-machine example with 23 instructions (here the product of states and colors does not necessarily lead to the number of instructions in Turing’s formalism, because his transitions are not total functions).

In the traditional 5-tuple form, Turing’s original universal machine could be written as (Mathematica notation):

{q1, “blank”} -> {q2, P0, R}
{DA, D} -> {DAA, DC, R}
{q2, “blank”} -> {q3, E, R}
{DAA, D} -> {DAAA, D, R}
{q3, “blank”} -> {q4, P1, R}
{DAAA, D} -> {DAAAA, DCC, R}
{q4, “blank”} -> {q1, E, R}
{DAAAA, D} -> {DA, D, R}

But notice that no state starting by q leads to a state starting by D because D (states are called m-configurations in Turing’s original jargon) are rather prefix subroutines defined in Turing’s paper while q’s are actually traditional Turing machine states. In Turing’s paper E is, for example, an erasing subroutine. Some other ‘m-configurations’ require scanning the whole tape several times (which is what one would do if one is asked to emulate another Turing machine), and so on. So most of the behavior description of the machine is encoded as strange strings of letters.

Nevertheless, Turing’s choice is somehow clever from a programmer perspective, he proceeded in the way one would do so today for designing an implementing a universal computer. One would hardly do so by describing the basic elements, but rather by constructing higher level subroutines describing a machine function and based itself in one or more other subroutines up to the level of states and symbols. Think of programming at the level of the machine language v. programming in an intermediate level language. Writing a universal Turing machine in detail in terms of states and symbols from the beginning leading to complete lost and misunderstanding, just as it would so if one pursuits the implementation of a complex piece of software writing binary code or even in a pure assembler language. Turing’s description provides a better understanding, not trivial though, of what a universal Turing machine does to carry out the computation of any other Turing machine by sketching and grouping intuitively the machine operations into these program subroutines.

Unlike today, that one can simply make Wolfram|Alpha to run any Turing machine simulation such as a random 2-state 5-color Turing machine or the 4-state 2-color Busy Beaver (for a list of other Turing machine examples one can type in WolfraAlpha click here), Turing had no computer to ran and test his code, it is not a surprise that his universal machine code came together with several glitches. It was quite amusing to see that the first ever program written for a digital computer was already bedeviled by bugs. And this program was the actual implementation of the very first universal Turing machine by Alan Turing himself.

If Turing had not died in 1954, at the age of only 41, next June 23 (2010) he would have 98 years old. As a tribute to his work I’ve set up the following webpage gathering most, if not all, the public images known of him

For his 100th birthday anniversary a series of events are being organized. 2012 will not only be the year of the Olympic Games that Turing would have particularly followed in his own country (UK) as an enthusiast long distance runner but also The Alan Turing Year to which I’m honored to be part of as a member of the advisory committee representing Wolfram’s Science Group.

Physics-like computation, Wolfram’s PCE and Church’s thesis

Wednesday, January 7th, 2009

The lack of correspondence between the abstract and the physical world seems sometimes to suggest that there are profound incompatibilities between what can be thought and what actually happens in the real world. One can ask, for example, how often one faces undecidable problems. However, the question of undecidability has been considered to be better formulated (and understood) in computational terms because it is closer to our physical mechanical reality (through the concept of computation developed by Turing): Whether a computing machine enters a certain configuration is, in general, an undecidable question (called the Halting problem). In other words, no machine can predict whether another machine will halt (under the assumption of Church’s thesis –aka Church-Turing thesis).

An interesting example of the gap between what the abstract theory says and what can be empirically ascertained was recently suggested by Klaus Sutner from Carnegie Mellon. He rightly points out that if no concrete instance is known of a machine with an intermediate Turing degree, and consonant with Wolfram’s Principle of Computational Equivalence, is because intermediate degrees are artificial constructions that do not necessarily correspond to anything in the real physical world.

In David Deutsch‘s words physics is at the bottom of everything and therefore everything relies on physics (ultimately on quantum physics according to Deutsch himself). This is true for the core objects of study and practices in math: proofs, and in computer science: computer programs. At the end, that they are and how they are, is only possible by what it is feasible in the physical world. As if sometimes it were forgotten that mathematics and computation also follow in practice the same laws of physics than everything else.

Sutner defines what he calls “physics-like” computation and concludes that machines with intermediate Turing degrees are artificial constructions unlikely to exist. According to Sutner, in practice machines seem to follow a zero-one law: either they are as computationally powerful as a machine at the bottom of the computational power hierarchy (what Wolfram empirically calls “trivial behavior”) or they are at the level of the first Turing degree (i.e. capable of universal computation). This seems to imply, by the way, that what Wolfram identifies as machines of  equivalent sophistication cannot be other but capable of universal computation, strengthening the principle itself (although one has to assume also Church’s thesis, otherwise PCE could be referring to a higher sophistication).

So is PCE a conflation of Church’s thesis?

No. Church’s thesis could be wrong and PCE be still true, since by the negation of Church’s thesis the upper limit of the feasible computational power would just be shifted further, and even if it turns out that the hypothesis of a Turing universe is false, PCE could be still true disregarding whether the universe is of a digital nature or not since it would refer then to the non-Turing limit as the one holding the maximal sophistication (not that I think that C-T is false though).

Is PCE tantamount to the Church thesis in the provable sense?

Wolfram’s PCE would be still falsifiable if the distribution of the intermediate degrees is proven to be larger than what informally PCE suggests. However, so far that hasn’t been the case and there are nice examples supporting PCE suggesting that very simple and small non-trivial programs can easily reach universal computation. Such as recent (weak) small universal Turing machines discovered by Neary and Woods and particularly the smallest TM proven universal by Alex Smith (a 2-state 3-color machine that Wolfram conjectured in his NKS book). However PCE could be as hard to prove or disprove as the Church thesis is. Unlike Church’s thesis PCE could not be disproved by exhibiting a single negative case but proving that the distribution of machines is different to what PCE suggests. A positive proof however may require an infinite verification of cases which is evidently non-mechanically feasible (and only negating Church’s thesis itself one would be able to verify all the infinite number of cases).

I see PCE acting below the curve while Church’s thesis acting from above determining a computational limit (known as the Turing limit).


PCE and Church's thesis sandwich effect

PCE and Church's thesis (C-T) diagram

The Shortest Universal Turing Machine Implementation Contest

Monday, December 22nd, 2008

The Shortest Universal Turing Machine Implementation Contest


23 Dec – 2008

Contest Overview

In the spirit of the busy beaver competition though related to program-size complexity, we are pleased to announce the “Shortest Universal Turing Machine Implementation Contest”.

The contest is open-ended and open to anyone. To enter, a competitor must submit a universal machine implementation written in the language specified in the contest web site (C++) with smaller size values than the latest  record published on the web page.

In order to take part in this competition it is necessary to submit the source code, to be compiled using the compiler program and version specified in the contest web site. It is important that you provide documentation of your code, either in an attached file or as commented text in the source code file.

Each submitter must agree to be bound and abide by the rules. Submissions remain the sole property of the submitter(s), but should be released under the GNU General Public License (GPL)  so we may be permitted to make them available on  this web site for downloading and executing.

Rules (General Rules section)

Team composition

Players may enter alone or as teams of any size. Anyone is eligible to enter.

Subscribe to the Newsletter

A mailing list will be used to keep participants informed of news about the contest. You can subscribe to this mailing list at any time:

Subscribe at here.

Hector Zenil (IHPST and LIFL, Paris 1 University and Lille 1 University)
Jean-Paul Delahaye (LIFL, Lille 1 University)

On the simplest and smallest universal Turing machine

Tuesday, October 30th, 2007

Alex Smith has recently been able to prove that a Turing machine conjectured to be capable of universal computation by Wolfram was actually universal (Wolfram 2,3 Turing machine Research Prize).

Part of the challenge was to find an encoding not doing by itself the universal computation that would make the Turing machine universal. Smith succeeded providing an encoding providing a background that while nonperiodic is sufficiently regular to be generated by infinite word written on the tape can be generated by a omega-automaton (not itself universal).

This relaxation of the tape content has been regular in the cellular automaton world, but also in the field of what is called weak-universality with the only difference that the Turing machines are allowed to start with other than blank tapes (blank tapes are periodic tapes with period 1).

An objection might be that such a coupled system could turn a nonuniversal machine into a universal one, but Smith showed that his encoding was capable of restarting the computation itself, something that coupled systems usually are not capable of. In other words, the preparation of the tape in Smith’s case is done in advance and the omega-automaton do not longer interact in any further time, while to have nonuniversal machines to become universal usually requires an automaton intervening at every step (or every certain steps) of a computation.

One may also be surprised by the need of an omega-automaton for infinite strings. But the difference to the traditional blank tape is subtle. Think of how machines operate in the real world. Machines do not run on blank tapes, they usually do so over memory (the cache, RAM or a HD) with all kind of information (that is considered garbage if it is not instantiated by a running program). You may think that this garbage is not infinite, but it is not so a blank tape for a Turing machine, so instead of thinking of providing a Turing machine with blank tape as need it, one can think of providing the Turing machine with a non-blank tape. Now, what is on the tape in the case of Smith is not exactly “garbage” because it plays a role in “helping” the Turing machine to perform its computation, in a kind of support on which the Turing machine that is capable of universal computation actually achieves the computation with the help of a defined background (that doesn’t mean that the machine cannot perform universal with other backgrounds) yet the background is not powerful enough to perform the hardest part of the computation. In the physical world, if processes are seen as computations, computations are performed on a support, on the background of the physical world. So these kinds of relaxation may be closer to actual physical situations than abstract situations in which a blank tape for a computation is assumed or required.

The discussion opened up by Wolfram’s work and motivated by Smith’s answer has generated a fruitful and renovated discussion of universality and complexity of small Turing machines. This is why I think this research s relevant to modern computer science, and not only as an amusing mathematical puzzle:

  • New techniques for proving universality are found (Alex Smith’s novel approach for unbounded computations from arbitrary lengths and non-periodic initial configurations).
  • New universal models of computation have been discovered (cyclic tag- systems, bi-tag systems).
  • Such research provides a better understanding of universality,  its limits, its  underlying principles and its necessary and sufficient conditions.
  • It is a base for actually building universal devices when only a few elements can be used, e.g. in nanotechnology or molecular computation.
  • Simple/small machines may be more easily/effectively embedded in other systems.
  • The old discovery/invention duality question comes to the fore: It sheds light on how simple universality is, how frequently it occurs, whether  it is engineered or not, whether  one builds universal computation or finds it in the universe.
  • It could shed light on the relative feasibility of  universal Turing machines based on different tape configurations (e.g. blank characters, repetitive words, non-repetitive with computationally simple backgrounds) as actual physical systems.  At present it is not at all clear why one ought to  favor blank characters over other possible real-world backgrounds, such as “noise.”
  • Questions of size and complexity  arise: It would be interesting, for instance, to find out whether there is a polynomial (or exponential) trade-off between program size and and the concept of simulating a process.
  • Some questions  on algorithmic complexity arise: Will the encoding always be more complex if the machine is simpler? All theorems in algorithmic information theory depend on additive constants, which depend on the sizes of typical universal Turing machines. What is the impact of different generalizations of universality on algorithmic complexity and what is the role of  encoding in such a measure?
  • Some questions arise on the relation between several variants of universality definitions: Is there an effective and efficient encoding for each non-periodic encoding preserving universality? If so, how does this impact their complexity? Is there a non-periodic encoding with blank characters for each periodic blank word encoding, and what would the impact of such  an encoding be on the size/complexity of the Turing machine in question?

The field is active and still an important area of research. Several computer science conferences include talks on small computational systems such as Computability in Europe (CiE) and Machines, Computations and Universality (MCU) included such talks this year, focusing in particular on reversible cellular automata and universal Turing machines.

Here are some references from the small Turing machine community:

[1] Manfred Kudlek. Small deterministic Turing machines. Theoretical Computer Science, 168(2):241-255, November 1996.
[2] Manfred Kudlek and Yurii Rogozhin. A universal Turing machine with 3 states and 9 symbols. In Werner Kuich, Grzegorz Rozenberg, and Arto Salomaa, editors, Developments in Language Theory (DLT) 2001, vol. 2295 of LNCS, pp. 311-318, Vienna, May 2002. Springer.
[3] Maurice Margenstern and Liudmila Pavlotskaya. On the optimal number of instructions for universality of Turing machines connected with a finite automaton. International Journal of Algebra and Computation, 13(2):133-202, April 2003.
[4] Claudio Baiocchi. Three small universal Turing machines. In Maurice Margenstern and Yurii Rogozhin, editors, Machines, Computations, and Universality (MCU), volume 2055 of LNCS, pp. 1-10, Chisinau Moldavia, May 2001. Springer.
[5] Turlough Neary and Damien Woods. Four small universal Turing machines. Machines, Computations, and Universality (MCU), volume 4664 of LNCS, pp. 242-254, Orleans, France, September 2007. Springer.
[6] Yurii Rogozhin. Small universal Turing machines. Theoretical Computer Science, 168(2):215-240, November 1996.
[7] Shigeru Watanabe. 5-symbol 8-state and 5-symbol 6-state universal Turing machines. Journal of the ACM, 8(4):476-483, October 1961.
[8] Shigeru Watanabe. 4-symbol 5-state universal Turing machines. Journal of Information Processing Society of Japan, 13(9):588-592, 1972.
[9] Stephen Wolfram. A New Kind of Science. Wolfram Media, 2002.

Seth Lloyd’s quantum universe view

Wednesday, November 22nd, 2006


In an exchange of emails, Seth Lloyd and I discussed the topic I wrote about some posts ago. Here is some of it.

According to Lloyd, there is a perfectly good definition of a quantum Turing machine (basically, a Turing machine with qubits and extra instructions to put those qubits in superposition, as above). A universal quantum computer is a physical system that can be programmed (i.e., whose state can be prepared) to simulate any quantum Turing machine. The laws of physics support universal quantum computation in a straightforward way, which is why my colleagues and I can build quantum computers. So the universe is at least as powerful as a universal quantum computer. Conversely, he says, a number of years ago he proved that quantum computers could simulate any quantum system precisely, including one such as the universe that abides by the standard model. Accordingly, the universe is no more computationally powerful than a quantum computer.

The chain of reasoning, to jump to the quantum computer universe view, seems to be 1 and 2 implies 3 where 1, 2 premises and the conclusion 3 are:

1 the universe is completely describable by quantum mechanics
2 standard quantum computing completely captures quantum mechanics
3 therefore the universe is a quantum computer.

Seth Lloyd claims to have proved the connection between 1 and 2, which probably puts the standard (or some standard) theory of quantum mechanics and the standard quantum computing model in an isomorphic relation with each other.

Lloyd’s thesis adds to the conception of the Universe as a Turing computer an important and remarkable claim (albeit one that depends on the conception of the quantum computer), viz. that the Universe is not only Turing computable, but because it is constituted by quantum particles which behave according to quantum mechanics, it is a quantum computer.

In the end, the rigid definition of qubit together with the versatility of possible interpretations of quantum mechanics allows, makes difficult to establish the boundaries of the claim that the universe is a quantum computer. If one does assume that it is a standard quantum computer in the sense of the definition of a qubit then a description of the universe in these terms assumes that quantum particles encode only a finite amount of information as it does the qubit, and that the qubit can be used for a full description of the world.

Quantum computation may have, however, another property that may make it more powerful than Turing machines as Cristian Calude et al. have suggested. That is the production of indeterministic randomness for free. Nevertheless, no interpretation of quantum mechanics rules out the possibility of deterministic randomness even at the quantum level. Some colleagues, however, have some interesting results establishing that hidden variables theories may require many more resources in memory to keep up with known quantum phenomena. In other words hidden variable theories are more expensive to assume, and memory needed to simulate what happens in the quantum world grows as bad as it could be for certain deterministic machines. But still, that does not rule out other possibilities, not even the hidden variables theories, even if not efficient in traditional terms.

This is important because this means one does not actually need ‘true’ randomness, the kind of randomness assumed in quantum mechanics. So one does not really need quantum mechanics to explain the complexity of the world or to underly reality to explain it, one does require, however, computation, at least in this informational worldview. Unlike Lloyd and Deutsch, it is information that we think may explain some quantum phenomena and not quantum mechanics what explains computation (neither the structures in the world and how it seems to algorithmically unfold), so we put computation at the lowest level underlying physical reality.

Lloyd’s thesis adds to the conception of the Universe as a Turing computer an important and remarkable claim (albeit one that depends on the conception of the quantum computer), viz.  that the Universe is not only Turing computable, but because it is constituted by quantum particles which behave according to quantum mechanics, it is a quantum computer computing its future state from its current one. The better we understand and master such theories, the better prepared we would be to hack the universe in order to perform the kind of computations–quantum computations–we would like to perform.

I would agree with Rudy Rucker too as to why Seth Lloyd assigns such an important role to quantum mechanics in this story. Rudy Rucker basically says that being a subscriber to quantum mechanics, Lloyd doesn’t give enough consideration to the possibility of deterministic computations. Lloyd writes, “Without the laws of quantum mechanics, the universe would still be featureless and bare.” However, though I am one among many (including Stephen Wolfram) who agree  that it is unlikely that the universe is a cellular automaton, simply because cellular automata are unable to reproduce quantum behavior from empirical data (but note that Petri and Wolfram himself attempt explanations of quantum processes based on nets), there’s  absolutely no need to rush headlong into quantum mechanics. If you look at computer simulations of physical systems, they don’t use quantum mechanics as a randomizer, and they seem to be able to produce enough variations to feed a computational universe. Non-deterministic randomness is not neccesary; pseudorandomness or unpredictable computation seem to be enough.

Is the Universe a Computer? (Ist das Universum ein Computer?) Conference, Berlin, Germany, 6,7 November 2006

Sunday, November 12th, 2006


Ist das Universum ein Computer?
Germany, November 2006, Informatik Jahr
Deutschen Technikmuseum Berlin
From Konrad Zuse’s Invention of the Computer to his “Calculating Space” to Quantum Computing.

Lesson One: For someone with a hammer in his hand the world seems to be a  nail. Joseph Weizenbaun.

Lesson Two: Knowing the input and the transition function of a Turing machine we know everything about it. Marvin Minsky.

Zuse's drawing 2
Dr. Zuse’s futuristic drawing

– The first talk entitled  “What Can We Calculate With?” by Prof. Dr. Bernd Mahr from the Technische Universitat of Berlin was a very good introduction to the standard theory of computation based on Turing’s model and classical mathematical logic. His remarks on the time when computing arose from math because the Greeks discovered they were unable to compute the square root of 2 were interesting. He pointed out some evident but not always explicit facts: Calculation has a subject (the individual who calculates), an object (what is calculated),  a medium (how it is calculated), and a symbolic representation (the language -binary, for instance). His use of the Leibniz medallion for explaining   starting points, ending points and transitions in a calculation was elementary but interesting (transitions: intermediate calculations). Further explanations of reversibility and non-linearity using transition nets were also illuminating. The context of a calculation (or computation) and the strong relation between the computation itself and its context is  such that it is sometimes difficult to distinguish them. Since any process can be seen as an object in itself, the context can become the calculation and the context of the context too. In some way, as we know, the concept of calculation is a constraint of a part of a calculation, and then it is defined in terms of an input, an ouput and a transition. He pointed out too that behind many of these complicated systems there is a Turing machine. It  is no longer visible from the top, but it is there.

Konrad Zuse
Dr. Konrad Zuse

– Dr. Horst Zuse’s  talk titled “Konrad Zuse’s Calculating Space (Der Rechnende Raum)”:
Dr. Konrad Zuse’s son, Dr. Horst Zuse made some interesting remarks about his father’s book “Calcualting Space”,  in which  Dr. Zuse proposes studying  the universe as a digital system, specifically a cellular automaton. Dr. Horst Zuse is a professor at the Technische Universitat of Berlin and his personal webpage can be found at: and

Zuse's son
Dr. Konrad Zuse’s son, Dr. Horst Zuse

Dr. Zuse’s father’s main question was: “Is Nature Digital, Analog or Hybrid?” It seems that he tended to answer “Digital,” proposing a No-Yes value language (binary). His thoughts were published in the Nova Acta Leopoldina. He evidently did acknowledge that there could be problems attempting to reconcile an explanation of the Universe in terms of Cellular Automata with Quantum Mechanics and General Relativity.

According to Konrad Zuse, laws of physics could be explained in terms of laws of switches (in the computational sense). Physical laws are computing approximations captured by the formulas in our models. He saw that differential equations could be solved by digital (hence discrete) systems.

Dr. Petri talk
Dr. Carl Adam Petri at the Berlin conference

– Dr. Carl Adam Petri (yes, the creator of the Petri nets!) on “Rechnender Netzraum” or “Computing Net Universe”:
According to Dr. Petri, at the Planck length quantum computing can be described by digital systems using combinatorial models (net models, kennings, combinat), and therefore the universe can be studied using discrete nets which are even capable of explaining quantum and relativistic fundamentals like Bell’s theorem and Heisenberg’s uncertainty principle. That would mean that discrete systems (for instance those proposed by Stephen Wolfram) would suffice to explain even quantum and relativistic phenomena.

According to Petri, measurement is equivalent to counting. For instance in S.I. one second is 9192631770 Cs periods. In fact Norbert Wiener proposed some axiomatics of measurement.

The correspondence of Petri nets with the Heisenberg uncertainty principle arises from the limitations of our observational capacities when carrying out measurements. When two different types of observations are performed, -for example momentum p and position q- we can only see p or q in a chain of succesive events related by a causality net. His nets as well as his explanations of such phenomena are very neat and elegant. The relevant slides  on causality and linear logic may be found  at:
He also distributed a CD with his slide presentation at the conference.

For Petri, the Universe is a Petri Net.

Dr. Seth Lloyd’s presentation at the conference in Berlin, Germany

– Dr. Seth Lloyd’s talk entitled “The Universe is a Quantum Computer”:
Some researchers think of information as more fundamental than physics itself. And it is a common practice in science to find all the time more fundamental structures on which previous ones were lying on. Some others such as Seth Lloyd and David Deutsch, however, strongly stand in favor of putting a quantum reality at the lowest level of reality description as they stand for a physical universe where information only makes sense if it is carried and represented by a physical entity. So these authors have developed world theories based in the concept of quantum computation.

A quantum computer differs from a Turing machine in that its bits are quantum bits, and so can exist in a superposition. In addition, it can be instructed to put those bits in superpositions. A universal Turing machine can simulate any other Turing machine in polynomial time but while what a quantum computer does is still Turing computable, a typical quantum computation of T steps on N qubits requires $O(2^N)$ bits on a classical Turing machine, and $O(T 2^2N)$ logical operations. That is, it takes a Turing machine exponential amounts of time and space to simulate a quantum computer or quantum Turing machine.

The standard quantum computer as shaped by, among others, Feynman and Deutsch differs from the classical computer in the concept of the qubit. While the concept of qubit takes advantage of a clear asset that the quantum world provides, entanglement, it does not necessarily makes use of the full properties of quantum mechanics as interpreted under the Copenhagen interpretation. The convenient definition of a qubit makes a quantum computer not to compute more but much faster than classical digital computers.

Lloyd’s conclusion is that the universe is not only a computer but a quantum computer. However some questions arise:

1. What exactly is a quantum computer? Does the quantum computer definition actually captures all quantum phenomena? According to the standard model quantum computing is Turing computable (disregarding run time). If Lloyd is assuming the standard model then the universe is indeed a quantum computer, but even more remarkably (since we have scientific and philosophical hypotheses like the Turing thesis) it is Turing computable. However, if he is assuming the more general quantum mechanics model, let’s say the standard model in physics (which basically assumes the possibility of harmless continuity rather than inquiring into it) he is saying that the universe is not a computer (since the term derives  from what we mean by Turing computable and hence covers  digital computers too). So the assumptions made are significant and cannot be glossed over if one wishes  to argue convincingly that  the universe is  a computer in some standard sense. If what we  assume to be computing is something that seems to be deterministic or rule-based, the concept becomes fuzzy and additional remarks need to be made.

2. What if a quantum particle encodes more information than just a 1 or 0 for the spin or any other quantum property? Let’s say a third value, or even worse, a non-computable number of values. In quantum mechanics for example, the superposition of a particle assumes an infinite and non-countable number of places since it is in all the spectra at the same time. If space/time is a continuum then it is evidently in a non -countable number of positions, which leaves us with  a non-computable model, or at least with something that’s neither a Turing-computable model nor a standard quantum-computable model. And this is not a simple assumption since it requires another Turing-type thesis which in the final analysis does not answer the most fundamental  question, i.e. whether the universe is a computer or not and if it is, what kind of computer (in the computational power sense) it is.

I raised these questions with Seth Lloyd and I will be posting his answers soon.

Seth Lloyd lecture at Berlin
Seth Lloyd at the conference in Berlin

An interesting concept mentioned by Seth Lloyd is “hacking the universe”. As Charles Bennett used to say, a computer is a handicapped quantum computer. So if Lloyd is right, a computer is not only a handicapped quantum computer but it is not taking advantage of the full computational power of the universe and it is just patching the universe instead of hacking it, as it would be in its power to do. By contrast, a quantum computer uses some particles that are already computing “something” (nothing less and nothing more than the processes in the universe ) to perform the computation that we want it to perform. It can be said to be  “hacking the universe” in Lloyd’s terms.

On the other hand, if the notion of programmer monkeys is valid it should be possible to test it experimentally. Under the supervision of M. Jean-Paul Delahaye, computer science professor at the University of Lille I ( we are undertaking this task. We are exploring  Lloyd’s quantum computational universe (or at least a handicapped but representative part, the recursive computational universe), applying some complexity measures (universal distribution, average-case complexity or Levin’s measure) in order to uncover the monkeys behind the Universe, or in other terms, to analyze the average distribution of randomly discrete systems with random inputs.

Is Seth Lloyd falling into the carpenter’s problem of thinking that the universe is a nail and the moon made of wood?  Is it because he is a quantum computer scientist that he thinks the universe is a quantum computer? He argues of course that the charge is unfair, but then we have been told by Dr Petri  that the Universe is in fact a Petri Net which probably  needs neither strong randomness nor quantum mechanics!

Here  is a video online in which he explains much of this:

Zuse's drawing
Dr. Zuse’s futuristic drawing 2

– Jurgen Schmidhuber reprised his algorithmic approach to the theory of everything in his talk entitled   “The program that computes all computable universes”.
Jurgen Schmidhuber’s major contribution probably is his Speed Prior concept, a complexity measure similar to Algorithmic Information Complexity, except that it is based on computation speed and not program length. i.e. the fastest way of describing objects rather than the shortest.
There is more information on his website: (where he includes an unfavorable review of  NKS) and in his slide presentation on the Speed Prior at:
Of course Schmidhuber himself has identified a problem with the Prior measure: If every possible future exists, how can we predict anything?

Other interesting talks on philosophical issues: If the Universe is a computer, therefore the human mind should be a computer too.
Is “the Universe is a  computer” a metaphor?
My answer: The metaphor is “The Universe is not a Computer”

Lesson Three: Metaphors can be reversed.

Kovas Boguta
Kovas Boguta’s  talk was titled “Is the Computer a Universe?” In it he pointed out the richness of mining the computational universe of simple programs.

Because we were together during the gala dinner I had an interesting exchange with Dr. Konrad Zuse’s son, Dr. Horst Zuse (Also at our table were the Technikmuseum director Dr. Dirk Bondel and  my colleague Kovas Boguta from Wolfram Research, among others). He shed some light on his father’s interactions with Alan Turing ( none apparently),  with von Neumann (some interaction regarding the controversy over who first built a digital computer and concerning von Neumann’s architecture, which  our current digital computers do not conform to, the ALU being separated from the memory as it is in Zuse’s conception but not in  von Neumann’s original design).

Zuse’s Z1 first computer “the input device, something equivalent to the keyboard, at the Technikmuseum in Berlin”

Kurt Godel: The writings. Université de Lille III

Sunday, November 12th, 2006

Kurt Godel workshop for studying his legacy and writings. Lille, France, May 19-21, 2006

My thoughts, ideas, references, comments and informal notes:

– The wheel machine, a machine for real computation which I am proposing -as a thought experiment- in a forthcoming paper  on the Church-Turing thesis -Yes, one more paper on the CT thesis!- with comments on Wilfried Sieg’s paper entitled “Church Without Dogma: Axioms for Computability”

– “In case Cantor’s continuum problem should turn out to be undecidable from the accepted axioms of set theory, the question of its truth would loose its meaning, exactly as the question of the truth of Euclid’s fifth postulate in Euclidian geometry did”. Godel replies: “It has meaning anyway, as Euclid’s fifth postulate gave rise to other now accepted mathematical fields.”

– Godel Gibbs Lecture and his dicotomy on absolutely undecidable propositions and the computational power of the human mind (Turing did great work… but he was wrong when he proposed his formal theory as a model of human thought…)

– New contacts and references: Olivier Souan, Rudy Rucker, Karl Svozil

Mark van Atten’s “On Godel’s awareness of Skolem’s lecture”.
Rick Tieszen

– Herbrand on general recursive functions, letter to Godel.

– Leibniz’ influence on Godel’s arithmetization?

– Sources: Godel Editorial Project. Firestone Library, Princeton University. I.A.S. Marcia Tucker, librarian for Godel papers.

– Godel’s concept of finite procedure as the most satisfactory definition of computation. “A machine with a finite number of parts as Turing did” or “finite combinatorial procedure” as a definition of an algorithm, mechanical or computational procedure.

– Computation’s main constraints: boundness and locality (paper from Hernandez-Quiroz and Raymundo Morado).

– Aphorisms and autoreference (Gabriel Sandu and Hinttika)

– Feferman on Turing

– Is Sieg’s paper and the question of “finite machine=effective procedure” a tautology? In fact such an approach seems to be one of the most strict versions of the Turing Thesis, and even though both Church and Turing probably did propose it in such a strict sense, extensive versions of the thesis have traditionaly covered more content, but even when it is strictly stated that there is still space for a thesis, it is neither proved nor provable from my point of view, and most authors would concur, though some clearly would not. I will comment on this more extensively later, since this was one of my Master’s topics and merits a post by itself.

– Putnam’s thought experiment on cutting all sensorial inputs. Solution: It is impossible in practice. However, machines are an example in a sense, and that is why we do not recognize intelligence in them — they are deprived of  sensorial capabilities.

Yes, Godel found an inconsistency in the U.S. constitution. My answer: One? Certainly a bunch. That’s why we need lawyers, who make them even worse.

Universality on Real Computation

Thursday, October 12th, 2006

A paper of mine in French on this subject is already in arXiv: “Universality on Real Computation”. Chris Moore called my attention to his paper entitled “Recursion Theory on the Reals and Continuous-time Computation” ,  which arrives at similar results using a different approach. A paper I’m writing in English on this subject, and which I have been discussing with several people, is forthcoming.  A preliminary powerpoint presentation that I used when presenting on the topic to Gregory Chaitin is available here:Universality on Real Computation In computability theory, the theory of real computation deals with hypothetical abstract machines capable of infinite precision. They operate on the set of real numbers, most of which are non-computable by a Turing machine. These hypothetical computing machines can be viewed as idealised analog computers or differential machines. The question I am interested in concerns the consequences of computing on the set of real numbers for current foundational concepts in traditional computation,  for example universality. I have found that an infinite number -possibly denumerable- of  universal real computers of varying power allows what I have called “intrinsic universality” (there is another definition of intrinsic universality related to dynamical systems and universal computation), since real numbers are not bounded in complexity and there is therefore one for each class of complexity. However, taking the complete set of real numbers, what I claim is that a single and ultimate universal machine for real computation is not possible, since for each proposed universal real computer A –with pre-fixed real numbers R=r_1,r_2,…,r_n — it suffices to calculate the maximal Turing degree of the set of the real numbers involved, let’s say O_n, to conceive a new real computer B able to compute numbers of higher Turing degree, let’s say O_m with m>n, which the given universal real machine A won’t be able to emulate– which contradicts the definition of a universal abstract real computer. Therefore A won’t be able to emulate (in the way conceived for defining universality in the traditional Turing model)  any other real machine B which belongs to the same set –the real numbers– (which evidently can be seen as their class of languages) unless it allows non-recursive functions going through all the levels of the arithmetical and hyper-arithmetical hierarchy. In other words, real computation does not allow universality, which is the very foundation of computer science unless we take into consideration non-recursive functions of arbitrary power, which adds an aditional problem to the conception of a universal device and the convergence of real computing models, unlike the traditional model based on recursive functions theory and Turing machines. However, in real computation there is a place for relative universality, which is very rich in ideas and theories and gives rise to a hierarchical Church-Turing type thesis for each level of the arithmetical and hyper-arithmetical hierarchy. So talking about the universal abstract device E_n for example–because of  the arithmetical level E_n– for each n natural number would make sense, unlike a device E able to behave like any other E_n for any n natural number, which would need at least one function or the composite of several functions able to attain any arbitrary degree of computability. In other words, E wouldn’t be a field since it is not closed under the set of operators of the defined E.

Here is a related paper entitled on “The complexity of real recursive functions” by Manuel Lameiras Campagnolo. In France, “M. Olivier Bournez” has a particular interest in real computation, in particular analog and hybrid systems.Other important authors in the field include : Karp, da Costa, Blum, Cucker, Shub and Smale. Felix da Costa has let me know that  he is working on something related to my ideas on universality in real computation and relative universality.It is important to say that being interested in real computation is not the same thing as being an “hyper-computationalist”. In fact, some if not most  of the above authors think that real computation would at best be only as powerful as the digital models.  Their interest in the field is mainly operational, in the sense that these types of  machines perform operations commonly related to fields like real anaylisis.

Computability in Europe Conference (CiE) Report, Wales UK

Monday, August 7th, 2006

This is a report on the Computability in Europe Conference (CiE), held at the University of Swansea, Wales in the United Kingdom in July 2006.

I attended a mini-course on Quantum Computing given by Julia Kempe, a lecture on the Church-Turing thesis by Martin Davis– who defended it against proposed models of hypercomputation– and a  lecture on Proof Theory. Another very interesting lecture was on Godel and Turing’s remarks on the Human Mind (the dichotomy argument from Godel and the mechanistic vision from Turing). Among other noteworthy lectures were Samuel Buss’ on Complexity of Proofs, John Dawson’s on Godel in Computability, Wilfried Sieg’s on the Concept of Mechanical Procedure in Godel and Turing, as well as many presentations on hypercomputability and computing over the reals. I met people whom I had only known through email exchanges, like Felix Da Costa from the Technological Institute of Lisbon, Robert Meyer professor emeritus at the National University of Australia, and Julia Kempe from France who is a renowned researcher in the quantum computing field and with whom I shared some doubts I had concerning where the restrictions in Quantum Computing lay which constrained its power to the set of recursive functions. I also met people from SUNY who are doing interesting research on Turing-computation, studying isomorphisms between Oracle machines and the relation with the Tenenbaum theorem upon the uniqueness of the recursive model of PA (Peano Arithmetic). Many lectures were given on computing over infinite time and space and computing at the limit of the general relativity theory. The conference was intended to take the pulse of the field of hypercomputation in Europe and worldwide.

International Conference in Complex Systems, NECSI

Monday, August 7th, 2006

NECSI/ICCS Conference Report, Quincy, Greater Boston, USA, July 2006.

First lesson: For every complex problem there is a simple, neat, wrong solution.

I attended talks given by Ed Fredkin on Finite Nature, Lazlo Barabasi on Complex Networks, Christoph Teuscher on Biology and Computation and John Nash on his research upon Game Theory.
* Ed Fredkin presented a table salt 3-D cellular automata model that fulfils all physical constraints on symmetry and energy conservation. The model is surprisely Turing-universal. As a reminder, the Zuse-Fredkin Thesis is a Church-type thesis which claims additionally that the universe being a Cellular Automaton can be understood in terms of the evolution of  Cellular Automata. An interesting paper entitled “Church-Turing Thesis is Almost Equivalent to Zuse-Fredkin Thesis” by Plamen Petrov is available here>
And much more information on the interesting ideas of Ed Fredkin is available in and on  Ed Fredkin homepage at MIT, which  includes some remarks on why this thesis is not in agreement with  Stephen Wolfram’s, since Wolfram does not intend to imply that the universe is a classical cellular automaton but rather conceives of   a discrete universe based on systems performing simple rules and producing all the complex behavior we find in nature. Such systems are comparable to Turing machines, tag systems, axioms or any other equivalent system.
* In his lecture, Lazlo Barabasi claimed that behind all complex systems there are complex networks, in particular Free-Scale Networks (with a few nodes very well connected with others, and many nodes weakly connected). These nets are efficient and robust (even minus random nodes they remains connected)except under attack (provided the best connected nodes are targeted).
* Christoph Teuscher gave a lecture on computation inspired by biology. However, as he himself admitted, it was not his area of expertise.
* John Nash presented his research on Game Theory using Mathematica as an engine.
* I met Hava Siegelmann and one of her fellows, Yariv, who is working on Artificial Intelligence. Siegelmann worked some years ago (while completing  her PhD dissertation) upon a model of computation called Analog Recurrent Neural Network or just ARNN which, under certain circumstances,  is able to compute more functions than the set of recursive functions , which are those computed by the Turing machines. She is now working on topics related to Molecular Biology. I asked her about the forceful  critiques delivered by traditional logicians like Martin Davis, who wrote a paper entitled “The Myth of HyperComputation.”    The target of most of these critiques is a certain circular argument—to compute more than Turing machines it is necessary to use non-Turing computable weights previously coded into the neural network. It has been known for a long time that the set of all real numbers is able to encode any arbitrary language, even if it is non-Turing computable. What is remarkable from my point of view is her result relating to complexity classes, since weights with lower complexity are able to compute a higher class when  used in those networks. Aditionally she argued that even with a stochastic function her networks are able to solve non-computable functions. Recently I discussed the fact with Stephen Wolfram and we agreed that she is assuming strong randomness. I would say however that Siegelmann’s work is much more beatiful from a complexity point of view than from a “hypercomputational” viewpoint. In her book published under the title “Neural Networks and Analog Computation: Beyond the Turing Limit ” she proves that: a) there exists an universal neural network with only 9 neurons, and b) that  p(r) suffices to compute a non-computable function, where r is an arbitrary complete real number but p(r) represents the first p digits of the expansion of r–which means that linear precision suffices to achieve “super-Turing” capabilities, assuming that the neural network can have access to any possible real number before the computation. In other words this seems to be true only if all possible real numbers are allowed a priori (just as in the Turing machines an unbounded tape is neccesary to carry out all recursive languages, and neural networks with rational numbers as weights do not compute the same set of functions as neural networks with whole numbers as weights, the first having been  proven by Klenee to compute the same set as Turing machines and the second  to compute the same set of languages as finite automata, those called regular languages).
I exchanged ideas with some other interesting people, like an engineer from the Space and Airbone Systems Department of Raytheon. And I met John Nash Jr. during the gala dinner  at which he was presented with an award  for his contributions to  Complexity Theory, mainly honoring his work relating to Game Theory.