## Archive for the 'Foundations of Computation' Category

### Algorithmicity and programmability in natural computing with the Game of Life as in silico case study

Thursday, September 4th, 2014

In a previous article, I suggested a method for testing the algorithmicity of a natural/physical process using the concept of Levin’s universal distribution. In this new paper published by the Journal of Experimental & Theoretical Artificial Intelligence, I explain this method in the context of the problem formulated by Floridi concerning the testability of pancomputationalism. Then, I introduce a behavioural battery of programmability tests for natural computation, as an example of a computational philosophy approach. That is to tackle a simplified version of a complex philosophical question with a computer experiment. I go on to demonstrate the application of this novel approach in a case study featuring Conway’s Game of Life. I also briefly discuss another question raised by Floridi, concerning a grand unified theory of information, which I think is deeply connected to the grand unification of physics. You can access the paper for free here.

### Calculating a Universal Distribution to Approximate Kolmogorov-Chaitin Complexity

Wednesday, December 12th, 2012

Computing the incomputable has always been a challenge. For example, in finding the busiest Turing machines (Rado) given a number of symbols and states (whimsically called busy beavers). This means either finding Turing machines that, starting from an empty input, produce more non-blank symbols in their output tapes before halting than any other Turing machine of the same size, or Turing machines that, also starting from an empty input, have the greatest runtime before halting than any other Turing machine of the same size. Both problems are ultimately undecidable because of the Turing-complete capabilities of Turing machines, as proven by Alan Turing himself (that is, the capability of some Turing machines to simulate any other Turing machine).

In this new paper we describe how we have managed to calculate an approximation of a so-called Universal Distribution (aka Levin’s semi-measure) which connects the frequency of production of a string to its Kolmogorov complexity (K). The chief advantage of calculating an approximation of the Universal Distribution is that it is an incremental process over an average of a large number of Turing machines. One doesn’t get rid of the constant from the invariance theorem in the theory of algorithmic information theory (for example when Kolmogorov complexity is measured using 2 different universal Turing machines), yet one seems to have to make fewer arbitrary decisions.

One of the main advantages is that one can better deal with strings of very short lengths. Think about it! If one wished to approximate K for a single bit by using compression algorithms, the lossless compression algorithm would not be able to compress the single bit any further. And this not only happens for a single bit but for all strings up to a certain minimal length for which lossless compression algorithms are simply unsuitable (recall that a compression algorithm includes the decompression instructions together with the data in the new compressed object in order to make it self-decompressible).

The usefulness of lossless compression algorithms as a method for approximating K derives from the fact that compression is a sufficient test of non-randomness. This is because K is, more precisely than an uncomputable function, upper semi-computable, meaning that one can estimate upper bounds. The lossless compressed length of an object s (e.g. a string) is therefore an upper bound on K(s). The usefulness of the Coding Theorem Method (the theorem presented in this paper) will ultimately come down to whether it is useful in applications, which is the main motivation behind its conception, given the failure of compression algorithms to provide information about the possible K(s) values for s that are too short (shorter, say, than the typical size of the length of the instructions that a lossless compression algorithm has to add to the compressed object).

The state of the art of the Coding Theorem Method can be gleaned from this paper, recently made public by my colleagues and I, and announcing the release of the calculation of a universal distribution based on (5,2), that is, all Turing Machines with 5 states and 2 symbols: Calculating Kolmogorov Complexity from the Output Frequency Distributions of Small Turing Machines

It represents a major improvement over this previous calculation of mine, that required new and more powerful techniques to deal with a semi-computable distribution. It improves our previous work in terms both of accuracy and coverage of number of short strings and validates previous calculations of universal distributions, showing the incremental nature of the method to be fully compatible with the other calculated universal distributions with smaller samples of small Turing machines (and for which the known Busy Beaver values could be used).

In this other paper we explain why our approximations of K are real-number values, showing that strict integer-value program size follows our Coding Theorem Method, and thus that ours constitutes a more fine-grained measure. It is also shown that logical depth departs from both strict program-size and the Coding Theorem Method evaluations, being in agreement with the theory for all these 3 measures. The paper is available online at: Correspondence and Independence of Numerical Evaluations of Algorithmic Information Measures

In the next post I will be announcing and also briefly explaining the results from another paper showing that not only can our Coding Theorem deal with short strings, but that we have found a way to validate the method by lossless compressibility. Moreover, we have found that in the transition period, where the Coding Theorem Method starts to be too expensive to be of practical use whereas the compression method starts to provide some results, the 2 methods are in great agreement with each other. Like an expensive microscope of great power (e.g. the LHC at CERN), our Coding Theorem Method requires an incredible amount of calculation. The trick is to know when to use a microscope–to know when a microscope is required rather than a telescope. We believe we are providing the tools to deal with the information content of the smallest entities in the computational universe.

### Hypercomputation in A Computable Universe

Friday, September 21st, 2012

This is the full version of my answer to a question formulated by Francisco A. Doria to me included in the Discussion Section of A Computable Universe, my edited volume being published by World Scientific and Imperial College Press coming out next month (already available in Asia) concerning whether I think if Hypercomputation is possible:

I was once myself a hypercomputation enthusiast (my Master’s thesis–in French–was on “Hyper calcul”, focused on feasibility of hypercomputational models). Paradoxically it wasn’t until I had a good knowledge of it that I started to better appreciate and to increasingly enjoy more the beauty of the digital (Turing) model. On the one hand, hypercomputational models do not converge in computational power. There are a plethora of possible models of hypercomputation while there is only one digital (in terms of computational power). I find it astonishing how little it takes to reach the full power of (Turing) universality, and I don’t see the power of universal machines as being subject to any limitation, all the opposite. I happen to think the question has a truth value and is ultimately susceptible of a physical answer. However, I don’t think the answer will emerge in the foreseeable future, if it ever does.

Still, I found Doria’s contribution to the subject interesting, as he points out a particular (“ideal”) model of hypercomputation that I think gets around some of Martin Davis’ objections (The Myth of Hypercomputation), though it doesn’t fully address the problem of verification. Unlike a Turing computation, which can in principle be verified by carrying it out by hand, step by step the inner working of a hypercomputer can only be followed by another hypercomputer. The caricature version of the problem is in the whimsical answer given by the (hyper?)computer Deep Thought in “The Hitchhiker’s Guide to the Galaxy” by Douglas Adams, which proposed “42” as the “Ultimate (uncomputable?) answer to the Ultimate Question of Life, The Universe, and Everything” [added parenthesis]. The only way to verify such an answer would be by building another, more powerful and even less understandable computer. This makes me wonder whether we ought not to favour meaningful computation over what could potentially be hypercomputation, even if hypercomputation were possible.

There is a strong analogy to the concept of proof in math. Mathematical proofs seem to fall into two types. They either serve to convince of the truth of a statement one wasn’t certain was true, or else to provide logical evidence that a statement intuitively believed to be true was in fact so (e.g. the normality of the mathematical constant pi). But in the latter case, why would one bother to provide a proof of a statement that nobody would argue to be false? It is because ideally math proofs should provide insight into why a statement is true, and not simply establish whether or not it is so. There are a few exceptions of course. Some come from mathematical practice, for example Wiles’ proof of Fermat’s last theorem. It is not clear whether Wiles’ proof provides any insight into the original question of the truth value of Fermat’s theorem (it does contribute to understand the connection among different powerful mathematical theories). Some other cases, especially among computer automated proofs, are of the same kind, often neglecting the fact that a proof is also about explanation (for humans). In that sense I think we should also favour meaningful mathematical proofs (from meaningful mathematical questions!), just as we should better appreciate meaningful (digital) computation.

The study of infinite objects has given us great insight into profound and legitimate questions of math and computation, such as the question of the nature of what is a number or what is a computation. And it has been immensely useful to focus on limits of computation in order to better understand it. It is at the boundary between decidability and undecidability that one seems best positioned to answer the question of what does computation mean. And there are some examples where the study of a hypercomputational model is more valuable not as a model of hypercomputation but by its ancillary results. Unlike most people, I think the contribution of, for example, Siegelmann’s ARNN model, has much more to say about computational complexity (the ARNN model relativises P = NP and P != NP in a novel fashion) and therefore classical computation than about hypercomputation! (Siegelmann’s ARNNs model has come to be strangely revered among hypercomputation enthusiasts).

While I may agree that the problem of verification of a computation is not necessarily an argument against hypercomputation (because it can also be used against digital computation in practice), the answer is only in the realm of physics and not in paper models. As Davis points out, it is not a surprise that encoding non-computable numbers as weights in an artificial neural network deals to non-computable computation! so, assuming real numbers exist the consequence is straightforward and the justification of a hypercomputational model just circular.

Other models of hypercomputation are just fundamentally wrong, such as Peter Wegner’s model of “interactive computation” that is claimed to break Turing’s barrier. For a recent commentary pointing out the the flaws of yet another hypercomputational claim, you can consult this entertaining blog post by Scott Aaronson and his Toaster-Enhanced Turing Machine.

### Conjectures concerning Busy Beavers, Dynamic Behavior and Turing Universality

Friday, June 1st, 2012

In a recent paper I have advanced some conjectures using a coefficient that renders aspects of the qualitative behavior of complex systems in quantitative terms. It measures the sensitivity of a system to external stimuli, its apparent ability to (efficiently) transfer information from the input through the output.

In a previous paper, and in a quite different context, I proposed the conjecture that systems capable of (efficient) Turing universal computation had large coefficients (greater than 1), meaning that they are sensitive enough to external stimuli.

It is evident that a system that doesn’t react to its input is unlikely to be (efficiently) programmable, which is what Turing universality is about. Recall that Turing universality is the notion of a machine capable of simulating any other machine of the same type. More formally, if U is a universal Turing machine, anda Turing machine with input s, one can run U with input, that is>, and U will then behave like M with input s, where M and s are any Turing machine and any input.

In that paper I used this coefficient as a phase transition detector, and it so happened that the way in which it was defined made it eminently capable of classifying the behavior of cellular automata and other systems, even yielding Wolfram’s well-known classes of behavior. The paper is available online here.

Now I have advanced another set of conjectures, this time relating a set of Turing machines defined by the way they behave: Busy Beaver Turing machines. These are machines that do a lot of work before halting. More formally, they print more non-blank symbols and have the greatest runtime among machines of the same size (number of states and symbols). I have written a program showing all known and candidate Busy Beaver machines (in Mathematica, runnable on your browser by downloading the Mathematica CDF player):

Busy Beaver from the Wolfram Demonstrations Project by Hector Zenil

Let bb(n) be the Busy Beaver with n states (and 2 symbols). The conjectures say that:

1. (strong version): For all n > 2, bb(n) is capable of universal computation.
2. (sparse version): For some n, bb(n) is capable of universal computation.
3. (weak version): For all n > 2, bb(n) is capable of (weak) universal
computation.
4. (weakest version): For some n, bb(n) is capable of (weak) universal
computation.

They are based on the fact that Busy Beaver machines are complex, in Bennett’s sense. Bennett’s notion of “logical depth” is the idea of capturing the complexity of a string s in terms of the unfolding time of the shortest (or set of nearly shortest) programs producing the said string. The outputs of Busy Beaver Turing machines are therefore, by definition, the ones with the greatest logical depth. If they print more symbols than any other machine and take the greatest time to do so, they have maximal logical depth. Busy Beaver machines are also machines that produce their outputs with the minimum resources (if there were shorter machines producing the same outputs those would be the Busy Beaver machines), hence they are the shortest programs producing the strings. Busy Beaver machines are therefore by definition very deep (in Bennett’s definition of complexity), meaning they are complex in this sense (logical depth is also profoundly related to Kolmogorov complexity, given that the definition involves the shortest(s) programs).

Besides being deep, Busy Beavers have another characteristic: they halt. So they don’t just do their stuff without worrying about whether to stop. One can easily encode a “Busy Beaver” that loops and produces any arbitrary number of non-blank symbols say for example, an infinite loop printing 1s), but Busy Beavers don’t. They save a state to halt at the end. Machines that are not able to halt (the majority) cannot be Turing universal, because they wouldn’t ever produce any output, simply because by definition something is an output if the machine has halted. But machines that always halt would be decidable, and therefore not universal either.

So among the things to attempt to amount to evidence in favor of the positive answer of the conjectures of Turing universality for Busy Beaver machines, is to encode some input such that they don’t stop (and to prove so), as a first step towards a prove for universality. If Busy Beavers are capable of halting for some inputs (e.g. empty input, as this is entailed in the definition of a Busy Beaver) but are also capable of not halting for others, this would strengthen the conjectures. Busy Beavers may, however, be very different among them, as they are defined by behavior, rather than their specific inner workings, but if the behavior of a Busy Beaver in any way determines something that captures a property of all them, one could use the same argument in favor or against them towards proving or disproving these conjectures.

There seems no easy way to prove that all Busy Beavers are capable of universal computation, but such a proof, if it exists, would be the first to characterize a non-trivial set of well defined universal Turing machines. Non-trivial, because of course one can build a universal Turing machine and keep adding resources and proving that the resulting machines are all universal, just as happens with Turing-complete programming languages. But Busy Beaver machines would be characterized by nothing but a qualitative behavior. This makes the assertions of the conjectures more interesting.

It could be the case that Busy Beavers are not capable of Turing universality for blank tapes. In fact, Michel Pascal, a world expert in the field of Busy Beavers, thinks that Busy Beavers may be capable of –weak– universal computation (personal communication), that is, starting from non-blank tape configurations (e.g. a periodic or bi-periodic tape configuration). This accords with another conjecture in Computer Science, the Collatz sequence. The conjecture says that whichever number you start with, if n is even, divide it by 2 to get n / 2. If n is odd, multiply it by 3 and add 1 to obtain 3n + 1; you will end always end up reaching 1. I have also written 2 small computer programs (also in Mathematica) illustrating this conjecture:

Collatz Sequence Paths from the Wolfram Demonstrations Project by Hector Zenil

Collatz Sequence Maximums from the Wolfram Demonstrations Project by Hector Zenil

According to Michel, Busy Beavers would be related to Collatz-type functions (see his paper “Small Turing machines and generalized busy beaver competition“), with conjectures that are related but more general (in the context of small Turing machines) to the ones I’m talking about.

Today, we know that Turing universality requires very little. For example, Wolfram conjectured that a 2,3 Turing machine was universal, on the basis of the aspects ot its behavior that he studied. And as proven by Alex Smith, this 2,3 TM turns out to be capable of universal computation. The machine is surprisingly small, and indeed is the smallest it can possibly be, given that we know that no 2,2 Turing machine can be capable of Turing universality (proof recently provided by Maurice Margensten in a Complex Systems paper). Today we also know many minimialistic systems capable of universal computation: Conway’s game of Life, Wolfram’s Rule 110, many Post-tag systems and a variation called Cyclic tag systems, Wang tiles. And at least 2 important researchers in the field have recently argued for the ubiquity of Turing universality (Martin Davis and Maurice Margenstern myself, along the lines anticipated by Wolfram’s Principle of Computational Equivalence).

Finally, I advanced a last conjecture relating Busy Beavers and my coefficient C in this paper, establishing that:

C(bb(n)) > 0

That is, that Busy Beavers have a positive coefficient C, i.e. are quite sensitive to input stimuli, which, if the very first conjecture is true (that Turing universal machines have a large coefficient) makes this last coefficient a good place to start to make the case for the universality of Busy Beavers.

The paper “On the Dynamic Qualitative Behavior of Universal Computation” at ArXiv and has been recently published in the Journal of Complex Systems (vol. 20-3) also available online here.

### Turing’s Deep Field: Visualizing the Computational Universe

Thursday, January 5th, 2012

I generated this image in the course of an investigation of the distribution of runtimes of programs in relation to the lengths of mathematical proofs, the results of which are being published in my paper bearing the title “Computer Runtimes and the Length of Proofs with an Algorithmic Probabilistic Application to Optimal Waiting Times in Automatic Theorem Proving” Volume 7160 of the Lecture Notes in Computer Science series (LNCS), a festschrift for Cristian Calude.

The preprint is now available online in the ArXiv here. The paper shows that as theoretically predicted, most machines either stop quickly or never halt. It also suggests how a theoretical result for halting times may be used to predict the number of random theorems (dis)proven in a random axiom system –see the section I’ve called “Gödel meets Turing in the computational universe”.

Turing View of The Computational Universe

The plot was the winning image in this year’s Kroto Institute Image Competition, in the Computational Imagery category titled “Runtime Space in a Peano Curve”, it shows the calculation of the distribution of runtimes from simulating (4n+2)^(4n) = 10000 Turing machines with 2 symbols and n=2 states (of a total of more than 10^12 simulated Turing machines with up to n=4 states) following a quasi-lexicographical order in a Peano curve preserving–as far as possible–the distance between 2 machines arranged in a 2-dimensional array from a 1-dimensional enumeration of Turing machines.

In the image each point or cluster of points represents a Turing machine or a group of Turing machines, and the color is determined by a spectrum encoding their halting runtimes–the lighter the square the sooner the machine entered the halting state. White cells represent machines that are proven to halt in infinite time. Red cells show the Turing machines that take longer to halt (popularly called Busy Beavers). Knowing the values of the Busy Beaver functions allows us to identify the machines that never halt (depicted in white). The image is noncomputable, meaning that the process cannot be arbitrarily extended because of the undecidability of the halting problem (i.e. there is no procedure for ascertaining the color of the following pixels to zoom out the picture and cover a larger fraction of the computational universe). Put it in the words of crystilogic,

What you’re looking at is a subset of descriptions of all possible things, and some impossible things. This is possibility and reality compressed into an image.

Turing machines with an arbitrary number of states can encode any possible mathematical problem and are therefore perfect compressors of the known, the yet to be known, and even the unknowable (due to the undecidability of the halting problem).

This is how the image looks like as displayed in the stairway of the Kroto Research Institute in the UK:

Some postcards with the winning image were printed and they can be sent to scholar or enthusiasts upon request sending an email to hectorz[at] labores.eu.

I want to dedicate this prize to my former thesis advisor Jean-Paul Delahaye who suggested me the Peano arrangement as a packing for my visual results of halting times. And also to Cristian Calude who co-advised me all along my PhD thesis and who encouraged me to publish this paper and what better place to do so than for his festschrift.

I’m releasing the images under an open licence in honour of the Turing Year, so that it may be used for any artistic illustration of some of Turing’s main results (the halting problem) or for any other purpose in any medium.

Postcards of the winning image are also available upon request. Just send an email requesting 1 or more postcards to hectorz [at] labores.eu or to let him know (if you can) that you will be using any of the images (or if you need better resolution versions).

### An alternative method (to compression) for approximating the algorithmic complexity of strings

Wednesday, July 20th, 2011

The method introduced in my doctoral dissertation was featured in the French version of Scientific American Pour La Science in its July 2011 issue No. 405 under the title Le défi des faibles complexités.

Jean-Paul Delahaye points out that:

Comme les très petites durées ou longueurs, les faibles complexités sont délicates à évaluer. Paradoxalement, les méthodes d’évaluation demandent des calculs colossaux.
(Like long durations or very short lengths, weak complexities are tricky to evaluate. Paradoxically, the evaluation methods require colossal calculations.)

and he continues:

Pour les petites séquences, cette mesure est stable et conforme à notre idée de la complexité; pour les grandes, elle est, d’après le théorème mentionné [the invariance theorem, my comment], conforme à la meilleure mesure de complexité unanimement admise, la complexité de Kolmogorov. Que demander de plus?
(For short strings, this measure is stable and conforms to our idea of complexity; for long strings, according to the aforementioned theorem [the invariance theorem– my comment], it conforms to the best and universally accepted measure of complexity, Kolmogorov complexity. What more can one ask for?)

Imagine you are given 2 short strings and are asked to tell which one looks more random and which more structured. The strings are 0101 and 1011. Applying the idea of algorithmic complexity, the shorter the description the less random-looking. It would seem that the first string has a repeating pattern that can be taken advantage of in describing it. In plain English one may therefore say that the first string can be described as “zero and one twice” while the second one would require a longer description. In fact, there seem to be fewer descriptions for the first string than for the second (and also notice some ambiguity in plain English). The first may also allow the description “A zero followed by a one followed by a zero followed by a one” or “zero and one followed by the same” and perhaps other variations. Descriptions of the second one can include “one and zero followed by two ones” or “one zero one one,” just as the first one could simply be “zero one zero one,” which doesn’t look like a compressed version of the string but rather a straight translation into an expanded form of plain English.

All this by way of asking whether any of the two strings is without a doubt simpler than the other, or whether the apparent repetition in the first string makes us think that the string has a pattern despite the pattern being repeated only twice. Perhaps when one looks at such a string one gets the impression that it may belong to a larger string comprising alternations of 01 and one concludes that it is simpler than the second one. To leave the subjective realm, one would need to evaluate the algorithmic complexity of the strings and compare their respective values. The algorithmic complexity of a string is the shortest program (measured in bits) producing th string in question running on a universal Turing machine. It is inconvenient that there is no algorithm that, given a string, gives you the length of the shortest program that produces it. This is by reduction to the halting problem. Which means one cannot really measure with absolute certainty the algorithmic complexity of a string because it is uncomputable. It doesn’t mean, however, that one cannot approach it; one can often do so in an effective and useful way.

Traditionally, the way to approach the algorithmic complexity of a string was by using lossless compression algorithms. Lossless means that one can recover the original string from the compressed version. The use of lossless algorithms is preferable because there may be inconsistent ways of compressing strings that give the impression of compressing a string without corresponding to a measure of complexity (e.g., noise deletion is an effective way, but noise may be algorithmically random and the aim of algorithmic complexity is to precisely measure how random something is and not merely to gauge whether it looks random). The result of a compression algorithm is an upper bound of its algorithmic complexity. While one cannot ever tell when a string is not compressible, if one succeeds at somehow shortening a string one can tell that its algorithmic complexity cannot be larger than the compressed length.

One does not want to cheat and claim that one can compress any string into a bit if the decompression algorithm interprets that bit into the desired string. A fair compression algorithm can be defined as one that transforms a string into two pieces: one is the compressed version and the other the instructions to decompress the string, together accounting for the final length of the compressed version. In other words, it would seem as if you were adding the decompression algorithm to the compressed string so that the compressed string comes with its own decompression instructions. In the long run, there is a theorem (invariance) that guarantees that complexity values converge.

But for short strings (which are often the ones useful for practical applications), adding the decompression instructions to the compressed version makes the compressed string often, if not always, longer than the string itself. If the string is, for example, shorter than the size of the decompression algorithm, there will be no way to compress the string into something shorter still, simply because the decompression instructions are at least of the length of the original string. Moreover, the result is so dependent on the size of the decompression algorithm (because it is the greatest contributor to the overall length) that the final value of the compressed length is too unstable under different lossless compression/decompression algorithms.

For example, if one tries to compress a short string using Mathematica, one gets the following results:
StringLength@Compress[“0101”] = 30

Looking at the beginning of the compression line when plotting the lengths of strings (x axis) against their compressed lengths (y axis) one observes that it does not start at y=0 even when x=0.

This means that compressing the string 0101 with 4 bits took 46 characters (even more in bits). In Mathematica, strings begin to be compressed this way at about length 30. This is not a malfunction of Mathematica; it is the result of what I have explained. The Mathematica Compress function is actually based on the Deflate lossless compression algorithm, which is a combination of the LZ77 algorithm and Huffman coding, among the most popular lossless compression algorithms available, used in formats like ZIP, GZIP, GIF and PNG (these last two are therefore lossless compression image formats).

Zooming into the axis origin of the plot one can see that the beginning looks unstable. The string in question is a repetition of n 1s with n lying on the x axis (which represents different compression lengths rather than individual ones to avoid repetition and a step-like curve)

If one tries to compress 1011 one gets none other than the same value:
StringLength@Compress[“1011”] = 30

The instructions obviously take up some space in the final compressed length and they cannot be compressed themselves (if they were, they would in any case be the same for all strings, taking us back to the same situation). There is a limit for compression algorithms to compress short strings. So if one wished to tell which of these two strings were objectively more or less randomly complex by approximating their algorithmic complexity using a compression algorithm, it turns out that there is no way to do so.

On the other hand, given that the definition of algorithmic complexity based on compressibility says that the less compressible the more randomly complex a string, one could immediately say that a single bit, 0 or 1, is random for certain, i.e. has maximal algorithmic complexity, given that there is no way to further compress a single bit. In other words, there is no program shorter than 1 bit that can produce 0 or 1. The shortest descriptions of 0 and 1 are therefore 0 and 1 themselves. Hence 0 and 1, according to the compressibility approach, are random strings. It may seem to make no sense to characterize a single bit as random.

On the one hand, a single bit does not carry any information and on these grounds one may think of it as somehow random. If one thinks about whether one would have been able to predict 0 or 1 as the outcome of a process, given that there is no context because they occur alone, one may also conclude that their occurrence is somehow random. In other words, if you see a string like 010101 you may bet that the next bit is 0, but if you are provided with nothing there is no way you can favor any position on whether the bit to come is 0 or 1. So much for justifying that a single bit is random.

It is hard, however, to justify how 0 could look more random than, say, any other possible string. If 0 is random how is it relatively more complex than 00? Or 01? Intuition tells us that short strings shouldn’t be that random (more random than, for example, longer random-looking strings), so if a single bit is the most random among all finite strings, how could it be that there is such a phase transition from maximal random complexity to very low complexity of, say, strings of length 2, 3 or 5 bits long?

Since intuition tells us that something random should also be uncommon and rare, what if one asks how common 0 or 1 are as results of a program? There is a measure that gives the probability of a program’s producing a given string running on a universal Turing machine. This is a measure we used to present a new method to evaluate the complexity of strings, as an alternative to the traditional use of compression algorithms. The new method aims particularly to solve the problem of the evaluation of the complexity of short strings, as we’ve discussed. It is based on Levin-Solomonoff’s algorithmic probability and is connected back to algorithmic (Kolmogorov-Chaitin) complexity by way of Chaitin-Levin’s coding theorem.

Algorithmic probability says that it is not the case that a single bit is the most complex random string, but actually the most structured possible one and, more importantly , that the complexity transition is smooth, more in accordance with intuition.

It may be that it makes sense that a single bit can be regarded as both the most simple and the most complex of strings from different perspectives, and the advantage of the algorithmic probability approach is that it provides not only a different notion of the complexity of a single bit, one that is in keeping with intuition, but also that it generates a different outcome to the compressibility approach, even when the two measures are intimately related and asymptomatically produce the same results in the long term (for longer strings). I think the two views reflect different aspects of what a single bit represents.

The paper presenting the novel method for evaluating the algorithmic complexity of short strings was first proposed and sketched in Greg Chaitin’s 60th anniversary festschrift edited by Cris Calude (J-P. Delahaye & H. Zenil, “On the Kolmogorov-Chaitin complexity for short sequences,” Randomness and Complexity: From Leibniz to Chaitin, edited by C.S. Calude, World Scientific, 2007). The method uses an exhaustive and systematic search of Turing machines inspired by Wolfram’s NKS dictum, from which a frequency distribution of the halting machines is built and the Levin-Chaitin coding theorem applied to evaluate the algorithmic complexity of binary strings.

Chaitin pointed out (regarding our method) that:

…the dreaded theoretical hole in the foundations of algorithmic complexity turns out, in practice, not to be as serious as was previously assumed.

The full technical article is available in ArXiv Numerical Evaluation of Algorithmic Complexity for Short Strings: A Glance into the Innermost Structure of Randomness.

You can also look at the slides of the presentation I delivered at the Alan Turing amphitheater at the Computer Science Department of the University of Lille 1:

### “The World is Either Algorithmic or Mostly Random” awarded a 3rd Place Prize in this year’s FQXi contest

Friday, June 10th, 2011

Based on the combined ratings of the contest community and the panel of expert reviewers appointed by the FXQi, which included the members of the institute, I was awarded a 3rd Place Prize for my work The World is Either Algorithmic or Mostly Random in this year’s FQXi contest on the topic Is Reality Digital or Analog? sponsored by the Foundational Questions Institute. The winners were announced at this year’s World Science Festival in New York City.

My work can be summarized in one line as an explanation of the complexification process of the world, the process whereby we have evolved from a more primitive (random) state to the current organized state of the universe.

The essay is a summary of the philosophical branch of my current scientific research on finite algorithmic information theory. This philosophical branch is concerned with the exploration of the possible connections between algorithmic complexity and the physical world (or what happens in it). I propose the notion that the universe is likely digital, not as a claim about what the universe is ultimately made of but rather about the way it unfolds. Central to the argument are concepts of symmetry breaking and algorithmic probability, which are used as tools to compare the way patterns are distributed in our world to the way patterns are distributed in a simulated digital one. These concepts provide a framework for a discussion of the informational nature of reality. I argue that if the universe were analog, then the world would likely look more random, making it largely incomprehensible. The digital model has, however, an inherent beauty in its imposition of an upper limit and in the convergence in computational power to a maximal level of sophistication. Even if deterministic, that the world is digital doesn’t necessarily mean that the world is trivial or predictable, but rather that it is built up from operations that at the lowest scale are simple but that at a higher scale look complex and even random–though in appearance only.

How have we come from the early state of the universe (left) to the structures we find today (right)?

The arguments supporting my views are partially based on the findings of my research, epitomized by our most recent paper Numerical Evaluation of Algorithmic Complexity for Short Strings: A Glance into the Innermost Structure of Randomness available in ArXiv in which my co-author and I describe a method that combines several theoretical and experimental results to numerically approximate the algorithmic (Kolmogorov-Chaitin) complexity of bitstrings by using the concept of algorithmic probability, which is connected to algorithmic complexity by way of the (Levin-Chaitin) coding theorem.

An extended (and detailed) version of The World is Either Algorithmic or Mostly Random is forthcoming and will be eventually posted.

### Compression-based Investigation of Cellular Automata, A Phase Transition Coefficient and a Conjecture Related to Universal Computation

Sunday, August 22nd, 2010

In my new article Compression-based investigation of the dynamical properties of cellular automata and other systems, published in the Journal of Complex Systems (19:1), pages 1-28, I present a method for studying the qualitative behavior of cellular automata and other abstract computing machines based on the approximation of their program-size complexity using a general lossless compression algorithm. I show that the compression-based approach classifies cellular automata (CA) into clusters according to their heuristic behavior, with these clusters showing a correspondence with Wolfram’s main classes of systemic behavior. I also present a Gray code-based numbering scheme for initial conditions optimal for this kind of investigation, and a compression based method for estimating a characteristic exponent in the form of a phase transition coefficient measuring the resiliency or sensitivity of a system to its initial conditions. And I conjecture that universal systems have large transition coefficients.

I think this constitutes a novel framework for investigating the dynamical properties of cellular automata and other systems. Here I will discuss some of the main ideas and implications of the paper. A pdf version of the paper is available online on ArXiv

Algorithmic complexity: classification into Wolfram’s four classes

In A New Kind of Science and in several papers dating from the mid-1980s, Stephen Wolfram defined four classes into which cellular automata (CA) and several other systems evolve, each capturing a different qualitative behavior which evolves from the same ordered initial configuration (the simplest black cell).

• Class 1 consists of CA whose dynamics reach a steady state regardless of the initial conditions.
• Class 2 consists of CA whose long-term evolution produces periodic or nested structures.

These first two classes are simple, in the sense that their long-term evolution can be deduced from running the system for a small number of steps.

• Class 3 CA produce structures that seem random.
• Class 4 CA produce localized structures on a random looking background, and hence are complex looking.

Wolfram’s classification is heuristic, and the assignment of CA to the four classes is somewhat subjective. To the best of my knowledge, there is, to date, no universally agreed upon classification scheme for CA. I think, however, that there is no better approach than a pure program-size complexity measure, which is the approach I follow in this paper. I propose to continue the investigation later, using the same measure to discover other interesting properties and possible hierarchies.

An interesting question related to Wolfram’s classification concerns its dependence on the initial condition–chiefly because the classification was originally meant to be constructed by visual inspection over the evolution of a CA, and as we know, the evolution of a CA depends on its initial condition. This has been a major critique (Eppstein) of Wolfram’s classification, because the expectation is that the classification should be based on the evolution from an unordered (random) configuration.

Nevertheless, the classification is actually based on the potential of a CA to evolve into any of the possible behaviors from at least one initial configuration (the question is of course not answerable in finite time, since there is an infinite number of possible initial configurations). Wolfram’s classification may therefore be seen as being dependent on the initial condition of a CA.

It is not a surprise that one can, for example, construct a CA belonging to more than one of Wolfram’s four classes when starting from different initial configurations. Rule 110 belongs to class 4 because it is capable of universal computation–one can set up an initial configuration to ‘program’ rule 110 to carry out any computation (this being the very basic concept of a programmable computer).

For every CA rule there is a definite (but in general undecidable) answer to the question whether or not it is capable of universal computation (or in reachability terms, whether a CA will evolve into a certain configuration). The question only makes sense if the evolution of a CA depends on its initial configuration. No rule can be universal that fixes the initial configuration once and for all (there would be no way to input an instruction and carry out an arbitrary computation).

On the other hand, some rules, such as Rule 0, don’t produce different configurations relative to variant initial configurations. No matter how you may change the initial condition, there is no way to make it compute something other than what it actually computes for every other initial configuration.

A possible objection (made by David Eppstein) is that there are CAs that can be made to look as if they belonged to all classes by modifying their initial conditions. Which is true: a CA may belong to a certain class until, given another initial configuration, it is made to behave as if it belonged to another class.

My compression-based approach shows that Wolfram’s heuristic classification can actually be quantified by a measure which is clearly dependent on the initial conditions, while also being capable of detecting sensitivity to initial configurations, and hence of replacing the visual inspection. This hierarchical classification is well defined and is therefore a good candidate for a complexity measure.

The second part of my investigation actually takes advantage of the ability of CAs to behave differently in order to undertake a closer inspection and a novel classification, taking into account the average capability of a system to behave in different ways.

Differentiation from a priori approaches

The approach is different from others in that it is an a posteriori technique, unlike, for example, Langton’s lambda, a measure of the density of a CA rule. It is an a posteriori technique because unlike this lambda number, the compression-based approach requires the CA to evolve before saying anything about it, whereas Langton’s lambda is computed from the rules of the CA.

Langton’s lambda is simply the fraction of rules in which the new state of the cell is non-zero. For example, the rules of the elementary cellular automaton number 1 in Wolfram’s enumerating scheme are simple: all 3-tuples sent to 0 but one, the last one. Therefore Langton’s lambda is 1 over 8.

The lambda parameter of a CA is a number between 0 and 1. For example, if lambda is 0 (e.g. for ECA rule number 0), the evolution develops into a trivial state. Langton found that CA rules with lambda close to zero evolve into trivial states and CA rules close to 1 evolve into random-looking behavior, with complex behavior somewhere in between. It is near this transition that the most interesting CAs will lie, the ones that manifest the most complex behavior.

Unfortunately, classifying CAs with lambda values is more complicated than that, as one quickly faces undecidability. If it were possible to decide once and for all whether a CA is complex by computing its lambda value, without having to run the CA for a single step, one could solve all kinds of undecidable questions simply by looking at a CA’s rules.

The critical value for lambda is not a universal constant. Nor, for that matter, is my phase transition coefficient. But the main difference, as noted before, is that the compression-based approach actually looks at the evolution of the system rather than trying to figure everything out from the description of the CA.

The compression-based method represents a formal approach to Wolfram’s classification process, replacing the need for a visual inspection with a technique and a coefficient to determine to what class a CA belongs. The approach is compatible with what Stephen Wolfram himself has proposed in his NKS book, without contradicting any computability result or Wolfram’s own Principle of Computational Irreducibility, which says that while some computations may admit shortcuts that allow them to be performed more rapidly, others don’t, so that one cannot really tell what the evolution of a CA will be, except by running it.

Initial configuration numbering scheme

Ideally, one should feed a system with a natural sequence of initial configurations of gradually increasing complexity. Doing so ensures that qualitative changes in the evolution of the system are not attributable to discontinuities in its set of initial conditions.

What I propose is an initial configuration numbering scheme where two successive values differ by only one bit. To explore the qualitative behavior of a cellular automaton when starting from different initial configurations, the optimal method, because of its smoothness, is to follow this Gray encoding enumeration, in order to avoid any undesirable “jumps” attributable to the system’s having been fed with discontinuous initial configurations. By following the Gray code, an optimal numbering scheme was devised so that two consecutive initial conditions differed only in the simplest degree (by one bit). This simple convention will allow us to continue the investigation of the dynamical behavior of a system without being concerned about arbitrarily introducing artificial complexity when moving from one initial configuration to the next.

Phase transition detection and coefficient

I defined a measure based on the change of the asymptotic direction of the size of the compressed evolutions of a system for different initial configurations (following the proposed Gray-code enumeration of initial configurations). My phase transition coefficient yields an interesting classification: it measures the resiliency or sensitivity of a system to its initial conditions. So rules such as 30 and 0 appear close to each other. Odd as this may seem, this is because both, when their initial conditions are changed, behave in more or less the same way. In other words, there is no change in the qualitative behavior of these CAs when feeding them with different inputs, regardless of how different the inputs may be.

In this phase transition classification, for example, rules such as 122 and 89 appear next to each other, because, as the investigation proves, they are both CAs with relatively high phase transition coefficients, meaning that they are very sensitive to initial conditions, dramatically changing their qualitative behavior when starting from one rather than another initial configuration.

Phase transition and predictability

An obvious feature of universal systems is that they need to be capable of carrying information by reflecting changes made to the input in the output. In attempting to determine whether a system is capable of reaching universal computation, one may ask whether a system is capable of this in the first place, and how efficiently it does so. And this is what the phase transition actually measures, because it tells how well a system manages to respond to an input. Obviously, a system such as rule 0 or rule 255, which does not change regardless of the input, is trivially decidable. But a universal system should be capable of some reaction to external manipulation (the input to the system). The inverse, however, should not hold, because having a large transition coefficient by no means implies that the system will behave with the freedom required of a universal system if it is to emulate any possible computation (a case in point may be rule 22, which, despite having the largest transition coefficient, may not be versatile enough for universal computation).

The phase transition measure also implies that one may be able to predict the behavior of a system for an initial configuration with a degree of certainty based on the previous variability of the system in relation to an initial segment of initial configurations. It is an open question whether this is a lower bound. In other words, it is unclear whether looking at the behavior of a system for a certain length of time and for certain configurations will tell you anything about its behavior for other initial configurations or if it were allowed to evolve for a longer period of time. Experience says one would do well to predict future behavior on the basis of past behavior, and this may also be related to Israeli and Goldenfeld‘s very interesting findings. In 2004 they showed that some computationally irreducible elementary cellular automata have properties that are predictable at a coarse-grained level. They did so following a renormalization group (RG) technique, which refers to a mathematical apparatus that allows one to investigate the changes in a physical system as one views it at different distance scales.

They were able to provide a hierarchy of complexity in agreement with their apparent complexity. Israeli and Goldenfeld’s classification is also an a posteriori investigation, but it too is bedeviled by the unavoidable (and ultimately undecidable) statistical question, namely, whether one can keep on predicting for all initial conditions and for any number of steps, without having to run the system  forever and for all possible initial conditions. But unlike mine, their classification is partial, in the sense that one cannot always say whether the complexity of one CA is greater than that of another. Based on the comparison of their approximated compressed sizes, however, I could come up with a total order in full agreement with their apparent complexity as well as with Wolfram’s four classes (one that even yields the same number of classes).

An open question directed to me by Nigel Goldenfeld– in trying to understand the similarities and differences between their RG and my algorithmic approach–concerns how their hierarchy relates to mine. What they experimentally suggest is that the larger the scale of the transformations used, the more highly structured the objects and therefore the greater their algorithmic complexity.

For example, rule 110 is one rule about which my own phase transition classification says that, despite showing some sensitivity, it also shows some stability. Which means that one can say with some degree of certainty how it will look (and behave) for certain steps and certain initial configurations, unlike those at the top. This turns out to be predictable according to Israeli and Goldenfeld as well, at a coarse-grained level, after a scale transformation.

A sketch on a conjecture on the transition coefficient of Turing-universal systems

Based on two empirical facts, I conjecture that universal CAs should have a large transition coefficient, as 110 does. Rules such as 110 and 54 also turn out to be next to each other in the phase transition classification, both having large values (they are in the top 24, and both the top 24 and the bottom 22 are in the original paper: see the last 2 tables).

So I base this conjecture on the following empirical observations:

1. The only known universal elementary cellular automata figure at the top of this classification, and so do candidates such as rule 54, which appears right next to rule 110.

2. Universality seems to imply that a system should be capable of being controlled by the inputs, which our classification suggests those at the bottom are not, as all of them look the same no matter what the input, and may not be capable of carrying information through the system toward the output.

Other rules that some people think may be capable of some sophisticated computation (See paper in the Journal of Complex Systems by Paul-Jean Letourneau) also have large transition coefficients, such as rule 146, with a transition coefficient 0.124, ranking 39 out of the 256 elementary cellular automata.

As noted before, however, the transition coefficient is not a universal measure. In this case, coefficients were calculated for 600 steps in blocks of 50 for the first 500 initial conditions, which means that some rules may be capable of greater transitions but are somehow ‘slow’ at the selected number of steps and number of initial conditions. i.e. they take more than 500 initial conditions–in Gray code numbering–to show larger changes, or else larger changes are being missed because of the jumps in blocks of 50 steps, though this latter possibility is less likely.

The conjecture also seems to be in agreement with Wolfram’s claim (made at some of his oral presentations) that rule 30 (as a class 3 elementary cellular automaton) may be, according to his Principle of Computational Equivalence (PCE), computationally universal. But it may turn out that it is too difficult (perhaps impossible) to control in order to perform a computation, because it behaves too randomly.

### Comments on Turing’s very first Universal machine approaching Turing’s 100th. birthday anniversary

Tuesday, May 18th, 2010

The idea that a machine could perform the tasks of any other machine is the description of a Universal (Turing) machine. Its invention is considered by many to have been one of the major landmarks giving rise to the field of computer science. ‘Universal’ means that one can ‘program’ a general-purpose machine to perform the tasks of any specific-purpose machine. Turing machines are to this day the central object of study in the theory of computation.

In an attempt to understand how the very first universal machine was described in Turing’s original 1936 paper “On Computable Numbers, with an Application to the Entscheidungsproblem,” I spent half a day re-reading the paper (and its corrected version by Donald Davies, published in The Essential Turing by Jack Copeland), trying to decode it, only to find that it is written in something like an anticipation of a (Turing-complete) subroutine-oriented programming language which is impossible to rewrite in a traditional transition table. So if one really tries hard, one ends up encoding an arbitrary universal Turing machine, not Alan Turing’s first universal Turing machine.

Although the paper has all the primary elements of a traditional description of a Turing machine (Turing’s ‘a-machine’ description), the fact that it used multiple conventions for describing increasingly complex machines was Emile Post’s strongest critique. In a letter to Church, Turing replied to Post’s appreciation, arguing that his use of more than one convention when building his universal machine did not affect the final result, though he did admit it made it hard to decipher.

The result is that not only would it be a giant machine in terms of states and symbols, but the number of actual colors that it may need seems to be unknown. It is only known to assume 18 states, and to simulate Turing’s second a-machine example with 23 instructions (here the product of states and colors does not necessarily lead to the number of instructions in Turing’s formalism, because his transitions are not total functions).

In the traditional 5-tuple form, Turing’s original universal machine could be written as (Mathematica notation):

{q1, “blank”} -> {q2, P0, R}
{DA, D} -> {DAA, DC, R}
{q2, “blank”} -> {q3, E, R}
{DAA, D} -> {DAAA, D, R}
{q3, “blank”} -> {q4, P1, R}
{DAAA, D} -> {DAAAA, DCC, R}
{q4, “blank”} -> {q1, E, R}
{DAAAA, D} -> {DA, D, R}

But notice that no state starting by q leads to a state starting by D because D (states are called m-configurations in Turing’s original jargon) are rather prefix subroutines defined in Turing’s paper while q’s are actually traditional Turing machine states. In Turing’s paper E is, for example, an erasing subroutine. Some other ‘m-configurations’ require scanning the whole tape several times (which is what one would do if one is asked to emulate another Turing machine), and so on. So most of the behavior description of the machine is encoded as strange strings of letters.

Nevertheless, Turing’s choice is somehow clever from a programmer perspective, he proceeded in the way one would do so today for designing an implementing a universal computer. One would hardly do so by describing the basic elements, but rather by constructing higher level subroutines describing a machine function and based itself in one or more other subroutines up to the level of states and symbols. Think of programming at the level of the machine language v. programming in an intermediate level language. Writing a universal Turing machine in detail in terms of states and symbols from the beginning leading to complete lost and misunderstanding, just as it would so if one pursuits the implementation of a complex piece of software writing binary code or even in a pure assembler language. Turing’s description provides a better understanding, not trivial though, of what a universal Turing machine does to carry out the computation of any other Turing machine by sketching and grouping intuitively the machine operations into these program subroutines.

Unlike today, that one can simply make Wolfram|Alpha to run any Turing machine simulation such as a random 2-state 5-color Turing machine or the 4-state 2-color Busy Beaver (for a list of other Turing machine examples one can type in WolfraAlpha click here), Turing had no computer to ran and test his code, it is not a surprise that his universal machine code came together with several glitches. It was quite amusing to see that the first ever program written for a digital computer was already bedeviled by bugs. And this program was the actual implementation of the very first universal Turing machine by Alan Turing himself.

If Turing had not died in 1954, at the age of only 41, next June 23 (2010) he would have 98 years old. As a tribute to his work I’ve set up the following webpage gathering most, if not all, the public images known of him visualturing.org

For his 100th birthday anniversary a series of events are being organized. 2012 will not only be the year of the Olympic Games that Turing would have particularly followed in his own country (UK) as an enthusiast long distance runner but also The Alan Turing Year to which I’m honored to be part of as a member of the advisory committee representing Wolfram’s Science Group.

### Physics-like computation, Wolfram’s PCE and Church’s thesis

Wednesday, January 7th, 2009

The lack of correspondence between the abstract and the physical world seems sometimes to suggest that there are profound incompatibilities between what can be thought and what actually happens in the real world. One can ask, for example, how often one faces undecidable problems. However, the question of undecidability has been considered to be better formulated (and understood) in computational terms because it is closer to our physical mechanical reality (through the concept of computation developed by Turing): Whether a computing machine enters a certain configuration is, in general, an undecidable question (called the Halting problem). In other words, no machine can predict whether another machine will halt (under the assumption of Church’s thesis –aka Church-Turing thesis).

An interesting example of the gap between what the abstract theory says and what can be empirically ascertained was recently suggested by Klaus Sutner from Carnegie Mellon. He rightly points out that if no concrete instance is known of a machine with an intermediate Turing degree, and consonant with Wolfram’s Principle of Computational Equivalence, is because intermediate degrees are artificial constructions that do not necessarily correspond to anything in the real physical world.

In David Deutsch‘s words physics is at the bottom of everything and therefore everything relies on physics (ultimately on quantum physics according to Deutsch himself). This is true for the core objects of study and practices in math: proofs, and in computer science: computer programs. At the end, that they are and how they are, is only possible by what it is feasible in the physical world. As if sometimes it were forgotten that mathematics and computation also follow in practice the same laws of physics than everything else.

Sutner defines what he calls “physics-like” computation and concludes that machines with intermediate Turing degrees are artificial constructions unlikely to exist. According to Sutner, in practice machines seem to follow a zero-one law: either they are as computationally powerful as a machine at the bottom of the computational power hierarchy (what Wolfram empirically calls “trivial behavior”) or they are at the level of the first Turing degree (i.e. capable of universal computation). This seems to imply, by the way, that what Wolfram identifies as machines of  equivalent sophistication cannot be other but capable of universal computation, strengthening the principle itself (although one has to assume also Church’s thesis, otherwise PCE could be referring to a higher sophistication).

So is PCE a conflation of Church’s thesis?

No. Church’s thesis could be wrong and PCE be still true, since by the negation of Church’s thesis the upper limit of the feasible computational power would just be shifted further, and even if it turns out that the hypothesis of a Turing universe is false, PCE could be still true disregarding whether the universe is of a digital nature or not since it would refer then to the non-Turing limit as the one holding the maximal sophistication (not that I think that C-T is false though).

Is PCE tantamount to the Church thesis in the provable sense?

Wolfram’s PCE would be still falsifiable if the distribution of the intermediate degrees is proven to be larger than what informally PCE suggests. However, so far that hasn’t been the case and there are nice examples supporting PCE suggesting that very simple and small non-trivial programs can easily reach universal computation. Such as recent (weak) small universal Turing machines discovered by Neary and Woods and particularly the smallest TM proven universal by Alex Smith (a 2-state 3-color machine that Wolfram conjectured in his NKS book). However PCE could be as hard to prove or disprove as the Church thesis is. Unlike Church’s thesis PCE could not be disproved by exhibiting a single negative case but proving that the distribution of machines is different to what PCE suggests. A positive proof however may require an infinite verification of cases which is evidently non-mechanically feasible (and only negating Church’s thesis itself one would be able to verify all the infinite number of cases).

I see PCE acting below the curve while Church’s thesis acting from above determining a computational limit (known as the Turing limit).

PCE and Church's thesis (C-T) diagram

### The Shortest Universal Turing Machine Implementation Contest

Monday, December 22nd, 2008

The Shortest Universal Turing Machine Implementation Contest

ANNOUNCEMENT

23 Dec – 2008

http://www.complexitycalculator.com/experimentalAIT/TuringMachine.html

Contest Overview

In the spirit of the busy beaver competition though related to program-size complexity, we are pleased to announce the “Shortest Universal Turing Machine Implementation Contest”.

The contest is open-ended and open to anyone. To enter, a competitor must submit a universal machine implementation written in the language specified in the contest web site (C++) with smaller size values than the latest  record published on the web page.

In order to take part in this competition it is necessary to submit the source code, to be compiled using the compiler program and version specified in the contest web site. It is important that you provide documentation of your code, either in an attached file or as commented text in the source code file.

Each submitter must agree to be bound and abide by the rules. Submissions remain the sole property of the submitter(s), but should be released under the GNU General Public License (GPL)  so we may be permitted to make them available on  this web site for downloading and executing.

Rules

http://www.complexitycalculator.com/experimentalAIT/TuringMachine.html (General Rules section)

Team composition

Players may enter alone or as teams of any size. Anyone is eligible to enter.
Organizers

Hector Zenil (IHPST and LIFL, Paris 1 University and Lille 1 University)
Jean-Paul Delahaye (LIFL, Lille 1 University)

### On the Kolmogorov-Chaitin complexity for short sequences

Wednesday, October 31st, 2007

My paper On the Kolmogorov-Chaitin complexity for short sequences, coauthored with my PhD thesis advisor Jean-Paul Delahaye has been published as a book chapter in:RANDOMNESS AND COMPLEXITY, FROM LEIBNIZ TO CHAITIN, edited by Cristian S. Calude (University of Auckland, New Zealand) and published by World Scientific.

An extended draft version of this paper can be found in arXiv here and the webpage we have set up for our research on what we call Experimental Algorithmic Theory can be accessed here. The results of our ongoing experiments will be frequently published on this site.The book is a collection of papers contributed by eminent authors from around the world in honor of Gregory Chaitin’s birthday. It is a unique volume including technical contributions, philosophical papers and essays.

I presented our paper at the NKS Science Conference 2007 held at the University of Vermont, Burlington, U.S. The conference blog has an entry describing my participation.

From left to right: Hector Zenil, Stephen Wolfram, Paul Davies, Ugo Pagallo, Gregory Chaitin, Cristian Calude, Karl Svozil, Gordana Dodig-Crnkovic and John Casti.

### On the simplest and smallest universal Turing machine

Tuesday, October 30th, 2007

Alex Smith has recently been able to prove that a Turing machine conjectured to be capable of universal computation by Wolfram was actually universal (Wolfram 2,3 Turing machine Research Prize).

Part of the challenge was to find an encoding not doing by itself the universal computation that would make the Turing machine universal. Smith succeeded providing an encoding providing a background that while nonperiodic is sufficiently regular to be generated by infinite word written on the tape can be generated by a omega-automaton (not itself universal).

This relaxation of the tape content has been regular in the cellular automaton world, but also in the field of what is called weak-universality with the only difference that the Turing machines are allowed to start with other than blank tapes (blank tapes are periodic tapes with period 1).

An objection might be that such a coupled system could turn a nonuniversal machine into a universal one, but Smith showed that his encoding was capable of restarting the computation itself, something that coupled systems usually are not capable of. In other words, the preparation of the tape in Smith’s case is done in advance and the omega-automaton do not longer interact in any further time, while to have nonuniversal machines to become universal usually requires an automaton intervening at every step (or every certain steps) of a computation.

One may also be surprised by the need of an omega-automaton for infinite strings. But the difference to the traditional blank tape is subtle. Think of how machines operate in the real world. Machines do not run on blank tapes, they usually do so over memory (the cache, RAM or a HD) with all kind of information (that is considered garbage if it is not instantiated by a running program). You may think that this garbage is not infinite, but it is not so a blank tape for a Turing machine, so instead of thinking of providing a Turing machine with blank tape as need it, one can think of providing the Turing machine with a non-blank tape. Now, what is on the tape in the case of Smith is not exactly “garbage” because it plays a role in “helping” the Turing machine to perform its computation, in a kind of support on which the Turing machine that is capable of universal computation actually achieves the computation with the help of a defined background (that doesn’t mean that the machine cannot perform universal with other backgrounds) yet the background is not powerful enough to perform the hardest part of the computation. In the physical world, if processes are seen as computations, computations are performed on a support, on the background of the physical world. So these kinds of relaxation may be closer to actual physical situations than abstract situations in which a blank tape for a computation is assumed or required.

The discussion opened up by Wolfram’s work and motivated by Smith’s answer has generated a fruitful and renovated discussion of universality and complexity of small Turing machines. This is why I think this research s relevant to modern computer science, and not only as an amusing mathematical puzzle:

• New techniques for proving universality are found (Alex Smith’s novel approach for unbounded computations from arbitrary lengths and non-periodic initial configurations).
• New universal models of computation have been discovered (cyclic tag- systems, bi-tag systems).
• Such research provides a better understanding of universality,  its limits, its  underlying principles and its necessary and sufficient conditions.
• It is a base for actually building universal devices when only a few elements can be used, e.g. in nanotechnology or molecular computation.
• Simple/small machines may be more easily/effectively embedded in other systems.
• The old discovery/invention duality question comes to the fore: It sheds light on how simple universality is, how frequently it occurs, whether  it is engineered or not, whether  one builds universal computation or finds it in the universe.
• It could shed light on the relative feasibility of  universal Turing machines based on different tape configurations (e.g. blank characters, repetitive words, non-repetitive with computationally simple backgrounds) as actual physical systems.  At present it is not at all clear why one ought to  favor blank characters over other possible real-world backgrounds, such as “noise.”
• Questions of size and complexity  arise: It would be interesting, for instance, to find out whether there is a polynomial (or exponential) trade-off between program size and and the concept of simulating a process.
• Some questions  on algorithmic complexity arise: Will the encoding always be more complex if the machine is simpler? All theorems in algorithmic information theory depend on additive constants, which depend on the sizes of typical universal Turing machines. What is the impact of different generalizations of universality on algorithmic complexity and what is the role of  encoding in such a measure?
• Some questions arise on the relation between several variants of universality definitions: Is there an effective and efficient encoding for each non-periodic encoding preserving universality? If so, how does this impact their complexity? Is there a non-periodic encoding with blank characters for each periodic blank word encoding, and what would the impact of such  an encoding be on the size/complexity of the Turing machine in question?

The field is active and still an important area of research. Several computer science conferences include talks on small computational systems such as Computability in Europe (CiE) and Machines, Computations and Universality (MCU) included such talks this year, focusing in particular on reversible cellular automata and universal Turing machines.

Here are some references from the small Turing machine community:

[1] Manfred Kudlek. Small deterministic Turing machines. Theoretical Computer Science, 168(2):241-255, November 1996.
[2] Manfred Kudlek and Yurii Rogozhin. A universal Turing machine with 3 states and 9 symbols. In Werner Kuich, Grzegorz Rozenberg, and Arto Salomaa, editors, Developments in Language Theory (DLT) 2001, vol. 2295 of LNCS, pp. 311-318, Vienna, May 2002. Springer.
[3] Maurice Margenstern and Liudmila Pavlotskaya. On the optimal number of instructions for universality of Turing machines connected with a finite automaton. International Journal of Algebra and Computation, 13(2):133-202, April 2003.
[4] Claudio Baiocchi. Three small universal Turing machines. In Maurice Margenstern and Yurii Rogozhin, editors, Machines, Computations, and Universality (MCU), volume 2055 of LNCS, pp. 1-10, Chisinau Moldavia, May 2001. Springer.
[5] Turlough Neary and Damien Woods. Four small universal Turing machines. Machines, Computations, and Universality (MCU), volume 4664 of LNCS, pp. 242-254, Orleans, France, September 2007. Springer.
[6] Yurii Rogozhin. Small universal Turing machines. Theoretical Computer Science, 168(2):215-240, November 1996.
[7] Shigeru Watanabe. 5-symbol 8-state and 5-symbol 6-state universal Turing machines. Journal of the ACM, 8(4):476-483, October 1961.
[8] Shigeru Watanabe. 4-symbol 5-state universal Turing machines. Journal of Information Processing Society of Japan, 13(9):588-592, 1972.
[9] Stephen Wolfram. A New Kind of Science. Wolfram Media, 2002.

### Meaning against A.I.

Thursday, October 12th, 2006

A significant number of researchers believe that there are sentences with semantic value that could never be understood by a machine. These researchers believe that the mind has a semantic component, unlike machines. Their’s is a  Chinese Room type argument a la Searle. Consider Chomsky’s example of two books in a library with the same title, and two readers, each taking out one of the books. Do they get the same book? These researchers argue that machines would be unable to answer correctly on the basis of context since the answer would depend on a cognitive understanding of the situation. My claim is that all  meaningful components of a situation are based on a hierarchy that can be artificially represented by categories capturing the essence and functionality of  human mental operations. The answer to Chomsky’s question would be “yes” if one is  referring to the information content of the books,  “no” if one is referring to the books as physical objects.

### Universality on Real Computation

Thursday, October 12th, 2006

A paper of mine in French on this subject is already in arXiv: “Universality on Real Computation”. Chris Moore called my attention to his paper entitled “Recursion Theory on the Reals and Continuous-time Computation” ,  which arrives at similar results using a different approach. A paper I’m writing in English on this subject, and which I have been discussing with several people, is forthcoming.  A preliminary powerpoint presentation that I used when presenting on the topic to Gregory Chaitin is available here:Universality on Real Computation In computability theory, the theory of real computation deals with hypothetical abstract machines capable of infinite precision. They operate on the set of real numbers, most of which are non-computable by a Turing machine. These hypothetical computing machines can be viewed as idealised analog computers or differential machines. The question I am interested in concerns the consequences of computing on the set of real numbers for current foundational concepts in traditional computation,  for example universality. I have found that an infinite number -possibly denumerable- of  universal real computers of varying power allows what I have called “intrinsic universality” (there is another definition of intrinsic universality related to dynamical systems and universal computation), since real numbers are not bounded in complexity and there is therefore one for each class of complexity. However, taking the complete set of real numbers, what I claim is that a single and ultimate universal machine for real computation is not possible, since for each proposed universal real computer A –with pre-fixed real numbers R=r_1,r_2,…,r_n — it suffices to calculate the maximal Turing degree of the set of the real numbers involved, let’s say O_n, to conceive a new real computer B able to compute numbers of higher Turing degree, let’s say O_m with m>n, which the given universal real machine A won’t be able to emulate– which contradicts the definition of a universal abstract real computer. Therefore A won’t be able to emulate (in the way conceived for defining universality in the traditional Turing model)  any other real machine B which belongs to the same set –the real numbers– (which evidently can be seen as their class of languages) unless it allows non-recursive functions going through all the levels of the arithmetical and hyper-arithmetical hierarchy. In other words, real computation does not allow universality, which is the very foundation of computer science unless we take into consideration non-recursive functions of arbitrary power, which adds an aditional problem to the conception of a universal device and the convergence of real computing models, unlike the traditional model based on recursive functions theory and Turing machines. However, in real computation there is a place for relative universality, which is very rich in ideas and theories and gives rise to a hierarchical Church-Turing type thesis for each level of the arithmetical and hyper-arithmetical hierarchy. So talking about the universal abstract device E_n for example–because of  the arithmetical level E_n– for each n natural number would make sense, unlike a device E able to behave like any other E_n for any n natural number, which would need at least one function or the composite of several functions able to attain any arbitrary degree of computability. In other words, E wouldn’t be a field since it is not closed under the set of operators of the defined E.

Here is a related paper entitled on “The complexity of real recursive functions” by Manuel Lameiras Campagnolo. In France, “M. Olivier Bournez” has a particular interest in real computation, in particular analog and hybrid systems.Other important authors in the field include : Karp, da Costa, Blum, Cucker, Shub and Smale. Felix da Costa has let me know that  he is working on something related to my ideas on universality in real computation and relative universality.It is important to say that being interested in real computation is not the same thing as being an “hyper-computationalist”. In fact, some if not most  of the above authors think that real computation would at best be only as powerful as the digital models.  Their interest in the field is mainly operational, in the sense that these types of  machines perform operations commonly related to fields like real anaylisis.

### Computability in Europe Conference (CiE) Report, Wales UK

Monday, August 7th, 2006

This is a report on the Computability in Europe Conference (CiE), held at the University of Swansea, Wales in the United Kingdom in July 2006.

I attended a mini-course on Quantum Computing given by Julia Kempe, a lecture on the Church-Turing thesis by Martin Davis– who defended it against proposed models of hypercomputation– and a  lecture on Proof Theory. Another very interesting lecture was on Godel and Turing’s remarks on the Human Mind (the dichotomy argument from Godel and the mechanistic vision from Turing). Among other noteworthy lectures were Samuel Buss’ on Complexity of Proofs, John Dawson’s on Godel in Computability, Wilfried Sieg’s on the Concept of Mechanical Procedure in Godel and Turing, as well as many presentations on hypercomputability and computing over the reals. I met people whom I had only known through email exchanges, like Felix Da Costa from the Technological Institute of Lisbon, Robert Meyer professor emeritus at the National University of Australia, and Julia Kempe from France who is a renowned researcher in the quantum computing field and with whom I shared some doubts I had concerning where the restrictions in Quantum Computing lay which constrained its power to the set of recursive functions. I also met people from SUNY who are doing interesting research on Turing-computation, studying isomorphisms between Oracle machines and the relation with the Tenenbaum theorem upon the uniqueness of the recursive model of PA (Peano Arithmetic). Many lectures were given on computing over infinite time and space and computing at the limit of the general relativity theory. The conference was intended to take the pulse of the field of hypercomputation in Europe and worldwide.

### International Conference in Complex Systems, NECSI

Monday, August 7th, 2006

NECSI/ICCS Conference Report, Quincy, Greater Boston, USA, July 2006.

First lesson: For every complex problem there is a simple, neat, wrong solution.

I attended talks given by Ed Fredkin on Finite Nature, Lazlo Barabasi on Complex Networks, Christoph Teuscher on Biology and Computation and John Nash on his research upon Game Theory.
* Ed Fredkin presented a table salt 3-D cellular automata model that fulfils all physical constraints on symmetry and energy conservation. The model is surprisely Turing-universal. As a reminder, the Zuse-Fredkin Thesis is a Church-type thesis which claims additionally that the universe being a Cellular Automaton can be understood in terms of the evolution of  Cellular Automata. An interesting paper entitled “Church-Turing Thesis is Almost Equivalent to Zuse-Fredkin Thesis” by Plamen Petrov is available here>
http://digitalphysics.org/~ppetrov/
And much more information on the interesting ideas of Ed Fredkin is available in
http://www.math.usf.edu/~eclark/ANKOS_zuse_fredkin_thesis.html and on  Ed Fredkin homepage at MIT, which  includes some remarks on why this thesis is not in agreement with  Stephen Wolfram’s, since Wolfram does not intend to imply that the universe is a classical cellular automaton but rather conceives of   a discrete universe based on systems performing simple rules and producing all the complex behavior we find in nature. Such systems are comparable to Turing machines, tag systems, axioms or any other equivalent system.
* In his lecture, Lazlo Barabasi claimed that behind all complex systems there are complex networks, in particular Free-Scale Networks (with a few nodes very well connected with others, and many nodes weakly connected). These nets are efficient and robust (even minus random nodes they remains connected)except under attack (provided the best connected nodes are targeted).
* Christoph Teuscher gave a lecture on computation inspired by biology. However, as he himself admitted, it was not his area of expertise.
* John Nash presented his research on Game Theory using Mathematica as an engine.
* I met Hava Siegelmann and one of her fellows, Yariv, who is working on Artificial Intelligence. Siegelmann worked some years ago (while completing  her PhD dissertation) upon a model of computation called Analog Recurrent Neural Network or just ARNN which, under certain circumstances,  is able to compute more functions than the set of recursive functions , which are those computed by the Turing machines. She is now working on topics related to Molecular Biology. I asked her about the forceful  critiques delivered by traditional logicians like Martin Davis, who wrote a paper entitled “The Myth of HyperComputation.”    The target of most of these critiques is a certain circular argument—to compute more than Turing machines it is necessary to use non-Turing computable weights previously coded into the neural network. It has been known for a long time that the set of all real numbers is able to encode any arbitrary language, even if it is non-Turing computable. What is remarkable from my point of view is her result relating to complexity classes, since weights with lower complexity are able to compute a higher class when  used in those networks. Aditionally she argued that even with a stochastic function her networks are able to solve non-computable functions. Recently I discussed the fact with Stephen Wolfram and we agreed that she is assuming strong randomness. I would say however that Siegelmann’s work is much more beatiful from a complexity point of view than from a “hypercomputational” viewpoint. In her book published under the title “Neural Networks and Analog Computation: Beyond the Turing Limit ” she proves that: a) there exists an universal neural network with only 9 neurons, and b) that  p(r) suffices to compute a non-computable function, where r is an arbitrary complete real number but p(r) represents the first p digits of the expansion of r–which means that linear precision suffices to achieve “super-Turing” capabilities, assuming that the neural network can have access to any possible real number before the computation. In other words this seems to be true only if all possible real numbers are allowed a priori (just as in the Turing machines an unbounded tape is neccesary to carry out all recursive languages, and neural networks with rational numbers as weights do not compute the same set of functions as neural networks with whole numbers as weights, the first having been  proven by Klenee to compute the same set as Turing machines and the second  to compute the same set of languages as finite automata, those called regular languages).
I exchanged ideas with some other interesting people, like an engineer from the Space and Airbone Systems Department of Raytheon. And I met John Nash Jr. during the gala dinner  at which he was presented with an award  for his contributions to  Complexity Theory, mainly honoring his work relating to Game Theory.