## Compression-based Investigation of Cellular Automata, A Phase Transition Coefficient and a Conjecture Related to Universal Computation

In my new article ** Compression-based investigation of the dynamical properties of cellular automata and other systems**, published in the Journal of Complex Systems (19:1), pages 1-28, I present a method for studying the qualitative behavior of cellular automata and other abstract computing machines based on the approximation of their program-size complexity using a general lossless compression algorithm. I show that the compression-based approach classifies cellular automata (CA) into clusters according to their heuristic behavior, with these clusters showing a correspondence with Wolfram’s main classes of systemic behavior. I also present a Gray code-based numbering scheme for initial conditions optimal for this kind of investigation, and a compression based method for estimating a characteristic exponent in the form of a phase transition coefficient measuring the resiliency or sensitivity of a system to its initial conditions. And I conjecture that universal systems have large transition coefficients.

I think this constitutes a novel framework for investigating the dynamical properties of cellular automata and other systems. Here I will discuss some of the main ideas and implications of the paper. A pdf version of the paper is available online on ArXiv

**Algorithmic complexity: classification into Wolfram’s four classes**

In A New Kind of Science and in several papers dating from the mid-1980s, Stephen Wolfram defined *four classes* into which cellular automata (CA) and several other systems evolve, each capturing a different qualitative behavior which evolves from the same ordered initial configuration (the simplest black cell).

- Class 1 consists of CA whose dynamics reach a steady state regardless of the initial conditions.
- Class 2 consists of CA whose long-term evolution produces periodic or nested structures.

These first two classes are simple, in the sense that their long-term evolution can be deduced from running the system for a small number of steps.

- Class 3 CA produce structures that seem random.
- Class 4 CA produce localized structures on a random looking background, and hence are complex looking.

Wolfram’s classification is heuristic, and the assignment of CA to the four classes is somewhat subjective. To the best of my knowledge, there is, to date, no universally agreed upon classification scheme for CA. I think, however, that there is no better approach than a pure program-size complexity measure, which is the approach I follow in this paper. I propose to continue the investigation later, using the same measure to discover other interesting properties and possible hierarchies.

An interesting question related to Wolfram’s classification concerns its dependence on the initial condition–chiefly because the classification was originally meant to be constructed by visual inspection over the evolution of a CA, and as we know, the evolution of a CA depends on its initial condition. This has been a major critique (Eppstein) of Wolfram’s classification, because the expectation is that the classification should be based on the evolution from an unordered (random) configuration.

Nevertheless, the classification is actually based on the potential of a CA to evolve into any of the possible behaviors from at least one initial configuration (the question is of course not answerable in finite time, since there is an infinite number of possible initial configurations). Wolfram’s classification may therefore be seen as being dependent on the initial condition of a CA.

It is not a surprise that one can, for example, construct a CA belonging to more than one of Wolfram’s four classes when starting from different initial configurations. Rule 110 belongs to class 4 because it is capable of universal computation–one can set up an initial configuration to ‘program’ rule 110 to carry out any computation (this being the very basic concept of a programmable computer).

For every CA rule there is a definite (but in general undecidable) answer to the question whether or not it is capable of universal computation (or in reachability terms, whether a CA will evolve into a certain configuration). The question only makes sense if the evolution of a CA depends on its initial configuration. No rule can be universal that fixes the initial configuration once and for all (there would be no way to input an instruction and carry out an arbitrary computation).

On the other hand, some rules, such as Rule 0, don’t produce different configurations relative to variant initial configurations. No matter how you may change the initial condition, there is no way to make it compute something other than what it actually computes for every other initial configuration.

A possible objection (made by David Eppstein) is that there are CAs that can be made to look as if they belonged to all classes by modifying their initial conditions. Which is true: a CA may belong to a certain class until, given another initial configuration, it is made to behave as if it belonged to another class.

My compression-based approach shows that Wolfram’s heuristic classification can actually be quantified by a measure which is clearly dependent on the initial conditions, while also being capable of detecting sensitivity to initial configurations, and hence of replacing the visual inspection. This hierarchical classification is well defined and is therefore a good candidate for a complexity measure.

The second part of my investigation actually takes advantage of the ability of CAs to behave differently in order to undertake a closer inspection and a novel classification, taking into account the average capability of a system to behave in different ways.

**Differentiation from ***a priori*** approaches**

The approach is different from others in that it is an *a posteriori *technique, unlike, for example, Langton’s lambda, a measure of the density of a CA rule. It is an a posteriori technique because unlike this lambda number, the compression-based approach requires the CA to evolve before saying anything about it, whereas Langton’s lambda is computed from the rules of the CA.

Langton’s lambda is simply the fraction of rules in which the new state of the cell is non-zero. For example, the rules of the elementary cellular automaton number 1 in Wolfram’s enumerating scheme are simple: all 3-tuples sent to 0 but one, the last one. Therefore Langton’s lambda is 1 over 8.

The lambda parameter of a CA is a number between 0 and 1. For example, if lambda is 0 (e.g. for ECA rule number 0), the evolution develops into a trivial state. Langton found that CA rules with lambda close to zero evolve into trivial states and CA rules close to 1 evolve into random-looking behavior, with complex behavior somewhere in between. It is near this transition that the most interesting CAs will lie, the ones that manifest the most complex behavior.

Unfortunately, classifying CAs with lambda values is more complicated than that, as one quickly faces undecidability. If it were possible to decide once and for all whether a CA is complex by computing its lambda value, without having to run the CA for a single step, one could solve all kinds of undecidable questions simply by looking at a CA’s rules.

The critical value for lambda is not a universal constant. Nor, for that matter, is my phase transition coefficient. But the main difference, as noted before, is that the compression-based approach actually looks at the evolution of the system rather than trying to figure everything out from the description of the CA.

The compression-based method represents a formal approach to Wolfram’s classification process, replacing the need for a visual inspection with a technique and a coefficient to determine to what class a CA belongs. The approach is compatible with what Stephen Wolfram himself has proposed in his NKS book, without contradicting any computability result or Wolfram’s own Principle of Computational Irreducibility, which says that while some computations may admit shortcuts that allow them to be performed more rapidly, others don’t, so that one cannot really tell what the evolution of a CA will be, except by running it.

**Initial configuration numbering scheme**

Ideally, one should feed a system with a natural sequence of initial configurations of gradually increasing complexity. Doing so ensures that qualitative changes in the evolution of the system are not attributable to discontinuities in its set of initial conditions.

What I propose is an initial configuration numbering scheme where two successive values differ by only one bit. To explore the qualitative behavior of a cellular automaton when starting from different initial configurations, the optimal method, because of its smoothness, is to follow this Gray encoding enumeration, in order to avoid any undesirable “jumps” attributable to the system’s having been fed with discontinuous initial configurations. By following the Gray code, an optimal numbering scheme was devised so that two consecutive initial conditions differed only in the simplest degree (by one bit). This simple convention will allow us to continue the investigation of the dynamical behavior of a system without being concerned about arbitrarily introducing artificial complexity when moving from one initial configuration to the next.

**Phase transition detection and coefficient**

I defined a measure based on the change of the asymptotic direction of the size of the compressed evolutions of a system for different initial configurations (following the proposed Gray-code enumeration of initial configurations). My phase transition coefficient yields an interesting classification: it measures the resiliency or sensitivity of a system to its initial conditions. So rules such as 30 and 0 appear close to each other. Odd as this may seem, this is because both, when their initial conditions are changed, behave in more or less the same way. In other words, there is no change in the qualitative behavior of these CAs when feeding them with different inputs, regardless of how different the inputs may be.

In this phase transition classification, for example, rules such as 122 and 89 appear next to each other, because, as the investigation proves, they are both CAs with relatively high phase transition coefficients, meaning that they are very sensitive to initial conditions, dramatically changing their qualitative behavior when starting from one rather than another initial configuration.

**Phase transition and predictability**

An obvious feature of universal systems is that they need to be capable of carrying information by reflecting changes made to the input in the output. In attempting to determine whether a system is capable of reaching universal computation, one may ask whether a system is capable of this in the first place, and how efficiently it does so. And this is what the phase transition actually measures, because it tells how well a system manages to respond to an input. Obviously, a system such as rule 0 or rule 255, which does not change regardless of the input, is trivially decidable. But a universal system should be capable of some reaction to external manipulation (the input to the system). The inverse, however, should not hold, because having a large transition coefficient by no means implies that the system will behave with the freedom required of a universal system if it is to emulate any possible computation (a case in point may be rule 22, which, despite having the largest transition coefficient, may not be versatile enough for universal computation).

The phase transition measure also implies that one may be able to predict the behavior of a system for an initial configuration with a degree of certainty based on the previous variability of the system in relation to an initial segment of initial configurations. It is an open question whether this is a lower bound. In other words, it is unclear whether looking at the behavior of a system for a certain length of time and for certain configurations will tell you anything about its behavior for other initial configurations or if it were allowed to evolve for a longer period of time. Experience says one would do well to predict future behavior on the basis of past behavior, and this may also be related to Israeli and Goldenfeld‘s very interesting findings. In 2004 they showed that some computationally irreducible elementary cellular automata have properties that are predictable at a coarse-grained level. They did so following a** ****renormalization group (RG) technique, **which refers to a mathematical apparatus that allows one to investigate the changes in a physical system as one views it at different *distance scales*.

They were able to provide a hierarchy of complexity in agreement with their apparent complexity. Israeli and Goldenfeld’s classification is also an* a posteriori* investigation, but it too is bedeviled by the unavoidable (and ultimately undecidable) statistical question, namely, whether one can keep on predicting for all initial conditions and for any number of steps, without having to run the system forever and for all possible initial conditions. But unlike mine, their classification is partial, in the sense that one cannot always say whether the complexity of one CA is greater than that of another. Based on the comparison of their approximated compressed sizes, however, I could come up with a total order in full agreement with their apparent complexity as well as with Wolfram’s four classes (one that even yields the same number of classes).

An open question directed to me by Nigel Goldenfeld– in trying to understand the similarities and differences between their RG and my algorithmic approach–concerns how their hierarchy relates to mine. What they experimentally suggest is that the larger the scale of the transformations used, the more highly structured the objects and therefore the greater their algorithmic complexity.

For example, rule 110 is one rule about which my own phase transition classification says that, despite showing some sensitivity, it also shows some stability. Which means that one can say with some degree of certainty how it will look (and behave) for certain steps and certain initial configurations, unlike those at the top. This turns out to be predictable according to Israeli and Goldenfeld as well, at a coarse-grained level, after a scale transformation.

**A sketch on a conjecture on the transition coefficient of Turing-universal systems**

Based on two empirical facts, I conjecture that universal CAs should have a large transition coefficient, as 110 does. Rules such as 110 and 54 also turn out to be next to each other in the phase transition classification, both having large values (they are in the top 24, and both the top 24 and the bottom 22 are in the original paper: see the last 2 tables).

So I base this conjecture on the following empirical observations:

1. The only known universal elementary cellular automata figure at the top of this classification, and so do candidates such as rule 54, which appears right next to rule 110.

2. Universality seems to imply that a system should be capable of being controlled by the inputs, which our classification suggests those at the bottom are not, as all of them look the same no matter what the input, and may not be capable of carrying information through the system toward the output.

Other rules that some people think may be capable of some sophisticated computation (See paper in the Journal of Complex Systems by Paul-Jean Letourneau) also have large transition coefficients, such as rule 146, with a transition coefficient 0.124, ranking 39 out of the 256 elementary cellular automata.

As noted before, however, the transition coefficient is not a universal measure. In this case, coefficients were calculated for 600 steps in blocks of 50 for the first 500 initial conditions, which means that some rules may be capable of greater transitions but are somehow ‘slow’ at the selected number of steps and number of initial conditions. i.e. they take more than 500 initial conditions–in Gray code numbering–to show larger changes, or else larger changes are being missed because of the jumps in blocks of 50 steps, though this latter possibility is less likely.

The conjecture also seems to be in agreement with Wolfram’s claim (made at some of his oral presentations) that rule 30 (as a class 3 elementary cellular automaton) may be, according to his Principle of Computational Equivalence (PCE), computationally universal. But it may turn out that it is too difficult (perhaps impossible) to control in order to perform a computation, because it behaves too randomly.

Would your approach also work for calculating lambda for e.g. Langton’s Ant (http://en.wikipedia.org/wiki/Langton%27s_ant)? This CA looks like a class 3 for the first 10000 steps, but develops into a class 2 afterwards.

It would. If you capture the change of behavior by taking pictures of the system for long enough, you would see that the transition coefficient will make the derivative to fall into the new class.

What I’m suggesting in my paper is that Langton’s lambda parameter is unlikely to be scalable because it is ‘a priori’ measure. That is, it looks at the density of non-zero values of the rule. In contrast, my measure is ‘a posteriori’, meaning one actually has to run the system to say something about it. As you say, I may be able to say when a CA looks like in certain class for certain inputs and for certain steps. My measure may be used to provide evidence in favor or not for the lambda parameter prediction of a system.

Notice that while Langton’s is ill-fated because it cannot predict with certainty and will not tell you nothing more about a system beyond a generality, my measure is ill-fated because it will only tell you something about the system for the number of explored initial conditions and up to a number of steps (the problem of induction). Yet, my measure is scalable by definition and my suspicion is that it has a lot of potential. Among the many open questions to investigate is precisely the possible connections to Langton’s lambda. In principle they have draw a shared conclusion, that the rate of information transmission is very important: it has to unfold at a reasonable time, not too fast and not too slow for a system to behave complex. I hope my approach motivates further study. Thanks.