Help science and behave randomly

Posted in General on May 9th, 2013 by Hector Zenil – Be the first to comment

Take part in a cutting-edge massive online experiment to asses how humans perceive and generate randomness

Human Randomness Perception and Generation Project:

http://complexitycalculator.com/hrng/

For obvious reasons I cannot tell much.

This is an experiment undertaken by the Algorithmic Nature Group of LABoRES.

The results will be announced in this blog. Stay tuned.

New volume: A Computable Universe

Posted in General on March 23rd, 2013 by Hector Zenil – Be the first to comment

With a foreword by Sir Roger Penrose, my new volume A Computable Universe has been published by World Scientific and the Imperial College Press after my bestseller Randomness Through Computation in 2011. A Computable Universe: Understanding and Exploring Nature as Computation, introduces the views of the pioneers in the computable universe view of the world and some of its strongest opponents (including Penrose himself).

 

bookcover

In the context of the Alan Turing Year, the book discusses the foundations of computation in relation to nature.

  • What is computation?
  • How does nature compute?

They discuss computation in the world from a variety of perspectives, ranging from foundational concepts to pragmatic models to ontological conceptions and philosophical implications.

The volume provides a state-of-the-art collection of technical papers and non-technical essays, representing a field that assumes information and computation to be key in understanding and explaining the basic structure underpinning physical reality. It also includes a new edition, prepared by Adrian German from Indiana University Bloomington and myself, of Konrad Zuse’s “Calculating Space” (the MIT translation) fully retyped in shiny LaTeX with a few corrections and improvements to figures. A free copy of which is available here (trivia: Zuse’s name was used in the Tron: Legacy film (2010) certainly inspired by his mind blowing ideas on a digital universe).

The book also includes a panel discussion transcription on the volume topic, featuring worldwide experts including a Nobel prize in quantum mechanics, physics, cognition, computation and algorithmic complexity. In this section I asked authors (and other authors asked as well) what I thought an informed reader would ask them (including myself), sometimes leading to quite heated exchanges.

The volume is dedicated to the memory of Alan M. Turing — the inventor of universal computation, on the 100th anniversary of his birth, and is part of the Turing Centenary celebrations.

Free sample chapter(s)
Foreword (533 KB)
Chapter 1: Introduction the Computable Universe (476 KB) by myself.
Table of Contents:

    • Foreword (R Penrose)
    • Preface
    • Acknowledgements
    • Introducing the Computable Universe (H Zenil)
  • Historical, Philosophical & Foundational Aspects of Computation:
    • Origins of Digital Computing: Alan Turing, Charles Babbage, & Ada Lovelace (D Swade)
    • Generating, Solving and the Mathematics of Homo Sapiens. E Post’s Views on Computation (L De Mol)
    • Machines (R Turner)
    • Effectiveness (N Dershowitz & E Falkovich)
    • Axioms for Computability: Do They Allow a Proof of Church’s Thesis? (W Sieg)
    • The Mathematician’s Bias — and the Return toEmbodied Computation (S B Cooper)
    • Intuitionistic Mathematics and Realizability in the Physical World (A Bauer)
    • What is Computation? Actor Model versus Turing’s Model (C Hewitt)
  • Computation in Nature & the Real World:
    • Reaction Systems: A Natural Computing Approach to the Functioning of Living Cells (A Ehrenfeucht, J Kleijn, M Koutny & G Rozenberg)
    • Bacteria, Turing Machines and Hyperbolic Cellular Automata (M Margenstern)
    • Computation and Communication in Unorganized Systems (C Teuscher)
    • The Many Forms of Amorphous Computational Systems (J Wiedermann)
    • Computing on Rings (G J Martínez, A Adamatzky & H V McIntosh)
    • Life as Evolving Software (G J Chaitin)
    • Computability and Algorithmic Complexity in Economics (K V Velupillai & S Zambelli)
    • Blueprint for a Hypercomputer (F A Doria)
  • Computation & Physics & the Physics of Computation:
    • Information-Theoretic Teleodynamics in Natural and Artificial Systems (A F Beavers & C D Harrison)
    • Discrete Theoretical Processes (DTP) (E Fredkin)
    • The Fastest Way of Computing All Universes (J Schmidhuber)
    • The Subjective Computable Universe (M Hutter)
    • What Is Ultimately Possible in Physics? (S Wolfram)
    • Universality, Turing Incompleteness and Observers (K Sutner)
    • Algorithmic Causal Sets for a Computational Spacetime (T Bolognesi)
    • The Computable Universe Hypothesis (M P Szudzik)
    • The Universe is Lawless or “Pantôn chrêmatôn metron anthrôpon einai” (C S Calude, F W Meyerstein & A Salomaa)
    • Is Feasibility in Physics Limited by Fantasy Alone? (C S Calude & K Svozil)
  • The Quantum, Computation & Information:
    • What is Computation? (How) Does Nature Compute? (D Deutsch)
    • The Universe as Quantum Computer (S Lloyd)
    • Quantum Speedup and Temporal Inequalities for Sequential Actions (M ?ukowski)
    • The Contextual Computer (A Cabello)
    • A Gödel-Turing Perspective on Quantum States Indistinguishable from Inside (T Breuer)
    • When Humans Do Compute Quantum (P Zizzi)
  • Open Discussion Section:
    • Open Discussion on A Computable Universe (A Bauer, T Bolognesi, A Cabello, C S Calude, L De Mol, F Doria, E Fredkin, C Hewitt, M Hutter, M Margenstern, K Svozil, M Szudzik, C Teuscher, S Wolfram & H Zenil)
  • Live Panel Discussion (transcription):
    • What is Computation? (How) Does Nature Compute? (C S Calude, G J Chaitin, E Fredkin, A J Leggett, R de Ruyter, T Toffoli & S Wolfram)
  • Zuse’s Calculating Space:
    • Calculating Space (Rechnender Raum(K Zuse)
    • Afterword to Konrad Zuse’s Calculating Space (A German & H Zenil)

Available at Amazon and your favorite online retailers.

My new book: Irreducibility and Computational Equivalence

Posted in General on March 23rd, 2013 by Hector Zenil – Be the first to comment

Irreducibility and Computational Equivalence is a collection of contributions by outstanding authors in celebration of Stephen Wolfram’s A New Kind of Science after 10 years of its publication. Published by Springer Verlag it is already available both Paperback and Hardcover, as well as an eBook and CDF (computable document).

Here is a free preview and the book website from Springer.

ZenilWolframBook It is clear that computation is playing an increasingly prominent role in the development of mathematics, as well as in the natural and social sciences. The work of Stephen Wolfram over the last several decades has been a salient part in this phenomenon helping founding the field of Complex Systems, with many of his constructs and ideas incorporated in his book A New Kind of Science (ANKS) becoming part of the scientific discourse and general academic knowledge–from the now established Elementary Cellular Automata to the unconventional concept of mining the Computational Universe, from today’s widespread Wolfram’s Behavioural Classification to his principles of Irreducibility and Computational Equivalence.

I found this volume fascinating in its efforts to flesh out the computational implications for biology more generally.
– Dr. Mark Changizi

I believe that this book will be an inspiration for future work in interdisciplinary research at the intersection of computer science, natural and social sciences.
– Prof. Ivan Zelinka

The volume, with a Foreword by Gregory Chaitin and an Afterword by Cris Calude, covers these and other topics related to or motivated by Wolfram’s seminal ideas, reporting on research undertaken in the decade following the publication of Wolfram’s NKS book. Featuring 39 authors, its 23 contributions are organized into seven parts.

It is now available from Amazon and any large online bookstore, and directly from Springer itself.

Table of Contents:

Foreword
Gregory Chaitin

Part I Mechanisms in Programs and Nature

1. Hyperbolic Cellular Automata
Maurice Margenstern
2. A Lyapunov View on the Stability of Cellular Automata
Jan M. Baetens & Bernard De Baets
3. On the Necessity of Complexity
Joost J. Joosten
4. Computational Technosphere and Cellular Engineering
Mark Burgin

Part II The World of Numbers & Simple Programs

5. Cellular Automata: Models of the Physical World
Herbert W. Franke
6. Symmetry and Complexity of Cellular Automata: Towards an Analytical Theory of Dynamical System
Klaus Mainzer
7. A New Kind of Science: Ten Years Later
David H. Bailey

Part III Everyday Systems

8. A New Kind of Finance
Philip Z. Maymin
9. The Relevance and Importance of Computation Universality in Economics
Kumaraswamy Velupillai
10. Exploring the Sources of and Nature of Computational Irreducibility
Brian Beckage, Stuart Kauffman, Louis Gross, Asim Zia, Gabor Vattay and Chris Koliba

Part IV Fundamental Physics

11. The Principle of a Finite Density of Information
Gilles Dowek and Pablo Arrighi
12. Artificial Cosmogenesis: A New Kind of Cosmology
Clément Vidal
13. Do Particles Evolve?
Tommaso Bolognesi

Part V The Behavior of Systems & the Notion of Computation

14. An Incompleteness Theorem for the Natural World
Rudy Rucker
15. Pervasiveness of Universalities of Cellular Automata: Fascinating Life-like Behaviours
Emmanuel Sapin
16. Wolfram’s Classification and Computation in Cellular Automata Classes III and IV
Genaro J. Martinez, Juan Carlos Seck Tuoh Mora and Hector Zenil

Part VI Irreducibility & Computational Equivalence

17. Exploring the Computational Limits of Haugeland’s Game as a Two-Dimensional Cellular Automaton
Drew Reisinger, Taylor Martin, Mason Blankenship, Christopher Harrison, Jesse Squires and Anthony Beavers
18. Irreducibility and Computational Equivalence
Hervé Zwrin and Jean-Paul Delahaye
19. Computational Equivalence and Classical Recursion Theory
Klaus Sutner

Part VII Deliberations and Philosophical Implications

20. Wolfram and the Computing Nature
Gordana Dodig-Crnkovic
21. A New Kind of Philosophy. Manifesto for a Digital Ontology
Jacopo Tagliabue
22. Free Will For Us, not For Robots
Selmer Bringsjord

Afterword
Cristian Calude

Available from Amazon and any large online bookstore, and directly from Springer itself.

Life as Thermodynamic Evidence of Algorithmic Structure in Natural Environments

Posted in General on March 23rd, 2013 by Hector Zenil – Be the first to comment

In evolutionary biology, attention to the relationship between stochastic organisms and their stochastic environments has leaned towards the adaptability and learning capabilities of the organisms rather than toward the properties of the environment. This article is devoted to the algorithmic aspects of the environment and its interaction with living organisms. We ask whether one may use the fact of the existence of life to establish how far nature is removed from algorithmic randomness. The paper uses a novel approach to behavioral evolutionary questions, using tools drawn from information theory, algorithmic complexity and the thermodynamics of computation to support an intuitive assumption about the near optimal structure of a physical environment that would prove conducive to the evolution and survival of organisms, and sketches the potential of these tools, at present alien to biology, that could be used in the future to address different and deeper questions. We contribute to the discussion of the algorithmic structure of natural environments and provide statistical and computational arguments for the intuitive claim that living systems would not be able to survive in completely unpredictable environments, even if adaptable and equipped with storage and learning capabilities by natural selection (brain memory or DNA).

This new paper has recently been published online and is freely available from MDPI here. The statistics show 355 readers in only 3 months.

Zenil, H.; Gershenson, C.; Marshall, J.A.R.; Rosenblueth, D.A. Life as Thermodynamic Evidence of Algorithmic Structure in Natural Environments. Entropy 2012, 14, 2173-2191.

Announcing the Online Algorithmic Complexity Calculator

Posted in Algorithmic information theory, Complexity, General on March 23rd, 2013 by Hector Zenil – Be the first to comment

We have made available a basic beta version of an Online Algorithmic Complexity Calculator implementing the methods we have developed in recent papers at the Algorithmic Nature lab.

The OACC provides a comprehensive framework of universal mathematical measures of algorithmic complexity for researchers and professionals. It retrieves objective numerical measures of randomness for potential applications in a very wide range of disciplines, from bioinformatics to psychometrics, from linguistics to economics.

It is based on several years of research devoted to new methods for evaluating the information content and algorithmic complexity. The description of the Coding Theorem method to deal with short strings is described in this paper. It currently retrieves numerical approximations to Kolmogorv complexity and Levin’s universal distribution (or Solomonoff algorithmic probability) for binary strings of short length, for which lossless compression algorithms fail as a method for approximation to program-size complexity, hence providing a complementary and useful alternative to compression algorithms. More algorithmic information measures, more data and more techniques will be incorporated gradually in the future, covering a wider range of objects such as longer binary strings, non-binary strings and n-dimensional objects (such as images).

It also includes a Short String Complexity Explorer (it may take some time to run if you open the link) tool developed in Mathematica, it runs oinline with a free player. It provides a comparison of the estimated Kolmogorov complexity (K), the algorithmic probability (m), Shannon’s Entropy (E) and compressibility (using Deflate) of a string. It also provides an indication of the relative complexity among all other strings for which K has been approximated and its distribution rank.

Calculating a Universal Distribution to Approximate Kolmogorov-Chaitin Complexity

Posted in Algorithmic information theory, Complexity, Computability, Universality and Unsolvability, Computer Science, Foundations of Computation, New Ideas on December 12th, 2012 by Hector Zenil – Be the first to comment

Computing the incomputable has always been a challenge. For example, in finding the busiest Turing machines (Rado) given a number of symbols and states (whimsically called busy beavers). This means either finding Turing machines that, starting from an empty input, produce more non-blank symbols in their output tapes before halting than any other Turing machine of the same size, or Turing machines that, also starting from an empty input, have the greatest runtime before halting than any other Turing machine of the same size. Both problems are ultimately undecidable because of the Turing-complete capabilities of Turing machines, as proven by Alan Turing himself (that is, the capability of some Turing machines to simulate any other Turing machine).

In this new paper we describe how we have managed to calculate an approximation of a so-called Universal Distribution (aka Levin’s semi-measure) which connects the frequency of production of a string to its Kolmogorov complexity (K). The chief advantage of calculating an approximation of the Universal Distribution is that it is an incremental process over an average of a large number of Turing machines. One doesn’t get rid of the constant from the invariance theorem in the theory of algorithmic information theory (for example when Kolmogorov complexity is measured using 2 different universal Turing machines), yet one seems to have to make fewer arbitrary decisions.

One of the main advantages is that one can better deal with strings of very short lengths. Think about it! If one wished to approximate K for a single bit by using compression algorithms, the lossless compression algorithm would not be able to compress the single bit any further. And this not only happens for a single bit but for all strings up to a certain minimal length for which lossless compression algorithms are simply unsuitable (recall that a compression algorithm includes the decompression instructions together with the data in the new compressed object in order to make it self-decompressible).

The usefulness of lossless compression algorithms as a method for approximating K derives from the fact that compression is a sufficient test of non-randomness. This is because K is, more precisely than an uncomputable function, upper semi-computable, meaning that one can estimate upper bounds. The lossless compressed length of an object s (e.g. a string) is therefore an upper bound on K(s). The usefulness of the Coding Theorem Method (the theorem presented in this paper) will ultimately come down to whether it is useful in applications, which is the main motivation behind its conception, given the failure of compression algorithms to provide information about the possible K(s) values for s that are too short (shorter, say, than the typical size of the length of the instructions that a lossless compression algorithm has to add to the compressed object).

The state of the art of the Coding Theorem Method can be gleaned from this paper, recently made public by my colleagues and I, and announcing the release of the calculation of a universal distribution based on (5,2), that is, all Turing Machines with 5 states and 2 symbols: Calculating Kolmogorov Complexity from the Output Frequency Distributions of Small Turing Machines

It represents a major improvement over this previous calculation of mine, that required new and more powerful techniques to deal with a semi-computable distribution. It improves our previous work in terms both of accuracy and coverage of number of short strings and validates previous calculations of universal distributions, showing the incremental nature of the method to be fully compatible with the other calculated universal distributions with smaller samples of small Turing machines (and for which the known Busy Beaver values could be used).

In this other paper we explain why our approximations of K are real-number values, showing that strict integer-value program size follows our Coding Theorem Method, and thus that ours constitutes a more fine-grained measure. It is also shown that logical depth departs from both strict program-size and the Coding Theorem Method evaluations, being in agreement with the theory for all these 3 measures. The paper is available online at: Correspondence and Independence of Numerical Evaluations of Algorithmic Information Measures

In the next post I will be announcing and also briefly explaining the results from another paper showing that not only can our Coding Theorem deal with short strings, but that we have found a way to validate the method by lossless compressibility. Moreover, we have found that in the transition period, where the Coding Theorem Method starts to be too expensive to be of practical use whereas the compression method starts to provide some results, the 2 methods are in great agreement with each other. Like an expensive microscope of great power (e.g. the LHC at CERN), our Coding Theorem Method requires an incredible amount of calculation. The trick is to know when to use a microscope–to know when a microscope is required rather than a telescope. We believe we are providing the tools to deal with the information content of the smallest entities in the computational universe.

Paper published: Life as Thermodynamic Evidence of Algorithmic Structure in Natural Environments

Posted in General on November 6th, 2012 by Hector Zenil – Be the first to comment

Life as Thermodynamic Evidence of Algorithmic Structure in Natural Environments

In evolutionary biology, attention to the relationship between stochastic organisms and their stochastic environments has leaned towards the adaptability and learning capabilities of the organisms rather than toward the properties of the environment. This article is devoted to the algorithmic aspects of the environment and its interaction with living organisms. We ask whether one may use the fact of the existence of life to establish how far nature is removed from algorithmic randomness. The paper uses a novel approach to behavioral evolutionary questions, using tools drawn from information theory, algorithmic complexity and the thermodynamics of computation to support an intuitive assumption about the near optimal structure of a physical environment that would prove conducive to the evolution and survival of organisms, and sketches the potential of these tools, at present alien to biology, that could be used in the future to address different and deeper questions. We contribute to the discussion of the algorithmic structure of natural environments and provide statistical and computational arguments for the intuitive claim that living systems would not be able to survive in completely unpredictable environments, even if adaptable and equipped with storage and learning capabilities by natural selection (brain memory or DNA).

Zenil, Hector; Gershenson, Carlos; Marshall, James A.R.; Rosenblueth, David A. 2012. “Life as Thermodynamic Evidence of Algorithmic Structure in Natural Environments.” Entropy 14, no. 11: 2173-2191.

http://www.mdpi.com/1099-4300/14/11/2173

Hypercomputation in A Computable Universe

Posted in Computability, Universality and Unsolvability, Foundations of Computation, General, Recreation on September 21st, 2012 by Hector Zenil – 1 Comment

This is the full version of my answer to a question formulated by Francisco A. Doria to me included in the Discussion Section of A Computable Universe, my edited volume being published by World Scientific and Imperial College Press coming out next month (already available in Asia) concerning whether I think if Hypercomputation is possible:

I was once myself a hypercomputation enthusiast (my Master’s thesis–in French–was on “Hyper calcul”, focused on feasibility of hypercomputational models). Paradoxically it wasn’t until I had a good knowledge of it that I started to better appreciate and to increasingly enjoy more the beauty of the digital (Turing) model. On the one hand, hypercomputational models do not converge in computational power. There are a plethora of possible models of hypercomputation while there is only one digital (in terms of computational power). I find it astonishing how little it takes to reach the full power of (Turing) universality, and I don’t see the power of universal machines as being subject to any limitation, all the opposite. I happen to think the question has a truth value and is ultimately susceptible of a physical answer. However, I don’t think the answer will emerge in the foreseeable future, if it ever does.

Still, I found Doria’s contribution to the subject interesting, as he points out a particular (“ideal”) model of hypercomputation that I think gets around some of Martin Davis’ objections (The Myth of Hypercomputation), though it doesn’t fully address the problem of verification. Unlike a Turing computation, which can in principle be verified by carrying it out by hand, step by step the inner working of a hypercomputer can only be followed by another hypercomputer. The caricature version of the problem is in the whimsical answer given by the (hyper?)computer Deep Thought in “The Hitchhiker’s Guide to the Galaxy” by Douglas Adams, which proposed “42″ as the “Ultimate (uncomputable?) answer to the Ultimate Question of Life, The Universe, and Everything” [added parenthesis]. The only way to verify such an answer would be by building another, more powerful and even less understandable computer. This makes me wonder whether we ought not to favour meaningful computation over what could potentially be hypercomputation, even if hypercomputation were possible.

There is a strong analogy to the concept of proof in math. Mathematical proofs seem to fall into two types. They either serve to convince of the truth of a statement one wasn’t certain was true, or else to provide logical evidence that a statement intuitively believed to be true was in fact so (e.g. the normality of the mathematical constant pi). But in the latter case, why would one bother to provide a proof of a statement that nobody would argue to be false? It is because ideally math proofs should provide insight into why a statement is true, and not simply establish whether or not it is so. There are a few exceptions of course. Some come from mathematical practice, for example Wiles’ proof of Fermat’s last theorem. It is not clear whether Wiles’ proof provides any insight into the original question of the truth value of Fermat’s theorem (it does contribute to understand the connection among different powerful mathematical theories). Some other cases, especially among computer automated proofs, are of the same kind, often neglecting the fact that a proof is also about explanation (for humans). In that sense I think we should also favour meaningful mathematical proofs (from meaningful mathematical questions!), just as we should better appreciate meaningful (digital) computation.

The study of infinite objects has given us great insight into profound and legitimate questions of math and computation, such as the question of the nature of what is a number or what is a computation. And it has been immensely useful to focus on limits of computation in order to better understand it. It is at the boundary between decidability and undecidability that one seems best positioned to answer the question of what does computation mean. And there are some examples where the study of a hypercomputational model is more valuable not as a model of hypercomputation but by its ancillary results. Unlike most people, I think the contribution of, for example, Siegelmann’s ARNN model, has much more to say about computational complexity (the ARNN model relativises P = NP and P != NP in a novel fashion) and therefore classical computation than about hypercomputation! (Siegelmann’s ARNNs model has come to be strangely revered among hypercomputation enthusiasts).

While I may agree that the problem of verification of a computation is not necessarily an argument against hypercomputation (because it can also be used against digital computation in practice), the answer is only in the realm of physics and not in paper models. As Davis points out, it is not a surprise that encoding non-computable numbers as weights in an artificial neural network deals to non-computable computation! so, assuming real numbers exist the consequence is straightforward and the justification of a hypercomputational model just circular.

Other models of hypercomputation are just fundamentally wrong, such as Peter Wegner’s model of “interactive computation” that is claimed to break Turing’s barrier. For a recent commentary pointing out the the flaws of yet another hypercomputational claim, you can consult this entertaining blog post by Scott Aaronson and his Toaster-Enhanced Turing Machine.

Meaningful Math Proofs and “Math is not Calculation”

Posted in Foundations of Math, General, New Ideas, Recreation on August 5th, 2012 by Hector Zenil – Be the first to comment

Mathematicians are generally thought to be very good at calculation, and they sometimes are, but this is not because math is about calculation, just as astronomy is not about telescopes and computer science is not about computers (paraphrasing Edsger Dijkstra).

But if it is not preeminently about calculation, then what is mathematics about? The common answer, given by mathematicians themselves, is that it is about solving problems–learning to do so, and to think in a logical fashion. I think this may also be misguiding. Let’s take as an example Fermat’s last theorem (which may also be misguiding, as some math is not only about historical unresolved questions with little impact, in some sense, Fermat’s Last Theorem is not much more than a curiosity).

As is well-known, Fermat’s last theorem was solved by Andrew Wiles in 1994. The solution wasn’t simple at all, and it required a lot of convoluted math (Wiles himself got it wrong first and a few years later fixed it). The solution required much more machinery than the machinery required to formulate Fermat’s original statement, it is like going after a fly with a bazooka. For some mathematicians concerned with the foundations of math, Wiles’ theorem may not be as satisfactory they would have wished, even though as a piece of mathematical work connecting several areas of advanced math.

Wiles’ exact solution couldn’t have been arrived at earlier, as it draws upon some of the most sophisticated areas of modern math, it transfers a pure problem of arithmetic to a very powerful framework founded in algebraic geometry and analysis (elliptic curves, modular forms). If Fermat had a solution it wouldn’t have been at all similar to Wiles’. In fact, even before Wiles’ proof, no mathematician ever thought that Fermat’s last theorem was actually false. The consensus was that the theorem was true, so the long expected proof was supposed to shed light on the reason behind this truth rather than simply confirm it. Wiles’ great achievement only confirmed that with a very sophisticated set of tools one could prove Fermat’s last theorem, but it didn’t shed any light on why it was true (if it does, it is still hard to justify such a tradeoff in which to shed some light in arithmetic one requires the use of set theory).

Wiles’ solution doesn’t answer the question of whether Fermat himself had a more basic (but no less valid) proof. The most likely case is that Fermat’s last theorem is actually independent of number theory, and requires the full power of set theory. Hence Fermat very likely had no such solution, contrary to what he claimed.

Fermat’s theorem is, however, a rare case of the ultimate great challenge, and not part of the common practise of the “average” mathematician. The work of an “average” mathematician depends very much on its field. If the field is concerned with foundations it may involve a lot of set theory and logic, which in large part involves finding sets of formulae to prove other sets of formulae and looking for relativisations. In algebra or group theory, or even topology, their work may have to do with all sorts of symmetries or ways to characterise things in different ways in order to establish new and unanticipated connections, shedding more light on one area in terms of another, from a different point of view (and employing a different language). This connection is in fact the greatest achievement of Wiles’ proof of Fermat’s last theorem, as it connects several subdisciplines of modern mathematics building sophisticated intuitions of true statements, ultimately connected to Fermat’s last theorem.

I think math can be better described as a discipline of patterns (e.g. mappings, homeomorphisms, isomorphisms). Patterns are integrally related to the discipline I value most: algorithmic information theory, which is the ultimate field of the study of patterns, patterns in strings and objects, leading to applications such as compression algorithms.

One of the consequences of this change of mindset is that it will allow students access to computers. Computers and humans together can achieve an impressive number of things; they do so everyday in all sorts of fields, from car design to the development of new chips for the next generation of computers. In fact, airplanes today are mostly driven by computers, no pilot or set of pilots alone would be able to fly a modern airplane these days, and more than 60 to 70% (in some markets even 90%) of the operations in any top stock market are computer made, with no human intervention. Today’s industry wouldn’t be conceivable without the combined forces of human creativity and motivation, and the computer’s speed, accuracy and productivity.

There is an important new trend spearheaded by the UK initiative Computer-Based Math
promoting these very ideas. Conrad Wolfram has recently explained in a blog post how computer programming could not only help to teach math, but how they could merge into a single discipline. I may have a chance to talk about all this at the House of Lords soon, where the education shift would need to take official form.

Conjectures concerning Busy Beavers, Dynamic Behavior and Turing Universality

Posted in Complexity, Computability, Universality and Unsolvability, Computer Science, Foundations of Computation on June 1st, 2012 by Hector Zenil – Be the first to comment

In a recent paper I have advanced some conjectures using a coefficient that renders aspects of the qualitative behavior of complex systems in quantitative terms. It measures the sensitivity of a system to external stimuli, its apparent ability to (efficiently) transfer information from the input through the output.

In a previous paper, and in a quite different context, I proposed the conjecture that systems capable of (efficient) Turing universal computation had large coefficients (greater than 1), meaning that they are sensitive enough to external stimuli.

It is evident that a system that doesn’t react to its input is unlikely to be (efficiently) programmable, which is what Turing universality is about. Recall that Turing universality is the notion of a machine capable of simulating any other machine of the same type. More formally, if U is a universal Turing machine, anda Turing machine with input s, one can run U with input, that is>, and U will then behave like M with input s, where M and s are any Turing machine and any input.

In that paper I used this coefficient as a phase transition detector, and it so happened that the way in which it was defined made it eminently capable of classifying the behavior of cellular automata and other systems, even yielding Wolfram’s well-known classes of behavior. The paper is available online here.

Now I have advanced another set of conjectures, this time relating a set of Turing machines defined by the way they behave: Busy Beaver Turing machines. These are machines that do a lot of work before halting. More formally, they print more non-blank symbols and have the greatest runtime among machines of the same size (number of states and symbols). I have written a program showing all known and candidate Busy Beaver machines (in Mathematica, runnable on your browser by downloading the Mathematica CDF player):

Let bb(n) be the Busy Beaver with n states (and 2 symbols). The conjectures say that:

1. (strong version): For all n > 2, bb(n) is capable of universal computation.
2. (sparse version): For some n, bb(n) is capable of universal computation.
3. (weak version): For all n > 2, bb(n) is capable of (weak) universal
computation.
4. (weakest version): For some n, bb(n) is capable of (weak) universal
computation.

They are based on the fact that Busy Beaver machines are complex, in Bennett’s sense. Bennett’s notion of “logical depth” is the idea of capturing the complexity of a string s in terms of the unfolding time of the shortest (or set of nearly shortest) programs producing the said string. The outputs of Busy Beaver Turing machines are therefore, by definition, the ones with the greatest logical depth. If they print more symbols than any other machine and take the greatest time to do so, they have maximal logical depth. Busy Beaver machines are also machines that produce their outputs with the minimum resources (if there were shorter machines producing the same outputs those would be the Busy Beaver machines), hence they are the shortest programs producing the strings. Busy Beaver machines are therefore by definition very deep (in Bennett’s definition of complexity), meaning they are complex in this sense (logical depth is also profoundly related to Kolmogorov complexity, given that the definition involves the shortest(s) programs).

Besides being deep, Busy Beavers have another characteristic: they halt. So they don’t just do their stuff without worrying about whether to stop. One can easily encode a “Busy Beaver” that loops and produces any arbitrary number of non-blank symbols say for example, an infinite loop printing 1s), but Busy Beavers don’t. They save a state to halt at the end. Machines that are not able to halt (the majority) cannot be Turing universal, because they wouldn’t ever produce any output, simply because by definition something is an output if the machine has halted. But machines that always halt would be decidable, and therefore not universal either.

So among the things to attempt to amount to evidence in favor of the positive answer of the conjectures of Turing universality for Busy Beaver machines, is to encode some input such that they don’t stop (and to prove so), as a first step towards a prove for universality. If Busy Beavers are capable of halting for some inputs (e.g. empty input, as this is entailed in the definition of a Busy Beaver) but are also capable of not halting for others, this would strengthen the conjectures. Busy Beavers may, however, be very different among them, as they are defined by behavior, rather than their specific inner workings, but if the behavior of a Busy Beaver in any way determines something that captures a property of all them, one could use the same argument in favor or against them towards proving or disproving these conjectures.

There seems no easy way to prove that all Busy Beavers are capable of universal computation, but such a proof, if it exists, would be the first to characterize a non-trivial set of well defined universal Turing machines. Non-trivial, because of course one can build a universal Turing machine and keep adding resources and proving that the resulting machines are all universal, just as happens with Turing-complete programming languages. But Busy Beaver machines would be characterized by nothing but a qualitative behavior. This makes the assertions of the conjectures more interesting.

It could be the case that Busy Beavers are not capable of Turing universality for blank tapes. In fact, Michel Pascal, a world expert in the field of Busy Beavers, thinks that Busy Beavers may be capable of –weak– universal computation (personal communication), that is, starting from non-blank tape configurations (e.g. a periodic or bi-periodic tape configuration). This accords with another conjecture in Computer Science, the Collatz sequence. The conjecture says that whichever number you start with, if n is even, divide it by 2 to get n / 2. If n is odd, multiply it by 3 and add 1 to obtain 3n + 1; you will end always end up reaching 1. I have also written 2 small computer programs (also in Mathematica) illustrating this conjecture:

According to Michel, Busy Beavers would be related to Collatz-type functions (see his paper “Small Turing machines and generalized busy beaver competition“), with conjectures that are related but more general (in the context of small Turing machines) to the ones I’m talking about.

Today, we know that Turing universality requires very little. For example, Wolfram conjectured that a 2,3 Turing machine was universal, on the basis of the aspects ot its behavior that he studied. And as proven by Alex Smith, this 2,3 TM turns out to be capable of universal computation. The machine is surprisingly small, and indeed is the smallest it can possibly be, given that we know that no 2,2 Turing machine can be capable of Turing universality (proof recently provided by Maurice Margensten in a Complex Systems paper). Today we also know many minimialistic systems capable of universal computation: Conway’s game of Life, Wolfram’s Rule 110, many Post-tag systems and a variation called Cyclic tag systems, Wang tiles. And at least 2 important researchers in the field have recently argued for the ubiquity of Turing universality (Martin Davis and Maurice Margenstern myself, along the lines anticipated by Wolfram’s Principle of Computational Equivalence).

Finally, I advanced a last conjecture relating Busy Beavers and my coefficient C in this paper, establishing that:

C(bb(n)) > 0

That is, that Busy Beavers have a positive coefficient C, i.e. are quite sensitive to input stimuli, which, if the very first conjecture is true (that Turing universal machines have a large coefficient) makes this last coefficient a good place to start to make the case for the universality of Busy Beavers.

The paper “On the Dynamic Qualitative Behavior of Universal Computation” at ArXiv and has been recently published in the Journal of Complex Systems (vol. 20-3) also available online here.

Turing’s Deep Field: Visualizing the Computational Universe

Posted in Complexity, Computability, Universality and Unsolvability, Computer Science, Foundations of Computation, Foundations of Math, Mathematical Logic on January 5th, 2012 by Hector Zenil – Be the first to comment

I generated this image in the course of an investigation of the distribution of runtimes of programs in relation to the lengths of mathematical proofs, the results of which are being published in my paper bearing the title “Computer Runtimes and the Length of Proofs with an Algorithmic Probabilistic Application to Optimal Waiting Times in Automatic Theorem Proving” Volume 7160 of the Lecture Notes in Computer Science series (LNCS), a festschrift for Cristian Calude.

The preprint is now available online in the ArXiv here. The paper shows that as theoretically predicted, most machines either stop quickly or never halt. It also suggests how a theoretical result for halting times may be used to predict the number of random theorems (dis)proven in a random axiom system –see the section I’ve called “Gödel meets Turing in the computational universe”.

Turing View of The Computational Universe

My plot was the winning image in this year’s Kroto Institute Image Competition, in the Computational Imagery category titled “Runtime Space in a Peano Curve”, it shows the calculation of the distribution of runtimes from simulating (4n+2)^(4n)=10000 Turing machines with 2 symbols and n=2 states (of a total of more than 10^12 simulated Turing machines with up to n=4 states) following a quasi-lexicographical order in a Peano curve preserving–as far as possible–the distance between 2 machines arranged in a 2-dimensional array from a 1-dimensional enumeration of Turing machines.

In the image each point or cluster of points represents a Turing machine or a group of Turing machines, and the color is determined by a spectrum encoding their halting runtimes–the lighter the square the sooner the machine entered the halting state. White cells represent machines that are proven to halt in infinite time. Red cells show the Turing machines that take longer to halt (popularly called Busy Beavers). Knowing the values of the Busy Beaver functions allows us to identify the machines that never halt (depicted in white). The image is noncomputable, meaning that the process cannot be arbitrarily extended because of the undecidability of the halting problem (i.e. there is no procedure for ascertaining the color of the following pixels to zoom out the picture and cover a larger fraction of the computational universe).

Turing machines with an arbitrary number of states can encode any possible mathematical problem and are therefore perfect compressors of the known, the yet to be known, and even the unknowable (due to the undecidability of the halting problem).

This is how the image looks like as displayed in the stairway of the Kroto Research Institute in the UK:

Some postcards with the winning image were printed and they can be sent to scholar or enthusiasts upon request sending an email to hzenilc [at] gmail [dot] com.

I want to dedicate this prize to my former thesis advisor Jean-Paul Delahaye who suggested me the Peano arrangement as a packing for my visual results of halting times. And also to Cristian Calude who co-advised me all along my PhD thesis and who encouraged me to publish this paper and what better place to do so than for his festschrift.

I’m releasing the images under an open licence in honour of the Turing Year, so that it may be used for any artistic illustration of some of Turing’s main results (the halting problem) or for any other purpose in any medium.

Postcards of the winning image are also available upon request. Just send an email requesting 1 or more postcards to h.zenil [at] sheffield.ac.uk or to let him know (if you can) that you will be using any of the images (or if you need better resolution versions).

Towards a Web of One…

Posted in General on November 25th, 2011 by Hector Zenil – 1 Comment

The author is a little more critical of personalization than I’d be inclined to be, and he gives too much credit to human editors in my opinion (I think they can be as bad or worse than algorithmic editors. Witness the hacking scandals in the UK, a phenomenon not exclusive to Great Britain).

Eli Pariser’s words resonate powerfully with the way I see the web and how it has evolved into its current form, with its relentless social networking emphasis. My main concern is not personalization but trivialisation, owing not only to the inane content posted by individual users but also by companies controlling the web.

Social networking has been driven by an insidious knack for making clever people spout nonsense, and those who succumb make the place invasive in every sense. Not to mention the frantic race to acquire more friends (there have been reports of periods in which there is a reversal of this phenomenon, perhaps when people realise how meaningless these connections are and start unfriending their superfluous contacts, trimming their friend lists. Google+ and Facebook eventually came up with the idea of Acquaintances rather than Friends, in large part to mitigate this need.

The sad thing is that in many respects the Internet has brought out pretty much the worst in us rather than making us better. Here is a little meaningful piece of content though, to make us think about it:

David Hilbert and Deutsch in den Wissenschaften

Posted in General on November 13th, 2011 by Hector Zenil – Be the first to comment

David Hilbert was probably one of the most, if not the most, prominent German mathematician. His interests ranged from quantum mechanics to topology, and notably from the foundations to the history of science and mathematics. Among his main interests there was geometry, at which he excelled by formalizing a two and a half centuries (Euclidian) geometry synthesizing it into a few (consistent and independent) different sets axioms, making the entire theory as mathematically precise as possible (in the tradition of Euclid himself but taken to culmination) at the level of precision that any theorem could from then on be carried out by a mechanical procedure manipulating symbols with no semantics attached.

His formalization of geometry certainly led him to think that math could one day be entirely formalized in the same way, proving any true theorem for a given mathematical theory. In his famous 1900 lecture at the Sorbonne in Paris Hilbert set most of the mathematics agenda for the 20th. century, and among the open problems there was the one he called the Entscheidungsproblem (the Decision problem), that is whether a mechanical treatment of math would eventually make possible theorems to be automatically proven.

In his address to the Society of German Scientists and Physicians, in Königsberg on September 8, 1930, Hilbert finished his speech with a loud “Wir müssen wissen — wir werden wissen!” (We must know — we will know!) (the radio broadcast was recorded and is available here and an English transcription here). His statement defined so well his mind that it was used as the epitaph on his tomb in Göttingen.

Hilbert’s ideas eventually led two mathematicians to make two of the greatest discoveries in the last century solving Hilbert’s open question in the negative with remarkable outcomes. First, there was Kurt Gödel, who in 1931 found that there are always (true) theorems in any (powerful enough) mathematical theory which these theories could never prove to be true (or false) calling them undecidable propositions. Then, Alan Turing in 1939, proved what he called the undecidability of the Halting problem for any general-purpose mechanical machine, with nothing else but what was the basis of the first description of the modern digital computer and the foundations of a new science: computer science.

Hilbert’s “Wir müssen wissen — wir werden wissen!” made all this possible, and its impact is yet to be measured in the centuries to come.

My entry to the Deutsch in den Wissenschaften Competition (German in the Sciences) organized by the Goethe Institut.

A bit of cryptography, passwords strength, and sysadmins of savings banks

Posted in General, Natural Language and Computational Linguistics on August 14th, 2011 by Hector Zenil – Be the first to comment

This recent webcomic from xkcd, Password Strength, relates to what I’ve been investigating lately–the question of whether there need be a tradeoff between password strength and ease of recall, using, as usual, my knowledge of finite algorithmic information theory. While my conclusions will be presented in a paper in the next few months/years, the webcomic made me realize how defective even sophisticated things sometimes turn out to be. The webcomic draws attention to an unnecessary, even silly requirement for password generation.

Password strength is related to two main parameters: length and the likelihood of being guessed. Length is a simple parameter, the longer the better, but the likelihood of something being guessed is a bit trickier. What makes a string like 0000000 more likely to be cracked than say 0110101? Classical probability tells us that both have the same likelihood of occurring, thanks to (uniform) chance. But there is a good reason not to choose a word in any language as a password, viz. that someone trying to guess a password will assume that people wish to remember their password and therefore will most likely use meaningful strings, such as words, anniversary dates, and the like. Hackers use a technique called ‘dictionary attacks’ where every entry in a dictionary is presumed to be a potential password and used as such. The results are often spectacular. Despite the general awareness that words are not good passwords, when hackers perform these attacks they usually find at least a few people with weak, word-based passwords. The only reason not to use words is dictionary attack, so sysadmins took it upon themselves to dissuade people from choosing words as passwords, and the way they set about this was to ask users, and now quite often to force them, to:

1. Observe fixed length requirements;
2. Sometimes require a combination of caps;
3. Sometimes require numbers and other symbols;
4. In the case of some sites, require frequent password changes.

While requirement number 1 makes some sense (people may argue, with reason, that a short but good password is superior to an easy-to-guess long one), requirements 2 to 4 are quite arbitrary, if not completely nonsensical.

If people didn’t pick word-based passwords, if every character were assumed to be equally likely, only the length of the password would matter. If a password were to be generated randomly, the probability of hitting a word by random letter generation is very low, because there are more letters than one-digit numbers and reasonable symbols to choose from. The problem is that words are not random in the sense that they are sequences of correlated letters according to a distribution of words in a language.

But under the circumstances, the way that sysadmins decided to address this issue was to force people not to choose single words by arbitrarily enforcing requirements 1 to 3, with the result that people frequently either no longer remember their passwords or they cannot have a range of different passwords for use at different sites, because remember that if a password is disclosed at one site it will be disclosed for all the others where said password is used. To fix this, again sysadmins introduced the procedure of forcing people to change their passwords, so that users can no longer synchronize their password among all the sites that need one, leading them to have to either write down their nonsensical passwords because they cannot recall them, store them where they would not be secure, or else ask the system to retrieve the password every time they want to log in, often through unsecured lines (e.g. public wireless networks). These arbitrary requirements aren’t listed while you’re trying to log in, so that you may not remember whether you used a number or symbol (it may seem a bad idea to make hackers privy to this, but they will be anyway if they have access to the registration process). The computational time cost of retrieving or generating a new password may be low, but the human time cost may not be.

A longer password will always be better than a shorter one, even compared to a shorter one containing all sort of numbers and symbols. Numbers and symbols are only useful for preventing dictionary attacks, but if people use leeting (or chat slang) then I’m afraid it would not solve the problem that sysadmins are faced with, because hackers could just apply the translation tables to the dictionaries and try again, which I’m sure would be very successful. There are leetspeak translation sites on the web (e.g. www.computerhope.com/cgi-bin/convert.pl). The only truly effective way to solve this problem is to require longer passwords (which also prevents the use of words, because most common words have less than 8 characters, so asking for a 10-character password may be enough to preempt the choice of most dictionary words, though more characters can be required for greater saftey).

Could this problem have been solved otherwise? It would certainly be too expensive–computationally speaking–to check every password against a dictionary to avoid every possible word in every language (and even harder to avoid employing anything having to do with the user’s life, such as his forename or anniversary date), but being asked to come up with silly characters became unbearable for many users.

The xkcd site gives you a foretaste of it of a side effect of my algorithmic approach to cryptography. One does not and should not have to sacrifice ease of recall for strength. Something as simple as composing random words has greater combinatorial strength than a leetspeak-corrupted single word or passwords comprising a few random characters, with the added advantage that even if longer, sentences are much easier to remember.

There is no trick to this. Choosing a few words means a longer password, and putting them together circumvents dictionary attacks (to perform dictionary attacks over full sentences is computationally too expensive, so there is no way hackers can start doing so).

One shouldn’t need to choose between a good password and one that’s easy to remember, because there are passwords that can be both. What one may not be able to do is have short passwords that are easy to remember and also strong enough. Thus requirement 1 is the only important one, because one can also control the number of possible dictionary words, e.g. by picking a password length greater than most commonly used words. Zip’s law tells us that the number of letters will not be that long, because the longest words are less frequently used and therefore also less likely to be picked as passwords (besides, systems could actually make sure that there is no word in any major language that is of the stipulated length).

Compare the result of putting 2 random short (but not that short) words together using Mathematica:

In[1]:= passwds=Table[StringJoin[
RandomChoice[
Select[DictionaryLookup[], (StringLength[#] <
6) && (LowerCaseQ[#]) &], 2]], {100}];
In[2]:= RandomChoice[passwds, 5]
Out[2]="rimecrave", "stoneobeys", "mixedtromp", "whiffmutt", "letopted"

with their leetspeak versions that one would probably be required by sysadmins to use:

In[3]:=ToLeetSpeak/@passwds
Out[3]=”r1m3cr4v3″, “570n30b3y5″, “m1x3d7r0mp”, “wh1ffmu77″, “l370p73d”

even if the joined words do not either exist in a standard English dictionary:

In[4]:=Select[passwds, MemberQ[DictionaryLookup[], #] &]
Out[4]={}

To have fun you can also generate 2 word passwords in other languages using Mathematica:

In Spanish:

Table[StringJoin[
RandomChoice[
Select[DictionaryLookup[{"Spanish", All}], StringLength[#] < 6 &],
2]], {100}]

or French:

Table[StringJoin[
RandomChoice[
Select[DictionaryLookup[{"French", All}], StringLength[#] < 6 &],
2]], {100}]

Or any other of these languages:

In[5]:= DictionaryLookup[All]
Out[5]= {“Arabic”, “BrazilianPortuguese”, “Breton”, “BritishEnglish”, “Catalan”, “Croatian”, “Danish”, “Dutch”, “English”, “Esperanto”, “Faroese”, “Finnish”, “French”, “Galician”, “German”, “Hebrew”, “Hindi”, “Hungarian”, “IrishGaelic”, “Italian”, “Latin”, “Polish”, “Portuguese”, “Russian”, “ScottishGaelic”, “Spanish”, “Swedish”}

An alternative method (to compression) for approximating the algorithmic complexity of strings

Posted in Algorithmic information theory, Foundations of Computation, New Ideas on July 20th, 2011 by Hector Zenil – Be the first to comment

The method introduced in my doctoral dissertation was featured in the French version of Scientific American Pour La Science in its July 2011 issue No. 405 under the title Le défi des faibles complexités.

Jean-Paul Delahaye points out that:

Comme les très petites durées ou longueurs, les faibles complexités sont délicates à évaluer. Paradoxalement, les méthodes d’évaluation demandent des calculs colossaux.
(Like long durations or very short lengths, weak complexities are tricky to evaluate. Paradoxically, the evaluation methods require colossal calculations.)

and he continues:

Pour les petites séquences, cette mesure est stable et conforme à notre idée de la complexité; pour les grandes, elle est, d’après le théorème mentionné [the invariance theorem, my comment], conforme à la meilleure mesure de complexité unanimement admise, la complexité de Kolmogorov. Que demander de plus?
(For short strings, this measure is stable and conforms to our idea of complexity; for long strings, according to the aforementioned theorem [the invariance theorem-- my comment], it conforms to the best and universally accepted measure of complexity, Kolmogorov complexity. What more can one ask for?)



Imagine you are given 2 short strings and are asked to tell which one looks more random and which more structured. The strings are 0101 and 1011. Applying the idea of algorithmic complexity, the shorter the description the less random-looking. It would seem that the first string has a repeating pattern that can be taken advantage of in describing it. In plain English one may therefore say that the first string can be described as “zero and one twice” while the second one would require a longer description. In fact, there seem to be fewer descriptions for the first string than for the second (and also notice some ambiguity in plain English). The first may also allow the description “A zero followed by a one followed by a zero followed by a one” or “zero and one followed by the same” and perhaps other variations. Descriptions of the second one can include “one and zero followed by two ones” or “one zero one one,” just as the first one could simply be “zero one zero one,” which doesn’t look like a compressed version of the string but rather a straight translation into an expanded form of plain English.

All this by way of asking whether any of the two strings is without a doubt simpler than the other, or whether the apparent repetition in the first string makes us think that the string has a pattern despite the pattern being repeated only twice. Perhaps when one looks at such a string one gets the impression that it may belong to a larger string comprising alternations of 01 and one concludes that it is simpler than the second one. To leave the subjective realm, one would need to evaluate the algorithmic complexity of the strings and compare their respective values. The algorithmic complexity of a string is the shortest program (measured in bits) producing th string in question running on a universal Turing machine. It is inconvenient that there is no algorithm that, given a string, gives you the length of the shortest program that produces it. This is by reduction to the halting problem. Which means one cannot really measure with absolute certainty the algorithmic complexity of a string because it is uncomputable. It doesn’t mean, however, that one cannot approach it; one can often do so in an effective and useful way.

Traditionally, the way to approach the algorithmic complexity of a string was by using lossless compression algorithms. Lossless means that one can recover the original string from the compressed version. The use of lossless algorithms is preferable because there may be inconsistent ways of compressing strings that give the impression of compressing a string without corresponding to a measure of complexity (e.g., noise deletion is an effective way, but noise may be algorithmically random and the aim of algorithmic complexity is to precisely measure how random something is and not merely to gauge whether it looks random). The result of a compression algorithm is an upper bound of its algorithmic complexity. While one cannot ever tell when a string is not compressible, if one succeeds at somehow shortening a string one can tell that its algorithmic complexity cannot be larger than the compressed length.

One does not want to cheat and claim that one can compress any string into a bit if the decompression algorithm interprets that bit into the desired string. A fair compression algorithm can be defined as one that transforms a string into two pieces: one is the compressed version and the other the instructions to decompress the string, together accounting for the final length of the compressed version. In other words, it would seem as if you were adding the decompression algorithm to the compressed string so that the compressed string comes with its own decompression instructions. In the long run, there is a theorem (invariance) that guarantees that complexity values converge.

But for short strings (which are often the ones useful for practical applications), adding the decompression instructions to the compressed version makes the compressed string often, if not always, longer than the string itself. If the string is, for example, shorter than the size of the decompression algorithm, there will be no way to compress the string into something shorter still, simply because the decompression instructions are at least of the length of the original string. Moreover, the result is so dependent on the size of the decompression algorithm (because it is the greatest contributor to the overall length) that the final value of the compressed length is too unstable under different lossless compression/decompression algorithms.

For example, if one tries to compress a short string using Mathematica, one gets the following results:
StringLength@Compress["0101"] = 30

Looking at the beginning of the compression line when plotting the lengths of strings (x axis) against their compressed lengths (y axis) one observes that it does not start at y=0 even when x=0.

This means that compressing the string 0101 with 4 bits took 46 characters (even more in bits). In Mathematica, strings begin to be compressed this way at about length 30. This is not a malfunction of Mathematica; it is the result of what I have explained. The Mathematica Compress function is actually based on the Deflate lossless compression algorithm, which is a combination of the LZ77 algorithm and Huffman coding, among the most popular lossless compression algorithms available, used in formats like ZIP, GZIP, GIF and PNG (these last two are therefore lossless compression image formats).

Zooming into the axis origin of the plot one can see that the beginning looks unstable. The string in question is a repetition of n 1s with n lying on the x axis (which represents different compression lengths rather than individual ones to avoid repetition and a step-like curve)

If one tries to compress 1011 one gets none other than the same value:
StringLength@Compress["1011"] = 30

The instructions obviously take up some space in the final compressed length and they cannot be compressed themselves (if they were, they would in any case be the same for all strings, taking us back to the same situation). There is a limit for compression algorithms to compress short strings. So if one wished to tell which of these two strings were objectively more or less randomly complex by approximating their algorithmic complexity using a compression algorithm, it turns out that there is no way to do so.

On the other hand, given that the definition of algorithmic complexity based on compressibility says that the less compressible the more randomly complex a string, one could immediately say that a single bit, 0 or 1, is random for certain, i.e. has maximal algorithmic complexity, given that there is no way to further compress a single bit. In other words, there is no program shorter than 1 bit that can produce 0 or 1. The shortest descriptions of 0 and 1 are therefore 0 and 1 themselves. Hence 0 and 1, according to the compressibility approach, are random strings. It may seem to make no sense to characterize a single bit as random.

On the one hand, a single bit does not carry any information and on these grounds one may think of it as somehow random. If one thinks about whether one would have been able to predict 0 or 1 as the outcome of a process, given that there is no context because they occur alone, one may also conclude that their occurrence is somehow random. In other words, if you see a string like 010101 you may bet that the next bit is 0, but if you are provided with nothing there is no way you can favor any position on whether the bit to come is 0 or 1. So much for justifying that a single bit is random.

It is hard, however, to justify how 0 could look more random than, say, any other possible string. If 0 is random how is it relatively more complex than 00? Or 01? Intuition tells us that short strings shouldn’t be that random (more random than, for example, longer random-looking strings), so if a single bit is the most random among all finite strings, how could it be that there is such a phase transition from maximal random complexity to very low complexity of, say, strings of length 2, 3 or 5 bits long?

Since intuition tells us that something random should also be uncommon and rare, what if one asks how common 0 or 1 are as results of a program? There is a measure that gives the probability of a program’s producing a given string running on a universal Turing machine. This is a measure we used to present a new method to evaluate the complexity of strings, as an alternative to the traditional use of compression algorithms. The new method aims particularly to solve the problem of the evaluation of the complexity of short strings, as we’ve discussed. It is based on Levin-Solomonoff’s algorithmic probability and is connected back to algorithmic (Kolmogorov-Chaitin) complexity by way of Chaitin-Levin’s coding theorem.

Algorithmic probability says that it is not the case that a single bit is the most complex random string, but actually the most structured possible one and, more importantly , that the complexity transition is smooth, more in accordance with intuition.

It may be that it makes sense that a single bit can be regarded as both the most simple and the most complex of strings from different perspectives, and the advantage of the algorithmic probability approach is that it provides not only a different notion of the complexity of a single bit, one that is in keeping with intuition, but also that it generates a different outcome to the compressibility approach, even when the two measures are intimately related and asymptomatically produce the same results in the long term (for longer strings). I think the two views reflect different aspects of what a single bit represents.

The paper presenting the novel method for evaluating the algorithmic complexity of short strings was first proposed and sketched in Greg Chaitin’s 60th anniversary festschrift edited by Cris Calude (J-P. Delahaye & H. Zenil, “On the Kolmogorov-Chaitin complexity for short sequences,” Randomness and Complexity: From Leibniz to Chaitin, edited by C.S. Calude, World Scientific, 2007). The method uses an exhaustive and systematic search of Turing machines inspired by Wolfram’s NKS dictum, from which a frequency distribution of the halting machines is built and the Levin-Chaitin coding theorem applied to evaluate the algorithmic complexity of binary strings.

Chaitin pointed out (regarding our method) that:

…the dreaded theoretical hole in the foundations of algorithmic complexity turns out, in practice, not to be as serious as was previously assumed.

The full technical article is available in ArXiv Numerical Evaluation of Algorithmic Complexity for Short Strings: A Glance into the Innermost Structure of Randomness.

You can also look at the slides of the presentation I delivered at the Alan Turing amphitheater at the Computer Science Department of the University of Lille 1:

“The World is Either Algorithmic or Mostly Random” awarded a 3rd Place Prize in this year’s FQXi contest

Posted in Algorithmic information theory, Complexity, Foundations of Computation, Foundations of Physics, General, New Ideas on June 10th, 2011 by Hector Zenil – 2 Comments

Based on the combined ratings of the contest community and the panel of expert reviewers appointed by the FXQi, which included the members of the institute, I was awarded a 3rd Place Prize for my work The World is Either Algorithmic or Mostly Random in this year’s FQXi contest on the topic Is Reality Digital or Analog? sponsored by the Foundational Questions Institute. The winners were announced at this year’s World Science Festival in New York City.

My work can be summarized in one line as an explanation of the complexification process of the world, the process whereby we have evolved from a more primitive (random) state to the current organized state of the universe.

The essay is a summary of the philosophical branch of my current scientific research on finite algorithmic information theory. This philosophical branch is concerned with the exploration of the possible connections between algorithmic complexity and the physical world (or what happens in it). I propose the notion that the universe is likely digital, not as a claim about what the universe is ultimately made of but rather about the way it unfolds. Central to the argument are concepts of symmetry breaking and algorithmic probability, which are used as tools to compare the way patterns are distributed in our world to the way patterns are distributed in a simulated digital one. These concepts provide a framework for a discussion of the informational nature of reality. I argue that if the universe were analog, then the world would likely look more random, making it largely incomprehensible. The digital model has, however, an inherent beauty in its imposition of an upper limit and in the convergence in computational power to a maximal level of sophistication. Even if deterministic, that the world is digital doesn’t necessarily mean that the world is trivial or predictable, but rather that it is built up from operations that at the lowest scale are simple but that at a higher scale look complex and even random–though in appearance only.

How have we come from the early state of the universe (left) to the structures we find today (right)?

The arguments supporting my views are partially based on the findings of my research, epitomized by our most recent paper Numerical Evaluation of Algorithmic Complexity for Short Strings: A Glance into the Innermost Structure of Randomness available in ArXiv in which my co-author and I describe a method that combines several theoretical and experimental results to numerically approximate the algorithmic (Kolmogorov-Chaitin) complexity of bitstrings by using the concept of algorithmic probability, which is connected to algorithmic complexity by way of the (Levin-Chaitin) coding theorem.

An extended (and detailed) version of The World is Either Algorithmic or Mostly Random is forthcoming and will be eventually posted.

New book: “Lo que cabe en el espacio” on Philosophy of Space and Geometry

Posted in General on June 10th, 2011 by Hector Zenil – Be the first to comment

I am excited to announce the publication of my new book written in Spanish Lo que cabe en el espacio on the philosophy of space in connection to our reality, and what we can or cannot do with it and in it. The book, under the title “Lo que cabe en el espacio: La geometría como pretexto para explorar nuestra realidad física y matemática”, approaches the basic notions of geometry in the context of their original formulations, that is, the formulations of the mathematicians in their particular times and places. By so doing it attempts to understand the motivations behind these original formulations and build a modern view from the perspective of our time. Properties of space, such as dimension and curvature are discussed, as treated by mathematicians from Euclid to Legendre, from Descartes to Hilbert.

The book is available through CopIt Arxives UNAM, a peer reviewed publishing organization run by professors of the National University of Mexico (UNAM). Readers can download the PDF at no charge from CopIt Arxives and it will soon be available in physical form at Amazon.com for those wishing to own a paper copy. There is also a Kindle version of a previous incarnation of the book available from Amazon here, which despite issues around accent encodings, has been in the top 100 Kindle books in Spanish for several weeks/months. I should be updating the Kindle version fairly soon to the newest edition.

¿Cuánto cabe en nuestro espacio? ¿Por qué sólo pueden existir cinco poliedros regulares y no seis? ¿Qué figuras llenan el plano y el espacio? ¿Es el espacio una generalización trivial del plano? ¿Podemos notar si estamos en un plano curveado o si estamos sobre él? En lenguaje casi coloquial, despegaremos figuras del espacio para rotarlas en cuatro dimensiones, contaremos poliedros en dimensiones superiores y nos preguntaremos acerca de las propiedades fundamentales del espacio. En este libro se devela cómo una parte de las matemáticas describen las propiedades más fundamentales del espacio en que vivimos y por lo tanto de la realidad en la que estamos inmersos.

The Lighthill Parliamentary Debate on General Purpose Artificial Intelligence

Posted in General on February 24th, 2011 by Hector Zenil – Be the first to comment

In 1973, Lucasian professor at Cambridge, James Lighthill, was asked by the British Parliament to evaluate the state of AI research in the United Kingdom. His report, now called the Lighthill report, criticized the failure of AI to achieve its grandiose objectives. He specifically mentioned the problem of “combinatorial explosion” or “intractability” of the discourse universe, which implied that many of AI’s most successful algorithms would grind to a halt faced with real world problems and were only suitable for solving toy versions.

The report was contested in a debate broadcast as part of the BBC series “Controversy” in 1973. The debate, which was on the proposition “The general purpose robot is a mirage,” was sponsored by the Royal Institute and pitted Lighthill against a team of researchers, including Donald Michie and Richard Gregory and led by the young AI researcher John McCarthy. The report led to the near-complete dismantling of AI research in England.

Here is the debate on YouTube (in 6 parts):

This debate goes to show that as recently as 1973 it was generally believed (as indeed it still is in some quarters), that one had to write a complex program to obtain complex behavior, or that no simple rule could produce complex behavior (2nd. part, min 7:45). This position was epitomized by Lighthill.

Lighthill’s position does not come as a surprise. He was, after all, a researcher in fluid dynamics and aeroacoustics, where it is easy to be misled by complicated differential equations involving ‘continuous’ variables and where nonexistent solutions arise so often. His main argument was that because one had to specify the rules in a computer to tell the robot how to behave in every possible scenario, every attempt to come up with a general purpose robot would quickly turn out to be an intractable problem, with a combinatorial explosion of possible solutions. I don’t entirely disagree with Lighthill on this, but I can by no means endorse his conclusion.

Unfortunately the response of the researchers on the panel was rather weak, except for a couple of minor arguments put forward by McCarthy to what seemed to be fundamental impediments that Lighthill appeared to be invoking. From my point of view, one of the items that should have been put on the table ought to have been a serious discussion of whether universal computation actually had something to tell us about the possibility of general purpose artificial intelligence (i.e. what the abstract could say about the concrete). Also the question of the essential differences, if any, between advanced automation and artificial intelligence, which McCarthy seems to have broached in one of his arguments against Lighthill, without reaching any conclusions. Indeed the point may be that there is no essential difference, which is something that was perhaps difficult to see back then. But instead of considering this as a caveat against AI, as Lighthill seemed inclined to do, it actually makes AI inevitable in the event, even if achieved by means other than those traditionally employed in AI labs.

If automation is not really that different from AI, as I think has been proven over the years, what this suggests is that automation will eventually lead to what we think of as intelligent behavior and therefore to artificial intelligence.

In the end, Lighthill’s position was reduced by the moderator (Sir George Porter) to an argument about pragmatic difficulties rather than about the fundamental impossibility of the project, in response to which Lighthill quipped that his neuroscientist friends had made it plain that they felt it was hopeless to try and model the brain using a computer. I find several problems with this position, beginning with the assumption that AI is exclusively about trying to mimic the brain.

Donald Michie made a good point concerning the combinatorial explosion made so much of by Lighthill by citing the example of chess. In 1973 humans were still better at playing chess, so he asked whether Lighthill would acknowledge artificial intelligence if a machine performing an exhaustive search turned out to win a chess match against a Master.

In 1968, David Levy was an international Master and he bet a thousand pounds that no computer program would beat him in the next 10 years. He won his bet in 1978 by beating Chess 4.7 (the strongest computer at the time). Lighthill responded that a chess program would avoid the combinatorial explosion by virtue of having the benefit of human knowledge. In any case, he said, he was reluctant to believe that a chess program would ever beat a human chess player. Lighthill was cleverly hedging his bets.

Lighthill’s position wasn’t totally clear at the end of the debate–was his argument purely pragmatic or did it concern something fundamental? I’m not wholly unsympathetic to his position because AI has been misleading as a field. While it has certainly made some contributions to science and technology, it has achieved what it has by dint of technological improvements (hardware capabilities) rather than scientific breakthroughs (e.g. a set of sophisticated new algorithms) on the part of the AI community. And I have explained this in some detail in my previous blog post on the IBM computer Watson in the Jeopardy! game Is Faster Smarter?

Is Faster Smarter? IBM’s Watson Search Engine Approach to Beat Humans

Posted in General, Minds and Machines on February 18th, 2011 by Hector Zenil – 4 Comments

IBM’s computer named “Watson” has beaten Jeopardy! (human) contestants in a series of games this month. IBM has a long history of innovations (watch this other 100th anniversary documentary featuring Greg Chaitin and Benoit Mandelbrot, among others here).

Not everybody was impressed by Watson though, according to Gavin C. Schmitt who interviewed Noam Chomsky, the recognized linguist:

  • Noam Chomsky: I’m not impressed by a bigger steamroller.
  • Interviewer: I assume that “a bigger steamroller” is a reference to Deep Blue. Watson understands spoken language and adapts its knowledge based on human interaction. What level of AI would be required to impress you?
  • Noam Chomsky: Watson understands nothing. It’s a bigger steamroller. Actually, I work in AI, and a lot of what is done impresses me, but not these devices to sell computers.

In some sense, I understand Chomsky’s view and he does well pointing out what may seem the clearest difference between Watson and a human being, but the point may be much more subtle and deeper. Wolfram’s Principle of Computational Equivalence (PCE) (see this previous post) may shed some light on this subject. According to this principle, memory is what makes a system intelligent, because basically any system that does not behave in an obviously trivial fashion will likely be as sophisticated as the most sophisticated system. So what is the difference between a system that is potentially intelligent and one that shows signs of actually being so? It is somehow a matter of scale in several directions. Take the example of weather. When Wolfram is asked whether clouds are intelligent according to his principle, his short answer is yes. Every time humans want to make a prediction about whether it will rain, it turns out to be incredibly difficult to do so for more than a couple of days, and often the weather forecast is wrong even for the next day. How is it that weather prediction is so hard despite our long experience at weather forecasting? Well, clouds are computing themselves, and as part of weather they are quite complex, as complex as systems like the human brain, says PCE.

Picture of Gregory Chaitin taken by Hector Zenil outside of the IBM Research center at Yorktown, N.Y. where Watson was designed.
(Picture of Gregory Chaitin taken by Hector Zenil outside of the IBM’s
Thomas J. Watson Research Lab at Yorktown, N.Y. where Watson was designed)

After all these years, IBM hasn’t come up with a theoretical breakthrough to meet the challenge but rather with a supercomputer fast enough to beat the other participants. Watson uses fairly sophisticated algorithms, but these aren’t that much more sophisticated than those used by search engines, which proceed by matching pieces of text and retrieving other pieces of text statistically related to the original. The IBM team has come up with a super-computer to challenge the Jeopardy! participants and not with a new algorithm. The main goal of machine learning labs is usually to perform about 1% to 3% better than the best benchmark of the top lab in a particular area (e.g. word tagging, word disambiguation, information retrieval, etc.). If Watson has achieved anything like a breakthrough, it may be credited with having advanced the current state of AI research. It takes Google’s kind of technology a couple steps further by having drawn together several technologies on steroids. The point is that one may not need to come up with the smartest algorithm– because one may not be able to, simply because it doesn’t make sense to engineer a super complicated algorithm to reach intelligence, it being more appropriate to start from a simple but potentially powerful system, and then extract the best of it by running it on a super large corpus of data on a super fast computer. The Watson- in-Jeopardy! experience tells us that even clouds may look intelligent when running on a supercomputer.

Watson confirms what I’d suspected, that we’re not that special after all (in this sense). Watson meets the challenge of beating a human on its own turf and at what it does best basically through the use of brute force. Achieving artificial intelligence (AI) is not, as I suspected (among other thinkers), a matter of science breakthrough but rather a matter of scale and technological achievement. Over time we’ll have faster and faster computers, and that means computers with intelligence resembling ours (of course there are other subtle issues here, such as the fact that the system should be allowed to interact with the intelligent forms it is meant to interact with, otherwise its intelligence may prove alien to ours, and look as strange as that of clouds).

Jean (Michel Jarré with his electronic harp.<br />
Picture by Hector Zenil, Paris concert 2010)
(Jean Michel Jarré with his electronic harp.
Picture by Hector Zenil, Paris concert 2010)

Wired published (when they used to publish interesting articles more often) an interesting article back in 2009, which reached the conclusion that one could exchange data for models: The End of Theory: The Data Deluge Makes the Scientific Method Obsolete.

Some interesting inferences can be drawn from this milestone where IBM supercomputers have beaten humans at human tasks (remember this is the second time; at least with such a publicity, the first saw IBM’s Deep Blue beat Garry Kasparov, the reigning World Chess Champion at the time, in a series of chess matches). Among these we may single out the fact that we humans seem ready to call machines smart and intelligent even though they may not necessarily think like us (humans can only see ‘ahead’ a handful of selected chess movements while chess programs perform an exhaustive search only bounded by time), despite seeming as clever as we are. This is already a vindication of Turing’s proposal for a test of intelligence.

Yet Chomsky’s opinion seems to point in the opposite direction, that we may still think we are much more special than Watson, and he might be in some sense right. We definitely play very differently than machines, but is chess that different from natural language? It may be, but this time the question may be whether what Watson does at playing is that different from what we do.

At the end of the Deep Blue vs. Garry Kasparov, Kasparov pointed out that he felt that the machine actually had a strategy and something that made him think that the machine was actually playing like a human, somehow perhaps even teasing him. Ken Jennings (one of the two human participants against Watson) wrote a day after the match: “This is all an instant, intuitive process for a human Jeopardy! player, but I felt convinced that under the hood my brain was doing more or less the same thing.”

Some people think that Watson has absolutely no ability to understand what it did, or any awareness about it, and that still makes the difference. This might be only partially true. Watson may not understand or be yet aware of its achievement, but I wonder if the same processes that made Watson to win this match are not the same that may be in action for even more sophisticated human thinking, such as self-awareness. But the question can also be reversed and one may also ask whether we are really aware of what happened, and how much of our feeling of being aware is the result of the type of thinking that Watson just exhibited.

As many of my readers certainly know, Alan Turing came up with a basic test suggesting that if something looked intelligent then it was intelligent. So we have reached the point at which not only has a machine passed the Turing test but also humanity may be ready to accept the idea behind the test, that is that it doesn’t really matter how something thinks if it looks clever enough to fool us (or even beat us at something) then it is intelligent. For the machine, the milestone is located at a point in time that reflects the current state of technology, which basically amounts to the miniaturization and mastery of computing devices, as well as the management of very large quantities of data, not forgetting the current state of fields involved (basically machine learning and natural language processing or NLP). But it is by no means an isolated achievement. I see IBM as the standard bearer for a combination of several current technologies run on the best hardware available today, a pioneer in this sense.

Wolfram|Alpha computational engine control room. Lunch day picture by Hector Zenil.
(Wolfram|Alpha computational engine control room.
Launch day picture by Hector Zenil)

It seems we are now able to gauge the size and power of the human brain as against something that looks as sophisticated as we are, at least at playing a sophisticated game. Watson and humans may reach the same conclusions, whether or not they do so in different ways, but the fact that Watson requires a computer the size of 10 full-size fridges, 15-terabyte of memory (likely full of data and programs) and 2,880 microprocessors working in parallel tells us more about us than about Watson itself. We knew we carried a supercomputer in each of our heads but we didn’t know what its specs may be. We also thought that the were specially gifted with unprecedented intelligence but now a machine that is certainly not aware of it and hasn’t taken the same path is also able to exhibits key features of intelligence.

Jen Kennings adds after the last match: “…But unlike us, Watson cannot be intimidated. It never gets cocky or discouraged. It plays its game coldly, implacably, always offering a perfectly timed buzz when it’s confident about an answer.” “…I was already thinking about a possible consolation prize: a second-place finish ahead of the show’s other human contestant and my quiz-show archrival, undefeated Jeopardy! phenom Brad Rutter.” Read more here and here.

Watson specs will fit in a small box in the future–given the trend of the past several decades following Moore’s law– and in the future as it is the case today, faster will be smarter.

Aired on PBS, NOVA called their documentary The Smartest Machine On Earth:

Watch the full episode. See more NOVA.

Alan M. Turing documentary teaser

Posted in General on January 28th, 2011 by Hector Zenil – Be the first to comment