Evaluating the complexity of a living organism by its algorithmic complexity

One of the greatest scientific achievements of the last century was the understanding of life in terms of information. We know today that the information for synthesizing the molecules that allow organisms to survive and replicate is encoded in the DNA. In the cell, DNA is copied to messenger RNA, and triplet codons in the messenger RNA are decoded in the process of translation to synthesize polymers of the natural 20 amino acids.

Humans have been intrigued by the origin and mechanisms underlying complexity in nature coming from information contained in repositories such as the DNA. Darwin’s theory of evolution suggests that this complexity could evolve by natural selection acting successively on numerous small, heritable modifications.

Darwin’s theory represents a great leap forward in our understanding of the fundamental processes behind life. However, there is a tendency to assume that evolution os the sole factor in designing nature while it may not actually be the main driving force behind the complexity of living organisms [If you wish to know more about the theory of evolution by means of natural selection, three respectable British institutions have set up special websites in celebration of Darwin’s 200th. anniversary: the University of Cambridge (with the original scanned text and even an audio version in mp3 format), the Open University and the BBC].

Nature seems to use a specific toolkit of body features rather than totally random shapes. Like units of Lego, Nature assembles its forms from a limited set of elements. For example, despite the variety of living forms on the Earth, they do all seem to have a front-to-back line down the center of the body, and extremities (if any) on the sides, from flies who have a head at one end and a tail at the other, to worms, snakes and humans. Despite the randomness that may undermine any shared regularity among all animals in combinatoric terms, on a certain level, from a certain perspective, we are all similar in shape and features. Why didn’t evolution attempt other, completely different forms? And if it did, why were so few of them successful? Given the improbability of  several other shapes having been put into circulation without any of them winning out save the ones we all know, we could conclude that evolution never did attempt such a path, instead keeping to a small pool of tried and tested basic units whose survival has never been in jeopardy. There are some symmetries and general features that many animals share (more than can be explained by inheritance) that are not so easily explained in purely evolutionist terms. A remarkable example is the resemblance of all animals in their embryonic phase.

Two teams of biologists (Walter Jakob Gehring and colleagues at the University of Basel, Switzerland, and Matthew Scott and Amy Weiner working with Thomas Kaufman at Indiana University, Bloomington) seem to have independently discovered toolkits that Nature appears to use that they have called homeobox containing genes.

This discovery indicates that organisms use a set of very simple rules passed along to them (thus reducing the amount of randomness involved) to build a wide variety of forms from just a few basic possible body parts. To oversimplify somewhat, one can for instance imagine being able to copy/paste a code segment (the homeobox) and cause a leg to grow in the place where an antenna would normally be in an ant.

This begins to sound much more like the footprint of computation rather than a special feature characterizing life, since it turns out that a few simple rules are responsible for the assembly of complex parts. Moreoever, this is consonant with what in Wolfram’s scheme of things life’s guiding force is said to be, viz. computation. And with what Chaitin has proposed as an algorithmic approach to life and evolution, as well as with my own research, which is an attempt to discover Nature’s basic hidden algorithmic nature.  All the operations involved in the replication process of organisms– replacing, copying, appending, joining, splitting–would seem to suggest the algorithmic nature of the process itself. A computational process.

Based on my own research interests it is my strong belief that though by no means wrong, Darwin’s theory of evolution belongs within a larger theory of information and computation, according to which life has managed to speed up its rate of change by channeling information efficiently between generations, together with a rich exchange of information with the outside by a process that while seemingly random, is in fact the consequence of interaction with other algorithmic processes.

Think a bit further about it. Evolution seems deeply connected to biology on Earth, but as part of a larger computation theory it might be applied anywhere in the universe just as the laws of physics do. Evolution may be formulated and explained as a problem of information transmission and channeling, pure communication between 2 points in time. If you want to efficiently gather and transmit information it may turn out that biological evolution may be not the cause but the consequence.

The theory of algorithmic information (or simply AIT) on the other hand does not require a random initial configuration (unfortunately perhaps, nor any divine intervention) to have a program, when run, produce complicated output. This is in keeping with Wolfram’s finding that all over the computational universe there are simple programs with simple inputs generating complex output, what in NKS terms is called ‘intrinsic randomness’, yet is purely deterministic. Nor does AIT require the introduction of randomness during the computation itself. In other words, it seems that randomness plays no necessary role in producing complex organisms. Evolution seems to underlie change, its pace and direction, but it does not seem to constitute the driving force behind life.

Evolution seems to be taking advantage of the algorithmic properties of living systems to fabricate new forms of life. To facilitate understanding of these body patterns the University of Utah has set up an illustrative website. Incidentally, this genetic toolkit based on the homeobox concept is surprisingly well captured in the Spore video game.

In a recent article Greg Chaitin has proposed (Speculations on biology, information and complexity) that some of the properties of DNA and the accumulation of information in DNA may be better explained from a software perspective, as a computer program in constant development. When writing software, subroutines are used here and there all the time, and one usually creates an extra module or patch rather than rewrite a subroutine from scratch. This may correspond to what we see in DNA as redundant sections and ‘unused’ sections.

In Chaitin’s opinion, DNA is essentially a programming language for building an organism and then running that organism. One may therefore be able to characterize the complexity of an organism by measuring the program-size complexity of its DNA. This seems to work well for the length of DNA, since the longest known sequence of DNA belongs to what is certainly the most sophisticated organism on this planet, i.e. homo sapiens.
Chaitin proposes the following analogy:

program -> COMPUTER -> output
DNA ->
DEVELOPMENT/PREGNANCY -> organism

However, we encounter problems when attempting to view the process of animal replication in the same algorithmic terms. If, as the sophistication of homo sapiens would suggest, human DNA is the most complex repository of information, and given that DNA represents the shortest encoding capable of reproducing the organism itself, we would expect the replication runtime of human DNA to be of the same order relative to other animals’ replication times. But this is not the case. A gestation period table is available here. So what are we to make of the fact that the right complexity measure for living beings (the logical depth of an object as the actual measure of the organizational complexity of a living organism) does not produce the expected gestation times? One would expect the human gestation period to be the longest, but it is not.

Charles Bennett defined the logical depth of an object as the time required by a universal computer to produce the object from its shortest description, i.e. the decompression time taken by the DNA from the fertilized egg of an animal (seen as a universal computer) to produce another organism of the same type. There seems to be more at stake, however, when trying to apply the concept to Chaitin’s replication analogy– issues ranging from when to determine the end of the replication (the gestation period?), to better times to give birth, to gestation times inherited from ancestral species, to the average size of organisms (elephants and giraffes seem to have the longest periods). Some hypotheses on period differences can be found here for example.

If living organisms can be characterized in algorithmic terms as we think they can, we should be able to introduce all these variables and still get the expected values for the complexity measurement of an organism– seen as a computer program–reproducing another organism from its shortest encoding (the DNA being an approximation of it). A complete model encompassing the theory of evolution has yet to emerge. It seems to be on the horizon of AIT, as another application to biology, one that provides a mathematical explanation of life.

In summary:
So far, what we know is that DNA is the place where the information for replicating an animal is to be found. What’s being proposed above is that the information content in the DNA can be actually measured and effectively approximated as a distance measure of the complexity of an organism. If one can quantify these values one could, for instance, actually quantify an evolutionary step in mathematical terms.
Also, evolution is not usually seen as part of a computational theory, but as an special feature of life. I think otherwise.
Randomness has hitherto been thought to play a major role in evolution as it is mutation that drives the evolutionary process. But I suggest that this is not the case. It is just another part of the deterministic computation, as algorithmic information theory suggests.
Finally, evolution has been thought of in terms of very small steps rather than building blocks and building over them as other scientists have found (which would explain why the theory of evolution has been bedeviled by questions which have not thus far been satisfactorily answered). This favors my computational view of the process of life, because it is based on what in software technology is seen as a subroutine orientation programming paradigm.

In summary:

  • So far, what we know is that the DNA is the place where the information for replicating an animal is to be found. What’s being proposed above is that the information content in the DNA can be actually effectively approximated by means of its program-size complexity and logical depth to define a measure of the complexity of an organism. If one can quantify these values one could, for example, actually quantify an evolutionary step in mathematical terms. This would represent a first step toward encompassing Darwin’s theory of evolution within an algorithmic mathematical theory of life. Evolution is not usually seen as part of a computational theory, but as a special feature of life. The above suggests otherwise.
  • Randomness has hitherto been thought to play a major role in the evolution of species, as it is mutation that drives the evolutionary process. But I suggest that this is not the case. Rather I suggest that what appears to be random is actually part of a deterministic computation, which means that randomness plays no significant part in the process, while computation does.
  • Finally, evolution has hitherto been thought of as a process that advances by very small steps, rather than one that is capable of quickly building over blocks of code, as it might be actually the case. This new understanding favors the computational view I am putting forward here as playing a main role in the process of life, because it is based on what in software technology is the practice of a subroutine orientation programming paradigm: code reuse.
  1. Hola Héctor,

    Ojo: la complejidad de un recién nacido humano parece menor a la de un caballo o ballena recién nacidos… La complejidad humana se adquiere no sólo por DNA, sino también por la cultura, i.e. 20-30-40 años… la diferencia es que nuestros cerebros pueden seguir “computando”, acumulando nueva información, mientras que otros animales después de e.g. 10-15 años (elefantes) ya no aprenden mucho… (similar a nuestros ancestros hace unos pocos milenios…)

    Saludos,
    Carlos

    http://turing.iimas.unam.mx/~cgg/

  2. Hector Zenil says:

    Carlos,

    La capacidad del ser humano a aprender más tendría que reflejarse de alguna forma en la codificación del organismo, sino uno no podría explicar cómo del ADN, sobre todo no variando mas que en una mínima fracción con respecto a otros animales, genera un resultado tan diferente (elefante vs. humano vs. chimpancé). Me parece que tanto en complejidad algorítmica como en profundidad lógica deberían encontrarse diferencias substanciales, más pronunciadas incluso que la aparente mínima diferencia entre genomas. Es cierto que un elefante tal vez llegue a un máximo de aprendizaje después de 10 años y me pregunto si, un niño de 10 años que creciera en las mismas condiciones que el elefante parecería igual o más inteligente. Supongo que es lo último pero no estoy seguro, habría que ver esas personas adultas que encuentran abandonadas desde niños en el bosque.

    La pregunta tal vez podría refomularse de la siguiente forma: Supongamos 2 máquinas universales de Turing, la única forma de que ambas computen a la misma velocidad y alcancen el mismo máximo en el mismo tiempo, es si cuentan con exactamente los mismos recursos desde el principio. En otras palabras, me parece que el humano tiene acceso a mayores recursos desde un inicio (básicamente tamaño del cerebro) que debería poderse cuantificar cuando se le evalua su complejidad teniendo ambas máquinas en condiciones iguales y en cualquier tiempo dado (expuestos a los mismos estímulos exteriores, o ‘cultura’ usando el término que utilizaste). Me parece que la respuesta al menos a la evaluación de la profunidad lógica está relacionado a este problema que describí como cómo saber cuándo el cómputo se ha detenido, y como quizás dije: el tiempo de replicación no está bien definido, y tal vez tendría que tomar en cuenta el tiempo de aprendizaje a partir de estímulos externos, pero insisto en que ello no debería cambiar el valor original de la complejidad del organismo en tanto capaz de alcanzar dicho máximo de aprendizaje.

    – Héctor

  3. Es que la complejidad depende del contexto en el que la midas, e.g. las ratas tienen mejor sistema de navegación que nosotros, hay animales que tienen mejor memoria de ciertos tipos por que de ello depende su superviviencia… la complejidad metabólica sería otra cosa que se podría medir.

    Yo creo que en el DNA no encontrarías mucho sobre las capacidades actuales de los humanos, tal vez sólo acerca de su potencialidad, pero así como no puedes deducir acerca del estado final de ciertos CA antes de correrlo, no creo que puedas predecir la complejidad del comportamiento humano a partir de la complejidad de su DNA. ¿Por qué? Por que la complejidad viene de su interacción con su medio ambiente complejo, i.e. se genera nueva información epigenéticamente…

    – Carlos

  4. Hector Zenil says:

    Creo que la diferencia de nuestros puntos de vista pueden remitirse a la diferencia entre lo que suele identificarse como Complex Systems vs. Computational Complexity (ambas algorítimica y de tiempo de cálculo). La primera intenta justificar la complejidad de un sistema por su interacción con el medio y por lo tanto dependiente del contexto. El sistema es tan abierto que dificilmente es cuantificable.

    Entre menos cuantificable mejor para el área de Complex Systems ya que se concentra en el resultado del sistema mismo de donde vienen términos como emergencia, autoorganización, inteligencia de enjambre, etc.

    Computational Complexity es una disciplina más tradicional desde mi punto de vista en el sentido que intenta cuantificar la complejidad de un sistema, sin que ello signifique que la asuma predecible o computable, de hecho los resultados de irreducibilidad dura y de no computabilidad vienen del área que estoy llamando Computational Complexity).

    Desde mi punto de vista (el de Computational Complexity) el problema residiría en que la descripción más corta de un organismo como el humano podría (como seguramente es el caso) no ser el ADN, y la gestación no el mejor decompresor posible. Es por ello también que la evaluación de la profundidad lógica es importante, porque toma en cuenta el tiempo del desarrollo de la replicación a partir de su descripción y no sólo el tamaño del ADN (que ya de por sí coincide con lo que uno esperaría de acuerdo a su complejidad algorítmica, siendo el ADN humano el más largo hasta ahora conocido). Para resolver estos detalles uno puede introducir una función probabilística que favorice los programas cortos pero no necesariamente el más corto (como el del ADN), ello porque podría darse el caso de un intercambio no lineal de longitud del programa y tiempo de ejecución, en otras palabras el programa más corto podría resultar el más ineficiente, mientras que el segundo más corto podría resultar más eficiente (en términos ambos de complejidad algorítmica y tiempo de cómputo). La relación entre talla y tiempo, i.e. la relación entre las 2 medidas ‘duras’ de complejidad es también de mi interés pero si entiendo tu posición, no sólo el ADN sino nada en el embrión de un humano podría indicar su complejidad.

    Creo que mi ejemplo de las 2 máquinas de Turing es todavía relevante en este sentido ya que nada impide cuantificar la complejidad de ambas (en términos de sus descripciones más cortas y tiempo de repliación), sin embargo en ningún momento asumo que el problema de la detención es soluble, o que incluso la cuantificación es de hecho computable, pero si efectivamente aproximable.

    Es algo irónico que 2 áreas de complejidad que muy frecuentemente son mezcladas de hecho tengan paradigmas tan distintos, el área de Complex
    Systems como seguramente estaremos de acuerdo se basa en el no reduccionismo de un proceso, es decir, el dictum de que ciertos (¿la mayoría?) no puede reducirse a la suma de sus partes, y por otro lado, el área de Computational Complexity que intenta cuantificar la complejidad de un objeto reduciéndolo a su descripción más corta y al tiempo que le toma a un sistema regenerarse a partir de dicha descripción.

    – Hector

    • Hector Zenil says:

      Blog post exchange from Facebook:

      Russell Foltz-Smith: Gestation is not inclusive of the full unraveling of genetics… Creation of a organism. Probably could map out a table by fertilization to full adulthood….
      also debateable that humans are the most complex organism. How does one make such a determination?

      Hector Zenil: Agreed on the first comment—that’s the reason I put ‘(the gestation period?)’ in parentheses. I acknowledge that it’s quite difficult to determine the halting state of the replication process of an organism. One approach may involve setting a time for all and seeing how far each goes, though there may be other issues to take into consideration.

      On the other hand, ‘complexity’ as a term has been rather carelessly used in the field of Complex Systems IMHO. Some comments recently made in the blog post by Carlos Gershenson suggest that there are many types of complexity, and that one has to make a point of calculating complexity at different levels, e.g. ‘metabolic complexity,’ etc. However, as regards algorithmic complexity, there is only a single complexity, and that is the organizational complexity of a system. It is applied to the shortest description of an object capable of reproducing the object itself, and then looking at how long an optimal universal decompressor takes to reproduce it.

      I don’t think there is any reason not to think of the human being as the most complex organism on this planet. But the actual criteria for determining its complexity, as I am proposing in my blog post, is its algorithmic organizational complexity. Based on its compressed DNA length, algorithmic complexity says it is the most complex, yet to see if measuring its logical depth, I have no doubt though, that it is just a matter of devising the right framework within which to do so.

      • Hector Zenil says:

        Russell Foltz-Smith: as a curious side topic… how do epigenetics fit into your model?

        “Among some possible criteria: it has no natural predator, it is capable of killing any other organism even destructing the planet and everything on it (yes, including cockroaches), and the human itself. No other animal has developed a written language or… Read More has been able to produce technology and is able to spread his genome beyond its home planet (not even the cockroaches =))”

        these details in of themselves do not suggest highest complexity. or rather the necessity of complexity. could we not devise a more simple organism that does those things… or do you think an organism would have to cross a certain complexity threshold to do that?

        Hector Zenil:

        On epigenetics, the proposal is to measure the complexity of a species. Only information transferable through DNA is taken into account. Changes or features of particular individuals are another matter, and it is clear that calculating the algorithmic complexity of the DNA won’t help to capture particular features, nor will it help to capture the diversity among individuals as social and cultural beings.

        I don’t think there is a threshold to cross to achieve human-like complexity. It is just that the little jump we’ve achieved in evolutionary terms has had a non-linear impact on the outcome. The fact that we are able to accumulate information has made us collectors of complexity. I do think, however, that for an organism to be able to do what I have described above, it cannot significantly shortcut the process, and any artifact that we may create which is capable of doing such things would inherit much of our human complexity for archiving such a task.

        I think it’s legitimate to ask whether a complexity measure is suitable for measuring the complexity of an organism, the human organism in particular (or any other object or phenomenon for that matter). And I think that Complex Systems provide interesting insights into and explanations of an outcome that would otherwise seem impossible to explain. But the key concept here when talking about algorithmic complexity is the concept of information content. Here the DNA is the repository where the information for human replication is stored. If you place a chimp in the same cultural context as a human you don’t get a human, you still get a chimp. And it is what algorithmic complexity measures. In these terms, I think the human is the organism that stores more information than any other organism, as the length of its DNA would suggest. The organizational complexity (logical depth) of an organism assures that such information is actually useful information, and not junk that an organism may have accumulated. So that’s why it is also important to evaluate the decompression time of the replication process, and why I talk about it (among other things) in the blog post.

        I just realized that some posts ago I was calling women ‘optimal universal decompressors,’ apropos of the fact that they are capable of reading DNA and replicating a human being. Please don’t take offense. It was actually meant as a compliment.

  1. There are no trackbacks for this post yet.

Leave a Reply