Archive for December, 2006

International Union of History and Philosophy of Science Conference on calculability and constructivity.

Sunday, December 3rd, 2006

A Conference of the International Union of History and Philosophy of Science (Joint Session of the Division of Logic, Methodology and Philosophy of Science and of the Division of History of Science and Technics) was held in Paris at the Ecole Normale Supérieure on November 17-18th 2006. It was organized by Jacques Dubucs. Appointed president of the joint conference by the IUHPS,  he is also the director of the IHPST and  was   my master’s thesis adviser.

The conference began with an interesting talk  by Yiannis Moschovakis, professor of math at UCLA, about elementary algorithms, which  according to him are usually defined in terms of mathematical models of computers with “unbounded memory”. His talk was based on his article entitled “What is an algorithm?” –a very fine piece, in my opinion–included in Mathematics Unlimited – 2001 and Beyond,  edited by B. Engquist and W. Schmid and published by Springer in 2001 (also available online). Such definitions, he argued, do not coincide with our basic intuitions about algorithms. He provided examples of definitions that eschewed the use of  abstract machines. For instance McCarthy, with the concept of recursive programs, or elementary (first-order) algorithms. He took the common example of the GCD and asked whether the Euclidean algorithm was optimal among all algorithms for some primitive (There have been some results on upper bounds of the form Ce(x,y)=2log and definitions based on partial and pointed algebras). Then he asked about isomorphism between algorithms and defined what he called a “recursor”—a tuple of monotone functions on complete posets which determine a system. It  models an algorithm. A natural notion of recursor isomorphism models algorithm identity (which he seems to believe is the finest notion of an algorithm). He went through concepts like “program congruency” and a related theorem (which is easily demonstrable for McCarthy programs) : Every program E is reducible to a unique – up to congruence – irreducible program cf(E) [where cf(E) means canonical form] which is the optimal algorithm. And then he continued through more abstract terms such as “referential intension” and “partial algebras,’ which are covered  in his paper . He defined a complexity measure which he called “complexity functions for programs” where the basic idea consists in counting the number of calls to some primitives. For example, the number of primitives q1,q2,…,qk in the computation of an algorithm M(x,y1,…,y_k) using E (a McCarthy program) where x can be a vector as input. He talked about a depth complexity measure too and stated a Church-Turing thesis for elementary algorithms which would be as follows:Every algorithm which computes a partial function f:A^n->A from q1,…,1k is faithfully represented by a recursive program on some logical extension B of a partial algebra A=(A,0,1,q1,…,qk). On the basis of this assumption it would seem  that the lower bound results obtained by the embedding method hold for all elementary algorithms.

After Moschovakis, Prof. Serge Grigorieff  of the Math Dept. of  the University of Paris VII gave a talk on foundations of randomness in terms of recursivity, which was of particular interest to me given my current research on the topic. According to him, there is no formal notion of a random object in probability theory; random variables are entities having nothing to do with random objects. He talked about  Berry’s paradox as instanced by the question “What is the smallest number that needs at least n words to specify it, where n is large” or by the phrase “the first undefinable ordinal” and about a solution replacing “specify” or “describe” with “compute”. Then he went through traditional definitions by Kolmogorov: K_f(x)=min{|p|:f(p)=x} where f can be interpreted as a computer or compiler, and p the programming language with no input and x of course the object whose complexity is to be measured. In terms of compressibility, if |x|=x then x is random and if K(x)=|x|-c then x is c-incompressible. We know of course that the problem is in general non-computable.
Following that Prof. Grigorieff stated the invariance theorem, which basically says that  f (the computer program) varies by a constant. The problem, according to Kolmogorov(1964) himself,  lies with these constants, since “reasonable” optimal functions will lead to complexity estimates. In 1965 Martin-Lof gave another equivalent definition: x would be c-random id Delta(x)<=c, i.e. x passes statistical tests with significal level c. We know then that incompressible=random, from Kolmogorov’s and Martin-Lof’s work. Thus if the size of x is equal to the size of p, which is the smallest program which produces x,  x would be random. Next he cited Chaitin’s concept of the  prefix-free domain, and finally pointed out the equivalence between  them all. However all these definitions are weak in some fashion. Take for instance the notion of invariance ( “reasonable” variance if you prefer) or fragility:  a random string a0,a1,… would be random but the same with some regularity inside, like a fixed number in particular positions ( a0,0,a1,0…), wouldn’t be more so. Kolmogorov extendeded his own idea to infinite objects but it did not work. Martin-Lof’s random sequences satisfy. I found some ideas related to Schnorr, Martingale and Solovay relevant to the concept of irreducibility. Michael Detlefsen, philosophy Professor at the Univ. of Notre Dame, USA, gave another interesting lecture, more philosophically oriented, in which he merged discussion of construction, computation, exhibition and proofs. He made some remarks on the concept of proof : A proof is a sequence of judgment. A proof needs to use reference to judge. We moved on to something that caught my attention since it is a topic to which I have given considerable thought and  is related to the role of mental or graphic representation of proofs and proofs that need more powerful tools and a higher language to prove a statement posed in a lower and less powerful language. For example, according to Detlefsen, Frege  (Frege on the existential requirement of proof) pointed out that the use of objects like the square root of -1 in proofs for Real Analysis would be immediately seen as distractors; some proofs use the sqrt of -1 when the magnitude does not occur obviously in the real analysis theorem to be proven.  Such proofs would collapse if the number 1 were to be taken out. From my point of view the use of sqrt of -1 and equivalent cases can have even more implications : they could mean that sometimes proofs use stronger axiomatization in order to force a proof in a less powerful  statement in a less powerful axiomatization (I am concerned, for example, with the case of Fermat’s last theorem, but that is a subject for a separate post). According to Frege (who was not talking about Fermat’s last theorem, of course) we import something foreign into arithmetic. Gauss himself asked the same question about the significance of this foreignness of symbols and even more powerful tools. A concern with “purity” played a large part in Frege’s logicism. These questions seem to have engaged everyone from  Proclus to Leibniz to Frege. They can be found in papers about reference and rigor. Following this line of thought, in math, figures do more than simply “refer” to the objects they represent. In geometry objects are represented by entities  of the same kind– line by lines, and circles by circles– but in algebra the use of signs to represent objects avoids  explicit reference. According to Hobbes, the prover maintains contact with the reality to be proved by exhibiting it, that is, by manifestly generating it through a process of efficient causation. According to Francis Maseres, visually exhibiting the objects of geometrical reasoning increases rational confidence in its premises.

Another interesting lecture was given by Maurice Margenstern, Computer Science professor at the University of P. Verlaine, Metz, France on the “Computer Science Revolution”. The title did not really reflect the rich content of the talk which brought home to me the  importance of two concepts almost completely ignored in computer science and which seem to be of fundamental and foundational value. We have heard a lot on the equivalences between proof, algorithm and program (Curry-Howard for example and the concept of Currying). However, an algorithm is a project of execution while a program is the execution itself. According to Margenstern, time is the key concept both in an algorithm and a program (and of course the arrow of time–my contribution). The “equals” in a proof or in a program often means “compute” and what is merely a description of something to do becomes an actual computation. For example, replacing “=” for “:=”, a non symmetrical relation, introduces the role of time in computation.I have much more to say in this regard but I’ll do so in a separate post, since it is part of my personal view and pertains to the basic requirement of a computation, which includes the notion of time of course, but also the notion of the carrier and the medium, all of which are matters requiring in-depth analysis. Computation is often referred to in terms of an input, an output, and an intermediate process, but we will analyze what is involved in detail inside an actual computation, which from my point of view is inseparable from certain physical constraints.

Before the last lecturer I took some random notes: Computability is an epistemological notion. Constructivism refined by Martin-Lof. 4 features of finitism: a domain D is admisible if it is r.e.

The last lecturer was Wilfried Sieg, Professor in the Philosophy Department at Carnegie Mellon University. His lecture was based on his most recent paper “Church without Dogma : Axioms for Computability”. Prof. Sieg’s talk drew attention to some comments made by Goedel about Turing (in a letter to Martin Davis). According to Goedel, Turing’s work includes an analysis of the concept of “mechanical procedure”(I think it worth drawing attention here to Goedel’s so-called dichotomy, which casts doubt on the validity of Turing’s approach to the human brain).  An additional comment from me: In the history of calculability several terms have served as equivalents for the vague concept in the first part of Church’s thesis: “algorithm”, “effective procedure”, “verifiable by hand”, “computable function”, “effective function”, “feasible computor”, “mechanical procedure”, “finite combinatorial procedure”, “finite machine”and almost any permutation between them. According to Sieg there are two basic constraints for mechanical procedure:

– Boundness: There is a fixed bound on the number of configurations a computer can immediately reconize, and
– Locality: A computer can modify only immediately recognizable subconfigurations.  He talked about his own definition of a k-graph machine (equivalent to a string machine)–F: D->D (operation transforming states into states).He takes finiteness for granted, from which I deduce that he inclines to the view that  Church’s thesis has no content and is therefore neither a thesis nor a hypothesis. He presented a nice diagram in which he drew arrows from the concept of effective calculability to several computation systems like lambda-definability, general recursive functions and Turing machines.  All of them were proven to be equivalent. He gave us his definition of a computor, a human computer or mechanical device for instance, inspired in some way by Gandy machines as he himself expressed it, and he noted the two main constraints which, according to him, are basic to the definition of what is effectively computable. M=(S,T,G) is a T computor on S when S is a structural class, T a finite set of patterns and G a structural operation on G. Then he presented  a list of statements expressed in first-order logic which can be found in his original paper.

Meaning is in the word net: cyclic self-referential definitions, dictionaries and found in translation

Saturday, December 2nd, 2006

In the “Period 1 Cycle English Dictionary” published by “No way, Inc.” (it’s been said to be the most accurate dictionary ever) one can read:

dog (Noun) : a dog is a dog.

The lazy creators of this dictionary appear to have forgotten what is broadly accepted by common sense. A definition would not be a definition if it were cyclical and self-referential, simply because one would, theoretically, fall into an infinite loop trying to define an unknown word that has as part of its definition the word itself.

Let’s leave aside the fictional Period 1 Cycle English Dictionary, which I just made up, and consider how things work in a regular dictionary. Assume you want to know the definition of a word. Looking it up, you’ll encounter a set of ordered words in the language in question(if the dictionary is not a bilingual one). Let’s prove that in the end, any definition of a word w in a closed finite dictionary is cyclic with period p_w:

Proof sketch:

We may hypothesize that the dictionary (D) is closed under the operation “the definition of” (i.e. all words in the definition of a word have a definition in the dictionary). One can also safely assume that all words in D have a non-empty definition (given the definition of “dictionary” itself!) and that the dictionary is finite. Then if w is a word in D, the orbit of the successive definitions d_1,d_2,…d_n, … for w (listed as sets of words themselves) will end up eventually crossing the path d_w={d_1,d_2,…d_n, …} because d_w cannot be an infinite sequence without repeating elements in D– the finite nature of D would necessitate  coming back to the original word itself after a certain period p depending on the word, making every word w in the dictionary cyclic with period p_w. 

If all the definitions in a dictionary are then cyclic and enclosed in the dictionary itself, where does the analytic knowledge that a dictionary seems to provide come from? When one consults a dictionary in a given language, one already knows some words in that language, but assuming one doesn’t, would a dictionary be completely useless, as the issue of ultimately cyclic self-referential definitions seems to suggest?

One could conceive of a dictionary as a whole as a syntactical knowledge container with no actual knowledge in it– to someone who does not bring to it any knowledge of the relevant language. One may wonder how something so perfectly self-referential could actually be of use, as dictionaries seem to be. Is it indeed because you always know some, or indeed many, words of a language already when you consult a dictionary? For since every word is defined by other words in the same language, looking up the meaning of a given word would lead you to other words and yet others and eventually back to the first word, the word whose meaning you set out to learn. This would be the case even if  the dictionary were bilingual, and the meaning of the word you wished to check was given in a second language. Thus all dictionaries are perfectly circular, closed, self-referential sources.

However, the analytical knowledge in a dictionary does not come from the definitions as words, but from the word net underneath, where the words are connected in some, perhaps unique fashion(modulo exact synonyms) to each other. That’s what quite successful projects such as WordNet and functions like WordData in Mathematica are about. The power of being able to analyze the language as a net in a computable way is priceless for the progress of computational linguistics and linguistics in general.


2-level depth wordnet for the word "chair"

2-level depth wordnet for the word "chair"



For example, if “chair” is connected to “table”, “office”, “dining room”, etc. it should be easy to map it to its equivalent in any other language. Unless the target language doesn’t have the concept “chair” as referring to a seat placed at a table in a dining room (which was perhaps the case in some Asian countries before colonialism), together with an explicit cognitive representation of it.

Of course problems arise when considering the mappings between words having the same written form but different senses. The word “chair,” for example, is a seat, but may also mean the officer who presides at a meeting. Also posing a problem would be cases of words being mapped to or from words belonging to different parts of speech, such as “personne” in French, which  maps onto two completely different words in Spanish: “persona” and “nadie”,  a noun and an adjective respectively, with completely different connections and different supporting nets. So even when it seems that most relations might be surjective, the general case is certainly not bijective, and that applies to homonyms too, which often creates ambiguities in translation. However, the supporting network  would be able to uncover this fact and solve a possible ambiguity based on  context by extending the word network to encompass the ambiguity. In other words, if a subnet cannot be uniquely mapped, extending it should eventually solve the ambiguity. What one would need  is a corpus big enough to build such a network once and for all and then simply make comparisons at the network level. This could work even for completely new or unknown languages, either dead, living or artificial, assuming that they share a part of our actual reality and hence some part of our mental representations  (In a sense this is what Champollion did when he deciphered the Rosetta stone– he discovered a partial mapping of a subnetwork of words from an unknown language – Egyptian – to a subnetwork of a known one – Greek). In the final analysis, each language has a single unique network (changing slightly through time but remaining well connected and strong enough to make  it unique and recognizable while being isomorphic with that of any other language).  All languages could be identified by their fingerprints -their word net. This kind of analysis would identify the lack of a word net structure in hoax languages, such as perhaps the Voynich manuscript.

Having  established that, what about mining the world of all possible meanings, the world of all possible translations, and the world of all possible ideas? We wouldn’t have the problem of distinguishing between a coherent idea and a non-coherent one since the network would provide some minimal coherence. Thus the net-into-the -net approach would give us a way of translating from word to word and from phrase to phrase and from idea to idea as faithfully as possible in most cases, since in the end all of us as human beings share a single reality, though we perhaps approach it from different points of view.

Again, the analytical knowledge in a dictionary comes from the net connecting the words, so even if someone does not know English at all, I would say that he would be able, albeit with considerable difficulty, to learn English just by deducing the net connecting objects, in other words, by mapping his own mental  representations of objects onto words in the  English dictionary. In the process he could encounter some ambiguities, but the further he goes, the more of these he would be able to resolve. On the other hand,  speakers of those languages in which “chair” does not exist, both in the language itself and as a real object in the culture,   would be able to deduce what  a chair is by tracking its  relations with the objects they know and for which they do have mental representations and the phonemes to externalize them. So the problem of translation, which began with the mapping of word onto word and then phrase onto phrase  with statistical tools,  becomes, with this approach, a matter of  mapping net to net.  Indeed this seems to be the approach adopted  by Meaningful Machines.  

These ideas could be carried to the limit by taking the sum total of human languages and enquiring into the mapping between such a network and our cognitive representations. Such a move would provide grounds  for rebutting the Chinese room argument, since in the end it does not matter whether someone inside the room has no knowledge at all of a language; he would be able to map what he is mechanically translating onto his own mental representations, generating what, according to the argument, could not be generated: understanding. Because Searle’s  idea was, as I recall, to build up a case against A.I. in terms of the meaningless of the Turing test and true AI in general.

One may actually use a dictionary without knowing a single word in it! That is because there is a mapping between the word net in the dictionary and one’s own language, or even better, a mapping (not necessarily injective or surjective) between the word net of the dictionary and your cognitive personal word net.

Oversimplifying, translation might be reduced to the search for the homomorphism between algebraic groups, with each group G={W,d} being a language dictionary, with W the set of words in that language and d the closed operation “definition of”. One can then see each definition as an oversimplification of a directed, ordered graph g={w,s}, with w the set of vertex words involved in the definition and s the ordered (since definitions may be not commutative) set of edges for an ideally formal- enough language dictionary.

Bad practices of cyclic definitions in a dictionary should rather be expressed as the practice of period 1 cycles, i.e. words that have in their  immediate definition the word itself.

This post is  related to a previous post titled “Meaning against A.I.

NKS in Numb3rs

Friday, December 1st, 2006