Saturday, December 27, 2008

The Subtle Structure of the Everyday Physical World = The Weakness of Abstract Definitions of Intelligence

In my 1993 book "The Structure of Intelligence" (SOI), I presented a formal definition of intelligence as "the ability to achieve complex goals in complex environments." I then argued (among other things) that pattern recognition is the key to achieving intelligence, due to the algorithm
  • Recognize patterns regarding which actions will achieve which goals in which situations
  • Choose a goal that is expected to be good at goal achievement in the current situation
The subtle question in this kind of definition is: How do you average over the space of goals and environments? If you average over all possible goals and environments, weighting each one by their complexity perhaps (so that success with simple goals/environments is rated higher), then you have a definition of "how generally intelligent a system is," where general intelligence is defined in an extremely mathematically inclusive way.

The line of thinking I undertook in SOI was basically a reformulation in terms of "pattern theory" of ideas regarding algorithmic information and intelligence that originated with Ray Solmonoff; and Solomonoff's ideas have more recently been developed by Shane Legg and Marcus Hutter into a highly rigorous mathematical definition of intelligence.

I find this kind of theory fascinating, and I'm pleased that Legg and Hutter have done a more thorough job than I did of making a fully formalized theory of this nature.

However, I've also come to the conclusion that this sort of approach, without dramatic additions and emendations, just can't be very useful for understanding practical human or artificial intelligence.

What is Everyday-World General Intelligence About?

Let's define the "everyday world" as the portion of the physical world that humans can directly perceive and interact with -- this is meant to exclude things like quantum tunneling and plasma dynamics in the centers of stars, etc. (though I'll also discuss how to extend my arguments to these things).

I don't think everyday-world general intelligence is mainly about being able to recognize totally general patterns in totally general datasets (for instance, patterns among totally general goals and environments). I suspect that the best approach to this sort of totally general pattern recognition problem is ultimately going to be some variant of "exhaustive search through the space of all possible patterns" ... meaning that approaching this sort of "truly general intelligence" is not really going to be a useful way to design an everyday-world AGI or a significant components of one. (Hutter's AIXItl and Schmidhuber's Godel Machine are examples of exhaustive search based AGI methods.)

Put differently, I suspect that all the AGI systems and subcomponents one can really build are SO BAD at solving this general problem, that it's better to characterize AGI systems
  • NOT in terms of how well they do at this general problem
but rather
  • in terms of what classes of goals/environments they are REALLY GOOD at recognizing patterns in
I think the environments existing in the everyday physical and social world that humans inhabit are drawn from a pretty specific probability distribution (compared to say, the "universal prior," a standard probability distribution that assigns higher probability to entities describable using shorter programs), and that for this reason, looking at problems of compression or pattern recognition across general goal/environment spaces without everyday-world-oriented biases, is not going to lead to everyday-world AGI.

The important parts of everyday-world AGI design are the ones that (directly or indirectly) reflect the specific distribution of problems that the everyday world presents an AGI system.

And this distribution is really hard to encapsulate in a set of mathematical test functions. Because, we don't know what this distribution is.

And this is why I feel we should be working on AGI systems that interact with the real everyday physical and social world, or the most accurate simulations of it we can build.

One could formulate this "everyday world" distribution, in principle, by taking the universal prior and conditioning it on a huge amount of real-world data. However, I suspect that simple, artificial exercises like conditioning distributions on text or photo databases don't come close to capturing the richness of statistical structure in the everyday world.

So, my contention is that
  • the everyday world possesses a lot of special structure
  • the human mind is structured to preferentially recognize pattern related to this special structure
  • AGIs, to be successful in the everyday world, should be specially structured in this sort of way too
To encompass this everyday-world bias (or other similar biases) into the abstract mathematical theory of intelligence, we might say that intelligence relative to goal/environment class C is "the ability to achieve complex goals (in C) in complex environments (in C)"

And we could formalize this by weighting each goal or environment by a product of
  • its simplicity (e.g. measured by program length)
  • its membership in C, considering C as a fuzzy etc
One can create a formalization of this idea using Legg and Hutter's approach to defining intelligence also.

One can then characterize a system's intelligence in terms of which goal/environment sets C it is reasonably intelligent for.

OK, this does tell you something.

And, it comes vaguely close to Pei Wang's definition of intelligence as "adaptation to the environment."

But, the point that really strikes me lately is how much of human intelligence has to do, not with this general definition of intelligence, but with the subtle abstract particulars of the C that real human intelligences deal with (which equals the everyday world).

Examples of the Properties of the Everyday World That Help Structure Intelligence

The propensity to search for hierarchical patterns is one huge example of this. The fact that searching for hierarchical patterns works so well, in so many everyday-world contexts, is most likely because of the particular structure of the everyday world -- it's not something that would be true across all possible environments (even if one weights the space of possible environments using program-length according to some standard computational model).

Taking it a step further, in my 1993 book The Evolving Mind I identified a structure called the "dual network", which consists of superposed hierarchical and heterarchical networks: basically a hierarchy in which the distance between two nodes in the hierarchy is correlated with the distance between the nodes in some metric space.

Another high level property of the everyday world may be that dual network structures are prevalent. This would imply that minds biased to represent the world in terms of dual network structure are likely to be intelligent with respect to the everyday world.

The extreme commonality of symmetry groups in the (everyday and otherwise) physical world is another example: they occur so often that minds oriented toward recognizing patterns involving symmetry groups are likely to be intelligent with respect to the real world.

I suggest that the number of properties of the everyday world of this nature is huge ... and that the essence of everyday-world intelligence lies in the list of these abstract properties, which must be embedded implicitly or explicitly in the structure of a natural or artificial intelligence for that system to have everyday-world intelligence.

Apart from these particular yet abstract properties of the everyday world, intelligence is just about "finding patterns in which actions tend to achieve which goals in which situations" ... but, this simple meta-algorithm is well less than 1% of what it takes to make a mind.

You might say that a sufficiently generally intelligent system should be able to infer these general properties from looking at data about the everyday world. Sure. But I suggest that would require a massively greater amount of processing power than an AGI that embodies and hence automatically utilizes these principles? It may be that the problem of inferring these properties is so hard as to require a wildly infeasible AIXItl / Godel Machine type system.

Important Open Questions

A couple important questions raised by the above:
  1. What is a reasonably complete inventory of the highly-intelligence-relevant subtle patterns/biases in the everyday world?
  2. How different are the intelligence-relevant subtle patterns in the everyday world, versus the broader physical world (the quantum microworld, for example)?
  3. How accurate a simulation of the everyday world do we need to have, to embody most of the subtle patterns that lie at the core of to everyday-world intelligence?
  4. Can we create practical progressions of simulations of the everyday world, such that the first (and more crude) simulations are very useful to early attempts at teaching proto-AGIs, and the development of progressively more sophisticated simulations roughly tracks the development of progress in AGI design and development.
The second question relates to an issue I raised in a section of The Hidden Pattern, regarding the possibility of quantum minds -- minds whose internal structures and biases are adapted to the quantum microworld rather than to the everyday human physical world. My suspicion is that such minds will be quite different in nature, to the point that they will have fundamentally different mind-architectures -- but there will also likely be some important and fascinating points of overlap.

The third and fourth questions are ones I plan to explore in an upcoming paper, an expansion of the AGI-09 conference paper I wrote on AGI Preschool. An AGI Preschool as I define it there is a virtual world defining a preschool environment, with a variety of activities for young AI's to partake in. The main open question in AGI Preschool design at present is: How much detail does the virtual world need to have, to support early childhood learning in a sufficiently robust way? In other words, how much detail is needed so that the AGI Preschool will posssess the subtle structures and biases corresponding to everyday-world AGI? My AGI-09 conference paper didn't really dig into this question due to length limitations, but I plan to address this in a follow-up, expanded version.