Monday, August 27, 2012

Finding the "Right" Computational Model to Support Occam's Razor

(This is a rather abstract science-y post, beware....  If you don't like theoretical computer science, you'll be bored and/or baffled....  But if you do -- or if you like being bored and/or baffled, read on!  These are some technical speculations that I believe may have interesting conceptual/philosophical implications....)

One question I've been thinking about a lot lately is: What is the "right" computational model?

Occam's Razor is a powerful tool in AI, both in machine learning / data mining applications, and in a more ambitious AGI context.   But the formulation of Occam's Razor, in a practical computer science context, contains a certain arbitrariness.

Occam's Razor says to prefer the simpler hypothesis; or more subtly, to prefer hypotheses, in some sense, proportionally to their simplicity.

But what's "simple"?

An old friend of mine, Mariela Ivanova, once said: "Simplicity is so complex, I couldn't understand it."

Sometimes folks define "simplicity of X" as "length of the shortest expression of X in a specified programming language P" -- or something like that.   But of course this is too crude, so there are variations, such as "runtime of the program giving the shortest expression to X in language P"; or various weighted combinations of length and runtime as Juergen Schmidhuber and colleagues have explored in their work on "frontier search."

One can also look at a "total computational cost" simplicity measure such as "the minimum amount of space and time resources needed to compute X" on a specific machine M.

The program-length simplicity metric is related to the Solomonoff-Levin measure on program space, which defines the probability of a program as inversely related to the length of the shortest program computing it.  Variant simplicity metrics lead to other measures on program space

All this is very nice and elegant, to write about and to prove theorems about.  But it all depends on a seemingly arbitrary assumption: what is the programming language, or machine, involved?

There's some nice old theory saying that, in a sense, for the case of "shortest program length," this choice doesn't matter.  Any universal Turing machine can simulate any other one, given a constant overhead.

But in practice, this constant may be very large, and it sometimes does matter.

And for other simplicity measures, such as "total computational cost", no comparable theories are known.

To complete the story of the computer science basis for Occam's Razor, it seems what we need is a theory of what is the "right" programming language or machine; what is the "right" computational model.

Of course, this comes down to deciding what "right" means.

What I'd like to find are some nice theorems of the form: IF you want a programming-language/computational-model satisfying simple, elegant criteria A-B-C, THEN you  must use a programming-language/computational model essentially equivalent to P*.

If we had a good theory of this nature, then the choice of which programming-language/computational-model to use, would come down to choosing criteria A-B-C.

I looked around a bit and couldn't find any really great theoretical results of this nature.   But it's possible they exist and I just didn't figure out the right terms to search for.

My intuition is that: one can formulate some simple, elegant criteria that mathematically imply the right computational model is some sort of parallel graph rewriting.

In other words, I suspect there will be some simple, elegant criteria that are equivalent to the use of something vaguely like core Haskell.

What kind of criteria am I thinking of?  I know I'm getting a little fuzzy at this point, but I'm thinking about stuff like: "Computing W and computing f(W) should take the same amount of space and time resources," for cases where it seems intuitively obvious that W and f(W) should take the same amount of space and time resources.

Might it be possible, by imposing a few natural criteria of this nature, to derive results implying that the only way to fulfill the criteria is to use some sort of simple parallel graph rewriting based computational model?

I strongly suspect this is possible, but I've been thinking about this a while, and (a familiar story for me lately!) haven't found time to spend the needed hours/days/weeks to try to work out any details.

There is a math literature on abstract rewrite systems, as well as on the practical use of parallel graph rewriting systems to do computation.   These may provide raw material to be used to prove theorems of the sort I'm alluding to.  Or maybe some theorems equivalent to what I'm dreaming about exist somewhere, and I just didn't find them yet!

So, let's suppose someone proves theorems of the nature I'm suggesting.  What would that get us?

It would tell us, in an elegant and intuitively compelling way, what computational model to assume when applying Occam's Razor.

Of course, someone could always say "That math is nice, but I see no reason to accept your criteria.  Sure they're simple and elegant, but I happen to prefer a computational model fulfilling different criteria."   That's inevitable.

But: A community sharing a common intuition regarding which mathematical criteria are elegant and simple, would have a strong mutual intuitive reason to assume a common computational model, and thus to apply the Occam's Razor heuristic in the same way.