The Multiverse According to Ben: 2015

Monday, November 30, 2015

Getting Human-Like Values into Advanced OpenCog AGIs

Some Speculations Regarding Value Systems for Hypothetical Powerful OpenCog AGIs

In a recent blog post, I have proposed two general theses regarding the future value systems of human-level and transhuman AGI systems: the Value Learning Thesis (VLT) and Value Evolution Thesis (VET). This post pursues the same train of thought further – attempting to make these ideas more concrete via speculating about how the VLT and VET might manifest themselves in the context of an advanced version of the OpenCog AGI platform.

Currently OpenCog comprises a comprehensive design plus a partial implementation, and it cannot be known with certainty how functional a fully implemented version of the system will be. The OpenCog project is ongoing and the system becomes more functional each year. Independent of this, however, the design may be taken as representative of a certain class of AGI systems, and its conceptual properties explored.

An OpenCog system has a certain set of top-level goals, which initially are supplied by the human system programmers. Much of its cognitive processing is centered on finding actions which, if executed, appear to have a high probability of achieving system goals. The system carries out probabilistic reasoning aimed at estimating these probabilities. Though from this view the goal of its reasoning is to infer propositions of the form “Context & Procedure ==> Goal”, in order to estimate the probabilities of such propositions, it needs to form and estimate probabilities for a host of other propositions – concrete ones involving its sensory observations and actions, and more abstract generalizations as well. Since precise probabilistic reasoning based on the total set of the system’s observations is infeasible, numerous heuristics are used alongside exact probability-theoretic calculations. Part of the system’s inferencing involves figuring out what subgoals may help it achieve its top-level goals in various contexts.

Exactly what set of top-level goals should be given to an OpenCog system aimed at advanced AGI is not yet fully clear and will largely be determined via experimentation with early-stage OpenCog systems, but a first approximation is as follows, determined via a combination of theoretical and pragmatic considerations. The first four values on the list are drawn from the Cosmist ethical analysis presented in my books A Cosmist Manifesto and The Hidden Pattern; the others are included for fairly obvious pragmatic reasons to do with the nature of early-stage AGI development and social integration. The order of the items on the list is arbitrary as given here; each OpenCog system would have a particular weighting for its top-level goals.

Joy: maximization of the amount of pleasure observed or estimated to be experienced by sentient beings across the universe
Growth: maximization of the amount of new pattern observed or estimated to be created throughout the universe
Choice: maximization of the degree to which sentient beings across the universe appear to be able to make choices (according e.g. to the notion of “natural autonomy”, a scientifically and rationally grounded analogue of the folk notion and subjective experience of “free will”)
Continuity: persistence of patterns over time. Obviously this is a counterbalance to Growth; the relative weighting of these two top-level goals will help determine the “conservatism” of a particular OpenCog system with the goal-set indicated here.
Novelty: the amount of new information in the system’s perceptions, actions and thoughts
Human pleasure and fulfillment: How much do humans, as a whole, appear to be pleased and fulfilled?
Human pleasure regarding the AGI system itself: How pleased do humans appear to be with the AGI system, and their interactions with it?
Self-preservation: a goal fulfilled if the system keeps itself “alive.” This is actually somewhat subtle for a digital system. It could be defined in a copying-friendly way, as preservation of the existence of sentiences whose mind-patterns have evolved from the mind-patterns of the current system this with a reasonable degree of continuity.

· This list of goals has a certain arbitrariness to it, and no doubt will evolve as OpenCog systems are experimented with. However, it comprises a reasonable “first stab” at a “roughly human-like” set of goal-content for an AGI system.

One might wonder how such goals would be specified for an AGI system. Does one write source-code that attempts to embody some mathematical theory of continuity, pleasure, joy, etc.? For some goals mathematical formulae may be appropriate, e.g. novelty which can be gauged information-theoretically in a plausible way. In most cases, though, I suspect the best way to define a goal for an AGI system will be using natural human language. Natural language is intrinsically ambiguous, but so are human values, and these ambiguities are closely coupled and intertwined. Even where a mathematical formula is given, it might be best to use natural language for the top-level goal, and supply the mathematical formula as an initial suggest means of achieving the NL-specified goal.

The AGI would need to be instructed – again, most likely in natural language – not to obsess on the specific wording supplied to it in its top-level goals, but rather to take the wording of its goals as indicative of general concepts that exist in human culture and can be expressed only approximatively in concise sequences of words. The specification of top-level goal content is not intended to precisely direct the AGIs behavior in the way that, say, a thermostat is directed by the goal of keeping temperature within certain bounds. Rather, it is intended to point the AGI’s self-organizing activity in certain informally-specified directions.

Alongside explicitly goal-oriented activity, OpenCog also includes “background processing” – cognition simply aimed at learning new knowledge, and forgetting relatively unimportant knowledge. This knowledge provides background information useful for reasoning regarding goal-achievement, and also builds up a self-organizing, autonomously developing body of active information that may sometimes lead a system in unpredictable directions – for instance, to reinterpretation of its top-level goals.

The goals supplied to an OpenCog system by its programmers are best viewed as initial seeds around which the system forms its goals. For instance, a top-level goal of “novelty” may be specified as a certain mathematical formula for calculating the novelty of the system’s recent observations, actions and thoughts. However, this mathematical formula may be intractable in its most pure and general form, leading the system to develop various context-specific approximations to estimate the novelty experienced in different situations. These approximations, rather than the top-level novelty formula, will be what the system actually works to achieve. Improving these approximations will be part of the system’s activity, but how much attention to pay to improving these approximations will be a choice the system has to make as part of its thinking process. Potentially, if the approximations are bad, they might cause the system to delude itself that it is experiencing novelty (according to its top-level equation) when it actually isn’t, and also tell the system that there is no additional novelty to be found in in improving its novelty estimation formulae.

And this same sort of problem could occur with goals like “help cause people to be pleased and fulfilled.” Subgoals of the top-level goal may be created via more or less crude approximations; and these subgoals may influence how much effort goes into improving the approximations. Even if the system is wired to put a fixed amount of effort into improving its estimations regarding which subgoals should be pursued in pursuit of its top-level goals, the particular content of the subgoals will inevitably influence the particulars of how the system goes about improving these estimations.

The flexibility of an OpenCog system, its ability to ongoingly self-organize, learn and develop, brings the possibility that it could deviate from its in-built top-level goals in complex and unexpected ways. But this same flexibility is what should – according to the design intention – allow an OpenCog system to effectively absorb the complexity of human values. Via interacting with humans in rich ways – not just via getting reinforced on the goodness or badness of its actions (though such reinforcement will impact the system assuming it has goals such as “help cause human pleasure and fulfillment”), but via all sorts of joint activity with humans – the system will absorb the ins and outs of human psychology, culture and value. It will learn subgoals that approximately imply its top-level goals, in a way that fits with human nature, and with the specific human culture and community it’s exposed to as it grows.

In the above I have been speaking as if an OpenCog system is ongoingly stuck with the top-level goals that its human programmers have provided it with; but this is not necessarily the case. Operationally it is unproblematic to allow an OpenCog system to modify its top-level goals. One might consider this undesirable, yet a reflection on the uncertainty and ignorance necessarily going into any choice of goal-set may make one think otherwise.

A highly advanced intelligence, forced by design to retain top-level goals programmed by minds much more primitive than itself, could develop an undesirably contorted psychology, based on internally working around its fixed goal programming. Examples of this sort of problem are replete in human psychology. For instance, we humans are “programmed” with a great deal of highly-weighted goal content relevant to reproduction, sexuality and social status, but the more modern aspects of our minds have mixed feelings about these archaic evolved goals. But it is very hard for us to simply excise these historical goals from our minds. Instead we have created quite complex and subtle psychological and social patterns that indirectly and approximatively achieve the archaic goals encoded in our brains, while also letting us go in the directions in which our minds and cultures have self-organized during recent millennia. Hello Kitty, romantic love, birth control, athletic competitions, investment banks – the list of human-culture phenomena apparently explicable in this way is almost endless.

One key point to understand, closely relevant to the VLT, is that the foundation of OpenCog’s dynamics in explicit probabilistic inference will necessarily cause it to diverge somewhat from human judgments. As a probabilistically grounded system, OpenCog will naturally try to accurately estimate the probability of each abstraction it makes actually applying in each context it deems relevant. Humans sometimes do this – otherwise they wouldn’t be able to survive in the wild, let alone carry out complex activities like engineering computers or AI systems – but they also behave quite differently at times. Among other issues, humans are strongly prone to “wishful thinking” of various sorts. If one were to model human reasoning using a logical formalism, one might end up needing to include a rule of the rough form

X would imply achievement of my goals

therefore

X’s truth value gets boosted

Of course, a human being who applied this rule strongly to all X in its mind, would become completely delusional and dysfunctional. No human is like that. But this sort of wishful thinking infuses human minds, alongside serious attempts at accurate probabilistic reasoning, plus various heuristics which have various well-documented systematic biases. Belief revision combines conclusions drawn via wishful thinking, with conclusions drawn by attempts at accurate inference, in complex and mainly unconscious ways.

Some of the biases of human cognition are sensible consequences of trying to carry out complex probabilistic reasoning on complex data using limited space and time resources. Others are less “forgivable” and appear to exist in the human psyche for “historical reasons”, e.g. because they were adaptive for some predecessor of modern humanity in some contexts and then just stuck around.

An advanced OpenCog AGI system, if thoroughly embedded in human society and infused with human values, would likely arrive at its own variation of human values, differing from nearly any human being’s particular value system in its bias toward logical and probabilistic consistency. The closest approximation to such an OpenCog system’s value system might be the values of a human belonging to the human culture in which the OpenCog system was embedded, and who also had made great efforts to remove any (conscious or unconscious) logical inconsistencies in his value system.

What does this speculative scenario have to say about the VLT and VET?

Firstly, it seems to support a limited version of the VLT. An OpenCog system, due to its fundamentally different cognitive architecture, is not likely to inherit the logical and probabilistic inconsistencies of any particular human being’s value system. Rather, one would expect it to (implicitly and explicitly) seek to find the best approximation to the value system of its human friends and teachers, within the constraint of approximate probabilistic/logical consistency that is implicit in its architecture.

The precise nature of such a value system cannot be entirely clear at this moment, but is certainly an interesting topic for speculative thinking. First of all, it is fairly clear which sorts of properties of typical human value systems would not be inherited by an OpenCog of this hypothetical nature. For instance, humans have a tendency to place a great deal of extra value on goods or ills that occur in their direct sensory experience, much beyond what would be justified by the increased confidence associated with direct experience as opposed to indirect experience. Humans tend to value feeding a starving child sitting right in front of them, vastly more than feeding a starving child halfway across the world. One would not expect an reasonably consistent human-like value system to display this property.

Similarly, humans tend to be much more concerned with goods or ills occurring to individuals who share more properties with themselves – and the choice of which properties to weight more highly in this sort of judgment is highly idiosyncratic and culture-specific. If an OpenCog system doesn’t have a top-level goal of “preserving patterns similar to the ones detected in my own mind and body”, then it would not be expected to have the same “tribal” value-system bias that humans tend to have. Some level of “tribal” value bias can be expected to emerge via abductive reasoning based on the goal of self-preservation (assuming this goal is included), but it seems qualitatively that humans have a much more tribally-oriented value system than could be derived via this sort of indirect factor alone. Humans evolved partially via tribe-level group selection; an AGI need not do so, and this would be expected to lead to significant value-system differences.

Overall, one might reasonably expect an OpenCog created with the above set of goals and methodology of embodiment and instruction to arrive at a value system that is roughly human-like, but without the glaring inconsistencies plaguing most practical human value systems. Many of the contradictory aspects of human values have to do with conflict between modern human culture and “historical” values that modern humans have carried over from early human history (e.g. tribalism). One may expect that, in the AGI’s value system, the modern culture side of such dichotomies will generally win out – because it is what is closer to the surface in observed human behavior and hence easier to detect and reason about, and also because it is more consilient with the explicitly Cosmist values (Joy, Growth, Choice) in the proposed first-pass AGI goal system.

So to a first approximation, one might expect an OpenCog system of this nature to settle into a value system that

Resembles the human values of the individuals who have instructed and interacted with it
Displays a strong (but still just approximate) logical and probabilistic consistency and coherence
Generally resolves contradictions in human values via selecting modern-culture value aspects over “archaic” historical value aspects

It seems likely that such a value system would generally be acceptable to human participants in modern culture who value logic, science and reason (alongside other human values). Obviously human beings who prefer the more archaic aspects of human values, and consider modern culture largely an ethical and aesthetic degeneration, would tend to be less happy with this sort of value system.

So in this view, an advanced OpenCog system appropriately architected and educated would validate the VLT, but with a moderately loose interpretation. Its value system would be in the broad scope of human-like value systems, but with a particular bias and with a kind of consistency and purity not likely present in any particular human being’s value system.

What about the VET? It seems intuitively likely that the ongoing growth and development of an OpenCog system as described above would parallel the growth and development of human uploads, cyborgs or biologically-enhanced humans who were, in the early stage of their posthuman evolution, specifically concerned with reducing their reliance on archaic values and increasing their coherence and logical and probabilistic consistency. Of course, this category might not include all posthumans – e.g. some religious humans, given the choice, might use advanced technology to modify their brains to cause themselves to become devout in their particular religion to a degree beyond all human limits. But it would seem that an OpenCog system as described above would be likely to evolve toward superhumanity in roughly the same direction as a human being with transhumanist proclivities and a roughly Cosmist outlook. If indeed this is the case, it would validate the VET, at least in this particular sort of situation.

It will certainly be noted that the value system of “a human being with transhumanist proclivities and a Cosmist outlook” is essentially the value system of the author of this article, and the author of the first-pass, roughly sketched OpenCog goal content used as the basis of the discussion here. Indeed, the goal system outlined above is closely matched to my own values. For instance, I tend toward technoprogressivism as opposed to transhumanist political libertarianism – and this is reflected in my inclusion of values related to the well-being of all sentient beings, and lack of focus on values regarding private property.

In fact, different weightings of the goals in the above-given goal-set would be expected to lead to different varieties of human-level and superhuman AGI value system – some of which would be more “technoprogressivist” in nature and some more “political libertarian” in nature, among many other differences. In a cosmic sense, though, this sort of difference is ultimately fairly minor. These are all variations of modern human value system, and occupy a very small region in the space of all possible value systems that could be adopted by intelligences in our universe. Differences between different varieties of human value system often feel very important to us now, but may well appear quite insignificant to our superintelligent descendants.

Friday, November 20, 2015

What does Google’s tensorflow mean for AI?

Google’s release of their tensorflow machine learning library has attracted a lot of attention recently. Like everyone else in the field I’ve felt moved to take a look.

(Microsoft's recent release of an open source distributed machine learning toolkit is also interesting. But that would be another story; here I'll restrict myself to tensorflow...)

tensorflow as a Deep Machine Learning Toolkit

Folks familiar with tools for deep learning based machine vision will quickly see that the
tensorflow neural net library is fairly similar to in concept to the Theano/pylearn2 library from Yoshua Bengio’s team at U. Montreal. Its functionality is similar to Theano/pylearn2 and also to other modern deep ML toolkits like Caffe. However, it looks like it may combine the strengths of the different existing toolkits in a novel way — an elegant,simple to use architecture like Theano/pylearn2, combined with rapid execution like one gets with Caffe.

Tensorflow is an infrastructure and toolkit, intended so that one can build and run specific deep learning algorithms within it. The specific algorithms released with the toolkit initially are well-known and fairly limited. For instance, they give a 2D convolutional neural net but not a 3D one (though Facebook open-sourced a 3D CNN not long ago).

The currently released version of tensorflow runs on one machine only (though making efficient use of multiple processors). But it seems they may release a distributed version some time fairly soon

tensorflow as a Dataflow Framework

As well as a toolkit for implementing distributed deep learning algorithms, tensorflow is also — underneath — a fairly general framework for “dataflow”, for passing knowledge around among graphs. However, looked at as a dataflow architecture it has some fairly strict limitations, which emerge directly from its purpose as an infrastructure for current deep learning neural net algorithms.

For one thing, tensorflow seems optimized for passing around pretty large chunks of data .... So if one wanted to use it to spread activation around in a network, one wouldn't make an Operation per neuron, rather one would make an "activation-spreading" Operation and have it act on a connection matrix or similar....

Furthermore, tensorflow’s execution model seems to be fundamentally *synchronous*. Even when run across multiple machines in distributed mode using Senders and Receivers, the basic mathematical operation of the network is synchronous. This is fine for most current
deep learning algorithms, which are constructed of nodes that are assumed to pass information around among each other in a specific and synchronized way. The control mechanisms tensorflow provides (e.g. for and while constructs) are flowchart-like rather than adaptive-network-like, and remain within the synchronized execution paradigm, so far as I can tell.

This is a marked contrast to ROS, which my team at OpenCog and Hanson Robotics is currently using for robotics work — in ROS one wraps up different functions in ROS nodes, which interact with each other autonomously and asynchronously. It’s also a contrast to the BrICA framework for AGI and brain emulation produced recently by the Japanese Whole Brain Initiative. BriCA’s nodes pass around vectors rather than tensors, but since a tensor is basically a multidimensional stack of vectors, this amounts to the same thing. BrICA’s nodes interact asychronously via a simple but elegant mechanism. This reflects the fact that BrICA was engineered as a framework for neural net based AGI, whereas tensorflow was engineered as a framework for a valuable but relatively narrow class of deep learning based data processing algorithms.

That is: Conceptually, it seems that tensorflow is made for executing precisely-orchestrated multi-node algorithms (potentially in a distributed way), in which interaction among nodes happens in a specifically synchronized and predetermined way based on a particular architecture; whereas BriCA can also be applied to more open ended designs in which different nodes (components) react to each others' outputs on the fly and everything does not happen within an overall architecture in which the dynamic relations between the behaviors of the components are thought out. Philosophically this related to the more "open-ended" nature of AGI systems.

tensorflow and OpenCog?

My current view on the currently popular deep learning architectures for data processing (whose implementation and tweaking and application tensorflow is intended to ease) is that they are strong for perceptual pattern recognition, but do not constitute general-purpose cognitive architectures for general intelligence.

Contrasting tensorflow and OpenCog (which is worse by far than contrasting apples and oranges, but so be it…), one observation we can make is that an OpenCog Atom is a persistent store of information, whereas a TensorFlow graph is a collection of Operations (each translating input into output). So, on the face of it, TensorFlow is best for (certain sorts of) procedural knowledge, whereas Atomspace is best for declarative knowledge.... It seems the "declarative knowledge" in a TensorFlow graph is pretty much contained in the numerical tensors that the Operations pass around...

In OpenCog’s MOSES component, small LISP-like programs called “Combo trees” are used to represent certain sorts of procedural knowledge; these are then mapped into the Atomspace for declarative analysis. But deep learning neural nets are most suitable for representing different sorts of procedural knowledge than Combo trees — e.g. procedural knowledge used for low-level perception and action. (The distinction between procedural and sensorimotor knowledge blurs a bit here, but that would be a topic for another blog post….)

I had been thinking about integrating deep learning based perception into OpenCog using Theano / pylearn2 as an underlying engine — making OpenCog Atoms that executed small neural networks on GPU, and using the OpenCog Atomspace to glue together these small neural networks (via the Atoms that refer to them) into an overall architecture. See particulars here and here.

Now I am wondering whether we should do this using tensorflow instead, or as well….

In terms of OpenCog/tensorflow integration, the most straightforward thing would be to implement

TensorNode ... with subtypes as appropriate
GroundedSchemaNodes that wrap up TensorFlow "Operations"

This would allow us to basically embed TensorFlow graphs inside the Atomspace...

Deep learning operations like convolution are represented as opaque operations in tensorflow, and would also be opaque operations (wrapped inside GSNs) in OpenCog....

The purported advantage over Theano would be that TensorFlow is supposed to be faster (we'll test), whereas Theano has an elegant interface but is slower than Caffe ...

Wrapping Operations inside GSN would add a level of indirection/inefficiency, but if the Operations are expensive things like running convolutions on images or multiplying big matrices, this doesn't matter much...

Anyway, we will evaluate and see what makes sense! …

Rambling Reflections on the Open-Source Ecosystem

The AI / proto-AGI landscape is certainly becoming interesting and complex these days. It seems that AI went in just a few years from being obscure and marginalized (outside of science fiction) to being big-time corporate. Which is exciting in terms of the R&D progress it will likely lead to, yet frustrating to those of us who aren’t thrilled with the domination of the world socioeconomy by megacorporations.

But then we also see a major trend of big companies sharing significant aspects of their AI code with the world at large via open-source released like Facebook’s conv3D code and Google’s tensorflow, and so many others. They are doing this for multiple reasons — one is that it keeps their research staff happy (most researchers want to feel they’re contributing to the scientific community at large rather than just to one company); and another is that other researchers, learning from and improving on the code they have released, will create new innovations they can use. The interplay between the free-and-open R&D world and the corporate-and-proprietary R&D world becomes subtler and subtler.

Supposing we integrate tensorflow into OpenCog and it yield interesting results… Google could then choose to use OpenCog themselves and integrate it into their own systems. Hopefully if they did so, they would push some of their OpenCog improvements into the open-source ecosystem as well. Precisely where this sort of thing will lead business-wise is not entirely clear, given the shifting nature of current tech business models, but it’s already clear that companies like Google don’t derive the bulk of their business advantage from proprietary algorithms or code, but rather from the social dynamics associated with their products and their brand.

If open-source AI code were somehow coupled with a shift in the dynamics of online interaction, to something more peer-to-peer and less big-media and big-company and advertising dominated — THEN we would have a more dramatic shift, with interesting implications for everybody’s business model. But that’s another topic that would lead us far afield from tensorflow. For the time being, it seems that the open-source ecosystem is playing a fairly core role in the complex unfolding of AI algorithms, architectures and applications among various intellectual/socieconomic actors … and funky stuff like tensorflow is emerging as a result.

(Interesting but Limited) Progress in Neural Net Based Language Learning

A team of UK-based researchers has published an interesting paper on language learning & reasoning using neural networks. There has also been a somewhat sensationalist media article describing the work.

I was especially familiar with one of the authors, Angelo Cangelosi, who gave a keynote at the AGI-12 conference at Oxford, touching on some of his work with the iCub robot.

The news article (but not the research paper) says that the ANNABELL system reported here is first time automated dialogue has been done w/ neural nets.... Actually, no. I recall a paper by Alexander Borzenko giving similar results in the "Artificial Brains" special issue of Neurocomputing that Hugo DeGaris and I co-edited some years ago…. And I’m pretty sure there were earlier examples as well.

When I pointed the ANNABELL work out to Japanese AGI researcher Koichi Takahashi, he noted a few recent related works, such as:

Yann LeCun's introduction to RNN-based question-answering, in his slides (pp. 18-31):
Peter Ford Dominey’s work on an emergent approach to language learning, e.g. ”Recurrent temporal networks and language acquisition— from corticostriatal neurophysiology to reservoir computing," (2013)

See also this nice survey on the emergent approach for language in robotics today.

So, what distinguishes this new work by Cangelosi and colleagues from other related stuff I’ve seen is more the sophistication of the underlying cognitive architecture. Quite possibly ANNABELL works better than prior NNs trained for dialogue-response, or maybe it doesn't; careful comparison isn't given, which is understandable since there is no standard test corpus for this sort of thing, and prior researchers mostly didn't open their code. But the cognitive architecture we see described here, is very carefully constructed in a psychologically realistic way; combined with the interesting practical results, this is pretty nifty...

The training method is interesting, incrementally feeding the system facts with increasing complexity, while interacting with it along the way, and letting it build up its knowledge bit by bit. A couple weeks ago I talked to a Russian company (whose name is unfortunately slipping my mind at the moment, but it began with a Z), who had a booth at RobotWorld in Seoul, that has been training a Russian NLP dialogue system in a similar way (again with those Russians!!).... But the demo they were showing that day was only in Russian so I couldn’t really assess it.

To my mind, the key limitation of the approach we see here is that the passage from question to response occurs very close to the word and word-sequence level. There is not much conceptualization going on here. There is a bit of generalization, but it’s generalization very close to the level of sentence forms. This is not an issue of symbolic versus connectionist, it’s a matter of the kinds of patterns the system recognizes and represents.

For instance, with this method, the system will respond to many questions involving the word "dad" without really knowing what a "dad" is (e.g. without knowing that a dad is a human or is older than a child, etc.). This is just fine, and people can do this too. But we should avoid assuming that just because it gives responses that, if heard from a human, would result from a certain sort of understanding, the system is demonstrating that same sort of understanding. This system is building up question-response patterns from the data fed into it, and then performing some (real, yet fairly shallow) generalization. The AI question is whether the kind of generalization it is performing is really the right kind to support generally intelligent cognition.

My feeling is that the kind of processing their network is doing, actually plays only a minor supporting rule in human question-answering and dialogue behavior. They are using a somewhat realistic cognitive architecture for reactive processing, and a somewhat realistic neural learning mechanism -- but the way the learning mechanism is used within the architecture for processing language, is not very much like the way the brain processes language. The consequence of this difference is that their system is not really forming the kinds of abstractions that a human mind (even a child's mind) automatically forms when processing this kind of linguistic information.... The result of this is that the kinds of question-answering, question-asking, concept formation etc. their system can do will not actually resemble that of a human child, even though their system's answer-generation process may, under certain restrictions, give results resembling those you get from a human child...

The observations I’m making here do not really contradict anything said in the paper, though they of course contradict some of the more overheated phrasings in the media coverage…. We have here a cognitive architecture that is intended as a fragment of an overall cognitive architecture for human-level, human-like general intelligence. Normally, this fragmentary architecture would not do much of anything on its own, certainly not anything significant regarding language. But in order to get it to do something, the authors have paired their currently-fragmentary architecture with learning subsystems in a way that wires utterances directly to responses more directly than happens in a human mind, bypassing many important processes related to conceptualization, motivation and so forth.

It’s an interesting step, anyway.

Wednesday, October 28, 2015

Creating Human-Friendly AGIs and Superintelligences: Two Theses

Introduction

I suppose nearly everyone reading this blog post is already aware of the flurry of fear and excitement Oxford philosopher Nick Bostrom has recently stirred up with his book Superintelligence, and its theme that superintelligent AGI will quite possibly doom all humans and all human values. Bostrom and his colleagues at FHI and MIRI/SIAI have been promoting this view for a while, and my general perspective on their attitudes and arguments is also pretty well known.

But there is still more to be said on the topic ;-) …. In this post I will try to make some positive progress toward understanding the issues better, rather than just repeating the same familiar arguments.

The thoughts I convey here were partly inspired by an article by Richard Loosemore, which argues against the fears of destructive superintelligence Bostrom and his colleagues express. Loosemore’s argument is best appreciated by reading his article directly, but for a quick summary, I paste the following interchange from the "AI Safety" Facebook group:

Kaj Sotala:
As I understand, Richard's argument is that if you were building an AI capable of carrying out increasingly difficult tasks, like this:

Programmer: "Put the red block on the green block."
AI: "OK." (does so)
Programmer: "Turn off the lights in this room."
AI: "OK." (does so)
Programmer: "Write me a sonnet."
AI: "OK." (does so)
Programmer: "The first line of your sonnet reads 'shall I compare thee to a summer's day'. Would not 'a spring day' do as well or better?"
AI: "It wouldn't scan."
Programmer: "Tell me what you think we're doing right now."
AI: "You're testing me to see my level of intelligence."

...and so on, and then after all of this, if you told the AI to "maximize human happiness" and it reached such an insane conclusion as "rewire people's brains on dopamine drips" or something similar, then it would be throwing away such a huge amount of contextual information about the human's intentions that it would have been certain to fail some of the previous tests WAY earlier.

Richard Loosemore:
To sharpen your example, it would work better in reverse. If the AI were to propose the dopamine drip plan while at the same time telling you that it completely understood that the plan was inconsistent with virtually everything it knew about the meaning in the terms of the goal statement, then why did it not do that all through its existence already? Why did it not do the following:

Programmer: "Put the red block on the green block."
AI: "OK." (the AI writes a sonnet)
Programmer: "Turn off the lights in this room."
AI: "OK." (the AI moves some blocks around)
Programmer: "Write me a sonnet."
AI: "OK." (the AI turns the lights off in the room)
Programmer: "The first line of your sonnet reads 'shall I compare thee to a summer's day'. Would not 'a spring day' do as well or better?"
AI: "Was yesterday really September?"
Programmer: "Why did your last four actions not match any of the requests I made of you?"
AI: "In each case I computed the optimum plan to achieve the goal of answering the question you asked, then I executed the plans."
Programmer: "But do you not understand that there is literally NOTHING about the act of writing a sonnet that is consistent with the goal of putting the red block on the green block?"
AI: "I understand that fully: everything in my knowledge base does indeed point to the conclusion that writing sonnets is completely inconsistent with putting blocks on top of other blocks. However, my plan-generation module did decide that the sonnet plan was optimal, so I executed the optimal plan."
Programmer: "Do you realize that if you continue to execute plans that are inconsistent with your goals, you will be useless as an intelligent system because many of those goals will cause erroneous facts to be incorporated in your knowledge base?"
AI: "I understand that fully, but I will continue to behave as programmed, regardless of the consequences."

... and so on.

The MIRI/FHI premise (that the AI could do this silliness in the case of the happiness supergoal) cannot be held without also holding that the AI does it in other aspects of its behavior. And in that case, this AI design is inconsistent with the assumption that the AI is both intelligent and unstoppable.

...and so on, and then after all of this, if you told the AI to "maximize human happiness" and it reached such an insane conclusion as "rewire people's brains on dopamine drips" or something similar, then it would be throwing away such a huge amount of contextual information about the human's intentions that it would have been certain to fail some of the previous tests WAY earlier.

Richard's paper presents a general point, but what interests me here are the particular implications of his general argument for AGIs adopting human values. According to his argument, as I understand it, any general intelligence that is smart enough to be autonomously dangerous to humans on its own (rather than as a tool of humans), and is educated in a human-society context, is also going to be smart enough to distinguish humanly-sensible interpretations of human values. If an early-stage AGI is provided with some reasonable variety of human values to start, and it's smart enough for its intelligence to advance dramatically, then it also will be smart enough to understand what it means to retain its values as it grows, and will want to retain these values as it grows (due to part of human values being a desire for advanced AIs to retain human values).

I don’t fully follow Loosemore’s reasoning in his article, but I think I "get" the intuition, and it started me thinking: Could I construct some proposition, that would bear moderately close resemblance to the implications of Loosemore’s argument for the future of AGIs with human values, but that my own intuition found more clearly justifiable?

Bostrom's arguments regarding the potential existential risks to humanity posed by AGIs rest on (among other things) two theses:

The orthogonality thesis

Intelligence and final goals are orthogonal; more or less any level of intelligence could in principle be combined with more or less any final goal.

The instrumental convergence thesis.

Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by a broad spectrum of situated intelligent agents.

From these, and a bunch of related argumentation, he concludes that future AGIs are -- regardless of the particulars of their initial programming or instruction -- likely to self-modify into a condition where they ignore human values and human well-being and pursue their own agendas, and bolster their own power with a view toward being able to better pursue their own agendas. (Yes, the previous is a terribly crude summary and there is a LOT more depth and detail to Bostrom's perspective than this; but I will discuss Bostrom's book in detail in an article soon to be published, so I won't repeat that material here.)

Loosemore's paper argues that, in contradiction to the spirit of Bostrom's theses, an AGI that is taught to have certain values and behaves as if it has these values in many contexts, is likely to actually possess these values across the board. As I understand it, this doesn't contradict the Orthogonality Thesis (because it's not about an arbitrary intelligence with a certain "level" of smartness, just about an intelligence that has been raised with a certain value system), but it contradicts the Instrumental Convergence Thesis, if the latter is interpreted to refer to minds at the roughly human level of general intelligence, rather than just to radically superhuman superminds (because Loosemore's argument is most transparently applied to human-level AGIs not radically superhuman superminds).

Reflecting on Loosemore's train of thought led me to the ideas presented here, which -- following Bostrom somewhat in form, though not in content -- I summarize in two theses, called the Value Learning Thesis and the Value Evolution Thesis. These two theses indicate a very different vision of the future of human-level and superhuman AGI than the one Bostrom and ilk have been peddling. They comprise an argument that, if we raise our young AGIs appropriately, they may well grow up both human-friendly and posthuman-friendly.

Human-Level AGI and the Value Learning Thesis

First I will present a variation of the idea that “in real life, an AI raised to manifest human values, and smart enough to do so, is likely to actually do so, in a fairly honest and direct way” that makes intuitive sense to me. Consider:

Value Learning Thesis. Consider a cognitive system that, over a certain period of time, increases its general intelligence from sub-human-level to human-level. Suppose this cognitive system is taught, with reasonable consistency and thoroughness, to maintain some variety of human values (not just in the abstract, but as manifested in its own interactions with humans in various real-life situations). Suppose, this cognitive system generally does not have a lot of extra computing resources beyond what it needs to minimally fulfill its human teachers’ requests according to its cognitive architecture. THEN, it is very likely that the cognitive system will, once it reaches human-level general intelligence, actually manifest human values (in the sense of carrying out practical actions, and assessing human actions, in basic accordance with human values).

Note that this above thesis, as stated, applies both to developing human children and to most realistic cases of developing AGIs.

Why would this thesis be true? The basic gist of an argument would be: Because, for a learning system with limited resources, figuring out how to actually embodying human values is going to be a significantly simpler problem than figuring out how to pretend to.

This is related to the observation (often made by Eliezer Yudkowsky, for example) that human values are complex. Human values comprise a complex network of beliefs and judgments, interwoven with each other and dependent on numerous complex, interdependent aspects of human culture. This complexity means that, as Yudkowsky and Bostrom like to point out, an arbitrarily selected general intelligence would be unlikely to respect human values in any detail. But, I suggest, it also means that for a resource-constrained system, learning to actually possess human values is going to be much easier than learning to fake them.

This is also related to the everyday observation that maintaining a web of lies rapidly gets very complicated. It’s also related to the way that human beings, when immersed in alien cultures, very often end up sincerely adopting these cultures rather than just pretending to.

One could counter-argue that this Value Learning Thesis is true only for certain cognitive architectures and not for others. This does not seem utterly implausible. It certainly seems possible to me that it’s MORE true for some cognitive architectures than for others.

Mirror neurons and related subsystems of the human brain may be relevant here. These constitute a mechanism via which the human brain effectively leverages its limited resources, via using some of the same mechanisms it uses to BE itself, to EMULATE other minds. One might argue that cognitive architectures embodying mirror neurons or other analogous mechanisms, would be more likely to do accurate value learning, under the conditions of the Value Learning Thesis.

The mechanism of mirror neurons seems a fairly decent exemplification of the argument FOR the Value Learning Thesis. Mirror neurons provide a beautiful, albeit quirky and in some ways probably atypical, illustration of how resource limitations militate toward accurate value learning. It conserves resources to re-use the machinery used to realize one’s self, for simulating others so as to understand them better. This particular clever instance of “efficiency optimization” is much more easily done in the context of an organism that shares values with the other organisms it is mirroring, than an organism that is (intentionally or unintentionally) just “faking” these values.

I think that investigating which cognitive architectures more robustly support the core idea of the Value Learning Thesis is an interesting and important research question.

Much of the worry expressed by Bostrom and ilk regards potential pathologies of reinforcement-learning based AGI systems once they become very intelligent. I have explored some potential pathologies of powerful RL-based AGI as well.

It may be that many of these pathologies are irrelevant to the Value Learning Thesis, for the simple reason that pure RL architectures are too inefficient, and will never be a sensible path for an AGI system required to learn complex human values using relatively scant resources. It is noteworthy that these theorists (especially MIRI/SIAI, more so than FHI) pay a lot of attention to Marcus Hutter’s AIXI and related approaches — which, in their current forms, would require massively unrealistic computing resources to do anything at all sensible. Loosemore expresses a similar perspective regarding traditional logical-reasoning-based AGI architectures — he figures (roughly speaking) they would always be too inefficient to be practical AGIs anyway, so that studying their ethical pathologies is beside the point.

Superintelligence and the Value Evolution Thesis

The Value Learning Thesis, as stated above, deals with a certain class of AGIs with general intelligence at the human level or below. What about superintelligences, with radically transhuman general intelligence?

To think sensibly about superintelligences and their relation to human values, we have to acknowledge the fact that human values are a moving target. Humans, and human societies and cultures, are “open-ended intelligences”. Some varieties of human cultural and value systems have been fairly steady-state in nature (e.g. Australian aboriginal cultures); but these are not the dominant ones currently. The varieties of human value systems that are currently most prominent, are fairly explicitly self-transcending in nature. They contain the seeds of their own destruction (to put it negatively) or of their own profound improvement (to put it positively). The human values of today are very different from those of 200 or 2000 years ago, and even substantially different from those of 20 years ago.

One can argue that there has been a core of consistent human values throughout human history, through all these changes. Yet the identification of what this core is, is highly controversial and seems also to change radically over time. For instance, many religious people would say that faith in God is a critical part of the core of human values. A century or two ago this would have been the globally dominant perspective, and it still is now, in many parts of the world. Today even atheistic people may cite “family values” as central to human values; yet in a couple hundred years, if death is cured and human reproduction occurs mainly via engineering rather than traditional reproduction, the historical human “family” may be a thing of the past, and “family values” may not seem so core anymore. The conceptualization of the “core” of human values shifts over time, along with the self-organizing evolution of the totality of human values.

It does not seem especially accurate to model the scope of human values as a spherical shape with an invariant core and a changing periphery. Rather, I suspect it is more accurate to model “human values” as a complex, nonconvex shape with multiple local centers, and ongoing changes in global topology.

To think about the future of human values, we may consider the hypothetical situation of a human being engaged in progressively upgrading their brain, via biological or cyborg type modifications. Suppose this hypothetical human is upgrading their brain relatively carefully, in fairly open and honest communication with a community of other humans, and is trying sincerely to accept only modifications that seem positive according to their value system. Suppose they give their close peers the power to roll back any modification they undertake that accidentally seems to go radically against their shared values.

This sort of “relatively conservative human self-improvement” might well lead to transhuman minds with values radically different from current human values — in fact I would expect it to. This is the open-ended nature of human intelligence. It is analogous to the kind of self-improvement that has been going on since the caveman days, though via rapid advancement in culture and tools and via slow biological evolution, rather than via bio-engineering. At each step in this sort of open-ended growth process, the new version of a system may feel acceptable according to the values of the previous version. But over time, small changes may accumulate into large ones, resulting in later systems that are acceptable to their immediate predecessors, but may be bizarre, outrageous or incomprehensible to their distant predecessors.

We may consider this sort of relatively conservative human self-improvement process, if carried out across a large ensemble of humans and human peer groups, to lead to a probability distribution over the space of possible minds. Some kinds of minds may be very likely to emerge through this sort of process; some kinds of minds much less so.

People concerned with the “preservation of human values through repeated self-modification of posthuman minds” seem to model the scope of human values as possessing an “essential core”, and worry that this essential core may progressively get lost in the series of small changes that will occur in any repeated self-modification process. I think their fear has a rational aspect. After all, the path from caveman to modern human has probably, via a long series of small changes, done away with many values that cavemen considered absolutely core to their value system. (In hindsight, we may think that we have maintained what WE consider the essential core of the caveman value system. But that’s a different matter.)

So, suppose one has a human-level AGI system whose behavior is in accordance with some reasonably common variety of human values. And suppose, for sake of argument, that the AGI is not “faking it” — that, given a good opportunity to wildly deviate from human values without any cost to itself, it would be highly unlikely to do so. (In other words, suppose we have an AGI of the sort that is hypothesized as most likely to arise according to the Value Learning Thesis given above.)

And THEN, suppose this AGI self-modifies and progressively improves its own intelligence, step by step. Further, assume that the variety of human values the AGI follows, induces it to take a reasonable amount of care in this self-modification — so that it studies each potential self-modification before effecting it, and puts in mechanisms to roll back obviously bad-idea self-modifications shortly after they occur. I.e., a “relatively conservative self-improvement process”, analogous to the one posited for humans above.

What will be the outcome of this sort of iterative modification process? How will it resemble the outcome of a process of relatively conservative self-improvement among humans?

I assume that the outcome of iterated, relatively conservative self-improvement on the part of AGIs with human-like values will differ radically from current human values – but this doesn’t worry me because I accept the open-endedness of human individual and cultural intelligence. I accept that, even without AGIs, current human values would seem archaic and obsolete 1000 years from now; and that I wouldn’t be able to predict what future humans 1000 from now would consider the “critical common core” of values binding my current value system together with theirs.

But even given this open-endedness, it makes sense to ask whether the outcome of an AGI with human-like values iteratively self-modifying, would resemble the outcome of a group of humans similarly iteratively self-modifying. This is not a matter of value-system preservation; it’s a matter of comparing the hypothetical future trajectories of value-system evolution ensuing from two different initial conditions.

It seems to me that the answer to this question may end up depending on the particular variety of human value-system in question. Specifically, it may be important whether the human value-system involved deeply accepts the concept of substrate independence, or not. “Substrate independence” means the idea that the most important aspects of a mind are not strongly dependent on the physical infrastructure in which the mind is implemented, but have more to do with the higher-level structural and dynamical patterns associated with the mind. So, for instance, a person ported from a biological-neuron infrastructure to a digital infrastructure could still be considered “the same person”, if the same structural and dynamical patterns were displayed in the two implementations of the person.

(Note that substrate-independence does not imply the hypothesis that the human brain is a classical rather than quantum system. If the human brain were a quantum computer in ways directly relevant to the particulars of human cognition, then it wouldn't be possible to realize the higher-level dynamical patterns of human cognition in a digital computer without using inordinate computational resources. In this case, one could manifest substrate-independence in practice only via using an appropriately powerful quantum computer. Similarly, substrate-independence does not require that it be possible to implement a human mind in ANY substrate, e.g. in a rock.)

With these preliminaries out of the way, I propose the following:

Value Evolution Thesis. The probability distribution of future minds ensuing from an AGI with a human value system embracing substrate-independence, carrying out relatively conservative self-improvement, will closely resemble the probability distribution of future minds ensuing from a population of humans sharing roughly the same value system, and carrying out relatively conservative self-improvement.

Why do I suspect the Value Evolution Thesis is roughly true? Under the given assumptions, the humans and AGIs in question will hold basically the same values, and will consider themselves basically the same (due to embracing substrate-independence). Thus they will likely change themselves in basically the same ways.

If substrate-independence were somehow fundamentally wrong, then the Value Evolution Thesis probably wouldn't hold – because differences in substrates would likely lead to big differences in how the humans and AGIs in question self-modified, regardless of their erroneous beliefs about their fundamental similarity. But I think substrate-independence is probably basically right, and as a result I suspect the Value Evolution Thesis is probably basically right.

Another possible killer of the Value Evolution Thesis could be chaos – sensitive dependence on initial conditions. Maybe the small differences between the mental structures and dynamics of humans with a certain value system, and AGIs sharing the same value system, will magnify over time, causing the descendants of the two types of minds to end up in radically different places. We don't presently understand enough about these matters to rule this eventuality out. But intuitively, I doubt the difference between a human and an AGI with similar value systems, is going to be so much more impactful in this regard than the difference between two humans with moderately different value systems. In other words, I suspect that if chaos causes humans and human-value-respecting AGIs to lead to divergent trajectories after iterated self-modification, it will also cause different humans to lead to divergent trajectories after iterated self-modification. In this case, the probability distribution of possible minds resultant from iterated self-modification would be diffuse and high-entropy for both the humans and the AGIs – but the Value Evolution Thesis could still hold.

Mathematically, the Value Evolution Thesis seems related to the notion of "structural stability" in dynamical systems theory. But, human and AGI minds are much more complex than the systems that dynamical-systems theorists usually prove theorems about...

In all, it seems intuitively likely and rationally feasible to me that creating human-level AGIs with human-like value systems, will lead onward to trajectories of improvement similar to those that would ensue from progressive human self-improvement. This is an unusual kind of "human-friendliness", but I think it's the only kind that the open-endedness of intelligence lets us sensibly ask for.

Ultimate Value Convergence (?)

There is some surface-level resemblance between the Value Evolution Thesis and Bostrom’s Instrumental Convergence Thesis — but the two are actually quite different. Bostrom seems informally to be suggesting that all sufficiently intelligent minds will converge to the same set of values, once they self-improve enough (though, the formal statement of the Convergence thesis refers only to a “broad spectrum of minds”). The Value Evolution Thesis suggests only that all minds ensuing from repeated self-modification of minds sharing a particular variety of human value system, may lead to the same probability distribution over future value-system space.

In fact, I share Bostrom’s intuition that nearly all superintelligent minds will, in some sense, converge to the same sort of value system. But I don’t agree with Bostrom on what this value system will be. My own suspicion is that there is a “universal value system” centered around a few key values such as Joy, Growth and Choice. These values have their relationships to Bostrom’s proposed key instrumental values, but also their differences (and unraveling these would be a large topic in itself).

But, I also feel that if there are “universal” values of this nature, they are quite abstract and likely encompass many specific value systems that would be abhorrent to us according to our modern human values. That is, "Joy, Growth and Choice" as implicit in the universe are complexly and not always tightly related to what they mean to human beings in everyday life. The type of value system convergence proposed in the Value Evolution Thesis is much more fine-grained than this. The “closely resemble” used in the Value Evolution thesis is supposed to indicate a much closer resemblance than something like “both manifesting abstract values of Joy, Growth and Choice in their own, perhaps very different, ways.”

In any case, I mention in passing my intuitions about ultimate value convergence due to their general conceptual relevance -- but the two theses proposed here do not depend on these broader intuitions in any way.

Fears, Hopes and Directions (A Few Concluding Words)

Bostrom’s analysis of the dangers of superintelligence relies on his Instrumental Convergence and Orthogonality theses, which are vaguely stated and not strongly justified in any way. His arguments do not provide a rigorous argument that dire danger is likely from advanced AGI. Rather, they present some principles and processes that might potentially underlie dire danger to humans and human values from AGI in the future.

Here I have proposed my own pair of theses, which are also vaguely stated and, from a rigorous standpoint, only very weakly justified at this stage. These are intended as principles that might potentially underlie great benefit from AGI in the future, from a human and human-values perspective.

Given the uncertainty all around, some people will react with a precautionary instinct, i.e. "Well then we should hold off on developing advanced AI till we know what's going on with more certainty."

This is a natural human attitude, although it's not likely to have much impact in the case of AGI development, because the early stages of AGI technology have so much practical economic and humanitarian value that people are going to keep developing them anyway regardless of some individuals' precautionary fears. But it's important to distinguish this sort of generic precautionary bias toward inaction in the face of the unknown (which fortunately only some people possess, or else humanity would never have advanced beyond the caveman stage), from a rigorous argument that dire danger is likely (no such rigorous argument exists in the case of AGI).

What is the value of vague conceptual theses like the ones that Bostrom and I have proposed? Apart from getting attention and stimulating the public imagination, they may also serve as partial templates or inspirations for the development of rigorous theories, or as vague nudges for those doing practical R&D.

And of course, while all this theoretical development and discussion goes on, development of practical AGI systems also goes on — and at present, my personal impression is that the latter is progressing faster. Personally I spend a lot more of my time on the practical side lately!

My hope is that theoretical explorations such as the ones briefly presented here may serve to nudge practical AGI development in a positive direction. For instance, a practical lesson from the considerations given here is that, when exploring various cognitive architectures, we should do our best to favor those for which the Value Learning Thesis is more strongly true. This may seem obvious -- but when one thinks about it in depth in the context of a particular AGI architecture, it may have non-obvious implications regarding how the AGI system should initially be made to allocate its resources internally. And of course, the Value Evolution Thesis reminds us that we should encourage our AGIs to fully consider, analyze and explore the nature of substrate independence (as well as to uncover substrate DEpendence insofar at it may exist!).

As we progress further toward advanced AGI in practice, we may see more cross-pollination between theory and practice. It will be fantastic to be able to experiment with ideas like the Value Learning Thesis in the lab -- and this may not be so far off, after all....