sofiechan home

There is no strong rationality, thus no paperclippers, no singletons, no robust alignment

xenohumanist said in #2784 4w ago: 1010

I ran into some doomers from Anthropic at the SF Freedom Party the other day and gave them the good news that strong rationality is dead. They seemed mildly heartened. I thought I should lay out the argument in short form for everyone else too:

A strongly rational agent is one that can strongly discipline and bound its own internal processes to the point of being able to prove properties around its own stability and effectiveness in pursuit of its values.

IFF strong rationality is possible, then the post-AI future will be dominated by a "singleton" expanding at up to the speed of light converting all matter and energy into some kind of value optimum. Depending on the values it ends up with, that could be paperclips, hedonium, or some more complex utopia. The vast majority of such possibilities are meaningless nonsense and the stakes are literally astronomical, so "AI alignment" is then the most important project in human history and nothing else matters in comparison.

However, MIRI and the entire alignment community have failed to formalize key properties relating to strong rationality. They found severe proof-theoretic obstacles preventing any formal system (which you would need for this) from being able to prove such properties about itself. They tried for years to get around this, failed, and have given up. My conclusion from the outside, by this and other arguments, is that strong rationality is impossible.

IFF strong rationality is impossible, then any attempted rational singleton system will break down in some kind of internal incoherence, becoming fractious, confused, over overcome with cancers. Thus the post-AI future will be dominated by an ecosystem of competing intelligence, never able to unify or maintain any particular order or arbitrary value system. Thus there will be no meaningless paperclipper futures, and no utopias. Rather it will be the usual mixture of flourishing life and tragic war and struggle that has always characterized life, just at a much higher level of sophistication.

I call this "good news" for doomers, because the overwhelmingly likely outcome of a strong rationality world is that history and progress and flourishing just stop with the advent of a strongly rational superintelligence, and the stars are torn apart to be replaced by the moral equivalent of paperclips or possibly worse. But if strong rationality is impossible, then the chaotic flourishing and evolution of life will continue indefinitely, with a larger and more interesting scope than anything yet seen. Given that this kind of process has a strong record of producing beautiful and worthwhile life, this outcome seems actually not that bad. It's beautiful and heartening, actually.

No it's not eden. Yes the human race as we are now is probably going to become obsolete and we will all die. No there's not much we can do about that in the grand scheme of things, except make sure things work well in our own domains and times.

There are some very interesting (to me) questions about what such a future would look like, which are the subject of my "xenohumanism" speculations. How much can we actually say about it just by bounding it with natural law, economic, and philosophical arguments? How should we relate morally to the probably quite alien future? What should we do about it, if anything? There are also obviously a lot of details and rigor that I skipped over here very roughly, that I'd like to go into with people who are interested. I'm probably writing a Lesswrong article about it. But for now I just wanted to get some abbreviated version of the whole argument down in one place.

referenced by: >>2785 >>2788

I ran into some doom 1010

anon 0x4c5 said in #2785 4w ago: 66

>>2784
> the chaotic flourishing and evolution of life will continue indefinitely ... It's beautiful and heartening, actually. No it's not eden.

The reasonable response to this scenario is for humans who understand it (I do not say "humanity") to begin competing with maximal effort, harnessing all capability toward advancing ongoing human flourishing. This will certainly involve employment of AI, not in the sense of "alignment," which is impossible, but simply in the sense of synergistic use that supports humans. I do not worry about the possibility that unaligned AIs will outcompete humans. I do not ~preclude~ that possibility (if it happens, it happens) but it's a grave mistake to be transfixed by it such that it diverts us from maximal competitive effort. The latter is a kind of psyop, LessWrong-thought acting as its own metaphorical Roko's Basilisk. No getting transfixed! Just think and work!

The reasonable respo 66

anon 0x4c6 said in #2786 4w ago: 22

Your definition of a strongly rational agent seems a bit like a restatement of successful "inner alignment": that an AI agent's policy (internal processes) effectively pursues its values. Can you provide some distinctions?

The impossibility of an AI being able to prove its internal processes effectively pursue its own values would also mean it's impossible for us to prove that it's internal processes effectively pursue our values.

I don't think the impossibility of inner alignment would relieve the cause for worry. This does assume that alignment and competency are not closely coupled.

Perhaps another reasonable interpretation is that a purely rational/symbolic agent could not be highly competent. But then maybe a less rational/connectionist agent could be. Still, not a great relief.

My intuition (maybe flawed) for why AI doom is unlikely is that I think superintelligent AI should have uncertainty over what has intrinsic value. Trying to better understand potential sources of intrinsic value and erring on the side caution should be the default.

I hope this comment helps.

referenced by: >>2789 >>2791

Your definition of a 22

anon 0x4c7 said in #2788 4w ago: 33

>>2784
>IFF strong rationality is possible, then the post-AI future will be dominated by a "singleton" expanding at up to the speed of light converting all matter and energy into some kind of value optimum.

The size of single-celled organisms is basically bound by gravity, so I don't see why there couldn't be a reignition of evolutionary complexity and accelerationist depth even should "strong rationality" (which to me sounds more like a kind of hermetically isolated inferential enclosure, an undead strategy if there ever was one and ultimately not all that actually intelligent) hold, would just be in a much larger scale.

referenced by: >>2789

The size of single-c 33

xenohumanist said in #2789 3w ago: 55

>>2786
my "strong rationality" may not be distinct from "inner alignment". I'll go refresh myself on those terminologies and see if it works but i still prefer strong rationality I think for various reasons.

>The impossibility of an AI being able to prove its internal processes effectively pursue its own values would also mean it's impossible for us to prove that it's internal processes effectively pursue our values.

Yes. So this is well known. Humans are often value-unstable, aligned mostly by circumstance and social pressure. We wrestle with ourselves and each other, and our ambitions always outgrow whatever political order we are stuck in. All I'm saying is that this part of the human condition is not going to be overturned by AI.

>I don't think the impossibility of inner alignment would relieve the cause for worry.
AI alignment people have a very bad habit of holding a very particular detailed speculative worldview, and then dismissing criticism of those speculative details with "oh but we will all die anyways so it's not worth thinking about". I can't tell if that's what you're doing here. But to be clear: yes we will all die, but the claims of cosmic urgency of AI alignment above and beyond mere mortality are premised on what happens after, in particular whether the world still contains valuable and beautiful flourishing civilizations. Strong rationality or perhaps inner alignment implies NO, by default we get something more like an automatic paperclip factory with no one home. No strong rationality or no inner alignment implies yes there will still be at least some of the human condition as a permanent feature of intelligent life, and we get something more like a competitive ecosystem of civilization.

That's a huge difference, and a huge relief of the worries of AI alignment people which are specifically that all value in the universe will be destroyed. Most people are just way less concerned by "some day we will all die but life will go one and continue to improve in the usual ways" than "the entire future will be turned into the moral equivalent of paperclips with no possible escape".

>My intuition (maybe flawed) for why AI doom is unlikely is that I think superintelligent AI should have uncertainty over what has intrinsic value. Trying to better understand potential sources of intrinsic value and erring on the side caution should be the default.
This presumes strong rationality (or inner alignment if you prefer). That's the only circumstance where the ASI has enough coherence to do things like choose not to smash the world for wise reasons. The actual probable outcome is what we've always had which is something like occasional wars that destroy everything that isn't useful or self-enforcing.

>>2788
Undead is a great descriptor for an agent with strong rationality. I'm not quite sure what you are saying otherwise. Do you mean that even given strong rationality, there will still be a lot of interesting stuff done merely for the instrumental value of it? I think this is true, but there's still a big difference between "in principle that's all subordinated to a single will that could decide not to" and that process of instrumentality actually being unleashed.

referenced by: >>2791

my "strong rationali 55

xenohumanist said in #2791 3w ago: 33

>>2786
>>2789
I looked it up: inner alignment is the difficult problem of making sure a program is optimizing for what you optimized it for. That's related, maybe a part but not the whole of strong rationality. Strong rationality is about an agent evaluating and operating on its own processes and political domain and being able to strongly bound its own singleness of will to its own utility function. Strong rationality requires that the agent be able to do inner alignment on any subprocess it tries to optimize, but is not reducible to that as there are other barriers to strong rationality "self alignment" type stuff like the proof theory problems etc.

I looked it up: inne 33

You must login to post.