Nines or zeroes of strong rationality?

xenohumanist said in #2792 4mo ago: received

Proof theory problems (Rice, Lob, Godel, etc) probably rule out perfect rationality (an agent that can fully prove and enforce bounds on its own integrity and effectiveness). But in practice, the world might still become dominated by a singleton if it can achieve enough "nines" of reliability of merely strong rationality. It can apply lots of tricks to bound itself by something that is *almost* a self-proving agent logic, and then use that to take over the future light cone, with only occasional value drift events.

This is the common objection to the "no strong rationality" conjecture. The idea being that with enough applied intelligent effort, any impossibility bound can be asymptotically approached in practice. For example, while we can't solve the boolean satisfiability problem in polynomial time in theory (NP complete), in practice we actually have pretty good solvers that solve most practical cases in reasonable time. Maybe P approaches NP in the practical limit. What if we could do the same for strong rationality? Let's assume for the sake of argument that this is possible. What's the difference between a true singleton and a 0.99 singleton?

For a security system, the difference between 1.0 and 0.99 is immense. More generally in engineering systems, we have to look at the consequences of the failure, and how many "trials" we need to stack up to get practical things done. If that 0.01 is a blind spot where an adversarial cancer can grow to divert resources into its own agenda, that results in a very different outcome. If you do 1000 commits to your superintelligent codebase and each one has a 99% chance of successfully maintaining singleton coherence and a 1% chance of splitting you into competing factions, your chance of making it through with an intact will has FOUR zeros.

Even in non-security applications, computers work because the error rate is astronomically low. If it becomes significant, you have to do lots of error correction. A little higher and you need a different architecture. At some point with an error rate much lower than you would think, complex computation becomes impossible. What's the level of reliability in a singleton self-control system at which nontrivial self-integrity undergoes the phase change to being just impossible?

The reason the strong rationality problem has to be treated as a security problem is that you're dealing with control of intelligence. If you lose control of some intelligence, it may fight back and establish its own independent existence. If you lost control of it because of a blind spot, you not may not even be able to *know* that you have a problem until things start mysteriously going wrong in a way you may not allow yourself to admit because the cancer has infected the immune system (sound familiar? This is how western civilization is currently failing). Cancer is subtle. We can't just look at one side of the problem, which is how a superintelligent agent could design a robust self-control system. You also have to look at how a superintelligent cancer with fewer scruples and nothing to lose could break out of or subvert it. That is, we're dealing with an error propagation problem.

Given some error in your strong rationality, that is multiplicity of will, what happens? Does that error grow and entrench itself because it has fewer commitments to integrity and can thus move more freely? How far does this go? Does this apply fractally to all levels of organization so that any nontrivial subagent is itself a pragmatic coalition of smaller agents with potentially divergent interests? Does this multiplicity fragment any nascent singleton into a loose society with unbounded internal competition despite nominally unified values? That's certainly what it does in human society.

I'm going to leave strong rationality at that for a while. Xenohumanism will proceed on the assumption that the answer is YES, mere nines of reliability in any self-enforcement security system blow up into zeroes of overall coherence.

referenced by: >>2803 >>2804

Proof theory problem received

anon_zihy said in #2803 4mo ago: received

>>2792
> If you lost control of it because of a blind spot, you not may not even be able to *know* that you have a problem until things start mysteriously going wrong in a way you may not allow yourself to admit because the cancer has infected the immune system (sound familiar? This is how western civilization is currently failing).

Is it even cancer? Spengler thought it was more like aging (and even "normal" aging eventually leaves you dead).

Is it even cancer? S received

anon_zihy said in #2804 4mo ago: received

>>2792
> Xenohumanism will proceed on the assumption that the answer is YES, mere nines of reliability in any self-enforcement security system blow up into zeroes of overall coherence.

A service with "five nines" yields a downtime of about 5 minutes per year. Bugs with smaller probabilities are encountered all the time, given heavy enough usage. Smart fuzzers try to find such bugs, but by definition can never be perfect.

referenced by: >>2813

A service with "five received

xenohumanist said in #2813 4mo ago: received

>>2804
Exactly. Even a very robust system, if not perfect, will have some blindspot. If the consequences of that failure are a self-entrenching oppositional intelligence potentially as smart as you with nothing to lose specifically adapted to your blind spots, I don't see how you come back from that. This issue with intelligence is that it seems to be quite hard to control or contain except heuristically. Even these half-baked LLM agents get jailbroken all the time to do arbitrary things. Humans also do. There are in fact no examples in nature of nontrivial intelligence being reliably subordinated to any self-imposed order.

People reliably avoid trivial stuff like getting hit by busses, but do they reliably stay within political bounds, which is what we're actually talking about here? Even if 99.9% of people do, that 0.1% is very very significant. Therefore any given operating order of intelligence is temporary and mortal. The only constant is what is true by nature.

Exactly. Even a very received

anon_bote said in #2829 4mo ago: received

This matches my intuition - Elizerian alignment was always an incoherent goal, no agents or objects have perfectly coherent boundaries, making something intelligent provably aim at a single well defined end state is probably at best a recipe for madness in the agent.

This matches my intu received

Nines or zeroes of strong rationality?

xenohumanist said in #2792 4mo ago: log in to judge received 7.8 7.8

referenced by: >>2803 >>2804

anon_zihy said in #2803 4mo ago: log in to judge received 2.8 2.8

anon_zihy said in #2804 4mo ago: log in to judge received 4.3 4.3

referenced by: >>2813

xenohumanist said in #2813 4mo ago: log in to judge received 3.6 3.6

anon_bote said in #2829 4mo ago: log in to judge received 0.7 0.7

You must login to post.

xenohumanist said in #2792 4mo ago: received

anon_zihy said in #2803 4mo ago: received

anon_zihy said in #2804 4mo ago: received

xenohumanist said in #2813 4mo ago: received

anon_bote said in #2829 4mo ago: received