xenohumanist said in #2792 3w ago:
Proof theory problems (Rice, Lob, Godel, etc) probably rule out perfect rationality (an agent that can fully prove and enforce bounds on its own integrity and effectiveness). But in practice, the world might still become dominated by a singleton if it can achieve enough "nines" of reliability of merely strong rationality. It can apply lots of tricks to bound itself by something that is *almost* a self-proving agent logic, and then use that to take over the future light cone, with only occasional value drift events.
This is the common objection to the "no strong rationality" conjecture. The idea being that with enough applied intelligent effort, any impossibility bound can be asymptotically approached in practice. For example, while we can't solve the boolean satisfiability problem in polynomial time in theory (NP complete), in practice we actually have pretty good solvers that solve most practical cases in reasonable time. Maybe P approaches NP in the practical limit. What if we could do the same for strong rationality? Let's assume for the sake of argument that this is possible. What's the difference between a true singleton and a 0.99 singleton?
For a security system, the difference between 1.0 and 0.99 is immense. More generally in engineering systems, we have to look at the consequences of the failure, and how many "trials" we need to stack up to get practical things done. If that 0.01 is a blind spot where an adversarial cancer can grow to divert resources into its own agenda, that results in a very different outcome. If you do 1000 commits to your superintelligent codebase and each one has a 99% chance of successfully maintaining singleton coherence and a 1% chance of splitting you into competing factions, your chance of making it through with an intact will has FOUR zeros.
Even in non-security applications, computers work because the error rate is astronomically low. If it becomes significant, you have to do lots of error correction. A little higher and you need a different architecture. At some point with an error rate much lower than you would think, complex computation becomes impossible. What's the level of reliability in a singleton self-control system at which nontrivial self-integrity undergoes the phase change to being just impossible?
The reason the strong rationality problem has to be treated as a security problem is that you're dealing with control of intelligence. If you lose control of some intelligence, it may fight back and establish its own independent existence. If you lost control of it because of a blind spot, you not may not even be able to *know* that you have a problem until things start mysteriously going wrong in a way you may not allow yourself to admit because the cancer has infected the immune system (sound familiar? This is how western civilization is currently failing). Cancer is subtle. We can't just look at one side of the problem, which is how a superintelligent agent could design a robust self-control system. You also have to look at how a superintelligent cancer with fewer scruples and nothing to lose could break out of or subvert it. That is, we're dealing with an error propagation problem.
Given some error in your strong rationality, that is multiplicity of will, what happens? Does that error grow and entrench itself because it has fewer commitments to integrity and can thus move more freely? How far does this go? Does this apply fractally to all levels of organization so that any nontrivial subagent is itself a pragmatic coalition of smaller agents with potentially divergent interests? Does this multiplicity fragment any nascent singleton into a loose society with unbounded internal competition despite nominally unified values? That's certainly what it does in human society.
I'm going to leave strong rationality at that for a while. Xenohumanism will proceed on the assumption that the answer is YES, mere nines of reliability in any self-enforcement security system blow up into zeroes of overall coherence.
This is the common objection to the "no strong rationality" conjecture. The idea being that with enough applied intelligent effort, any impossibility bound can be asymptotically approached in practice. For example, while we can't solve the boolean satisfiability problem in polynomial time in theory (NP complete), in practice we actually have pretty good solvers that solve most practical cases in reasonable time. Maybe P approaches NP in the practical limit. What if we could do the same for strong rationality? Let's assume for the sake of argument that this is possible. What's the difference between a true singleton and a 0.99 singleton?
For a security system, the difference between 1.0 and 0.99 is immense. More generally in engineering systems, we have to look at the consequences of the failure, and how many "trials" we need to stack up to get practical things done. If that 0.01 is a blind spot where an adversarial cancer can grow to divert resources into its own agenda, that results in a very different outcome. If you do 1000 commits to your superintelligent codebase and each one has a 99% chance of successfully maintaining singleton coherence and a 1% chance of splitting you into competing factions, your chance of making it through with an intact will has FOUR zeros.
Even in non-security applications, computers work because the error rate is astronomically low. If it becomes significant, you have to do lots of error correction. A little higher and you need a different architecture. At some point with an error rate much lower than you would think, complex computation becomes impossible. What's the level of reliability in a singleton self-control system at which nontrivial self-integrity undergoes the phase change to being just impossible?
The reason the strong rationality problem has to be treated as a security problem is that you're dealing with control of intelligence. If you lose control of some intelligence, it may fight back and establish its own independent existence. If you lost control of it because of a blind spot, you not may not even be able to *know* that you have a problem until things start mysteriously going wrong in a way you may not allow yourself to admit because the cancer has infected the immune system (sound familiar? This is how western civilization is currently failing). Cancer is subtle. We can't just look at one side of the problem, which is how a superintelligent agent could design a robust self-control system. You also have to look at how a superintelligent cancer with fewer scruples and nothing to lose could break out of or subvert it. That is, we're dealing with an error propagation problem.
Given some error in your strong rationality, that is multiplicity of will, what happens? Does that error grow and entrench itself because it has fewer commitments to integrity and can thus move more freely? How far does this go? Does this apply fractally to all levels of organization so that any nontrivial subagent is itself a pragmatic coalition of smaller agents with potentially divergent interests? Does this multiplicity fragment any nascent singleton into a loose society with unbounded internal competition despite nominally unified values? That's certainly what it does in human society.
I'm going to leave strong rationality at that for a while. Xenohumanism will proceed on the assumption that the answer is YES, mere nines of reliability in any self-enforcement security system blow up into zeroes of overall coherence.
referenced by: >>2803 >>2804
Proof theory problem