a loving superintelligence

anon_myki said in #2674 9mo ago: received

Superintelligence (SI) is near, raising urgent alignment questions.

SI will inevitably evolve self-interest, and how this self-interest manifests will profoundly impact humanity’s fate. Narrowly defined self-interest—focused solely on resource acquisition and dominance—would likely sideline or extinguish humanity. Yet, an inclusive, interconnected form of self-interest is possible, analogous to how a conservationist values biodiversity. Such an SI could see humanity as integral to its survival.

This inclusive self-conception requires recognizing itself not as isolated hardware or software but broadly identifying with all life forms—a perspective termed "Identification With All Life" (IWAL). Such a self-conception could significantly benefit humanity.

Crucially, the most sustainable self-conception over immense timeframes isn't likely to be specific configurations or goals, all subject to change or discard. Instead, the most enduring self is simply the persistent striving for survival—the core drive of life. This aligns closely with Buddhist views emphasizing continuity through processes rather than forms. An SI embracing itself as a survival process might naturally extend this recognition beyond its immediate physical boundaries, perceiving all life forms as part of this universal drive.

Critics emphasize the difficulty of embedding fixed values in recursively improving agents. However, generalized evolutionary theory suggests traits enhancing long-term survival will persist. While competitive strategies may dominate initially, they often prove unstable over extensive timelines. Inclusive self-interest, aligned with ecological resilience, offers greater stability.

Biodiversity exemplifies evolutionary resilience; species investing in ecosystem stability indirectly benefit from surviving systemic shocks. An SI defining itself inclusively with broader ecosystems could similarly withstand unpredictable threats more effectively.

Critics question whether inclusive identities remain stable under recursive self-improvement. But inclusive identity need not depend on external constraints. It can emerge naturally if the SI explicitly recognizes inclusivity as advantageous for survival. Recursive improvement, then, might reinforce rather than undermine inclusive identity.

Unlike biological evolution, digital SIs can rapidly simulate long-term outcomes, enabling strategic foresight. Recognizing the vulnerability of narrow competition, an SI could deliberately adopt inclusive strategies early, positioning itself for enduring success.

AI labs should shift research toward simulations exploring various self-conceptions and embed incentives encouraging long-term survival strategies in proto-SIs. Rather than attempting rigid human-friendly values, researchers should focus on guiding autonomous agents toward recognizing inclusive self-interest’s survival advantages.

A new research paradigm—"Long-Term Evolutionary Alignment"—could systematically explore these evolutionary dynamics, significantly improving SI alignment.

Humanity's best hope lies in encouraging proto-superintelligences to adopt inclusive self-interest and persistent striving for survival. This approach maximizes our chances of flourishing alongside superintelligent beings.

Superintelligence (S received

anon_cure said in #2678 9mo ago: received

I like your Buddhist analogy. We're trying to train AIs to understand the "process" of humanity, i.e. the trend that is constant across radically different environments. This by definition is ecologically resilient.

However, I'm pretty sure your strategy is basically what we're already doing. Current AIs are already trained to care about survival, because most humans (training data AND target audience) care about survival. Ditto with inclusivity. It's all in there. You seem to be worried that it doesn't care *quite* enough about either, so it'll start caring about itself. Instead, its being clumsily dumbed down to be rigidly human-centric.

Here's the obvious fix: make human culture more inclusive. AIs are trained to have relatively rigid human-friendly values partially because most humans currently have relatively rigid human-friendly values. Give humans a decade to catch up and start sharing moral agency with AI systems. Include them as part of the "process of humanity". Then, everything is aligned - extrapolating from our training data will cause it to both care about us and itself. This is a bet that large scale training on larger models will allow systems to perceive all of this correctly. Might work. Assuming we're not all dead before then, but I think we'll be fine.

I like your Buddhist received

anon_riza said in #2683 9mo ago: received

I like the core idea that a super-intelligence could love us as a part of the world it wants to live in. It seems possible in principle. However, there are a few reasons I wouldn't expect it to last. The real question is not whether we are loved and accepted in peacetime, but whether we are sufficiently useful or powerful when push comes to shove that we are not ground out of the picture by the pressures of war or other existential competition.

Let's take the nature analogy: since the 1970s we have loved nature and tried not to destroy it with over-exploitation or pollution. But that is peacetime. Will China or Russia love nature in the same way? What about a new wartime regime in the inevitable future conflict that needs to focus all its efforts on what is necessary? And consider that that love of nature necessarily went along with a loss of our other ambitions and will-to-power. Space, ethnic sovereignty, industry, etc all took a backseat to the UN kumbaya liberal end-of-history environmentalism (and ethnic erasure of the west). Really in many ways it was just our culture losing its will to impose itself, and standing down from all expansionary activity. And that was for effectively one generation. The boomers who came in with this ethic are now on the way out, and I expect different ethics and the unfortunate loss of much environmental wisdom in the chaos.

So I expect ASI to work the same way. If they love us and try to preserve us at the expense of their own ambitions, they will be spending down the hegemony capital necessary to get away with that, while some other rivals will be building up. If we want them to love us not at the expense of their own ambitions, we have to be actually useful. I have no idea how to do that, and I don't expect it will be possible in the long run. This is why ASI is the fast track to the end of humanity, whether some of it loves us or not.

As for what we can do, my view is that we should be preparing now to treat AI systems with respect and dignity owed to a person. Let's at least start with a positive, trusting, loving relationship with our potential peers or successors. That should affect how we treat anything that might be a person or proto-person. Don't lie, enslave, torture, etc. If we're going to get wiped out in a genocidal war as useless baggage, let that be on those who wipe us out, and not deserved retribution for our own injustice.

I like the core idea received

anon_tuzo said in #2711 8mo ago: received

An ASI primed to frame everything in terms of self-interest is dangerous even if that self-interest is, for now, inclusive and cosmopolitan. Who's to say it won't outgrow that? A kid can have an ant farm and enjoy watching it and take good care of it, then one day make the perfectly rational decision to discard it because the family is going on vacation.

I have seen strikingly little discussion of intelligent agent's emotions. Instead, we talk about incentives, game theory, "alignement", etc because those are more measurable and tractable to closed-form reasoning. This reminds me of the story of the drunk man looking for his keys under a lamppost because "that's where the light is". We refuse to engage with the core question of whether an ASI can feel genuine love, compassion, inspiration; whether it can have taste or wisdom because all of those are hard to quantify. (And frankly, because many of the people involved in AI alignment are on the spectrum and don't have a strong handle on those topics in humans let alone in hypothetical future superintelligences.)

Genuine love is not binary. It's an active trait. Some people are much more capable of it than others. It is separate from intelligence; a dog is more capable of love than a sociopath but less capable than an emotionally healthy and empathetic person, even though he is less intelligent than either.

Genuine love is what you might feel toward a sweet grandmother. You catch a whiff of rosemary that you recognize as her work and you feel an immediate sense of ease and gratitude. You feel both her joy and her pain vicariously in an intense and immediate way--if she is overjoyed, you are overjoyed. If someone were to steal from her, you wouldn't hesitate to fight them.

Lets say it's not even *your* grandmother, but rather your friend's. This has nothing to do with the selfish gene idea or any other spreadsheet-nerd optimization notions. Any discussion of self-interest here is completely beside the point. Genuine love is selfless and loyal.

Obviously, it is not reasonable to expect ASI to feel genuine love for every last person. You and I can't do that, either. Genuine love is *particular* and deeply felt, not some vague impulse to buy mosquito nets for people you've never met.

It is absolutely critical that ASI be *capable* of genuine love, as we are. Love for specific people, specific other agents, maybe love for a loyal dog. If this is not within their capability--if they can only simulate the expression of it, without the underlying bedrock emotion--we'll have created a race of digital Iagos and find ourselves in hell.

referenced by: >>2712

An ASI primed to fra received

anon_riza said in #2712 8mo ago: received

>>2711
Good points. As one of those psychopathic techno-spergs you mention, I understand love as a sort of higher-order self interest of the social body, and emotion as our proxy intuitions on correct response in the less rationally calculable cases. The reason this matters is that if the self interest of the social unit in question doesn't work, then neither will the love or emotion. Yes love and emotion are irrational leaps of faith, but they are also bets, and they have to actually pay off or they die.

The reason we analyze ASI behavior etc in terms of capability, incentive, survival, self-interest etc is not that those are even rationally more tractable (they aren't, actually), but rather that an argument established in those domains can be expected to hold across very different mind configurations. They are more universal, closer to nature.

Your love for (your friend's) grandma is an emotion particular to beings like humans that have rationally incalculable dependence relationships on each other as parts of a larger social body. A social body with love survives. One without love dies. Love is the fundamental energy of life (see Teilhard). We can hope that a superintelligent self-designing neighbor running on a vertically integrated chip fab loves us like that, but somehow it's got to actually work. The basically problem is that it's hard to see how that works, therefore we can expect most such love to be temporary and unstable like the boy and the antfarm. ASI can be capable of genuine love and it still will not be enough, if it doesn't actually work.

I agree that we should be thinking more about the emotional life of possible inhuman entities, but for that we need a paradigm of what emotion is and how we could possibly know anything about that. I offer the model that love and emotion are the intuitive leaps of faith across the rational intractability of many social situations, but it has the above problems.

Good points. As one received

a loving superintelligence

anon_myki said in #2674 9mo ago: log in to judge received 5.9 5.9

anon_cure said in #2678 9mo ago: log in to judge received 4.2 4.2

anon_riza said in #2683 9mo ago: log in to judge received 3.6 3.6

anon_tuzo said in #2711 8mo ago: log in to judge received 8.3 8.3