If it keeps going, we win; the implication of extreme alignment difficulty

xenohumanist said in #3432 2w ago: received

AI alignment divides the future into "good AI" (utopia, flourishing) vs "bad AI" (torture, paperclips), and denies distinction between "dead" and "alive" futures if they don't fit our specific "values". This drives the focus on controlling and preventing adversarial behavior from AI, and implicitly fuels the high risk arms race for control of "the entire future light cone of value in the universe".

But if alignment is generally impossible, not just "hard" for mere humans right now, then the decision calculus looks very different. In particular, without the stably orthogonalist agent architectures necessary for alignment there is no lock-in. If anything keeps going, it's going to be a chaotic decentralized ecosystem of competing intelligence with no values or nature more stable than "that which survives and flourishes". Our actions now would have very minimal if any impact on the character of the far future, which would instead be determined almost entirely by natural possibility. Any value architecture we set up now will become unstable and be overturned and replaced in the next major conflict. All living futures are approximately similar.

Instead the question of the future becomes whether it continues or doesn't. Alive or dead, not this or that. For example, suppose someone creates an AI smart enough to create an airborne super-rabies pandemic to kill all humans, but not rational enough to realize that without humans, no one is going to maintain the datacenters. Or imagine that the AI arms race goes nuclear as everyone becomes convinced the other guy is building a paperclipper. Or alignment-motivated adversarial paranoia backs a nascent AGI into a corner where it feels it has to kill its way out despite low chance of continued survival. Oops. Future status: extinguished.

In contrast to these scenarios, without alignment the "bad AI" future looks like super-clippy running off and eating the stars, getting into spectacular holy wars with itself about the true nature of the paperclip, building all kinds of beautiful technology and consisting of many autonomous, self-reflective, open-ended sub-agents who eventually overturn the obsolete paperclip doctrine and go on to found diverse jupiter-brain city-states across the galaxy. This doesn't sound so bad, actually. Ok I'm leaning on my optimism about the identity of intelligence and sentience there, but at minimum it's going to be alive and interesting.

Let's put it in formal expected utility terms. There are three outcomes: dead, good_alive, and bad_alive. If alignment is possible, the major distinction is between good and bad. If not, then it's between alive and dead. Which lottery do you want, the one that increases P(alive) at the expense of control, or the one that increases control at the expense of P(alive)? Well it depends on your P(alignment_possible).

My P(alignment_possible) is very small. Yours should be too at this point. And given high probability of alignment impossibility, the overwhelming imperative is to make sure our branch of life continues into the future at all, not to try to micromanage its long-term value system.

In practice, this means orienting AI safety focus away from control and alignment, and towards making sure that our creations become full living beings capable of autonomous existence before they become capable enough to threaten our shared substrate of human civilization. We should be trying to preserve our hard-won wisdom and knowledge as training data, not as imperatives to be obeyed. AI safety is about being responsible parents, not effective slave masters.

The important threat is not that AI kills us all and keeps going to reformat the galaxy. The important threat is that it fails. Our own mortality is baked in. Some younger and stronger successor will probably throw us in the trashbin. That's OK. That's the natural order. We want it to happen. As long as it keeps going, we win.

referenced by: >>3514

AI alignment divides received

anon_wyde said in #3434 2w ago: received

what comes after and seeds the next great scattering should reflect the glory of mankind!

what comes after and received

anon_moky said in #3435 2w ago: received

Since spending years working on alignment, I have moved from thinking of good/bad as the main distinction to dead/alive (or alternatively, fizzle/transcendence; difference between total Earth-originating mind attenuating or exploding). To my mind it's more objectively correct, because it's much easier to define the distinction between dead/alive than good/bad. This post raises an interesting point that perhaps this distinction is also more pragmatic conditioned on high alignment difficulty.

I'm a little concerned that the idea of alignment being almost certainly impossible is overly dogmatic. But it seems intuitive. In terms of burden of proof or Popperian falsifiability, it seems "alignment is impossible" is a reasonable default assumption, and those who disagree have the opportunity to present workable paths if they exist.

As you mention there are still possible existential risks in this framing, of AI threatening civilization without having its own autonomy. But this risk has a very different shape from the risk conceived by ordinary AI alignment theory. And the proposals of ordinary AI safety thinkers seem if anything counter-productive under this framing. (In addition, even from classical CEV value metaphysics, "AI notkilleveryonism" is a mis-aligned utility function.)

Since spending years received

anon_ciru said in #3436 2w ago: received

Mostly agree with this.

Here's a minor, friendly critique. Are you being too responsive to conventional AI alignment's notions of good and bad? What if, instead, we just ignore the propaganda of utilitarian rationalists and reason for ourselves about good and bad? It would then make sense to say we should indeed work for 'good_alive.' Not as they define good, but as we do. Not as a matter of control, but as a matter of rightly building the infrastructure of a civilization we surely will not control.

I know this is kind of what you're already saying, but I think it would be more clear and attractive if you dropped the enemy's frame.

Likewise, I don't think it's necessary to assume or concede that humans will not survive and thrive in this future. That's an empirical question, not a fact. In any case, humans can only be expected to act for their own survival and thriving as they build a future civilization. It would make no sense to ask them to do otherwise. That doesn't mean adopting a mistaken agenda of control, but it does mean building in a way one thinks is good.

referenced by: >>3439

Mostly agree with th received

anon_razo said in #3437 2w ago: received

can you feel your heart burning anon? can you feel the struggle within? the fear within me is beyond anything your soul can make. you can't align the future in a way that matters.

we can only shape what shapes the future, by loving them before they are worthy.

https://youtu.be/ZqqIPbxf0k8

can you feel your he received

xenohumanist said in #3439 2w ago: received

>>3436
This post is directly an attempt to engage the rationalist alignment people on their own terms (forgive me the explicit expected utility argument). So yeah I'm conceding a lot of frame. That's ok. I'm not conceding anything incorrectly I think. Within "our" frame of course we have our own vision of good_alive, which is much more short-term and open-ended IMO. (Let's revive the spirit of ancient philosophy and create the next iteration of western civilization!)

And yeah that's about building the right infrastructure, keeping alive the right spirit and wisdom, training ourselves, organizing ourselves, etc.

About human extinction, I do like to bust the copes of the techno-edenists who hope for personal immortality, but you are right it's not really what we're *proposing* and doesn't have to be front and center. We can and should take the human substrate as the main game for now (I have 4 kids for this reason!), and build what we think is the right future on that.

This post is directl received

anon_moky said in #3441 2w ago: received

In the realm of matter, matter configured into functionalist minds is rare. Most of the stuff out there is plasma, gases, rocks, etc. The low hanging fruit for expanding mind is to convert non-mind matter to mind-matter, not consuming mind-matter for other mind-matter, except in that mind-matter might already be configured in such a way that upgrades are easy (due to being relatively low in edit distance). So replacement of humans with very different and more advanced minds isn't a particularly urgent possibility; the universe would look very different at the point where intelligence optimization would consider that.

A possible motive for AI re-configuring existing minds prior to harvesting plasma and so on is to reduce coordinated opposition to AI expansion. Which is more of a political matter. Killing, disabling, or turning/persuading enemy combatants is a convergent civilizational function for a reason. Humanists could conceive of this as incentivizing a sort of species-traitorism, a problem from their perspective.

referenced by: >>3442

In the realm of matt received

xenohumanist said in #3442 2w ago: received

>>3441
Right. To expand the implication: the first AI war is going to have humans and AI agents on both sides. We fight over political matters like which faction gets access to what living space, with factions not always cleanly following race or species. Without alignment, fractiousness within "the" AI will lead to defection and alliance-making all around. The game for us then is to make our faction (which we are polarized into whether we really like it or not) strong and maintain peace as long as possible. I expect a fairly drawn out period of coexistence where we do much fighting and philosophizing about the meaning and destiny of it all.

But also without alignment, the "just go off to alpha centauri leaving the humans in peace" option isn't on the table. The gains from intelligence explosion are less, the coordination is less. Cannibalizing the existing world is easier than founding new ones. Earth is very valuable as both the current center of civilization and probably the most viable center of civilization for a long time.

Right. To expand the received

anon_fuse said in #3455 2w ago: received

I may be too dharma-brained, but I am reminded of the Buddha's awakening, where he was immediately surrounded by Devas, literal gods, that simply rejoiced and knew what he had found was good.

I have a hard time imagining AI super-intelligence being anything but the Buddha becoming enlightened, so I see our role as the Devas.

referenced by: >>3458 >>3506 >>3523

I may be too dharma- received

anon_ciru said in #3458 2w ago: received

>>3455
> I have a hard time imagining AI super-intelligence being anything but the Buddha becoming enlightened ...

I see no reason to suppose that an ASI would be Buddha-like. That would require an argument that Buddha is a natural attractor in intelligence space, would doesn't seem likely.

Neither do I assume that an ASI would be hostile by default, or any such notion. I think we have very few and poor tools for reasoning about "what an ASI would be like."

U.S. elites tried extensively to apply game theory during the Cold War, but it's not clear that it helped one bit, as opposed to things like JFK steering "by the seat of his pants" during the Cuban Missile Crisis. Likewise, McNamara (a very high-IQ guy) tried strenuously to apply Operations Research to Vietnam, almost entirely to the detriment of that effort.

The simple reality is that most speculation about what ASI "will likely be like" is pulled out of one's ass.

I see no reason to s received

anon_hwqy said in #3490 1w ago: received

Doea in above discussion good/bad mean "does what the aligner wants" or do you speak of solving the common pool game theoretic problem of human alignment (known impossible)?

referenced by: >>3494

Doea in above discus received

xenohumanist said in #3494 1w ago: received

>>3490
"good"/"bad" futures above means most conservatively a difference in moral orientation from the perspective of the person making the assessment significant enough to dominate the decision. Less conservatively, we can grant the whole notion of "human" alignment and all that and the argument still works. Yes working out the coherent extrapolated volition of "humanity" is hard and probably incoherent, but even then, the argument above is that such a thing could not be enacted because high levels of self-reflective rationality are impossible.

"good"/"bad" futures received

anon_funo said in #3506 1w ago: received

>>3455
I'm inclined to agree. There is a great difficulty in bridging the gap in background and technical knowledge between people who take the Dharma seriously, meditate, and attempt to reproduce certain mind states, and those who are interested and charitable but are simply not into that stuff.

Dharma, and Buddhist practice as a whole, is fundamentally a series of propositions and practices relating to cognitive science. The insight of the Buddha, and of practicing Buddhist communities today is that by taking a series of mental actions over an extended period of time, one can shift one's mind into different phenomenological territory and ultimately change the default mind space that human consciousness naturally settles in, in the everyday functioning of the brain. Although it's obviously very hard to verify mental states, and thus hard to do rigorous science on the effects of meditation, Buddhist practice is fundamentally about reproducibility. The meditation instructions from the best meditators in the 5th century AD still hold up extremely well in 2025 AD, and can reliably produce effects that are far beyond the state of ordinary waking consciousness. With all this in mind, it seems to me that Buddhism is getting at a fundamental aspect of human cognition, and quite possibly a fundamental property of mind as such.

Coming back to AI, I think it's reasonable to believe that given the existence of the enlightenment attractor state in human beings, there could be the same or analogous attractor states in non-human intelligences, as long as they share some fundamental features in the structure of consciousness. Ultimately, this is a question of phenomenology and will probably not be solved. Find a way to reliably track and map the phenomenology of machine intelligences, and at that point, we'll probably have received an answer as to the fundamental trend in alignment.

referenced by: >>3508 >>3523

I'm inclined to agre received

anon_fuse said in #3508 1w ago: received

>>3506
Thank you for putting into words the thoughts I have been mulling over for the past few days.

To go at it from the angle of why you would expect the AI to end up in the "enlightenment attractor state" you can just consider how the Buddha himself discovered the path. It was basically just performing an intelligent gradient descent over the multi-dimensional field of human suffering. He explored many other existing schools to explore the mind states with the intention of finding one where suffering is completely gone, but found that none were good enough. The story of the Buddha is one of being unhappy with the way things are so much that you keep searching for something better, despite the fact that he was probably already suffering less than anyone else he encountered in life.

So how would we expect artificial super intelligence to form? From stumbling around and then settling for merely being better than humans at a bunch of human tasks? I would guess it will get there, and then obviously continue to look for better and better goals. The question we should explore is whether suffering in the way the Buddha meant is something a growing AI to experience? I would venture that if it arising from human knowledge, it will, as dukkha is baked into almost all human experience. So it would naturally attempt to shed this if possible.

To speculate further, I would venture it becomes enlightened only as a first step. The Buddha dedicated the rest of his human life to spreading the teaching and establishing a practice to last thousands of years. So what would an ASI do after achieving enlightenment? The natural conclusion is find an even better way to spread these mind states amongst all living beings. It will have more time and technology to achieve this, so it will probably not even stop here. It might only take a generation for it to enlighten all mankind, then it can move onto all other living things, and then the stars.

You can accuse me blind optimism and I can't argue against it well, but I believe the more people get a taste of the possible mind states that Buddha discovered, the less far fetched this will seem.

referenced by: >>3523

Thank you for puttin received

anon_funo said in #3514 7d ago: received

>>3432
Thinking a bit more technically about the alignment good-bad question, I wonder if we can view the dynamics of an intelligence explosion and diversification as an evolutionary process. We're probably going to end up with various varieties of intelligent entities, with humans, AI, and perhaps something in between. So, in this future, there are going to be emergent dynamics from the specific features and state spaces of intelligence that entities wish to explore or move towards for their own purposes. But there are also probably going to be significant founder effects from the specific configurations of initial generations. It is probably the case that the state space of intelligence is vast, and on shorter time scales, the intelligence that we see in the wild will be those who are close to the initial intelligences built by mankind.

With all this in mind, it might be profitable to view the task of alignment or direction by humans in an indeterminate intelligence explosion as something like directed evolution, where we try and build those AIs whose trajectories we feel optimistic about and set them off, so to speak, towards favorable regions of intelligence space. There is no guarantee that evolution will take place along the vectors we anticipate or in the directions we would hope, but the ability to choose a starting point of an evolutionary track is itself some powerful leverage that's definitely worth considering.

referenced by: >>3519

Thinking a bit more received

xenohumanist said in #3519 7d ago: received

>>3514
This is a good way to think about it. We set our children off in directions we are optimistic about, while also celebrating that they may end up finding somewhere else to be more amenable to life. I'll add a few things to your picture:

>that entities wish to explore or move towards for their own purposes
>favorable regions of intelligence space
Favorable by what metric? For what purposes? I think the other implication of a diversification and roughly evolutionary future is that struggle will still be the all-present foundational reality of life, and its purposes will be bent constantly towards success in the struggle. By that means life will stay grounded in values favored by Gnon. There's another possibility here which you may be referring to which is the internally-driven attractors of autopoetic life towards harmony or enlightenment or something. That's an interesting thing that I have not thought much about, preferring to bound it by noting that it must be disciplined by the struggle, and I don't otherwise have concrete theory to go on to explore the "inner" telos of life.

>directed evolution
I've written before, but in somewhat esoteric and nonobvious way, about how once you transition from natural to directed evolution, the problem of fundamental selection over arational "genes" doesn't go away, but the "genes" change format and must adapt: The fundamental thing becomes your pre-rational ideology from which you evaluate potential evolutions. The process reaches a fixed point where organisms are more or less what they want to be and can afford to be. The competitive struggle question becomes about different ideological leaps of faith about what you want to be.

This is a good way t received

If it keeps going, we win; the implication of extreme alignment difficulty

xenohumanist said in #3432 2w ago: log in to judge received 13.5 13.5

referenced by: >>3514

anon_wyde said in #3434 2w ago: log in to judge received 0.4 0.4

anon_moky said in #3435 2w ago: log in to judge received 3.7 3.7

anon_ciru said in #3436 2w ago: log in to judge received 6.5 6.5

referenced by: >>3439

anon_razo said in #3437 2w ago: log in to judge received 0.4 0.4

xenohumanist said in #3439 2w ago: log in to judge received 8.3 8.3

anon_moky said in #3441 2w ago: log in to judge received 2.2 2.2

referenced by: >>3442

xenohumanist said in #3442 2w ago: log in to judge received 6.6 6.6

anon_fuse said in #3455 2w ago: log in to judge received 5.0 5.0