Alignment to Evil
"Aligned to who?" from the coiner of "notkilleveryoneism"
One seemingly-necessary condition for a research organization that creates artificial superintelligence (ASI) to eventually lead to a utopia1 is that the organization has a commitment to the common good. ASI can rearrange the world to hit any narrow target, and if the organization is able to solve the rest of alignment, then they will be able to pick which target the ASI will hit. If the organization is not committed to the common good, then they will pick a target that doesnât reflect the good of everyone - just the things that they personally think are good ideas. Everyone else will fall by the wayside, and the world that they create along with ASI will fall short of utopia. It may well even be dystopian2; I was recently startled to learn that a full tenth of people claim they want to create a hell with eternal suffering.
I think a likely way for organizations to fail to have common good commitments is if they end up being ultimately accountable to an authoritarian. Some countries are being run by very powerful authoritarians. If an ASI research organization comes to the attention of such an authoritarian, and they understand the implications, then this authoritarian will seek out control of the future activities of the organization, and they will have the army and police forces to attain this control, and, if they do solve the rest of alignment, the authoritarian will choose the ASIâs narrow target to be empowering them. Already, if DeepSeek and the Chinese government have a major disagreement, then the Chinese government will obviously win; in the West, there is a brewing spat between Anthropic and the US military regarding whether Anthropic is allowed to forbid the US military from using their AI for mass surveillance of Americans, with OpenAI, xAI and Google seemingly having acquiesced.
Therefore, even if progress towards ASI is shut down, there doesnât seem to be a very good off-ramp to turn this advantage into utopia. The time bought could be used to set up an ASI Project that is capable of solving alignment, but this Project could be captured by authoritarians, and so fail to be committed to the common good, leading to not just extinction but dystopia. Any shutdown would likely be set up by governments, and so the terms of any graceful off-ramp would be up to governments, and this does not leave me cheerful about how much of a finger authoritarianism will have in the pie.
Cutting against dystopia fears is that authoritarians are stupid. Alignment is a very precise technical problem - if your subordinates come to you with gloomy news about alignment, and then you fire them for giving you bad news, then you donât solve alignment, you just die. Thereâs a level of pride-swallowing and humility in the face of the problem that seems harder for authoritarians to stomach than people who just want whatâs best, whatever form that may take. So rather than extinction or dystopia, itâs extinction or extinction.
Well, maybe. Itâs possible for authoritarians to solve alignment; there is code that, when executed, will output the string of actions that gives the authoritarian all the power. They can get this code written by hiring for loyalty and competence. Stalin hired Lysenko and fired geneticists, and so failed to increase agricultural production; but Stalin also hired nuclear physicists, and the Soviet Union successfully acquired atomic bombs.
One perverse way that this might happen is if there is some group that is dedicated to humans surviving, and works hard to make this happen. They notice that, if theyâre disloyal, then the authoritarian will hire some incompetent person, but if theyâre loyal, then theyâll be hired. And so they hand the world over to the authoritarian, carefully maneuvering around anything that might end up with the world being destroyed. I think this is a dangerous dynamic that has emerged in lesser form, people propping up damaging systems that hurt the people propping them up to avoid the disaster of failure.3 And thereâs no non-galaxy brained way out of this. Either you cooperate with the authoritarian because the stakes are Too High, or you let the world be destroyed by the authoritarian because decision theory.
Iâm not quite sure how MIRI, my favourite organisation along the axis of being right about things, squares this circle. They advocate for a Shutdown, but I donât see a very good story for how they get an off-ramp from this shutdown that leads to anything good. From what I can tell, their publicly stated stance is that, at this stage, adding a âno authoritariansâ rider is preemptively shrinking the coalition. The goal is for not everyone to die. It is extremely tempting, and every cause area that doesnât try not to do this ends up doing this, to add in âeveryone doesnât die and no authoritariansâ, or âeveryone doesnât die and sustainabilityâ, or âeveryone doesnât die and less immigrationâ, or âeveryone doesnât die and LGBT rightsâ, which attach their riders to the very important cause to promote themselves at the cost of the cause, but theyâre not necessary.
On shrinking the coalition: it seems that for humans to survive starting from the world in 2026, the Chinese Communist Party actually does need to be on board, so that they can shut down ASI research in China and also avoid freaking out if someone needs to shut down ASI research in another unwilling country. A âno authoritariansâ rider at this stage would genuinely alienate them in a way that dooms the world. If, later, when everyone hasnât died, it turns out no authoritarians is a good idea, then this can be negotiated for separately, rather than as a package deal with survival.4
On preemptive: right now, there is no Shutdown, and therefore of course thereâs no off-ramp for the nonexistent shutdown. This all hypothetical. Itâs a nice thought that deciding on the shape of the future now is something weâre in a position to do, and we can choose now to add âno authoritariansâ as a feature. But the world we live in is one that is going to be destroyed by superintelligent ASI. Talking too much about the specifics of the off-ramp makes it feel a lot like thereâs a big prize to be won from ASI, one you can get by driving off the off-ramp early and shrewdly defecting, but continued ASI research on the current path doesnât give you a prize, it just kills you.
Might I choose now, anyway, that I would rather the world end than risk an authoritarian hijacking the off-ramp from shutdown and setting up a dystopia? Despite the warnings to keep the coalition small, continue to insist on no authoritarians, making sure to not to think of it as a feature of the future I get to pick as the controller of the future, but as a condition on survival?
That sounds awfully galaxy-brained, and it sounds a lot like deciding the fates of a lot of people.5
Maybe thatâs the key, accept that the decision isnât now. There isnât actually a silver bullet for the final victory of Good over Evil, there isnât an ASI ready to go controlled by the parties of Good, if Good and Evil donât agree on shutting down ASI research then they both die. But Good and Evil have been in their conflict, and going into the future they will continue to be in their conflict, Good getting its victories here and there, Evil getting its victories here and there. And Good wants to win, itâs always wanted to win, so itâll keep trying. Thereâs no need for me, here, to helpfully advise Good that really it ought to try to win. Instead I ought to just get my hands dirty and help it win, same as it always has been.
So, by the time the world is in a position to use its off-ramp from a global shutdown of ASI research, Good will have done everything that it could have done to win, and will be in the best position that it thinks that it could be in, given the resources and information that it had. And then the forces of Good, as they are in the future, can decide how best to manage the possibility that we drive off this ramp with an ASI aligned to Evil. Whoâs to say thereâll still be authoritarians, all the way in the future?
Iâll give a concrete definition of utopia here taken from superintelligent AI is necessary for an amazing future, but far from sufficient: something at least as good as glorious merely-human civilizations, where people's lives have more guardrails and more satisfying narrative arcs that lead to them more fully becoming themselves and realizing their potential (in some way that isn't railroaded), with a far lower rate of bad things happening for no reason.
One example that stuck with me: Trump tore up documents. But White House staff, who cared deeply about document preservation, went to great lengths to tape them together. And so Trump continued tearing up documents, protecting him from facing any legal consequences from destroying official records, because the Trump administration as a whole was keeping the records intact and was therefore legally compliant.
The Chinese Communist Party is going to be pretty miffed that they die if they donât hold back from adding a âand Xi Jinping Thoughtâ rider to survival, if it helps.
Not that I have such power. A common feature of stupid edgy thought experiments - like this one, to some extent - is they pretend you have a great power over the fates of unwilling people, and then tell you not to think about anything else you could use with that power.
Like, if an ASI shutdown was personally up to me, then that implies a world where Iâm a supergenius, or a billionaire, and doubtlessly a playboy philanthropist. Iâd either have the ears of major world governments, or be able to obtain them. And supergenius billionaire government-influencing Tetra has much better options with regards to fixing the world!
Small Tetra lacks such angles, but small Tetra could, for example, recommend that people donate to charities other than MIRI to increase the probability of early extinction, and withhold any future efforts to do things that help with alignment or pausing AI. Small Tetra cares about the margins of adding tiny amounts of extra effort on top of the existing world, because thatâs all she can do.
