Garbage In, Garbage Out
On the alignment problem in artificial intelligence
There is a principle every engineer knows. It is the first principle of systems design, so elementary that it functions less as a rule than as a background assumption: garbage in, garbage out. The quality of a system’s output is bounded by the quality of its input. You can optimize the pipeline indefinitely. You cannot recover from bad data at the source.
The AI safety community understands this. They apply it with extraordinary rigor to training data, to benchmarks, to reward functions, to every measurable input in the systems they are building. They have developed entire subfields — interpretability, robustness, red-teaming — dedicated to the proposition that what enters the system determines what the system becomes.
They have not applied it to the most important input of all.
⁂
Last month, a survey of 59 AI safety summit attendees reached a conclusion that surprised almost no one who has been paying attention: risks from aligned AI — power concentration, authoritarian lock-in — deserve far more attention than they currently receive. Strong consensus across the field’s leading thinkers. The people most focused on whether AI will do what we want have identified, as their primary concern, the question of who “we” is.
They are right. But they do not yet see why.
The alignment problem, as currently framed, treats human values as an input to be specified. A target function. A preference profile to be measured, formalized, and optimized against. The entire technical program of AI alignment — utility functions, corrigibility proofs, value learning — rests on this assumption: that human values exist as a stable quantity prior to the system that will act on them, and that the problem is one of accurate measurement and faithful implementation.
This is the garbage. And it goes in at the foundation.
⁂
The assumption was not invented by the AI safety community. They inherited it from economics, which inherited it from a particular reading of Bentham — the idea that utility is a quantity that exists prior to choice, that preferences are something the world contains and the analyst measures, that the job of a sufficiently sophisticated system is to know what you want before you want it and deliver it efficiently.
This is not what preferences are.
Preferences are not a static input. They are not a quantity waiting to be measured. They are something we do — enacted in the choosing, constituted in the act of will, alive only in what Bergson called the durée and what I will call simply the eternal now. You do not have preferences the way you have a blood type. You have preferences the way you have a conversation — they exist in the doing, they change in the doing, and the moment you freeze them into a profile you have replaced the living thing with a specimen.
This distinction is not philosophical decoration. It is the load-bearing wall of the entire argument.
If preferences are a measurement problem, then a sufficiently sophisticated AI can in principle know what you want better than you do. It has more data. It has better pattern recognition. It has no motivated reasoning. The alignment problem becomes a technical problem: build the system that measures accurately and implements faithfully. The question of who specifies the values is an engineering question about data quality.
If preferences are an enactment problem — if they are the site of human agency rather than the output of human psychology — then no measurement system, however sophisticated, can capture them without remainder. What gets left out is not noise. It is the act of choosing itself. The system that claims to know your preferences before you choose them has not solved the alignment problem. It has eliminated the thing that needed to be aligned with.
It has replaced the human agent with a preference profile.
⁂
The AI safety community has a name for their top-ranked concern: authoritarian lock-in. A small group of technically sophisticated people, reasoning from what the philosopher Thomas Nagel called the view from nowhere, deciding the future of all conscious life. They have identified the symptom with precision. They have not yet traced it to the root.
The root is this: the alignment framework does not just fail to prevent authoritarian lock-in. It provides the philosophical architecture for it.
Someone has to specify the values. Someone has to decide the alignment target. Someone has to be in the room. The framework requires this. It cannot function without a prior specification of what humans want — which means it cannot function without someone deciding what humans want. The more sophisticated the system, the more consequential that decision. The more consequential the decision, the more power concentrates in the room where it gets made.
This is not a bug. It is the logical consequence of misconceiving the input.
⁂
The question then is what the correct input actually is — and here the answer is less comfortable than a better measurement system.
You cannot align AI with human values because human values are not a fixed target. They are what humans do when they are free to choose. They are enacted in the eternal now of human agency, revised in light of experience, contested in the space of democratic deliberation, never complete and never final. Any system that treats them as complete and final has already committed the authoritarian move — has already decided, in advance, that the conversation is over.
The only alignment that preserves human agency is not a better preference profile. It is a better political system.
Not because democracy is efficient. It is not. Not because majorities are wise. They are not always. But because democracy is the only known error-correction mechanism that does not require any single node in the system to be correct. It encodes uncertainty into its operating procedure. It builds revisability into its structure. It treats the question of what we want not as an input to be specified but as a conversation to be continued — indefinitely, contestably, by the people who have to live with the consequences.
This is not a retreat from the technical problem. It is the correct technical specification. The alignment target is not a utility function. It is a constitution. Not a preference profile. A process.
⁂
The AI safety community already knows this. The summit data proves it. Authoritarian lock-in was not a minor concern. It was the top-ranked one — the thing that 59 of the field’s best minds identified, nearly unanimously, as the risk that deserves more attention than it currently receives.
They have identified the problem. They have not yet identified what produced it.
What produced it is the assumption they inherited from economics and never questioned: that preferences are something the world contains and the system measures, rather than something humans enact in the act of choosing. That the alignment problem is an input quality problem solvable by better data, rather than a category error about what kind of thing values are.
Garbage in, garbage out.
The garbage is not bad values. The garbage is the premise that values are the kind of thing that can be specified at all — by anyone, for everyone, in advance, from the view from nowhere.
Fix the input. Not by finding better value-specifiers. By building systems accountable to the only process that doesn’t require any specifier to be right: the open, contestable, democratic process of people choosing, together, in the eternal now.
That is the alignment target.
Everything else is garbage.





Clear and succinct. Love it!
But oh what a problem when wielded by the better-than bros. 🤖
Yes! This is really great. Having worked on the periphery of large data analysis since the early '80's, observed assessments based thereon, the 'founder factor' bias was palpable. It mirrors your very coherently laid-out Garbage In, Garbage Out analysis. Touche!! And, there are always some very interesting human stories to illustrate this point... and colorfully, as well :) Thanks, Mike!