Trust Will Define the AI Era | Guðmundur Einarsson

Hiring evaluates two dimensions. Performance is the easy one. Can they do the job? How fast, how well? You test for this with portfolios and interviews.

Trust is the hard one. You ask around and check references. You collect stories. You put people on probation. Trust builds over months and evaporates in a day.

With AI agents, we’re repeating the same mistake, except worse. The productivity is visible. A developer with an AI coding agent produces in hours what used to take days. Nobody is asking about trust.

The Plane Question

A junior software engineer uses AI to write flight control software for a transatlantic plane. Would you board it?

A senior engineer writes the prompts instead. Would you fly?

A senior engineer with years of avionics experience writes the prompts. Would you get on?

Somewhere in that progression, people start saying yes. The AI’s productivity gain is the same across all three scenarios. The domain knowledge and the trust profile of the developer are worlds apart.

Large teams write flight software with layered review and verification at each stage. No single person’s blind spot can bring down a plane. The structure manages trust.

Roles Are Collapsing

Software development is changing and most people haven’t caught up.

In the old model, roles separated concerns. A product manager defined scope. Developers wrote code. Testers verified it worked. Each checked the others. The product manager reined in scope creep. Testers caught what developers missed. Developers pushed back on specs that didn’t make sense.

With AI-assisted development, those roles collapse onto one person. You describe what you want, the AI produces something, and you review it and iterate. You become the product owner, developer, and tester at once.

This is spec-based development. You write intent, you get output, you evaluate. The people who thrive here were already good at two things: thinking like a product owner and being meticulous about testing.

A good engineer might be meticulous about one subsystem. Now they need to hold the full scope, the thing a team used to hold collectively. Many people don’t have that skill. They never needed it.

The Bus Factor Gets Worse

Collapsing roles onto fewer people creates information isolation. One developer carries context that used to live across a team. The classic bus factor applies.

One person can ship fast with AI assistance. If they leave or burn out, the work has no owner. In the old team model, responsibility spread across several people. Someone else had seen the code and understood the tradeoffs. Today, a project stalls until someone replaces them.

The replacement won’t share the original developer’s vision for the product. AI tools let them refactor fast, but refactoring without shared context breaks things that used to work. The code gets reshaped to fit a new mental model, and subtle assumptions from the original design get lost.

Edmond Lau wrote about the dangers of one-person teams in 2014, arguing for a minimum team size of two. His reasoning: solo work kills feedback loops, concentrates knowledge in one head, and stalls the moment that person is unavailable. AI tools are making the problem worse. Solo work feels productive enough that teams stop questioning the structure until someone leaves.

The risk extends beyond individual projects. AI makes solo work feel viable, so organizations spread people thinner across more initiatives. More parallel efforts fragment the vision. Each person pulls in their own direction, and the shared sense of where the product is heading weakens. Teams need to push harder to maintain focus and resist the temptation to staff ten one-person projects when five two-person teams would produce better outcomes.

Decision Fatigue and Work Slop

Collapse a team’s worth of decisions onto one person and decision fatigue follows. Pair that with a tool producing output faster than you can absorb and you get “work slop”: output you have no time to review or integrate. Features ship without hard questions about edge cases. The 80% version looks good enough. The remaining 5 to 10% that separates a prototype from production gets hand-waved away.

Part of this is a learning curve, and that’s fine. We still need to figure out how to use these tools. Part is structural. Teams produce faster than they can evaluate, and the gap between the two is where things break.

Build quality gates early, before effort compounds and sunk cost makes it painful to discard work. Sorting viable AI output from noise is becoming a core competency. Filter early or drown in half-finished work.

Trust Doesn’t Scale With Output

Think of a senior engineer you trust. They’ve been careful and meticulous for years. You know their work. You trust their judgment.

Give them an AI agent that 10x’s their output. Do you still trust them at the same level?

You trusted their process: how they caught edge cases and thought through failure modes. Output volume changes. Does scrutiny scale with it? Or does the volume dilute the thing that made them trustworthy?

You calibrated your trust to a certain pace of work. Change the pace and you need to recalibrate.

What This Means

AI-augmented productivity will create value and disasters, in the same organizations, at the same time. The Challenger didn’t explode because the engineers were incompetent. Institutional confidence overrode individual judgment at a critical moment.

Trust frameworks inside companies and teams haven’t kept pace with the tools. The companies that get this right will treat trust as a variable they measure. At what level of AI assistance do they require additional review? When does a single developer’s output need a second pair of eyes because the process has changed, even if the person hasn’t? How do they keep decision fatigue from eroding the judgment they depend on?

We need to figure out how teams collaborate in this era. One approach: cross-testing cycles before merging to main, where developers review and test each other’s AI-generated output. This rebuilds the multi-perspective check that old team structures provided. It spreads context across people again, reducing the one-person-team risk.

Further out, AB testing changes how software improves. Once a product reaches stability, an agent can propose changes and measured results decide what ships. This is one way to build trust in automated decisions: let the data validate what the agent produced. But AB testing needs a large enough user base to produce meaningful results. Small products or internal tools won’t generate the traffic to distinguish a real improvement from noise. Someone still has to set direction; an agent won’t continue a product vision on its own. But for products with the scale and the testing infrastructure, agents that run experiments and ship changes based on measured outcomes are coming. Software that improves itself through controlled experiments will become standard practice.

Developers building with AI every day need to get comfortable holding back when something feels off, and saying “I don’t trust this output yet” out loud.

Few people measure that instinct. They should start.

Written by a human, refined with AI.