Engineering Leadership Trade-offs: Build vs Buy, Tech Debt, and Rewrite vs Refactor

Apr 7, 2026 8 minutes to read

EM interviews often end with “the harder framing” — questions about judgment, decision-making under pressure, and how you navigate disagreement. These don’t have right answers; they have reasoned answers that demonstrate how you think. Here’s a framework for the most common ones.

Build vs Buy

The question sounds simple; the answer has layers.

The framework:

Build when:

This is a core differentiator — it’s what your product does, and doing it better than a vendor is a competitive advantage
The off-the-shelf solution is a poor fit (you’d spend more customizing than building)
Data or security requirements make a third-party solution unacceptable (regulated industries, data residency)
The vendor is a single point of failure for your core business

Buy when:

This is undifferentiated infrastructure — logging, payments, email delivery, search, identity
The vendor has years of reliability data you can’t replicate quickly
The total cost of ownership (build + maintain + evolve) exceeds vendor cost
It moves you faster to your actual differentiating work

The hidden cost of build: Build has ongoing maintenance — every feature, every bug, every on-call incident, every security patch is yours. The “2 weeks to build” becomes “2 weeks to build + 2 years to maintain.”

The hidden cost of buy: Vendor lock-in, pricing changes, feature gaps that force workarounds, API changes that break your integration, vendor going out of business.

The EM answer: “My default is buy for commodity concerns — payments (Stripe), auth (Auth0/Cognito), observability (Datadog), email (SendGrid). Build when it’s genuinely core and when buy doesn’t meet the bar. The question I ask is: ‘Five years from now, do we want to be maintaining this or building the thing that’s actually our product?’”

Evaluating a New Technology Proposal

A senior engineer wants to introduce a new technology. How do you evaluate it?

The questions to ask:

What specific problem does it solve that we don’t already solve? If the answer is “it’s newer” or “more engineers are using it,” that’s not a problem definition — it’s trend-following.
What’s the total cost of adoption? Migration of existing code, new expertise required, CI/CD changes, monitoring, on-call runbooks, licensing.
What’s the blast radius if it doesn’t work? Can we roll it back? Is it isolated to one service or does it require system-wide changes?
Who will own it? Every new technology needs an owner — someone who stays current, makes upgrade decisions, and is accountable when it breaks.
What’s the reversibility? Technologies that are hard to remove (becomes the primary DB) deserve more scrutiny than ones that are easy to swap out.
What’s the community and ecosystem trajectory? Betting on a declining technology is worse than using a “less cool” stable one.

The EM posture: Take proposals seriously — senior engineers are closest to the technical problems. But distinguish between solving a real problem and technical novelty. Run a time-boxed proof of concept with explicit success criteria before committing.

Tech Debt: Measuring, Prioritizing, and Selling It

What tech debt actually is: A deliberate or accidental decision to ship faster now at the cost of more work later. Not all tech debt is bad — some is intentional (MVP shortcuts to validate before investing). The problem is unintentional debt (code that was written fast and never cleaned up) and ignored debt (known issues never prioritized).

Measuring it: You can’t put an exact dollar figure on it, but you can measure proxies:

Cycle time for changes in the debt area (slow → high debt)
Bug rate in the debt area (high → quality debt)
Developer sentiment in retrospectives (“every sprint we fight the same fire”)
Time spent on unplanned work

Prioritizing it: Not all debt needs to be paid. Pay down debt that:

Is in the critical path — touched every sprint, high blast radius when it fails
Slows delivery measurably — engineers say “this would be easy if not for X”
Has reliability implications — known instability, poor error handling, missing monitoring
Is security debt — vulnerabilities that have been deferred

Don’t pay down debt that:

Is in rarely-touched code (stable legacy that works)
Costs more to fix than to tolerate
Will be replaced by a planned initiative anyway

Selling it to the business:

Translate to business impact: “This component slows every feature by 2 sprints. In 6 months, we’ll ship 3 fewer features per quarter than we could. Fixing it takes 4 weeks and unlocks this pace permanently.”
Don’t say “it’s the right thing to do.” Say “here’s what it’s costing us and here’s what we get back.”
Propose a cadence: 20% of each sprint for reliability/debt, rather than a “debt sprint” that the business sees as a sprint with no value.

Velocity vs Quality: The Tension

The business is pushing hard and wants features faster. You’re concerned about quality. How do you navigate?

The honest framing: Velocity and quality are in tension in the short term, but they’re correlated in the long term. Technical debt compounds. A team that ships 20% more features this quarter by cutting corners may ship 40% fewer features next quarter because of the bugs and slowdowns those corners created.

The data argument: “Our test coverage has dropped from 75% to 50% in the last quarter. Our production incident rate has tripled. Here’s the trend. If we continue at this pace, we’ll spend more time fighting fires than shipping features in 6 months.”

The practical negotiation:

Agree on explicit quality gates — a feature is done when it has tests, monitoring, and a runbook. Non-negotiable.
Make technical health a quarterly OKR, not just velocity.
Push back on scope, not quality — “We can do features X and Y at quality, or X, Y, Z at lower quality. I recommend X and Y.”

Team Disagreements: How to Resolve Without Losing the Dissenting Side

When the team is split between two technical approaches:

1. Surface the actual disagreement. Often teams think they’re disagreeing about the solution when they’re actually disagreeing about the problem, the constraints, or the criteria for success. Get these explicit.

2. Define decision criteria together. “We should choose the option that minimizes time-to-market, fits our team’s expertise, and is reversible within 6 months.” Now evaluate both options against the criteria.

3. Time-box the discussion. Endless debate is worse than a suboptimal decision. “We’ll discuss this for one more meeting, then decide.”

4. Make it reversible if possible. Start with the lower-stakes option. If it fails, course-correct. Avoid commitments that lock you in.

5. Separate the decision from the person. “Your proposal lost” feels personal. “We chose option B and here’s why” is professional. Acknowledge the merits of the losing option explicitly.

6. Give the dissenting side ownership. “You raised the strongest concerns about option B. I’d like you to own the monitoring strategy so we catch the failure mode you’re worried about early.” Converts skeptics into invested participants.

Rewrite vs Refactor vs Leave Alone

The most fraught decision in software. The rule of thumb attributed to Joel Spolsky: “Never rewrite from scratch. It’s the single worst mistake a software company can make.”

Why rewrites fail:

The existing system has encoded years of business rules, edge cases, and bug fixes that aren’t documented. The rewrite loses them.
Rewrites take 2-3x longer than estimated. The business expects “6 months” and gets “18 months.”
By the time the rewrite is done, requirements have changed.
The rewrite team writes code that will eventually become the legacy system the next team wants to rewrite.

When rewrite is legitimate:

The technology stack is genuinely end-of-life and unsupportable
The architecture is fundamentally incompatible with current requirements (can’t add features without breaking everything)
The cost of maintaining the existing system exceeds the cost of replacement
You’re doing a Strangler Fig (incremental rewrite, not big bang)

Strangler Fig pattern: Route traffic for individual features to the new system progressively. The old system shrinks; the new system grows. No big bang cutover. Much safer than “we go live on day X.”

Refactor when:

Specific modules are painful and well-understood
The overall architecture is sound but the implementation is messy
You can refactor incrementally with tests as safety net

Leave alone when:

The code works, nobody touches it, and the risk of introducing bugs exceeds the aesthetic cost of messy code
“If it ain’t broke” is a valid engineering principle for stable code

The Wrong Technical Decision Retrospective

“Tell me about a technical decision you made that turned out to be wrong. What did you learn?”

What interviewers are looking for:

Self-awareness and intellectual honesty
A structured understanding of why it was wrong (not just “it didn’t work”)
What you changed in your decision-making process afterward
That you don’t repeat the same class of mistake

Framework for the answer:

The context and the decision you made
What signals you had that it might be wrong (admit you had some)
Why you made it anyway (time pressure, confidence bias, missing information)
What happened (how did it fail, what was the impact)
What you’d do differently (specific, not “I’d be more careful”)
What process change or heuristic you now apply

The worst answer: “I don’t make wrong technical decisions.” The second worst: “We moved fast and broke things, that’s how you learn.” The best answer demonstrates genuine reflection and a specific change in behavior.