Fail Fast, Move On: AI Estimation - helping you understand Architectural Quality

Artificial Intelligence can be a blessing and a curse. While developing VXS, I have come upon a personal reflection exercise:
At the end of every working day, I tell my IDE AI to "please explain all the changes we have made today."
Then, I feed the output into chatGPT and ask it, "Please provide for me a conservative estimate for this backlog."
The result is usually something like this:

The Evening Ritual

Every evening I close the laptop with a simple ritual. I ask my IDE's assistant to explain, in plain language, everything that changed today: which modules moved, which interfaces shifted, which tests were added or deleted, where the seams tightened or tore. It replies with a neutral, factual log. No heroics, no storytelling, just the anatomy of the pure code work.

I take that log and hand it to an external estimator (I use chatGPT) with a fixed prompt: "Give me a conservative effort estimate for the following backlog:" The prompt never changes. The intent never changes. Only the code does. And I'm not asking for a plan or a deadline. I'm asking for a mirror.

The estimator returns a number and, more importantly, a decomposition of why the number is what it is. I archive the result alongside the day's log. Tomorrow I'll do it again. Over weeks, a pattern emerges: sometimes the estimated effort grows; sometimes it shrinks; sometimes it holds steady. The numbers are not my speedometer. They're my oil pressure gauge.

How I do it

Consistency makes this work: I use a consistent source description (the IDE's explanatory log) and an identical estimator (same prompt, same assistant). The descriptive prompt helps keep things consistent. The only variation I tolerate is that I separate either by user-facing behaviour (for anything that has user impact) or technical behaviour (things that affect structure or technical arrangements rather than users) - the point is categorical comparability, not theatrics.

My git commit logs are the source of truth. I don't talk about work that is not visible in the codebase. So it excludes anything that does not lead to a technical change: and that's on purpose. We will get to that soon.

I never doubt the estimate. I never ask for a second-guess unless chatGPT made obviously wrong assumptions. I do not think I'm a hero if the number is high. Nobody knows the real numbers - except me and my dev diary. I am interested in a single question: "What did today's work say about the architecture's ability to accept change?"

Why I do it

Most development "estimation" is theater. I don't have time for theater. I am not in a situation where it matters how long something takes - what matters is that it's done, and it works. I don't care how fast other people work - I care what I accomplish. I don't care what is easy or difficult - I care for the results I produce. So the estimates themselves are not the value.

And I don't care for quantity of output - what I am concerned with is malleability - the rate at which the system can absorb change. This one property tells me whether the system is getting better or deteriorating over time. The result might look like this:

Interpreting AI estimates as "architectural telemetry" flips the estimation narrative on its head. When the estimates rise over time, it means the system is able to absorb larger changes than before: The boundaries are clear, the tests reliably catch issues, the code is adaptive, and the coupling of components is honest. However, a decline in estimates is caused by poor changes: Poor modularity, Clean Code Violations, WET code, "unidentifiable bugs," unfixable tests, non-working features. These are all things that steal my time and result in fewer changes integrated over the course of the day.

So what I end up with is not a developer performance metric - it's an architectural quality metric, and an extremely powerful indicator of Technical Debt. This measure is much more connected to business value than anything that others measure before: If I can do 3 days of work in 1 day, it means that my engineering efforts are highly valuable. If it were 1/2 day, I'd better throw away the entire codebase and start from scratch.

Additionally, this routine also helps me check cognitive drift. Immersion blurs judgment. The AI, as an external estimator, given a fixed description, stays objective enough to challenge my gut. It stays neutral and unbiased. It has no incentive to make me feel good or bad about the work I do. It is pure consequence. This removes the emotional justification from the process.

How you can use this

You do not need a specific tool to adopt this. All you need is a rhythm: a factual daily log, a consistent external estimator, and a small note about your architectural intent. Done steadily, the series becomes a quiet diagnostic of design health. You will discover whether your system starts resisting meaningful change, and you can see the payoff of investments into code quality long before defect curves become visible.

There's a temptation to optimize the number, and managers may be tempted to look for high numbers. But: Goodhart's Law is inevitable if you try. The entire approach only works because it's not about making numbers go up. The number is just a mirror. And it helps you reflect on key design questions: "Why was this so hard, when AI said it should be easy?" - "What can we do to make it easier without getting ourselves into a pickle tomorrow?" - "What prevents or causes deterioration of change integration?" The answers will relentlessly point out the weak spots in both your architecture and your development approach.

Over time, you will notice the codebase becoming calmer. Working in this way and asking these questions frequently leads to Relentless Improvement. No tests? You did poor. Tight coupling? You blew the estimate. Hardcoding? You can't keep it. And that's the value of the exercise: a leading indicator of architectural quality that you can gather with nothing more than honesty and repetition.

Tips & Tricks

When the estimates of today's work increase after a deliberate cleanup, that's the right track. Keep going. You've unlocked leverage: the same unit of intent now reaches farther into the system.

When estimates decrease without special causes (e.g., you had a lot of meetings, were sick etc.) - stop and look for debt: hidden coupling, missing tests, undocumented contracts.

When estimates stay constant while scope expands, confirm that your boundaries are carrying their weight; if they are, you've found equilibrium worth defending.

The point of the AI Estimation metric is not to go faster. The point is to stay flexible. True agility in software development is "the ability to achieve more tomorrow than you could today, with less sweat or fear." When used in this way, AI estimates will tell you whether you're creating or losing that freedom.

Fail Fast, Move On

Pages

Monday, October 27, 2025

AI Estimation - helping you understand Architectural Quality

The Evening Ritual

How I do it

Why I do it

How you can use this

Tips & Tricks

No comments:

Post a Comment