The problem with working with not-good-but-works design, both technical and product, is the cascade effect. Take two examples:
- You make a popover with ugly borders. It might be the first one or the second, it does its job but it doesn't really look good, with 2 bigger problems in it: (a) like the building with one broken window, it lowers the overall quality gates for the whole product and (b) it gives poor context for all subsequent agents that will work with popovers in the future
- The backend is even more complicated because even well-coded, autonomous features don't make them good for the overall system, and these features are much larger than on the frontend. Let's say you're trying to ship a new flow for automatic background refresh, you have a good plan and the latest Codex on execution, but Codex didn't get the whole system context, as there already was a partial implementation of similar refresh logic, and you end up having two same-function parts with different conventions working at the same time
There is a big difference between incomplete software that does the thing the right way, and complete software that does the thing in a problematic way, even tho both are broken. The first one is a matter of low-hanging correction or proper subsequent follow-up; the second is a ticking bomb that needs to be taken care of the right way eventually, multiplying the initial efforts
To better understand this point, we need to agree on this fact in the LLM-assisted era:
writing software is effortless, the only complexity lies in good plans and global, deep system understanding
Note that I'm not talking about premature optimizations. The points above are not about trying to scale the system for high load, but having the system work as intended at this concrete given moment, without infecting it with poor choices that could have been rejected on review or contemplated given the better system understanding
As customer trust is built through high-quality service, so too do high-quality contributions build developers' trust. If 80%+ pull requests can't guarantee their correctness and get comments, the review process is required
To make reviews faster, I intentionally do not leave single nitpick comments like moving LLM calls to a shared prompt module, but when there is another required change like failing tests, architectural problems, or even just merge conflicts, it doesn't really make sense to skip low-hanging fruit since the agent will be called on this change set anyway
I also do not believe you can iterate faster by eliminating review processes in a large system with shared ownership and mixed experience levels. Just like eating hamburgers to get energy before a marathon, eventually you'll regret it
Another point to keep in mind is that the longer you build the product, the more complicated it is to build the next step. With each next feature complexity, you add an ever-growing support complexity to the existing product and tech behind. That's the #1 reason companies are having hundreds of engineers while shipping a few features per year: the bigger the product, the more complicated it is to sustain it
The only ways to overcome are (a) invest in quality stuff from the beginning and (b) invest in hiring in advance; successful scale-ups tend to have both
There are a few points that help iterate faster without losing the quality:
- better scoping of the developer's responsibilities
When an engineer owns a single subsystem end-to-end, they know every bit of it, and their cognitive bandwidth is well set to fine-tune LLM outputs as intended. It's harder to scope it clearly within the same repository boundaries, but well-bounded domain ownership is a good example of the scoped responsibilities we should strive for
- increasing overall developers' experience
If an LLM proposed tracking a graph-shaped relationship in Mongo, but you knew you had Neo4J for this purpose, you wouldn't commit to this solution, even without strong system familiarity. We can increase our overall experience by stuff like having more learnings-share sessions for newly adopted tools and frameworks; having more pair-programming sessions with pairs of Senior<>not-Seniors; by setting its-okay-to-fail-fast and no-DMs-policy culture
- gating the LLMs with proper functional and architecture testing
Functional testing we have. Architecture testing is a big, great task to take care of, and this is only my fault that we didn't establish it earlier
- guiding LLMs with better system context
We partially have it, but it makes a lot of sense to make a full revision of comments, docs, and agents.md, making sure every agent is following the conventions
- investing in observability early
If something goes wrong, you don't want to dig into logs and traces yourself. The more observability we get over the overall system, the easier it is to fix, and therefore, the easier it is to cover the faulty behavior with more tests and benchmarks. I think we have a good base now
There is a big difference between running to a concrete milestone and executing a long-term vision, and you approach will morph a few more times as you scale the product and ambitions. I'm personally psyched to keep rotating in such execution circles for life