A software system is a bit like an onion. As in, a large enough software system will have many layers. There is the core, on which many different parts of the system depend. The outer shell of the onion represents the parts of the system that depend on the layers underneath, but nothing depends on it.
Looking at a software system this way makes it clear that not every part of the system has an equally big impact on the system as a whole. If thereās an issue at the core, this will potentially impact every single other part of the system. Alternatively, if thereās an issue with the outer shell, this will only impact that outer shell because nothing depends on it.
The story doesnāt stop at explicit issues or bugs though. Letās say the core contains a very bad abstraction for its underlying concept, leaking implementation details or requiring users of the core to jump through hoops to use the underlying functionality. This will have an impact on the layer that directly depends on the core, but thereās a reasonable chance the impact will spread over the entire system. The onion has a rotten core.
Reasons for a rotten onion
Nobody wants to end up with a system like this, but it ends up happening anyway. I think there are three main reasons for this:
- Lack of clarity: Itās not always clear what is core or it accidentally grew naturally over time
- Lack of segmentation: The core parts arenāt treated differently from the rest of the system
- Lack of investment: Issues at the core are way more costly to fix
Lack of clarity
Itās not always clear how many dependents there are for the piece of logic you are looking at. If you are working within a monolith or mono-repo, you can find all references to a specific piece of logic. When you are working in a distributed system or the core is exposed as a library, this becomes more complicated.
What can also happen is that certain functionality is implemented on the outer layer, but gradually over time, more and more other parts of the system start depending on it. This way, the functionality starts moving more and more towards the core, even though it didnāt start there.
Lack of segmentation
Organisations have a certain engineering culture which can span many topics, such as testing culture, engineering best practices, delivery velocity expectations, design processes,ā¦ The list goes on. All teams within the organisation follow similar rules, even though the potential impact of their work can vary wildly.
Lack of investment
Itās often a hard sell to fix issues that are very expensive to fix but donāt have immediate value. This issue becomes even worse because itās nearly impossible to put a number on the long-term costs. Once decisions in a big enough company have a large enough impact, numbers are often required to make a case for the investment. Without concrete numbers, the chances of these investments being made are simply very small.
Cultivating a healthy onion
I believe the solution to these issues is to be more explicit and to trust the judgement of the people working on the core systems.
Clarity
Itās important to have an up-to-date architectural overview of the system at all times. This makes dependencies explicit and gives the opportunity to explicitly label parts of the system according to their importance or how close to the core they are. This also makes it clear where to put the best engineers, which will come back in āInvestmentā.
Segmentation
Itās okay to have deviating practices for different parts of the system. Having clarity on which parts of the system are considered core makes it easier to know where these deviations are applicable. For example, spend more time on up-front design, have stricter testing practices or simply have less delivery pressure on the teams working on the core.
Investment
Engineers are humans and humans make mistakes. Regardless of how much of an effort was made on clarity and segmentation, eventually, there will still be a need for large investments. Instead of requiring engineers to come up with a business case, trust their judgment. If you donāt trust their judgment, why would you let them work on the most important parts of your software anyway?
Conclusion
Engineering is hard, especially when working in large-scale systems. There will almost always be core systems that many others depend on. Itās important to keep these core systems healthy to avoid the other layers of the software onion becoming rotten too. A healthy software onion can be achieved by bringing clarity as to which systems are core, by allowing different engineering practices for core systems and by not being scared to make large investments.