Software Engineering as a discipline often doesn’t feel very mature. Even though it is an engineering discipline, much comes down to gut feeling and reinventing the wheel. Most engineering disciplines have a more scientific and data-driven approach compared to software, but a lack of data leads to gut feeling. I’ve discussed Tech Debt in ♻️ Sustainable Software, but I think part of the issue is insufficient data. If it’s easier to show the true cost of Tech Debt, the conversation will change I think. That is where the idea of measuring software complexity comes in.
Static Code Analysis tools like SonarQube and Detekt exist that point out Software Complexity like Cyclomatic Complexity or too many input parameters. Those tools are usually limited to pointing out very localised complexity, which often leads to the warnings being ignored.
Goal
Show that high-quality software is worth the investment.
Software Quality is essential but often overlooked or treated as a lower priority.
Concept
Rely as little as possible on gathering manual data from engineering, but rather on other metrics that can easily be calculated from existing systems:
- Runtime data
- How often are lines executed
- How often are certain branches taken
- This can be based on automated tests too, although it’s not 100% representative
- Version Control data
- Code hotspots (= how often are certain files updated)
- Code hotspots + code coverage by automated tests
- How much do tests have to change as a result of a code change (a lot of changing tests is usually a bad sign)
- Unchanged code (good code doesn’t have to change because it works, bad code usually isn’t changed because nobody dares to touch it)
- Code hotspots (= how often are certain files updated)
Since most data gathering should be based on runtime data of tests + version control, it can be applied to existing open-source projects as a basis. A list of big open source projects can be found in this blogpost by GitHub on open source.
It makes most sense to focus on the biggest programming languages right now. GitHub has compiled a list of top 10 programming languages, which I can use as a baseline.
Other data can be useful too, but is probably harder to realistically gather:
- Why code changed (new feature vs a bug)
Proof of Concept
A static analysis tool written in 🧑‍💻️ Kotlin focusing on the biggest open-source projects. ⌨️ TypeScript is one of the biggest languages right now, so the initial version will focus on that language. The open-source project used as an initial reference point is vscode.
This means I can’t dog-feed, but I prefer Kotlin as a language to work in.