A reliable strategic model will inherently incorporate Key Performance Indicators (KPIs) for gauging success. Peter Drucker's philosophy, "What gets measured gets managed," resonates with managers and leaders across all industries, including software engineering. Over time, various metrics have been explored to encapsulate the progress, outcomes, and business value of software. However, the question remains: which are the most valuable and appropriate ones?
This paper reflects on experiences from my past projects, insights from leaders and peers, and scholarly articles on Metrics, Observation/Monitoring, and their impact. While we will delve into 'good' metrics, this is not a magic list of metrics. We aim to help you understand how to design metrics that are valuable in your context by exploring the intersection of Psychology, Engineering, and Business.
Improperly chosen metrics can inadvertently promote undesirable behaviors, unhealthy competition, and even undermine team cohesion. Therefore, it's vital to select metrics that encourage collaboration and progress rather than merely individual performance.
The Cobra effect, Campbell's law, and Goodhart's law illustrate how measuring the wrong thing or measuring incorrectly can have significant unintended consequences.
The Cobra effect:
You may have heard this story or one of its variants about this psychological effect. The British government, concerned about the number of venomous cobras in Delhi in India, offered a bounty for every dead cobra. Initially, this was a successful strategy; large numbers of snakes were killed for the reward. Eventually, however, enterprising people began to breed cobras for income. When the government became aware of this, the reward program was scrapped. When cobra breeders set their now-worthless snakes free, the wild cobra population further increased
The cobra effect is the most direct kind of perverse incentive, typically because the incentive unintentionally rewards people for making the issue worse. It is also cited as examples of 2 “laws” that are also useful to understand in this context.
Campbell's law is an adage developed by Donald T. Campbell, a psychologist and social scientist who often wrote about research methodology, which states: The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor
Goodhart's law is an adage often stated, "When a measure becomes a target, it ceases to be a good measure" [although the actual statement is different and sounds complicated.
What the Cobra effect and the above 2 adages state is that measuring the wrong thing or measuring in the wrong way can have significant unintended consequences. Let me illustrate this with a real-life story:
The move to agile development models caused significant heartburn among managers and HR - with a focus on “Team”, how do we find out the best developers/engineers? Intense discussions, google searches, and product evaluations resulted in a manager dashboard - a secret one that tracked individual performance. Lines-of-code committed, Jira stories/tickets, individual velocity, and defects attributed to contributed code were some of the metrics that were on the Dashboard.
Result: As you can imagine, the overall productivity tanked, while individual performance soared, after the first performance appraisal. Team members found out that not only were they being measured on some unknown metrics but they were stack-ranked based on these metrics. Here’s what happened when developers started to game the system:
Lines of code committed – inelegant code that could have been written in a few lines or using existing library functions, leading to more defects and tech debt.
Jira stories/tickets – reports broken down into mini pieces and then cross-referenced - the dependency diagram resembling the Death Star!
Individual velocity – Every story was now padded for estimation, with no one challenging the estimate. 3-point stories were now 5 points or even 7 points. Edge cases were not unit tested, while trivial use cases had lots of tests to maintain coverage.
Defects reported against code – increase in finger-pointing in every team meeting, as most features were a result of multiple team members contributing to it.
Is this only true for software engineering? Are other fields also susceptible to this effect? The answer is yes.
Similar outcomes were observed in a study conducted in the Netherlands titled “Stripped of Agency: The Paradoxical Effect of Employee Monitoring on Deviance” (Ref #1). The study highlights that reliance on performance measures might have unintended negative impacts on long-term performance and other less tangible aspects like commitment and collegiality.
Another study “Effects of electronic performance monitoring on work outcomes“ published in “Personnel Psychology” [Ref 2], illustrates the principle that some forms of monitoring such as electronic monitoring of employees via desktop/screen capture tools, can negatively impact the elusive “employee satisfaction score” with no evidence that there are positive outcomes.
What about reputed Business Management & leadership institutions that often promote KPIs and Metrics as business tools? Yes, the Business literature landscape has multiple articles from reputed sources that cautions using just numbers for decision-making.
A 2022 Harvard Business Review Article titled “Monitoring Employees Makes Them More Likely to Break Rules“ mirrors the scientific article that monitoring makes employees feel that they are not trusted by their employers. However, transparency and well-directed monitoring can help mitigate side effects, by assuring them that minor deviations never go into the HR file.
Michael Harris and Bill Tayler take a strategic perspective in their 2019 HBR article “Don’t Let Metrics Undermine Your Business – An obsession with the numbers can sink your strategy“, stating that data does not always tell the whole story, and a focus on some metrics without considering other dimensions to balance them out can lead to significant damage to the business. They offer numerous examples, including that of Wells Fargo, where a focus on new accounts opened, resulted in a class-action lawsuit that ultimately resulted in a $350 Million penalty, with another $1 Billion fine from regulatory authorities.
Sometimes, organizations point to the “Hawthorne Effect” - where individuals modify an aspect of their behavior in response to their awareness of being observed positively. However, this was a very poor, uncontrolled study that points to a different interpretation.
People want to feel valued, rewarded, and noticed for their work. The Hawthorne experiment did this unintentionally and ascribed any success to monitoring than making feel part of the organization!
We now have a good understanding of how to NOT create metrics hurriedly, so we can now turn our attention to what can be measured.
The answer to this depends on your engineering maturity, the stage of SDLC you are in, the type of product you are developing, the size of your organization, and several other factors.
In a recent book titled “Rewired” [Ref #5], authors from McKinsey, themselves leaders in Digital and AI, dedicate a whole chapter to the performance measures for Digital Transformation. In subtle, but clear terms, they warn that poor selection of metrics and supporting infrastructure can be disastrous. They offer a model based on OKRs and value driver trees to determine the right KPIs at different levels of the organization.
However, what is still missing from the model is the interdependency between those drivers and the impacts of focusing each metric in isolation by different leaders. But this is still one of the best frameworks for KPIs, based on industry best practices such as OKRs and the Goal-Question-Metric methodology. A well-designed KPI structure and agreement from everyone involved on how the metrics will be interpreted are critical for a Performance Measurement initiative.
The DORA metrics, viz., Lead time to change, MTTR, Deployment Frequency, and Change Failure rate, supported by other metrics such as cost vs. revenue/value by feature, and outstanding defects of high severity, are some metrics used in high-performing organizations. The warnings in this article and scientific research still apply - avoid blame on individuals or measuring them, don’t compare teams, investigate negative trends, and use statistical analysis than simple averages.
For a comprehensive analysis of your current situation, recommended metrics, and a Value-stream analytics & Insights product that brings our thought process into pre-built metrics and dashboards, contact us at Qentelli. Our consulting and implementation services can accelerate your KPI framework Definition and Implementation journey by many orders of magnitude.
Citations:
- Thiel, C. E., Bonner, J., Bush, J. T., Welsh, D. T., & Garud, N. (2023). Stripped of Agency: The Paradoxical Effect of Employee Monitoring on Deviance. Journal of Management, 49(2), 709–740. https://doi.org/10.1177/01492063211053224
- Ravid, D. M., White, J. C., Tomczak, D. L., Miles, A. F., & Behrend, T. S. (2023). A meta-analysis of the effects of electronic performance monitoring on work outcomes. Personnel Psychology, 76, 5– 40. https://doi.org/10.1111/peps.12514
- Monitoring Employees Makes Them More Likely to Break Rules https://hbr.org/2022/06/monitoring-employees-makes-them-more-likely-to-…
- Don’t Let Metrics Undermine Your Business – An obsession with the numbers can sink your strategy https://hbr.org/2019/09/dont-let-metrics-undermine-your-business
- 5. “Rewired - The McKinsey guide to outcompeting in the age of digital and AI” – Wiley Publishing 2023.