Measuring Developer Productivity Without Making Developers Hate You – Esteban Garcia

The fastest way to ruin a developer productivity conversation is to make developers feel like they are being watched. That is especially true with AI.

As organizations adopt tools like GitHub Copilot, leaders naturally want to understand the return on investment. That is fair. These tools cost money, and the business should expect value. But if the measurement strategy is just “how many lines of code did people write?” or “how many hours did Copilot save?”, the organization is probably measuring the wrong thing. Developer productivity is not typing speed.

Software delivery is a system. It includes understanding requirements, designing solutions, writing code, reviewing changes, testing, securing, deploying, operating, and learning from feedback. AI can help across many of those steps, but the value is not always captured by simple activity metrics. A developer might use Copilot to understand a legacy codebase faster. That may not show up as more lines of code, but it could reduce onboarding time and lower the risk of a bad change.

A team might use AI assistance to write better tests. That may slow down the first commit but improve quality and reduce production issues. A security team might use AI-assisted remediation to help developers fix vulnerabilities faster. That is not just productivity. That is risk reduction. A platform team might use Copilot to help developers understand deployment workflows. That could reduce support tickets and improve release confidence.

Those are valuable outcomes. They are also harder to measure than raw activity. So what should organizations measure?

I like starting with flow, friction, quality, and sentiment.

Flow asks whether work is moving through the system more effectively. Are pull requests getting reviewed faster? Are cycle times improving? Are teams deploying more consistently?

Friction asks where developers are getting stuck. Are they spending less time figuring out unfamiliar code? Are they opening fewer support tickets? Are they resolving build and deployment issues faster?

Quality asks whether the work is getting better. Are tests improving? Are defects going down? Are security findings remediated faster? Are standards being applied more consistently?

Sentiment asks whether developers actually feel more effective. Do they trust the tools? Do they feel supported? Do they understand the guidance? Are they using AI in ways that help their work?

This is where the conversation becomes less about surveillance and more about improvement. The goal should not be to rank individual developers based on AI usage. That is a great way to destroy trust and encourage bad behavior. The goal should be to understand whether the engineering system is getting healthier.

That distinction matters.

A good measurement program should help teams identify bottlenecks, improve practices, and make better investment decisions. It should not turn into a dashboard that punishes developers for complex work that does not look productive in a simplistic metric. AI also changes what productivity means. If Copilot helps a developer generate code faster, that is useful. But the greater return may come from consistency. Better pull request summaries. Faster onboarding. More complete tests. Faster security remediation. More repeatable workflows. Less architectural drift. More knowledge captured in the development process.

That kind of value compounds. It is not just time saved. It is innovation velocity. Not velocity in the sense of “everyone go faster at all costs,” but velocity as the organization’s ability to turn ideas into reliable, secure, maintainable software with less friction.

That is the measurement conversation worth having. If developers feel like measurement is being done to them, they will resist it. If developers see that measurement is being used to remove friction and improve the system, they are much more likely to engage. So yes, measure Copilot and AI-assisted development. But measure the right things. Measure flow. Measure friction. Measure quality. Measure sentiment. And most importantly, use the data to make the work better.

Not to make developers hate the dashboard.