Friday, November 27, 2015

The Manager as Debugger

I have observed that the best engineering managers I know are often also great debuggers. Why would this be? What is it about these two tasks that has such an overlapping skill set?

A great debugger is relentless in their pursuit of the “why” for a bug. This is simple when we are looking for errors in application logic, but we all know that bugs can go many layers deep, particularly in complex systems that involve many separate parts operating over time-delayed networks. A sign of a poor debugger is a person who, when they add a log statement to a piece of concurrent code to attempt to find an error and see that the error can’t be reproduced, assumes that they have therefore fixed the problem. It’s a lazy habit but a common one. Sometimes there are problems that seem impossible to determine, and many people don’t have the patience to dig through layers of code, theirs and others’, log files, system settings, and whatever else is needed to get to the bottom of something that only happened once. I can’t blame them. Obsessive debugging of one-off issues is not always a great use of your time, but it does show a mind that cannot be satisfied with the unknown, especially when that unknown might cause them to be paged at two o’clock in the morning.

What does this have to do with management? Managing teams is a series of complex, black boxes interacting with other complex, black boxes. These black boxes have inputs and outputs that can be observed, but when the outputs aren’t as expected, figuring out why requires trying to open up the black box and see what is going on inside. And, just as sometimes you don’t have the source code, or the source code is in a language you don’t understand, or the log files aren’t readable, the black boxes of teams can resist yielding their inner workings. 

Teams also share the characteristic of another famous box, the one containing Schrödinger’s Cat. The point of that experiment is to show that the act of observing changes the outcome, or rather, causes an outcome to happen. You can’t go into a team and not change the behavior of that team by being around them, sitting in their meetings, watching their standups. Your presence changes the team behavior and may hide the problem you are trying to find, in the same way that a log statement can cause a concurrency issue to be magically erased, at least for some time.

To debug a system properly you have to be able to have a reasonable hypothesis that explains how the system got into the failed state, preferably one that you can reproduce, so that you can fix the bug (or at least prevent it from happening in the future). To debug a team, you also want to attempt to get a hypothesis for why the team was having problems. You want to do this in as minimally-invasive way as possible, to prevent your meddling from obscuring the problems. You have the added challenge of team problems not generally being single failures but more like performance issues, the system is running but it seems to slow down from time to time, the machines are ok except occasionally they crash, people seem happy but attrition is too high. 

Let’s work through an example. We have a team that feels slow. You have heard complaints from their business partners and product manager that they are slow, and you agree that the team just seems to lack the same energy as your other teams. How do we figure this out?

Debugging this deserves the same rigor you would apply to debugging a serious systems issue. When I debug a systems issue, the first thing I look at is generally log files and any other record of system state from the time of the incident. When talking about a team that is not producing work fast enough, look at the records. Look at the team chats and emails, look at the tickets, look at the repository code reviews and checkins. What do you see? Are there production incidents happening that are taking too much time? Are a bunch of people sick? Are they bickering over coding style constantly in their code review comments? Are the tickets that are being written vague, too big, too small? Does the team seem upbeat in their communication style, sharing fun things as well as important work in chat, or are they purely business? Look at their calendars. Is the team spending many hours a week in meetings? Is their manager doing 1-1s? None of these things is necessarily a smoking gun, but they may point to an area to address.

Perhaps everything seems ok in all of these indicators, but the team still just is not performing as well as you believe they should. You know the talent is there, the team is happy, they’re not overburdened by production support. What is happening? Now is the time to start doing some potentially destructive investigations. Sit in their meetings. Are they boring to you? Is the team bored? Who is speaking most of the time? Are there regular meetings with the whole team where the vast majority of the time is spent listening to the manager or product lead talk? Boring meetings are a sign. They may be a sign of inefficient planning on the part of the organizers. There may be too many meetings happening for the information covered. They may be a sign that the team members don’t feel that they can actually help set the direction of the team, choose the work that will happen. Boring meetings are a sure sign of time wasted, if not bigger leadership problems.

Ask the team what their goals are, can they tell you? Do they understand why those are the goals? If they don’t understand the goals of their work, their leaders (manager, tech lead, product manager) are not doing a good job engaging the team in the purpose of the work. In almost every model of motivation, people need to feel an understanding and connection with the purpose of their work. Who are they building these systems for, what is the potential impact on the customer, the business, the team? Did they have any part in deciding these goals, and the projects that they’re doing to achieve them? If not, why not? When you see a team spending all of their time on engineering-sponsored projects and neglecting product/business projects, it’s likely that the team does not appreciate or understand the value of the product/business projects that they are supposed to be working on, and therefore they are lacking in motivation to tackle them.

Finally, you might take a look at the actual team dynamics. Do people like each other? Are they friendly? Do they collaborate on projects or is every person working on something independently? Is there banter in the chat room, in emails? Do they have a good working relationship with other adjacent departments, and with their product managers? These are little things but even very professional groups tend to have a degree of personal connection between the members. A bunch of people who never talk to each other and are always working on independent projects are not really working as a team. There would be nothing wrong with that, if the team were performing well, but given that they are not, this may be contributing to your problem.

Sometimes managers of managers choose to take such problems as something that the manager of the team just needs to fix. You measure the manager on the output of their team, after all, and it is their responsibility to fix it if things are not going well. This is true, but just as I sometimes jump in and help debug complex system outages even though I rarely write code, it is ok to jump in and help debug team issues as you see them, particularly when the manager in question is struggling. It can be an opportunity to teach the manager and help them grow. It can also reveal more foundational problems with the organization, such as a lack of senior business leadership that even the best managers can’t identify or resolve alone. 


The pursuit of why when it comes to organizational problems is the thing that gives you patterns to match on, and lessons to lead with. We get better at debugging by doing it often, and learn which areas tend to break first and which indicators provide the most value for understanding issues. We become better leaders by pushing ourselves and our management teams to really get to the bottom of organizational issues, searching for why so that we can more quickly resolve such issues in the future. Without the drive to understand why, we rely on charm and luck to see us through our management career, we hire and fire based on charm and luck, and we have a huge blindspot when it comes to truly learning from our mistakes.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.