Tuesday, November 29, 2016

Building and Motivating Engineering Teams

I have agreed to give a guest lecture for a class at Yale, and they’ve asked me to speak about “building and motivating engineering teams” from the perspective of a smaller startup. The readings for my section include A Field Guide to Software Developers by Joel Spolsky. I remember reading it when it was first written. I admire Joel’s work, and the piece has many valuable takeaways.


However. The industry has changed a lot in the years since this piece was written. In 2007, the options for engineers here in NYC were far more limited. You worked for a bank, a media company, ad tech was starting to blossom, some e-commerce. Google or Fog Creek if you were lucky. There were plenty of jobs but there was far less of a classic “tech company” presence than there is today.

Over the past 9 years, we’ve seen a massive increase in the number of startups here in NYC. Every major tech company has an office here, the Google office alone has thousands of engineers. We’ve also seen an increase in the supply of engineers. Between students realizing the value of a tech degree, people changing careers, and bootcamps rapidly churning out graduates, the mix of types of people who write software has changed. Most of the people I knew writing software in 2007 had tech or tech-adjacent (math, physics) degrees. My team when I left Rent the Runway had a significant number of developers who had moved into tech from other areas, some without degrees at all.

All of this is to say, playing to 2007 nerd stereotypes is not always a good way to build a team here in NYC. What DO engineers want? I believe that you have 3 axes with which to twiddle. Depending on your company, you can lean more on one to cover problems with the others, but you have to find your balance.

Money. Joel hides this at the bottom of his piece. It is correct to say that engineers don’t care about money above all else, but far too many people delude themselves into thinking that this means that you can lag industry standard salaries and build a good team on the basis of some other factor.

Salaries have gone way up in the past 10 years. People know that they can get paid a certain amount of money, and if you try to hire people who can easily make 50% more somewhere else, they are almost certainly not going to accept your offer. Figure out what the market spread is. The top will be companies like Google, Facebook, some financial companies. The bottom might be non-profits or extremely early startups with significant equity grants. Even a startup with 10–20 people is going to struggle to hire without paying close to market rate. Engineers are expensive. New engineers are expensive. Senior engineers are really expensive. They know their value. You have to pay them.

You are building a company. You are not going to have a perfect, smooth ride. Things will go well, things will go poorly. Very few teams have a straight shot to greatness that is so clear it can cover all management woes. When you don’t pay people well enough, you contribute to undermining their resilience in the face of problems at work. Think of it as the baseline of Maslow’s Hierarchy. Money does not solve all problems for most people, but lack of money exacerbates all irritations.

Purpose. You’re building a company. You want to inspire people to work for you, so you sell them on the mission of the company, the product they’ll be working on. You have to do this because you are not building a company with a bunch of insanely hard tech problems. If you are, you can lean on that to hire people, but to be real, these days most of us are not. The days when everyone had hard technical scaling problems have ended. Scaling is not the same bold new frontier that it was even 5 years ago. Sure, there are always technical challenges to be found in organizations, but for many companies those technical challenges revolve around matching technology with the product and business. This means you have to work harder to sell the learning opportunities.

Leadership undermines their teams when they refuse to let engineers into the non-technical decision-making processes. In Joel’s article he talks about developers wanting to be allowed to make decisions within their own realm of expertise, which is certainly a bare minimum. I would encourage you to go further than that. If you are building a product-focused business, where the challenges are less purely technical and more about engaging a customer, your engineers need to feel connection to that business. They need to feel like they understand it, like they can have ideas about it. This is why I’m such an advocate for cross-functional product development teams. Put engineering, product management, marketing, operations together as a group and let them work as a team to solve problems. Don’t just throw work over the wall to engineering and expect them to implement it.

Respect. The undercurrent that I like least in Joel’s piece is the undercurrent that engineers need to be coddled and pampered. Giving them “toys” instead of “tools,” keeping them out of the politics. There are plenty of engineers who want to be given hard work to do and left alone, in their private offices, to think and code. But there are increasingly more engineers who want to build businesses, and they want to be treated like adults in the process.

Our engineering teams are not overgrown children. They are not idiot-savants who can produce software but must be given sufficient cookies to do so. They are highly paid professionals. Let’s treat them that way. You should expect your engineers to show up for your business. Respect is not pampering, it is not treating the team like the stars of the show. Rather, respect is challenging the team to show up and grow up. Respect is giving them clear, achievable goals and holding them accountable. My experience has been that most great engineers want to work somewhere that inspires them to achieve. Many of us stop at the idea of “hard technical problem” when we think about inspiring our engineering teams, but challenging them to partner with people who have different perspectives is another way you can help them grow.

Respect that engineers are smart individuals who often have more to add to your business than just their coding talents, and teach them to respect that the other parts of the business have equally valuable skills and perspectives. Engineers don’t need to feel like the company royalty to be inspired to do good work, but they do need the opportunity to be treated like a partner.


These days, you have a lot of competition for talent, but you also have a lot of talent to choose from. Understand your company’s positioning. If you can’t pay top of market, you will have to rely on a balance of finding undeveloped talent and giving engineers other reasons to want to work for your company. For most of us, that means giving them a voice beyond the purely technical, and challenging them to see and understand perspectives outside of engineering.

Friday, August 19, 2016

Microservices: Real Architectural Patterns

A dissection of our favorite folk architecture


Introduction


I’m fascinated by the lore and mystery behind microservices. As a concept, microservices feels like one of the most interesting folk architectures of the modern era. It’s useful enough to be applied widely across different usage patterns and also vague enough to mean many different things.

I’ve been struggling for a while with understanding what people really mean when they discuss “microservices.” Despite deploying what I would consider to be a version of that pattern in my last gig, it’s quite clear to me that the architecture we used is not the same as the pattern that all other companies use. Recently I finally interrogated someone who has deployed the pattern in a very different way than I have, and so I decided it would be illustrative to compare and contrast the circumstances of our architectures for those in the larger technical audience.

This article is going to have two examples. The first is the rough way “microservices” was deployed in my last gig, and why I made the decisions I made in the architecture. The second is an example of an architecture that is much closer to the “beautiful dream” microservices as I have heard it preached, for architectures that are stream-focused.


Microservices Basics


I think that microservices as an architecture evolved due to a few factors.
  1. A bunch of startups in the late 2000s started on monoliths like rails, scaled their business and team quickly, and hit the wall on what could reasonably be done in that monolith
  2. The cloud made it significantly easier to get access to a new server instance to run software
  3. We all got much more comfortable with the idea that we were dealing with distributed systems and in particular got comfortable making network calls as part of our systems
This combination of factors — scaling woes, easy access to new hardware, distributed systems and network access — played a huge part in what I might call “microservices for CRUD.” If you have managed to scale a company to a certain level of success on a monolith but you are having trouble scaling the technology and/or the engineering team, breaking the monolith into a services-style architecture makes sense. This is a situation I encountered first-hand.

The arguments for microservices here look something like:
  1. Services allow for independent axes of scaling. If you have a part of the system with higher load or capacity requirements than other parts, you can scale to meet its needs. This is certainly doable in a monolith, but somewhat more complicated to reason about.
  2. Services allow for independent failure domains, to a more limited extent. Insofar as parts of your system are independently operable, you may want to allow for partial availability by splitting them out into services. For example, in a commerce app, if you can serve the checkout flow even when the product search flow is down, that might be considered a good thing. This is much more complicated in practice than it is in theory, and people make many silly claims about microservices that imply that any overlap in services means that they are not valuable. Independent failure domains are sometimes more of a “nice to have” than a necessity, and making the architecture truly account for this is not easy.
  3. Services allow for teams to work independently on parts of the system. Again, you can do this in a monolith. I have done this in a monolith. But the challenge with monolith (and a related challenge with services in a monorepo (single source repository)) is that humans struggle to tangibly understand domains that are theoretically separate when they are presented as colocated by the source code. If I can see all of the code and it all compiles together and feels like a single thing, my tendency is to want to use it as a single thing. Grab code from here to use there, grab data from there to use here, etc.
A few more notes. “Monolith” and “monorepo” often get tangled up when talking about this world. A monolithic application is one where you have a set of code that compiles into a single main server artifact (possibly with some additional client artifacts produced). You can use configuration to make monoliths do almost anything you can imagine, including all of the services-type things above, but the image produced tends to include most if not all of the code in the repository. This does get fuzzy because sometimes teams evolve their monoliths to compile to a couple of specialized server artifacts via a combination of build tooling and configuration. I would generally still call this a monolithic architecture.

Monorepo, or monolith repository, is the model where you have a single repository that holds all of the code for any system you are actively changing (so, possibly excluding the source code for your OSS/external dependencies). The repository itself contains source code that accounts for multiple artifacts that are run as separate applications, and which can be compiled/packaged and tested separately without using the entire repository. Often this is used to enable certain shared libraries to change across all of the services that use those libraries, so that developers who support shared libraries can more easily evolve them instead of having to wait for each dependent team to adopt the newest version. The biggest downside of the monorepo model is that there’s not much OSS tooling that supports this, because most OSS is not built this way, so large investments in tooling are usually needed to make this work.


Microservices for CRUD-based Applications


Before I get to how to evolve a CRUD monolith to microservices, let me further articulate the architecture needed to build your traditional mid-sized CRUD platform. This type of platform covers a use case that is pretty well-trod, that of “transactions” and “metadata.”

Transactions: User does an action that you want to persist, consistency of data is very valuable. The “Create, Update, Delete” of CRUD. Much less frequent than the “Read” actions of CRUD. 
Metadata: Information that describes things to the users, but is usually only modified by internal content creators, or rarely by external users (reviews, for example). Changes less frequently, often highly cacheable. Even more, can often tolerate a degree of temporary inconsistency (showing stale data).
Are there more things that CRUD-heavy companies want to do, especially in the analytical space here? Sure. You may want to adjust results frequently based on user behavior as the user is browsing the site, and other personalization actions. However, that is a hard thing to do real-time and you don’t always have the volume of data you need from the user to actually do that well, so it isn’t generally the first-order concern of the system.

The process for moving off of a monolith in this type of architecture is relatively straightforward:
  1. Identify independent entities. This paper by Pat Helland, “Life Beyond Txns”, has some useful and interesting definitions there. It’s better to go a little bit too big early than to go too small and end up having to implement significant distributed transactions. You probably want data-owning services for the major business objects(products, users, etc), and then sets of integration services that implement aggregations and logic over those objects.
  2. Pull out the logic into services entity by entity. Try not to change the data model as much as possible in this process. Redirect the monolith to call APIs in the new services as functionality is moved.
That’s basically it. You pull pieces out until you have enough to cover a particular set of user functionality in data and integration terms, then you can start to evolve that part of the user functionality to do new things in the services.

These services are not classic SOA, but nor are they teeny-tiny microservices. The services that own the data may be fairly sophisticated. You may not want to have too many services because you want to be able to satisfy requests from the user without having to make a ton of network hops, and ideally, without needing to do distributed transactions.

You are probably not making new services every day, and especially if you have a sub-50-person engineering team and a long product roadmap, you may not want to invest extensive engineering time into complex orchestration and tooling that enables people to dynamically add new services at the click of a button (nb: the products to support this are getting better all the time, and so at some point this will be worth doing even for that smaller team. It is unclear to me whether that time is now or not.).

The equation to apply for determining how much to invest in tooling is pretty straightforward: how much time does it cost devs to have a less automated process for adding a new service, vs how long does it take to implement and maintain the automation for doing it easily, and how many new services do you expect to want to deploy over time? You’re making a guess. Obviously, if you think there is value to enabling people to spin up tiny services fast and frequently, it is better to invest time and tooling into this. As with all engineering process optimization decisions, it’s not a matter of getting it perfectly right, but rather, of deciding for the foreseeable future and periodically re-evaluating.

There are many microservices “must-haves” in this instance that I have found to be anything but. I mentioned extensive orchestration above. Dynamic service discovery is also not needed if you are not automatically spinning up services or moving services around frequently (load balancers are pretty nice for doing this at a basic level).

Allowing teams to choose their ideal language, framework, and data store per service is also certainly not a must-have and in fact it’s likely to be far more of a headache than a boon to your team.
Having independent data stores for the services is also not a must-have, although it does mean that you will have a high-risk SPOF on the shared database. As I was writing this piece I discovered a section of some writing on microservices from 2015:
Create a Separate Data Store for Each Microservice
Do not use the the same back-end data store across microservices. … Moreover, with a single data store it’s too easy for microservices written by different teams to share database structures, perhaps in the name of reducing duplication of work. You end up with the situation where if one team updates a database structure, other services that also use that structure have to be changed too.
This is true, but for smaller teams you can prevent sharing of database structures by convention (process and code review, and automated testing and checking for such access if it is a huge worry). When you carefully define the data-owner services, it’s less likely this will happen. And the alternative is the next paragraph:
Breaking apart the data can make data management more complicated, because the separate storage systems can more easily get out sync or become inconsistent, and foreign keys can change unexpectedly. You need to add a tool that performs master data management (MDM) by operating in the background to find and fix inconsistencies. For example, it might examine every database that stores subscriber IDs, to verify that the same IDs exist in all of them (there aren’t missing or extra IDs in any one database). You can write your own tool or buy one. Many commercial relational database management systems (RDBMSs) do these kinds of checks, but they usually impose too many requirements for coupling, and so don’t scale.(original)
This paragraph probably leads to sighs of exhaustion from anyone with experience doing data reconciliation. It’s due to this overhead that I encourage those of you in smaller organizations to at least evaluate a convention-based approach before deciding to use entirely independent and individual data stores. This is a decision you can delay as needed.

This version of the microservices architecture is very compelling for the scaled CRUD world because it lets you do a rewrite piece by piece. You can do the whole system, or you can simply take out pieces that are most sensitive to scaling. You proactively engage with many of the bits of distributed systems complexity by thinking carefully about the data and where transactions on that data will be needed. You probably don’t need a ton of fancy data pipelines floating around. You know where the data will be modified.

Do you have to go to microservices to scale this? Probably not, but that doesn’t mean using microservices to scale such systems is a bad idea. However, going extreme with the microservices model may be a bad idea, because you really don’t want to slice your data up in a way that ends up in distributed transaction land.


Microservices For Data Stream Processing


Now, let’s talk about a very different use case. This use case is not your classic CRUD application, thick with business rules around transactionally-updated objects. Instead, this use case has a large pipeline of data. It has small bits of data flowing into it from many different sources, a very large volume of many bits of data. This large volume of input data sources also has many different services that will consume it, modify it, and pass it along for further processing.

The major concern of this application is ingesting large quantities of ever-changing data, processing it in various ways, and showing a view of it to customers. CRUD concerns are secondary to the larger concerns of keeping up with the data stream and recalculating information based on what is happening on that stream.

Let’s take a metrics-aggregating SaaS application, for example. This application has customers all over the world with various applications, services, and machines that are reporting out metrics to the aggregator. These customers only need to see their data, although the combined total of data for any one customer may be very large. Our aggregator needs to consume these metrics and send them off to the application that is going to show them to the customer. The customer-facing application may be operating on a combination of incoming metrics in real-time plus historical data that comes from cache or a backing storage system. A large part of the value of the data is in the moving-window of what is happening right now/recently.

This architecture from the start has considerations of volume that even our scaled CRUD world may not care about for a very, very long time. Additionally, the data itself is mostly a stream of updates over time. The notion of the “stateful” data that is transactionally updated is minimal, the most useful data is more like a timeseries or log of events. The transactional data, say, stored user views and user configuration, may be more like the “metadata” of our CRUD application in the first example, infrequently changed compared to the updates coming in from the stream. The majority of developer time is most likely spent not in dealing with these transactional changes but rather in managing the streams of inputs, providing new types of inputs, applying new calculations to the stream of inputs, and changing the calculations.

In this example, you can imagine a service that wants to run an experiment by doing a different calculation across a particular element on the stream. Instead of modifying the existing code, the experimental service listens to the data stream at the same point as the existing calculation, provides a new calculation value, and pushes that calculation value back into the data pipeline on a different channel. At some point an experiment service pulls this data out for the customers who are assigned to the experimental treatment and shows the results of that calculation instead of the standard calculation. In all of these places you need a record of what happened in order to do analysis of experiment success and debugging, but that record does not need to be strongly, transactionally related to the record of other events in the system at this time, even across related users.

In this example, it may very well be much more effective to spin up new services as needed, in order to run quick experiments, rather than changing existing services. Especially in cases where the service can do this without needing to worry about coordinating the data consumption or production with any existing service. This is the world of what I would like to call “stream-centric microservices.”

If there is enormous value to your business to manage real-time data streams, and you are going to have a lot of developers consuming those streams by creating new services to listen to them and produce results, then you absolutely must be willing to commit to the investment in tooling to make the process of creating services and putting them into production as easy as possible. You will probably use this for all of your services over time, once you have it, but realize that the clear value is that you have dynamic data that can be processed and manipulated and experimented on independently.


Cron Jobs as Microservices


I’d be remiss if I didn’t mention this pattern. When it becomes very easy to make anything a microservice, everything becomes a microservice, including things we would traditionally run as cron jobs.

But cron jobs are a nice concept, and not everything has to be a “service.” You can use CloudWatch Events from AWS for this purpose, or scheduled Lambda functions. Use Gearman, a queue and async job runner, to schedule cron jobs. Remember your cron jobs need to be idempotent (can be run twice on the same input without changing the outcome). If you have an easy way to spin up services and it’s easy to create tiny services that are basically cron jobs, no problem, but cron jobs in and of themselves are not a great reason to create a large, orchestrated services environment.


Conclusion


I hope that this has been a useful breakout across a few axes of the wild world of microservices. Going through the thought experiment was very useful for me, personally. It helped me understand how what seems obvious to people at one extreme, say those who spend most of their time focused on stream processing, doesn’t make as much sense for people who are more focused on the world of CRUD application scaling.

(This was originally published on medium)

Monday, July 18, 2016

The Virtue of Hubris and The Value of Complaining

In my previous post, I discussed the leadership virtues of Laziness and Impatience. But as you may know, I neglected one of the core virtues in my list, namely, that of hubris. Hubris. Pride. As Larry Wall says,
Excessive pride, the sort of thing Zeus zaps you for. Also the quality that makes you write (and maintain) programs that other people won't want to say bad things about. Hence, the third great virtue of a programmer.
I would translate this as taking pride in one's work, and being willing to not just take pride in it, but show off that work, talk about it, teach others its magic. And hubris is important. One of the challenges of impatience is that it sometimes drives us to cut corners. Cutting corners can make work go faster, but it can also have a price in the long run. So we balance that desire to cut corners with a desire to maintain pride in our work, and use those conflicting values to keep each other in check.

Hubris done well in my opinion has some interesting expressions. You may think of the person who takes pride in their work as someone who loves to learn and share new things. Who loves to brag about the good stuff. This is certainly part of hubris, sharing lessons learned and trying to help others by showing off our wins. Many tech teams encourage this actively, through rituals like "drinks and demos" where teams get up to share what they accomplished during the week. We encourage people to write up cool stuff we've built, to go speak at conferences and talk about cool technology, and this is all a great thing to do.

However, I think there's more to it than just showing off the good stuff. Within a team, hubris also shows in people who are willing to complain about the bad stuff. Yes, that's right, I think that there is value to expressing not just the positive, but also the negative. In fact, I think that you are actively harming your culture and creating a culture of false pride when you only encourage people to speak up to share good things.

Complaining is all about context. The problems we are facing are our context, and the solutions to those problems must be made within understanding of that context. Context is what makes microservices right for one team and wrong for another. Context is what makes hiring a certain way successful in a high-growth startup but devastating in a big company. Context is so important that when you misunderstand the role that it plays in a solution, you run the risk of misapplying that solution to a place where it will cause you more problems than it solves. Applying someone else's lessons to your context without understanding is how we end up with these cargo cult solutions.

So, the details of the problem are pretty important for putting the solution we're bragging about into context. But here's the thing. If you squash people who want to complain or criticize, you lose the details of your problems. Those complaints contain the details!

Does your company have a practice of telling people to "bring solutions, not complaints?" That is at best hiding problems, not avoiding them. It is unrealistic to expect people to be able to solve every problem they see in front of them. I mean, can you do that, really? It is hard enough to expect your executives to be able to do this, believe me, I know. Your team is going to see problems that they will not know how to solve, and to tell them to keep that to themselves until they figure out the solution is a great way to avoid dealing with real issues.

Instead, I encourage you to ask people to give you details when they have complaints. Help them put their complaints in context. If they complain a system sucks, ask them why. Maybe the answer is that they don't like the formatting standards, in which case an appropriate response might be, unfortunately not everything goes your way. On the other hand, maybe the answer is that it takes them a long time to make changes because the system has no tests and breaks easily, in which case, perhaps you want to think about actually fixing that problem.

If you do this well, you actually teach people how to understand which problems are important, and which problems are not. Letting people complain might seem like it will do nothing but encourage negativity and drama, but if you guide people to learn from their complaints it can instead help your team grow. It's great when people can bring problems AND solutions to you simultaneously, but it's more likely that they will need help to see the best solution. Helping them see the best solution starts by helping them understand how to state the problem.

We are going to have disagreements and conflict in our teams. None of us sees the world in the same way, and that is good. We form teams because as a group, sharing our perspectives, we can create things that are greater than the sum of their parts. Trying to create conflict-free environments is a fool's errand. But you can guide conflict and complaints to result in an increased understanding of context. Instead of discouraging all disagreement, push people to be specific about their thoughts and concerns, and attempt to understand them. As a leader, ask questions to tease out details, and show that you are actually interested in the perspectives on your team, even when you might disagree.

Taking pride sometimes means speaking up when something doesn't seem to be right, when something seems to be less than what it could be. Criticism can help us become even better than we are, if we are willing to listen to its details. Please don't smother this in the name of harmony or positivity, because repressing conflict only leads to a false sense of security and prevents us from achieving true greatness.

Friday, June 10, 2016

The Virtues of Laziness and Impatience

This is an excerpt from my work in progress, a book on engineering management. If you're interested in getting occasional updates you can subscribe to my newsletter!

I love the idea of Laziness, Impatience, and Hubris as virtues of engineers, articulated in “Programming Perl” by Larry Wall. I believe these virtues sustain into leadership, and learning how to channel these traits into advantages is something I encourage all managers to do.

As a manager, when you are dealing with people 1-1 you probably don’t want to be impatient, of course. Impatience can be rude when it is directed at individuals. And you don’t want to seem lazy, there’s nothing worse than working for a manager who seems to be taking it easy while you kill yourself to deliver projects. But impatience, paired with laziness, is wonderful when directed at processes and decisions. Impatience and laziness, applied to process, are the key elements to focus.

As you grow more into leadership positions, people will look to you for behavioral guidance. What you want to teach them is how to focus. To that end, there are two areas I encourage you to practice showing, right now: figuring out what’s important, and going home.

I can’t stand watching people waste their energy approaching problems with brute force and spending time rather than thought, and yet, any culture where you are encouraged to work excessive hours all the time is almost certainly doing just that. What is the value of automation if you don’t use it to make your job easier? We engineers automate so that we can focus on the fun stuff, and the fun stuff is the stuff that uses the most of your brain, and it’s not usually something you can do for hours and hours, day after day.

So be impatient to figure out the nut of what is important. As a leader, any time you see something being done that feels inefficient, start to ask the question, why does this feel inefficient to me? What is the value in the thing we are doing? Can we deliver that value in a way that is faster? Can we strip down this project into something simpler and get it done more quickly?

The problem with this line of questioning is that often when managers ask, can it be done faster, what they explicitly or implicitly want to know is, can the team work harder or longer hours to deliver it in fewer days. This is why I encourage you to develop and show the value of laziness. Because “faster” is not about “same number of hours but fewer total days.” “Faster” is about “the same value to the company in less total time.” If the team works 60 hours in a week to deliver something that otherwise would’ve taken a week and a half, they haven’t worked faster, they’ve just given the company more of their free time.

This is where going home comes in. Go home! And stop emailing people at all hours of the night and all hours of the weekend! Forcing yourself to disengage is essential for your mental health, believe me. Burnout is a real problem in the American workforce these days, and almost everyone I know who has worked sustained excess hours has experienced it to some degree. It’s terrible for individuals, terrible for their families, and terrible for teams. But this isn’t just about preventing your own burnout, it’s about preventing your team’s burnout. When you work later than everyone else, when you send those emails at all hours, even if you don’t expect your team to respond to those emails or work those hours, they see you doing it, and think it’s important. And that overwork makes them less effective, especially at the detailed knowledge work that engineers need to perform.

When you are a newish manager, and you haven’t figured out the tricks to do your job effectively, you might find yourself needing to work more hours to get it all done. That is ok, for a little while. But I encourage you to figure out a way to work those hours without encouraging your team to do so, or making them feel obligated to be on your schedule. Queue up the weekend and overnight emails for the next work day. Put your chat status as “away” in off hours. Take vacation and don’t answer email during that time. And constantly ask yourself the same questions you ask your team: can I do this faster? Do I need to be doing this at all? What is the value that I am providing with this work?

Laziness and impatience. We focus so we can go home, and we encourage going home because it forces us to constantly focus. This is how great teams scale.

Thursday, May 5, 2016

Thoughts on Take Home Interviews

There is a movement now in tech to really think about what it would take to improve our interview process. This is a movement a long time coming. White board coding interviews are clearly a strange way to measure a person's ability to actually do the day to day work of a modern software engineer. And we know that we tend to have a lot of bias in our interview processes that takes what we wish were an objective evaluation of skills and turns it into something very, very subjective.

Recently, my friend Julia Grace wrote about the interview process at Slack, to grant more transparency into what it takes to become an engineer at one of the hottest companies around. While this is a recruiting tactic it's also great that they are helping people understand what to expect and how to apply. Slack is taking pains to try to avoid bias, by having people complete a take-home technical exercise.
This varies by position, but generally you’ll have a week to complete a technical exercise and submit the code and working solution back to us.
Since we don’t do any whiteboard coding during the onsite interview, the technical exercise is one of the best ways we’ve found to evaluate programming competency.
The exercise is graded against a rigorous set of over 30 predetermined criteria. We’re looking for code that is clean, readable, performant, and maintainable. We put significant effort into developing these criteria to ensure that, regardless of who grades the exercise, the score is an accurate reflection of the quality of the work. We do this to limit bias. If there are clear criteria, variations that might impact score but have nothing to do with the candidate (such as if the grader is having a good day) are less likely to influence the outcome.
On twitter, a discussion ensued about whether asking people to spend time at home doing exercises didn't itself cause bias, against those who did not have a lot of spare time to be doing take-home exercises. Julia mentioned that they expect it to take 2-4 hours, but admitted that some people got really into the project and spent far longer than that.

This brings up three good questions that I want to address:

1) Are take-home exercises on the balance good?
2) Is it reasonable to expect people to spend 2-4 hours of their own time on a take-home exercise?
3) What about the people who will spend more time?


On the issue of 1, I think that yes, take-home exercises can help a lot to address the bias that happens when you know the person who wrote the code. They have the potential to be the blind audition step that the tech industry needs.

On the issue of 2, I actually ALSO think it is ok to ask candidates to spend a few hours on these exercises. This comes with two caveats. First, you should genuinely believe that the exercise can be done in the number of hours you expect by a candidate qualified for that level (so, measure how long candidates report spending on it). Second, these exercises only take "a few hours," not "tens of hours."

I feel ok, within bounds, to say that you should find a few hours to do a coding assignment, especially if it reduces the time you would need to spend onsite. The benefits of getting rid of whiteboard coding and that particular bit of evaluation bias outweigh the possible inconvenience. If you're looking for a job, you have to budget time to interview. So take half a day off if you can to do this project. Just because the exercise says take home doesn't mean you actually have to do it in your off hours. If you're unable to take time off of work to interview at all, unfortunately, you're going to have a problem getting a new job anywhere.

Which brings us to issue 3, what about the people who will spend more time? This is the most interesting part.

Julia said that candidates get really into solving the problem, and spend more time on it because they're excited about what they're building. That is awesome, but it also makes my blood run cold. At this point you're talking about something that is both a test of your programming abilities but also a creative project. Let's contrast that to the description that Foursquare gives of their take-home portion:

Instead we give out a take-home exercise that takes about three hours. The exercise consists of three questions:
1. A single-function coding question
2. A slightly more complicated coding question that involves creating a data structure
3. A short design doc (less than a page) on how to implement a specific service and its endpoints.
Every question we use is based on a real problem we’ve had to solve and has a preamble explaining the reason we need to solve this problem. If there is an obvious solution with a poor running time we mention it since we can’t help course-correct when the work isn’t being done live. We also provide scaffolding for the coding questions to save the candidate time.
This appears to be designed as an exercise that will only require a lot more time if you are struggling with the solution. Sure you can go nuts creating a crazy creative data structure or design doc, but this is a pretty clinical test. There's few places to add bells and whistles and they're unlikely to get you any brownie points with the interviewers.

These are two processes with the same goal, to reduce bias in our interviewing process, but slightly different tactics in the take-home. I would guess that Slack's take-home is fun and appealing to a set of people, those who enjoy tinkering or creating cool new projects. And it will find those people and make them shine, and probably serve as a good bit of recruiting to make them even more excited about the possibility of working there.

But I can tell you that for someone like me, I would hate being given the "creative" take-home coding problem. I'm happy to write some code to show you that I can. But I don't like to tinker, and I prefer that my creative work be collaborative. It feels like you are wasting my time and instead of making me more excited, it makes me far less excited.

The creative take-home also seems likely to select for those with free time, because if it is really an exercise that some people want to overdo, they will overdo it and you will have a hard time not rewarding that enthusiasm (why shouldn't you!). And while it's ok to ask for a few hours, building something that rewards those who can spend far longer is likely to bias against those who have, say, kids to take care of after work and on weekends, or other activities that limit their free time.

On the other hand, Foursquare's test is the first take-home I ever read about where I thought, "yes, that I would do." It is respectful of my time, and gives me something constrained to complete. We can elaborate on creative topics in the in-person interview, I do much of my best creative thinking with other people, elaborating on ideas together. Of course, I am a strong in-person communicator, and having a process that pushes the creativity into the in-person interaction interviews selects for people like me.

In the end, you probably want to hire a spectrum of people. When thinking about how to design your take-home tests, make sure that you are being considerate of the candidate's time, and decide if you really need this test to be a test of both technical skills and creativity, or merely a screening for technical skills. It may be that different roles need different screenings, and you may even want to offer both! That's a lot to ask for, but no one ever said that fixing hiring was going to be easy.

Thursday, April 28, 2016

Ask the CTO: Getting Information without interruptions

This is reposted from my Ask the CTO series on O'Reilly

The problem: I don’t know how to figure out what’s happening on my teams without asking people directly, and it’s driving them crazy!


I am in charge of a team that is running a big, critical project. My boss is breathing down my neck at all times, asking me for status updates, wondering when different pieces are going to be done. Usually, I know what is happening because I’m doing some of the work myself, so I can give the answers because I’m deeply involved in the project. This time, however, I’m not actually doing any of the work, and I don’t even have the time to attend every planning meeting and standup. When my boss asks me for status updates, I feel that I have no choice but to ask a member of the team.

Of course, there’s also the outages. Did I mention that this big project is to replace a system that is crumbling and causing incidents all the time? So, if I’m not asking about project status, I’m trying to figure out why we’re down, what the status of the incident is, if there’s anything I can help with.

I just got some feedback from the tech lead: I’m driving the team crazy with my one-off questions. They complain that I’m distracting them, interrupting their work, and derailing other important discussions to get status updates. I don’t want to do that, but I don’t know how to figure out what’s going on without tapping someone on the shoulder to ask. What do I do?

The solution: Know where to look


When you are embedded in the team you are leading, the details of their projects are the air you breathe. But add on more teams or a level of management and all of those details provide hard to discover. How do you get the status of projects when you are a step or two removed? How do you figure out what is going on without constantly tapping an engineer on the shoulder and asking her for a status update?

To begin with, you have to know where to look. If it is important that you know the status for projects at the drop of a hat, then you should learn how to read the project management dashboards the team is using. If the teams are not using task tracking and project management software, at this point I must ask, why not? It is one thing to avoid the overhead of formal task tracking in a very small group. However, as the team grows, you need to have transparency into progress. This means that you need a sense of what the tasks that make up “progress” look like, which requires some sort of breakdown and tracking, even if it is in the form of sticky notes on the wall.

This also goes for your production incidents. What is your incident management process? Are incidents managed via chatroom, or by people sitting in the office together? Do you have a postmortem process? If ongoing incidents don’t have a communication process and structure, it might be time to throw something together.

Knowing where to look is the first element, one that managers often skip over too quickly. You owe it to your team to look at the existing tracking software and look for the status there first, before firing off questions directly to the engineers. However, looking through the project tickets and work-in-flight status does not always provide the full picture. Sometimes the person managing the project tracking is themselves a bit disorganized. Sometimes the details that you are concerned with are not represented, or not represented well. Then what do you do?


Practical advice: Manage up, read the room, and let the team lead


Let’s break down this situation further, because “then what” really depends on the time and the urgency, and the reasons behind your question.

Scenario 1: You’re asking because your boss asked about the project and you didn’t have a great answer.

That’s fine! But does your boss need more details this second? Or can you ask the team for details in the context of scheduled meetings or via less-urgent communication channels (email or chat)? Usually when the boss asks us something, that makes us want to jump quickly. What we really need to do is just make a note to get more information and then get it when the time is right. This requires some managing up. Predict the questions that your boss tends to ask, and have some general answers prepared. That will give you some ammunition to use as you push back for time to get details on specific concerns. And you will need to push back occasionally. The worst outcome is having your boss go directly to the team themselves and ask for a status update.

Open office plans provide the most tempting setup for interrupting people. Picture: you have a question you’re itching to get an answer for, the manager of the team is in a meeting that you can’t interrupt, but there’s one of the engineers sitting there typing away. The temptation to tap him on the shoulder and ask can be overwhelming. I know it is for me. If you’re going to do this, for goodness sakes, respect the signals. Is this person wearing headphones and look really focused? Or is he chatting, walking around, or otherwise out of the zone? Be sensitive about your impatience versus their focus, and apologize when you are forced to interrupt their work. Apologies don’t cure all evils, but making an effort here is important.

Scenario 2: You’re asking because launch of this work is imminent, or this work relates to an immediate production issue and you need nuanced details that only a developer can provide.

In this case, you may very well need to interrupt someone and reading the room will be essential. There are times when managers, for whatever reason, are themselves either occupied with something else or don’t understand the nuance of the situation you are concerned with, e.g., a release is at-risk and you don’t know why. Sometimes, asking the developers for details will tease out decisions that could change a release from “late” to “on-time minus a non-essential feature.” The goal is to do this very sparingly, and use it as a teaching opportunity instead of a chance to undermine.

In the book Turn the Ship Around (an essential leadership text for engineering teams that I've recommended several times already), the authors spend a lot of time discussing how to resist taking the wheel every time a small problem happens and thus undermining the leadership and ownership of the members of the team. The way you choose to ask for details in the moment can make the difference between encouraging leadership from the team and enabling learned helplessness that causes the team to be paralyzed without you around. Instead of barking orders in a production outage or telling people to cut features close to a launch, practice using these situations to ask them what they intend to do. Asking what they intend to do puts the ball in their court, and forces them to practice taking leadership of the situation. You want the whole team, not just the tech lead or manager, to feel capable of speaking up and sharing leadership for resolving these situations. Use this direct interaction to push that team-driven leadership and encourage the team to be in charge of situations instead of victims of them.

Scenario 3: You’re asking because an idea occurred to you, you just heard of another team doing a related task, or you have an unspecified concern about the project and knowing more details will help clarify.

Getting details from the developers directly is often useful when you have an intuition nagging about a project. Again, the trick here is to do it without interrupting a developer who is in the middle of flow. Unfortunately, you’re the boss. Even if you try to communicate via non-urgent channels like chat or email, often the person will notice and drop everything. Some situational awareness helps. If you work in the office with them, try to wait until they are getting up from their desk, going to lunch or coffee, or notice when they are out of the zone. If they’re not in the office with you, encourage them to make active use of chat status to indicate when they can be bothered or not, or make it clear at the outset of your chat or email that the topic not urgent but you would like to get some details later.

Have you heard of “swoop and poop management?” It describes what happens when a manager or other senior party unfamiliar with the details drops by the team, makes a suggestion for something the team should look at, and then leaves. Swooping down, pooping out an idea or comment, and flying away without seeing the effect of the interaction is, at best, a distraction or annoyance and, at worst, causes people to start focusing on a causal idea instead of the important tasks at hand because The Big Boss said so.

In all of these situations, you want to defer to the team’s immediate manager, tech lead, or product manager as it makes sense. Going directly to the team should be an occasional event, not a regular occurrence. It’s not that you can’t have a relationship with the team directly, but it is the job of the tech lead or manager to manage and communicate project status specifically so that the engineers can focus. When you do choose to go to the team, do it productively and positively. Use it to share ideas with the developers, help them grow as leaders, and affirm your connection with them in a productive way.

Find more of my columns here!

Tuesday, April 12, 2016

Ask the CTO: Navigating the hands-on to hands-off transition

I'm writing a new series for managers on oreilly.com. Here is a teaser for my first post!
The problem: I’m a new manager and I’m overwhelmed

A few months ago, I was given the opportunity to run a new team in a growing area of my organization. The company thought I was ready for a bigger leadership role, and I agreed. I think I have a good track record that prepared me well for this: I've been a successful tech lead, had a couple of direct reports, and created some great software. I was excited to grab this chance to lead a large team and set the technical direction for part of the business, so I gladly took the job.

But lately, I've begun to realize that instead of having a few hours of meetings a week, more than half of my time is booked before the week begins. I have regular 1-1s, touchbases with peers across tech, and meetings with the product team for planning, interviews, and strategy sessions, and the list goes on. I was excited to write some code for the new systems my team is building, and I've tried to, but the only time I can find is in the mornings before everyone arrives, or at night after I've eaten dinner. I'm totally stressed out trying to balance all of this, and my team is starting to complain that I am canceling my meetings so I can have more time to code. So, my question is: how do I learn to balance my days so I have time for all these meetings and management tasks, but also time to write code?

The solution: Think of management as real work

Let me ask you a question, before I begin. Do you believe that management is real work? Do you believe it is work that is just as valuable as coding, architecture, and other hands-on technical tasks?

We start with that because if you do not think that management is real work, you will continue to treat it like an unfortunate burden to be dealt with so you can get to the fun stuff. Which, sadly, is the attitude that many managers in tech seem to take. I say that this is sad because it causes not only suffering on the part of the manager, but a lot of damage to the team as well. Your team needs you to be there as a manager for them, and it is almost impossible to show up fully for a role that you don't think is valuable. Management is work, and it is important that you put your management priorities front and center.

Great managers know that their value comes from the ways in which they create functioning teams, focused on the right work. When you can do that job well, you will almost always create greater value to the organization than you would as an engineer; yes, even a 10X engineer. If you shortchange your role as manager by fighting for code to write and technical work to deliver, it's likely that you will end up, at best, totally stressed out, and at worst, slowing down your team.

Sometimes we managers feel that we have to keep writing code because otherwise things won't get done. However, if your team isn't getting enough work done, your job is not to do the work yourself. Your job as a manager is to figure out how to help the team get more done. It is incredibly hard to make the overall team stronger and spend half your time focused on doing the work yourself, especially when half your time is actually overtime. Yes, immersing yourself in the technical details may give you some insights into the day-to-day challenges of your team, but you can also get many of those insights by training your team to notice them, and helping your team know what to tell you.

Going hands-off is hard, even if you do value the work of management, because you no longer have a simple, easily measurable thing to look at. Experienced engineers get frequent strings of wins as they write code, fix bugs, and see new features come to life. The feedback loop as a manager is much, much slower. There's no red-green-refactor for guiding humans. It's okay to mourn the loss of that endorphin hit. You spent years getting good at the job of being an engineer, and now you have to do something that you probably aren't as good at yet. It will get better, but it will not be the same.