Buy My Book, "The Manager's Path," Available March 2017!

Wednesday, September 6, 2017

How do managers* get stuck?

*May also apply to senior ICs

Earlier this year, I wrote a piece called “How do Individual Contributors Get Stuck?” This was an attempt to help ICs provide constructive feedback to their peers, by identifying common challenges that I have seen developers struggle to overcome. 

This piece is a little bit different. I want to answer the question that I often hear from first-line managers, that is, managers who manage only individual contributors: “How do I get promoted to the next level of management? How do I prove I’m ready to manage managers?” 

Managers often believe that if they are handling the demands of managing their team, they should naturally be promoted to manage more people, bigger teams, teams of teams, as quickly as such an opportunity comes about. And yet, just as often as those opportunities come about, someone else is chosen, a person is hired from outside the company, and the eager manager is passed over. You’re stuck. When you find yourself “stuck” in terms of management career progression, what might really be happening?

Usually, getting stuck as a manager falls into one or more of the key areas of management: Failing to manage down, failing to manage sideways, and failing to manage up.


Scenario One: You aren’t actually scaling yourself effectively, aka, failing to manage down

You may think that you’re handling your team well, but when you look at your schedule, you’re working nights and weekends and then some to juggle all of the new tasks that managing a team entails. Sure, there are some companies which expect that from everyone, but it’s rarely a sign that you’re using your time effectively. Look at your team. Is it a well-oiled machine? Do you feel like the team is able to operate independently, get things done, without you micromanaging every detail? If not, you’re probably stuck on the basic needs of your current job. Some examples of this include:
  1. Can’t delegate. Look at all hands-on tasks you own, and ask yourself whether you are the only person who could be completing these tasks, or whether you could assign them to another senior engineer. If you are spending a lot of your time doing hands-on work that someone else could be doing, you probably aren’t delegating effectively.
  2. Not training your team. If there are too many tasks that only you can complete, you have made yourself a key dependency for your team. Who are your potential successors, and have you spent time training them on the things only you can do?
  3. Not enough attention to the process. Is your team drowning in alerts with no end in sight? Why haven’t you spent the time to allocate people to fix that? Have you spent time paying attention to the way work is assigned in your team? Do you actively participate in the planning process? When was the last time you tried changing it to see how it could improve? Process is part of your life now, and you need to tend to it.
  4. Won’t say no. If your team is completely overwhelmed with work, well, it’s partially your fault. You are the manager, and you are the person who is responsible for pushing back on the work commitments for the team. 

Scenario Two: You haven’t shown that you can expand beyond your team, aka, failing to manage up

Maybe your team is running well enough, but that’s all you’re doing. Opportunities for advancement are usually given to people who show up for those opportunities. You can easily get stuck by just getting comfortable in the place that you’re sitting. Someone who is failing to manage up often exhibits one of the following problems:
  1. Doesn’t attend to the details. Everything from clearly communicating the things that your team has accomplished, and sharing challenges or setbacks, to keeping your manager in the loop about major design decisions or roadmap changes, these details all matter. The best managers push information up without being asked and are quick to provide more details as necessary. Your manager wants to know that you are paying attention to what is going on.
  2. Complains a lot about things that are not working well, but never volunteers to fix them. If you are free with your criticism about the way things work, but don’t feel the need to do anything more than complain, you are holding yourself back. Instead of complaining, volunteer to lead the initiative to fix that team-wide problem. Bring problems and solutions.
  3. Drags her feet when given a clear task that is outside of her comfort zone. I have seen so many managers fail at the simple act of taking a clear assignment and seeing it through to completion. When your manager asks you to do something, either do it, or say you can’t/don’t really want to. But don’t just drag your feet and fail to do it.
  4. Doesn’t show a professional face to more senior managers. Do you openly look bored, distracted, or impatient in meetings? Do you write emails that communicate clearly? Do you think your manager would be comfortable having you present to her peers, alone? Your verbal, written, and body language communication is more and more important the more senior you become, and if you are lazy here it can hold you back.

Scenario Three: Fails to show peer leadership, aka, failing to manage sideways

Some people have well-running teams, they jump on fresh assignments, and yet they still get stuck. This is often due to the fact that your manager knows that she cannot put you as the manager to any of your existing peers. You are stuck because you haven’t shown enough peer/relationship management. This can sometimes look like:
  1. Doesn’t build strong peer relationships. If you spend most of your time focused downward or upward, you’re missing a step. When was the last time you helped out one of your peers? How often do you spend time with your peers, 1–1? Do you seek out feedback from your peers on your ideas, or ask them for help with your challenges? Having peers who trust and respect you, and more to the point, who might want to work for you if they had to, is needed for successful growth.
  2. Doesn’t look for additional tasks. You should be looking for opportunities to lead projects or initiatives outside of your team. How can you help your peers? You may be the best person to lead your area, but if you rarely push yourself outside of that area to talk to others and see places you could volunteer to improve, you’re missing a critical element of leadership.
  3. Doesn’t create a compelling vision or strategy that others want to buy into. You might have a clear roadmap for your team, but how much have you thought beyond your team? Have you ever shared any ideas you have for the larger group with your peers, and gotten their buy-in? Many people think that strategic thinking starts and stops with forming the strategy itself, but getting people around you excited by your ideas is critical to achieving them. 
  4. Doesn’t seem like someone a manager would want to report to. Ultimately, if you want to manage managers, a manager should want to work for you. That means that you are going to help train them, help them grow their career. You’re not going to be spending all of your time telling them exactly what to do, when to do it, no questions asked. If one of your peer line managers wouldn’t want to work for you, you might be stuck on giving the impression that you’re looking to progress solely so that you can acquire more power and influence.

What about senior ICs?

As you may have noticed, this advice applies to more than first-line managers. Many senior individual contributors start to trip on these issues. They can crank out code for days, but communicating, getting buy in, and going outside of their comfort zone stops their progression. Few senior ICs get promoted beyond a point on the strength of their ideas and code alone. 


Getting Unstuck

How do you get out of your rut? It starts by noticing where you are stuck. I noticed something that I’m not doing well just writing this list! Be honest, which of these are you really doing well at, and which are you failing? If you brought this list to your manager, what would they say? There’s only one real way to find out, so think about it, ask for feedback, and start to formulate your plan of attack for the things that are holding you back.
For more ideas, check out my book, The Manager’s Path, which addresses many of these stuck points!

Friday, January 6, 2017

How Do Individual Contributors Get Stuck? A Primer

Occasionally, you may be asked to give constructive feedback on your peers, perhaps as part of review season. If you aren’t a naturally critical person but you want to give someone a valuable insight, you may find this task daunting. To that end, I suggest the following:

Pay attention to how they get stuck.

Everyone has at least one area that they tend to get stuck on. An activity that serves as an attractive sidetrack. A task they will do anything to avoid. With a bit of observation, you can start to see the places that your colleagues get stuck. This is a super power for many reasons, but at a baseline, it is great for when you need to write a review and want to provide useful constructive feedback.
How do people get sidetracked? How do people get stuck? Well, my friend, here are two incomplete lists to get you started:

Individual Contributors often get sidetracked by…

  1. Brainstorming/architecture: “I must have thought through all edge cases of all parts of everything before I can begin this project”
  2. Researching possible solutions forever (often accompanied by desire to do a “bakeoff” where they build prototypes in different platforms/languages/etc)
  3. Refactoring: “this code could be cleaner and everything would be just so much easier if we cleaned this up… and this up… and…”
  4. Helping other people instead of doing their assigned tasks
  5. Jumping on fires even when not on-call
  6. Working on side projects instead of the main project
  7. Excessive testing (rare)
  8. Excessive automation (rare)

Individual Contributors often get stuck when they need to…

  1. Finish the last 10–20% of a project
  2. Start a project completely from scratch
  3. Do project planning (You need me to write what now? A roadmap?)
  4. Work with unfamiliar code/libraries/systems
  5. Work with other teams (please don’t make me go sit with data engineering!!)
  6. Talk to other people (in engineering, or more commonly, outside of engineering)
  7. Ask for help (far beyond the point they realized they were stuck and needed help)
  8. Deal with surprises or unexpected setbacks
  9. Navigate bureaucracy
  10. Pull the trigger and going into prod
  11. Deal with vendors/external partners
  12. Say no, because they can’t seem to just say no (instead of saying no they just go into avoidance mode, or worse, always say yes)
“AHA! Wait! Camille is missing something! People don’t always get stuck!” This is true. While almost everyone has some areas that they get overly hung up on, some people also get sloppy instead of getting stuck. Sloppy looks like never getting sidetracked from the main project but never finishing anything completely, letting the finishing touches of the last project drop as you rush heedlessly into the next project.

Noticing how people get stuck is a super power, and one that many great tech leads (and yes, managers) rely on to get big things done. When you know how people get stuck, you can plan your projects to rely on people for their strengths and provide them help or even completely side-step their weaknesses. You know who is good to ask for which kinds of help, and who hates that particular challenge just as much as you do.

The secret is that all of us get stuck and sidetracked sometimes. There’s actually nothing particularly “bad” about this. Knowing the ways that you get hung up is good because you can choose to either a) get over the fears that are sticking you (lack of knowledge, skills, or confidence), b) avoid such tasks as much as possible, and/or c) be aware of your habits and use extra diligence when faced with tackling these areas.

Wednesday, January 4, 2017

Hey Diddle Diddle, Data to Fiddle

When I worked in finance ages ago, there was a system used by many (but not me!) that was basically a combination of a gigantic distributed database plus a scripting language that allowed you to run calculations over information in that database. One of the things that you could easily do, as far as I understand, was "diddle" a piece of information. The "diddle" would change that piece of data inside of a particular scope, so that you could quickly see different calculations over the graph, without necessarily persisting that data back to the larger system. This was a useful construct for exploring what might happen with changes to different input data and exploring different scenarios. (The first half of this blog post provides some insights into how the system might have worked).

Whether my understanding is exactly right or not is irrelevant except that this concept of "diddling" stuck with me. There are times when what you want to do is take persistent data, make a small change to it on the fly, and use the results of that change without necessarily persisting it back to the original data set.

I've often thought that this concept is particularly useful in places like personalization. Imagine the situation where you have a complex set of results that you wish to display to a user, like say, the google search results. We all know that the ranking system for google results is a complex beast, relying on a huge amount of precomputed data, for example, the links between pages. But now, to compute that graph with personalization taken into account? You're probably not calculating all personalization vectors on the fly, but when I go to search for "java" you're also probably not doing a lookup of pre-computed personalized results for "java + userid:camille." Instead, you're applying a "diddle" function to the top set of the overall graph, and showing me the results in the diddled order that makes sense for me.

There are two parts to the concept that make it powerful for me. The first is the idea that you are changing things temporarily. To serve large sets of results fast (or, in the case of google, to be able to function at all), you need to pre-calculate a huge amount of data. You're doing a complex piece of work that takes some time, you don't want to have to redo it for every request. However, you don't want to force yourself to store all the work for all possible scenarios up-front. But, diddling in my mind presents a second element: it is only applied to the set of data within a limited scope. You don't diddle across the entire search index. You diddle the first few results, the ones that matter to the user in question.

There can be a ton of technical complexity to implement such a concept in practice. One immediate challenge is that of "diddling" in such a way as to drop results from the top set, thus requiring a re-querying for additional responses to get enough data to satisfy the user. The purpose of this post is not to go into the technical details of how you might implement such a thing, but to show you that you can reframe your thinking on a problem like this through its phases. Just because you have a list of thousands as your first pass of results doesn't mean you need to personalize across that whole data set to get the best results for the end user. If your goal is to get the most personally relevant of the most generally relevant, you probably want to operate on the top of the generally-ordered list, not necessarily the whole list itself.

There's many ways to attack such problems, and I have no idea how companies like Google solve the challenge of personalizing results under the hood. I do know that, to me, the idea of mixing indexed and computed results in personalized querying is a sticky one, and it's an analogy I use frequently. It helps you remember that there is value in the underlying order of the results as provided by the source of truth, and that personalization is often an enhancement on an underlying set of computed data, not the fundamental computation itself. Pulling out to a larger picture, remember that when your data gets in front of a human end-user, they are going to operate on only the tiny surface area they, as a human, can process at any one time. So in cases where you're serving humans, you can apply different patterns on the fly to the human-visible surface area that would be too expensive to apply to the entire data set at large.

Tuesday, November 29, 2016

Building and Motivating Engineering Teams

I have agreed to give a guest lecture for a class at Yale, and they’ve asked me to speak about “building and motivating engineering teams” from the perspective of a smaller startup. The readings for my section include A Field Guide to Software Developers by Joel Spolsky. I remember reading it when it was first written. I admire Joel’s work, and the piece has many valuable takeaways.


However. The industry has changed a lot in the years since this piece was written. In 2007, the options for engineers here in NYC were far more limited. You worked for a bank, a media company, ad tech was starting to blossom, some e-commerce. Google or Fog Creek if you were lucky. There were plenty of jobs but there was far less of a classic “tech company” presence than there is today.

Over the past 9 years, we’ve seen a massive increase in the number of startups here in NYC. Every major tech company has an office here, the Google office alone has thousands of engineers. We’ve also seen an increase in the supply of engineers. Between students realizing the value of a tech degree, people changing careers, and bootcamps rapidly churning out graduates, the mix of types of people who write software has changed. Most of the people I knew writing software in 2007 had tech or tech-adjacent (math, physics) degrees. My team when I left Rent the Runway had a significant number of developers who had moved into tech from other areas, some without degrees at all.

All of this is to say, playing to 2007 nerd stereotypes is not always a good way to build a team here in NYC. What DO engineers want? I believe that you have 3 axes with which to twiddle. Depending on your company, you can lean more on one to cover problems with the others, but you have to find your balance.

Money. Joel hides this at the bottom of his piece. It is correct to say that engineers don’t care about money above all else, but far too many people delude themselves into thinking that this means that you can lag industry standard salaries and build a good team on the basis of some other factor.

Salaries have gone way up in the past 10 years. People know that they can get paid a certain amount of money, and if you try to hire people who can easily make 50% more somewhere else, they are almost certainly not going to accept your offer. Figure out what the market spread is. The top will be companies like Google, Facebook, some financial companies. The bottom might be non-profits or extremely early startups with significant equity grants. Even a startup with 10–20 people is going to struggle to hire without paying close to market rate. Engineers are expensive. New engineers are expensive. Senior engineers are really expensive. They know their value. You have to pay them.

You are building a company. You are not going to have a perfect, smooth ride. Things will go well, things will go poorly. Very few teams have a straight shot to greatness that is so clear it can cover all management woes. When you don’t pay people well enough, you contribute to undermining their resilience in the face of problems at work. Think of it as the baseline of Maslow’s Hierarchy. Money does not solve all problems for most people, but lack of money exacerbates all irritations.

Purpose. You’re building a company. You want to inspire people to work for you, so you sell them on the mission of the company, the product they’ll be working on. You have to do this because you are not building a company with a bunch of insanely hard tech problems. If you are, you can lean on that to hire people, but to be real, these days most of us are not. The days when everyone had hard technical scaling problems have ended. Scaling is not the same bold new frontier that it was even 5 years ago. Sure, there are always technical challenges to be found in organizations, but for many companies those technical challenges revolve around matching technology with the product and business. This means you have to work harder to sell the learning opportunities.

Leadership undermines their teams when they refuse to let engineers into the non-technical decision-making processes. In Joel’s article he talks about developers wanting to be allowed to make decisions within their own realm of expertise, which is certainly a bare minimum. I would encourage you to go further than that. If you are building a product-focused business, where the challenges are less purely technical and more about engaging a customer, your engineers need to feel connection to that business. They need to feel like they understand it, like they can have ideas about it. This is why I’m such an advocate for cross-functional product development teams. Put engineering, product management, marketing, operations together as a group and let them work as a team to solve problems. Don’t just throw work over the wall to engineering and expect them to implement it.

Respect. The undercurrent that I like least in Joel’s piece is the undercurrent that engineers need to be coddled and pampered. Giving them “toys” instead of “tools,” keeping them out of the politics. There are plenty of engineers who want to be given hard work to do and left alone, in their private offices, to think and code. But there are increasingly more engineers who want to build businesses, and they want to be treated like adults in the process.

Our engineering teams are not overgrown children. They are not idiot-savants who can produce software but must be given sufficient cookies to do so. They are highly paid professionals. Let’s treat them that way. You should expect your engineers to show up for your business. Respect is not pampering, it is not treating the team like the stars of the show. Rather, respect is challenging the team to show up and grow up. Respect is giving them clear, achievable goals and holding them accountable. My experience has been that most great engineers want to work somewhere that inspires them to achieve. Many of us stop at the idea of “hard technical problem” when we think about inspiring our engineering teams, but challenging them to partner with people who have different perspectives is another way you can help them grow.

Respect that engineers are smart individuals who often have more to add to your business than just their coding talents, and teach them to respect that the other parts of the business have equally valuable skills and perspectives. Engineers don’t need to feel like the company royalty to be inspired to do good work, but they do need the opportunity to be treated like a partner.


These days, you have a lot of competition for talent, but you also have a lot of talent to choose from. Understand your company’s positioning. If you can’t pay top of market, you will have to rely on a balance of finding undeveloped talent and giving engineers other reasons to want to work for your company. For most of us, that means giving them a voice beyond the purely technical, and challenging them to see and understand perspectives outside of engineering.

Friday, August 19, 2016

Microservices: Real Architectural Patterns

A dissection of our favorite folk architecture


Introduction


I’m fascinated by the lore and mystery behind microservices. As a concept, microservices feels like one of the most interesting folk architectures of the modern era. It’s useful enough to be applied widely across different usage patterns and also vague enough to mean many different things.

I’ve been struggling for a while with understanding what people really mean when they discuss “microservices.” Despite deploying what I would consider to be a version of that pattern in my last gig, it’s quite clear to me that the architecture we used is not the same as the pattern that all other companies use. Recently I finally interrogated someone who has deployed the pattern in a very different way than I have, and so I decided it would be illustrative to compare and contrast the circumstances of our architectures for those in the larger technical audience.

This article is going to have two examples. The first is the rough way “microservices” was deployed in my last gig, and why I made the decisions I made in the architecture. The second is an example of an architecture that is much closer to the “beautiful dream” microservices as I have heard it preached, for architectures that are stream-focused.


Microservices Basics


I think that microservices as an architecture evolved due to a few factors.
  1. A bunch of startups in the late 2000s started on monoliths like rails, scaled their business and team quickly, and hit the wall on what could reasonably be done in that monolith
  2. The cloud made it significantly easier to get access to a new server instance to run software
  3. We all got much more comfortable with the idea that we were dealing with distributed systems and in particular got comfortable making network calls as part of our systems
This combination of factors — scaling woes, easy access to new hardware, distributed systems and network access — played a huge part in what I might call “microservices for CRUD.” If you have managed to scale a company to a certain level of success on a monolith but you are having trouble scaling the technology and/or the engineering team, breaking the monolith into a services-style architecture makes sense. This is a situation I encountered first-hand.

The arguments for microservices here look something like:
  1. Services allow for independent axes of scaling. If you have a part of the system with higher load or capacity requirements than other parts, you can scale to meet its needs. This is certainly doable in a monolith, but somewhat more complicated to reason about.
  2. Services allow for independent failure domains, to a more limited extent. Insofar as parts of your system are independently operable, you may want to allow for partial availability by splitting them out into services. For example, in a commerce app, if you can serve the checkout flow even when the product search flow is down, that might be considered a good thing. This is much more complicated in practice than it is in theory, and people make many silly claims about microservices that imply that any overlap in services means that they are not valuable. Independent failure domains are sometimes more of a “nice to have” than a necessity, and making the architecture truly account for this is not easy.
  3. Services allow for teams to work independently on parts of the system. Again, you can do this in a monolith. I have done this in a monolith. But the challenge with monolith (and a related challenge with services in a monorepo (single source repository)) is that humans struggle to tangibly understand domains that are theoretically separate when they are presented as colocated by the source code. If I can see all of the code and it all compiles together and feels like a single thing, my tendency is to want to use it as a single thing. Grab code from here to use there, grab data from there to use here, etc.
A few more notes. “Monolith” and “monorepo” often get tangled up when talking about this world. A monolithic application is one where you have a set of code that compiles into a single main server artifact (possibly with some additional client artifacts produced). You can use configuration to make monoliths do almost anything you can imagine, including all of the services-type things above, but the image produced tends to include most if not all of the code in the repository. This does get fuzzy because sometimes teams evolve their monoliths to compile to a couple of specialized server artifacts via a combination of build tooling and configuration. I would generally still call this a monolithic architecture.

Monorepo, or monolith repository, is the model where you have a single repository that holds all of the code for any system you are actively changing (so, possibly excluding the source code for your OSS/external dependencies). The repository itself contains source code that accounts for multiple artifacts that are run as separate applications, and which can be compiled/packaged and tested separately without using the entire repository. Often this is used to enable certain shared libraries to change across all of the services that use those libraries, so that developers who support shared libraries can more easily evolve them instead of having to wait for each dependent team to adopt the newest version. The biggest downside of the monorepo model is that there’s not much OSS tooling that supports this, because most OSS is not built this way, so large investments in tooling are usually needed to make this work.


Microservices for CRUD-based Applications


Before I get to how to evolve a CRUD monolith to microservices, let me further articulate the architecture needed to build your traditional mid-sized CRUD platform. This type of platform covers a use case that is pretty well-trod, that of “transactions” and “metadata.”

Transactions: User does an action that you want to persist, consistency of data is very valuable. The “Create, Update, Delete” of CRUD. Much less frequent than the “Read” actions of CRUD. 
Metadata: Information that describes things to the users, but is usually only modified by internal content creators, or rarely by external users (reviews, for example). Changes less frequently, often highly cacheable. Even more, can often tolerate a degree of temporary inconsistency (showing stale data).
Are there more things that CRUD-heavy companies want to do, especially in the analytical space here? Sure. You may want to adjust results frequently based on user behavior as the user is browsing the site, and other personalization actions. However, that is a hard thing to do real-time and you don’t always have the volume of data you need from the user to actually do that well, so it isn’t generally the first-order concern of the system.

The process for moving off of a monolith in this type of architecture is relatively straightforward:
  1. Identify independent entities. This paper by Pat Helland, “Life Beyond Txns”, has some useful and interesting definitions there. It’s better to go a little bit too big early than to go too small and end up having to implement significant distributed transactions. You probably want data-owning services for the major business objects(products, users, etc), and then sets of integration services that implement aggregations and logic over those objects.
  2. Pull out the logic into services entity by entity. Try not to change the data model as much as possible in this process. Redirect the monolith to call APIs in the new services as functionality is moved.
That’s basically it. You pull pieces out until you have enough to cover a particular set of user functionality in data and integration terms, then you can start to evolve that part of the user functionality to do new things in the services.

These services are not classic SOA, but nor are they teeny-tiny microservices. The services that own the data may be fairly sophisticated. You may not want to have too many services because you want to be able to satisfy requests from the user without having to make a ton of network hops, and ideally, without needing to do distributed transactions.

You are probably not making new services every day, and especially if you have a sub-50-person engineering team and a long product roadmap, you may not want to invest extensive engineering time into complex orchestration and tooling that enables people to dynamically add new services at the click of a button (nb: the products to support this are getting better all the time, and so at some point this will be worth doing even for that smaller team. It is unclear to me whether that time is now or not.).

The equation to apply for determining how much to invest in tooling is pretty straightforward: how much time does it cost devs to have a less automated process for adding a new service, vs how long does it take to implement and maintain the automation for doing it easily, and how many new services do you expect to want to deploy over time? You’re making a guess. Obviously, if you think there is value to enabling people to spin up tiny services fast and frequently, it is better to invest time and tooling into this. As with all engineering process optimization decisions, it’s not a matter of getting it perfectly right, but rather, of deciding for the foreseeable future and periodically re-evaluating.

There are many microservices “must-haves” in this instance that I have found to be anything but. I mentioned extensive orchestration above. Dynamic service discovery is also not needed if you are not automatically spinning up services or moving services around frequently (load balancers are pretty nice for doing this at a basic level).

Allowing teams to choose their ideal language, framework, and data store per service is also certainly not a must-have and in fact it’s likely to be far more of a headache than a boon to your team.
Having independent data stores for the services is also not a must-have, although it does mean that you will have a high-risk SPOF on the shared database. As I was writing this piece I discovered a section of some writing on microservices from 2015:
Create a Separate Data Store for Each Microservice
Do not use the the same back-end data store across microservices. … Moreover, with a single data store it’s too easy for microservices written by different teams to share database structures, perhaps in the name of reducing duplication of work. You end up with the situation where if one team updates a database structure, other services that also use that structure have to be changed too.
This is true, but for smaller teams you can prevent sharing of database structures by convention (process and code review, and automated testing and checking for such access if it is a huge worry). When you carefully define the data-owner services, it’s less likely this will happen. And the alternative is the next paragraph:
Breaking apart the data can make data management more complicated, because the separate storage systems can more easily get out sync or become inconsistent, and foreign keys can change unexpectedly. You need to add a tool that performs master data management (MDM) by operating in the background to find and fix inconsistencies. For example, it might examine every database that stores subscriber IDs, to verify that the same IDs exist in all of them (there aren’t missing or extra IDs in any one database). You can write your own tool or buy one. Many commercial relational database management systems (RDBMSs) do these kinds of checks, but they usually impose too many requirements for coupling, and so don’t scale.(original)
This paragraph probably leads to sighs of exhaustion from anyone with experience doing data reconciliation. It’s due to this overhead that I encourage those of you in smaller organizations to at least evaluate a convention-based approach before deciding to use entirely independent and individual data stores. This is a decision you can delay as needed.

This version of the microservices architecture is very compelling for the scaled CRUD world because it lets you do a rewrite piece by piece. You can do the whole system, or you can simply take out pieces that are most sensitive to scaling. You proactively engage with many of the bits of distributed systems complexity by thinking carefully about the data and where transactions on that data will be needed. You probably don’t need a ton of fancy data pipelines floating around. You know where the data will be modified.

Do you have to go to microservices to scale this? Probably not, but that doesn’t mean using microservices to scale such systems is a bad idea. However, going extreme with the microservices model may be a bad idea, because you really don’t want to slice your data up in a way that ends up in distributed transaction land.


Microservices For Data Stream Processing


Now, let’s talk about a very different use case. This use case is not your classic CRUD application, thick with business rules around transactionally-updated objects. Instead, this use case has a large pipeline of data. It has small bits of data flowing into it from many different sources, a very large volume of many bits of data. This large volume of input data sources also has many different services that will consume it, modify it, and pass it along for further processing.

The major concern of this application is ingesting large quantities of ever-changing data, processing it in various ways, and showing a view of it to customers. CRUD concerns are secondary to the larger concerns of keeping up with the data stream and recalculating information based on what is happening on that stream.

Let’s take a metrics-aggregating SaaS application, for example. This application has customers all over the world with various applications, services, and machines that are reporting out metrics to the aggregator. These customers only need to see their data, although the combined total of data for any one customer may be very large. Our aggregator needs to consume these metrics and send them off to the application that is going to show them to the customer. The customer-facing application may be operating on a combination of incoming metrics in real-time plus historical data that comes from cache or a backing storage system. A large part of the value of the data is in the moving-window of what is happening right now/recently.

This architecture from the start has considerations of volume that even our scaled CRUD world may not care about for a very, very long time. Additionally, the data itself is mostly a stream of updates over time. The notion of the “stateful” data that is transactionally updated is minimal, the most useful data is more like a timeseries or log of events. The transactional data, say, stored user views and user configuration, may be more like the “metadata” of our CRUD application in the first example, infrequently changed compared to the updates coming in from the stream. The majority of developer time is most likely spent not in dealing with these transactional changes but rather in managing the streams of inputs, providing new types of inputs, applying new calculations to the stream of inputs, and changing the calculations.

In this example, you can imagine a service that wants to run an experiment by doing a different calculation across a particular element on the stream. Instead of modifying the existing code, the experimental service listens to the data stream at the same point as the existing calculation, provides a new calculation value, and pushes that calculation value back into the data pipeline on a different channel. At some point an experiment service pulls this data out for the customers who are assigned to the experimental treatment and shows the results of that calculation instead of the standard calculation. In all of these places you need a record of what happened in order to do analysis of experiment success and debugging, but that record does not need to be strongly, transactionally related to the record of other events in the system at this time, even across related users.

In this example, it may very well be much more effective to spin up new services as needed, in order to run quick experiments, rather than changing existing services. Especially in cases where the service can do this without needing to worry about coordinating the data consumption or production with any existing service. This is the world of what I would like to call “stream-centric microservices.”

If there is enormous value to your business to manage real-time data streams, and you are going to have a lot of developers consuming those streams by creating new services to listen to them and produce results, then you absolutely must be willing to commit to the investment in tooling to make the process of creating services and putting them into production as easy as possible. You will probably use this for all of your services over time, once you have it, but realize that the clear value is that you have dynamic data that can be processed and manipulated and experimented on independently.


Cron Jobs as Microservices


I’d be remiss if I didn’t mention this pattern. When it becomes very easy to make anything a microservice, everything becomes a microservice, including things we would traditionally run as cron jobs.

But cron jobs are a nice concept, and not everything has to be a “service.” You can use CloudWatch Events from AWS for this purpose, or scheduled Lambda functions. Use Gearman, a queue and async job runner, to schedule cron jobs. Remember your cron jobs need to be idempotent (can be run twice on the same input without changing the outcome). If you have an easy way to spin up services and it’s easy to create tiny services that are basically cron jobs, no problem, but cron jobs in and of themselves are not a great reason to create a large, orchestrated services environment.


Conclusion


I hope that this has been a useful breakout across a few axes of the wild world of microservices. Going through the thought experiment was very useful for me, personally. It helped me understand how what seems obvious to people at one extreme, say those who spend most of their time focused on stream processing, doesn’t make as much sense for people who are more focused on the world of CRUD application scaling.

(This was originally published on medium)

Monday, July 18, 2016

The Virtue of Hubris and The Value of Complaining

In my previous post, I discussed the leadership virtues of Laziness and Impatience. But as you may know, I neglected one of the core virtues in my list, namely, that of hubris. Hubris. Pride. As Larry Wall says,
Excessive pride, the sort of thing Zeus zaps you for. Also the quality that makes you write (and maintain) programs that other people won't want to say bad things about. Hence, the third great virtue of a programmer.
I would translate this as taking pride in one's work, and being willing to not just take pride in it, but show off that work, talk about it, teach others its magic. And hubris is important. One of the challenges of impatience is that it sometimes drives us to cut corners. Cutting corners can make work go faster, but it can also have a price in the long run. So we balance that desire to cut corners with a desire to maintain pride in our work, and use those conflicting values to keep each other in check.

Hubris done well in my opinion has some interesting expressions. You may think of the person who takes pride in their work as someone who loves to learn and share new things. Who loves to brag about the good stuff. This is certainly part of hubris, sharing lessons learned and trying to help others by showing off our wins. Many tech teams encourage this actively, through rituals like "drinks and demos" where teams get up to share what they accomplished during the week. We encourage people to write up cool stuff we've built, to go speak at conferences and talk about cool technology, and this is all a great thing to do.

However, I think there's more to it than just showing off the good stuff. Within a team, hubris also shows in people who are willing to complain about the bad stuff. Yes, that's right, I think that there is value to expressing not just the positive, but also the negative. In fact, I think that you are actively harming your culture and creating a culture of false pride when you only encourage people to speak up to share good things.

Complaining is all about context. The problems we are facing are our context, and the solutions to those problems must be made within understanding of that context. Context is what makes microservices right for one team and wrong for another. Context is what makes hiring a certain way successful in a high-growth startup but devastating in a big company. Context is so important that when you misunderstand the role that it plays in a solution, you run the risk of misapplying that solution to a place where it will cause you more problems than it solves. Applying someone else's lessons to your context without understanding is how we end up with these cargo cult solutions.

So, the details of the problem are pretty important for putting the solution we're bragging about into context. But here's the thing. If you squash people who want to complain or criticize, you lose the details of your problems. Those complaints contain the details!

Does your company have a practice of telling people to "bring solutions, not complaints?" That is at best hiding problems, not avoiding them. It is unrealistic to expect people to be able to solve every problem they see in front of them. I mean, can you do that, really? It is hard enough to expect your executives to be able to do this, believe me, I know. Your team is going to see problems that they will not know how to solve, and to tell them to keep that to themselves until they figure out the solution is a great way to avoid dealing with real issues.

Instead, I encourage you to ask people to give you details when they have complaints. Help them put their complaints in context. If they complain a system sucks, ask them why. Maybe the answer is that they don't like the formatting standards, in which case an appropriate response might be, unfortunately not everything goes your way. On the other hand, maybe the answer is that it takes them a long time to make changes because the system has no tests and breaks easily, in which case, perhaps you want to think about actually fixing that problem.

If you do this well, you actually teach people how to understand which problems are important, and which problems are not. Letting people complain might seem like it will do nothing but encourage negativity and drama, but if you guide people to learn from their complaints it can instead help your team grow. It's great when people can bring problems AND solutions to you simultaneously, but it's more likely that they will need help to see the best solution. Helping them see the best solution starts by helping them understand how to state the problem.

We are going to have disagreements and conflict in our teams. None of us sees the world in the same way, and that is good. We form teams because as a group, sharing our perspectives, we can create things that are greater than the sum of their parts. Trying to create conflict-free environments is a fool's errand. But you can guide conflict and complaints to result in an increased understanding of context. Instead of discouraging all disagreement, push people to be specific about their thoughts and concerns, and attempt to understand them. As a leader, ask questions to tease out details, and show that you are actually interested in the perspectives on your team, even when you might disagree.

Taking pride sometimes means speaking up when something doesn't seem to be right, when something seems to be less than what it could be. Criticism can help us become even better than we are, if we are willing to listen to its details. Please don't smother this in the name of harmony or positivity, because repressing conflict only leads to a false sense of security and prevents us from achieving true greatness.

Friday, June 10, 2016

The Virtues of Laziness and Impatience

This is an excerpt from my work in progress, a book on engineering management. If you're interested in getting occasional updates you can subscribe to my newsletter!

I love the idea of Laziness, Impatience, and Hubris as virtues of engineers, articulated in “Programming Perl” by Larry Wall. I believe these virtues sustain into leadership, and learning how to channel these traits into advantages is something I encourage all managers to do.

As a manager, when you are dealing with people 1-1 you probably don’t want to be impatient, of course. Impatience can be rude when it is directed at individuals. And you don’t want to seem lazy, there’s nothing worse than working for a manager who seems to be taking it easy while you kill yourself to deliver projects. But impatience, paired with laziness, is wonderful when directed at processes and decisions. Impatience and laziness, applied to process, are the key elements to focus.

As you grow more into leadership positions, people will look to you for behavioral guidance. What you want to teach them is how to focus. To that end, there are two areas I encourage you to practice showing, right now: figuring out what’s important, and going home.

I can’t stand watching people waste their energy approaching problems with brute force and spending time rather than thought, and yet, any culture where you are encouraged to work excessive hours all the time is almost certainly doing just that. What is the value of automation if you don’t use it to make your job easier? We engineers automate so that we can focus on the fun stuff, and the fun stuff is the stuff that uses the most of your brain, and it’s not usually something you can do for hours and hours, day after day.

So be impatient to figure out the nut of what is important. As a leader, any time you see something being done that feels inefficient, start to ask the question, why does this feel inefficient to me? What is the value in the thing we are doing? Can we deliver that value in a way that is faster? Can we strip down this project into something simpler and get it done more quickly?

The problem with this line of questioning is that often when managers ask, can it be done faster, what they explicitly or implicitly want to know is, can the team work harder or longer hours to deliver it in fewer days. This is why I encourage you to develop and show the value of laziness. Because “faster” is not about “same number of hours but fewer total days.” “Faster” is about “the same value to the company in less total time.” If the team works 60 hours in a week to deliver something that otherwise would’ve taken a week and a half, they haven’t worked faster, they’ve just given the company more of their free time.

This is where going home comes in. Go home! And stop emailing people at all hours of the night and all hours of the weekend! Forcing yourself to disengage is essential for your mental health, believe me. Burnout is a real problem in the American workforce these days, and almost everyone I know who has worked sustained excess hours has experienced it to some degree. It’s terrible for individuals, terrible for their families, and terrible for teams. But this isn’t just about preventing your own burnout, it’s about preventing your team’s burnout. When you work later than everyone else, when you send those emails at all hours, even if you don’t expect your team to respond to those emails or work those hours, they see you doing it, and think it’s important. And that overwork makes them less effective, especially at the detailed knowledge work that engineers need to perform.

When you are a newish manager, and you haven’t figured out the tricks to do your job effectively, you might find yourself needing to work more hours to get it all done. That is ok, for a little while. But I encourage you to figure out a way to work those hours without encouraging your team to do so, or making them feel obligated to be on your schedule. Queue up the weekend and overnight emails for the next work day. Put your chat status as “away” in off hours. Take vacation and don’t answer email during that time. And constantly ask yourself the same questions you ask your team: can I do this faster? Do I need to be doing this at all? What is the value that I am providing with this work?

Laziness and impatience. We focus so we can go home, and we encourage going home because it forces us to constantly focus. This is how great teams scale.