Buy My Book, "The Manager's Path," Available March 2017!

Friday, January 6, 2017

How Do Individual Contributors Get Stuck? A Primer

Occasionally, you may be asked to give constructive feedback on your peers, perhaps as part of review season. If you aren’t a naturally critical person but you want to give someone a valuable insight, you may find this task daunting. To that end, I suggest the following:

Pay attention to how they get stuck.

Everyone has at least one area that they tend to get stuck on. An activity that serves as an attractive sidetrack. A task they will do anything to avoid. With a bit of observation, you can start to see the places that your colleagues get stuck. This is a super power for many reasons, but at a baseline, it is great for when you need to write a review and want to provide useful constructive feedback.
How do people get sidetracked? How do people get stuck? Well, my friend, here are two incomplete lists to get you started:

Individual Contributors often get sidetracked by…

  1. Brainstorming/architecture: “I must have thought through all edge cases of all parts of everything before I can begin this project”
  2. Researching possible solutions forever (often accompanied by desire to do a “bakeoff” where they build prototypes in different platforms/languages/etc)
  3. Refactoring: “this code could be cleaner and everything would be just so much easier if we cleaned this up… and this up… and…”
  4. Helping other people instead of doing their assigned tasks
  5. Jumping on fires even when not on-call
  6. Working on side projects instead of the main project
  7. Excessive testing (rare)
  8. Excessive automation (rare)

Individual Contributors often get stuck when they need to…

  1. Finish the last 10–20% of a project
  2. Start a project completely from scratch
  3. Do project planning (You need me to write what now? A roadmap?)
  4. Work with unfamiliar code/libraries/systems
  5. Work with other teams (please don’t make me go sit with data engineering!!)
  6. Talk to other people (in engineering, or more commonly, outside of engineering)
  7. Ask for help (far beyond the point they realized they were stuck and needed help)
  8. Deal with surprises or unexpected setbacks
  9. Navigate bureaucracy
  10. Pull the trigger and going into prod
  11. Deal with vendors/external partners
  12. Say no, because they can’t seem to just say no (instead of saying no they just go into avoidance mode, or worse, always say yes)
“AHA! Wait! Camille is missing something! People don’t always get stuck!” This is true. While almost everyone has some areas that they get overly hung up on, some people also get sloppy instead of getting stuck. Sloppy looks like never getting sidetracked from the main project but never finishing anything completely, letting the finishing touches of the last project drop as you rush heedlessly into the next project.

Noticing how people get stuck is a super power, and one that many great tech leads (and yes, managers) rely on to get big things done. When you know how people get stuck, you can plan your projects to rely on people for their strengths and provide them help or even completely side-step their weaknesses. You know who is good to ask for which kinds of help, and who hates that particular challenge just as much as you do.

The secret is that all of us get stuck and sidetracked sometimes. There’s actually nothing particularly “bad” about this. Knowing the ways that you get hung up is good because you can choose to either a) get over the fears that are sticking you (lack of knowledge, skills, or confidence), b) avoid such tasks as much as possible, and/or c) be aware of your habits and use extra diligence when faced with tackling these areas.

Wednesday, January 4, 2017

Hey Diddle Diddle, Data to Fiddle

When I worked in finance ages ago, there was a system used by many (but not me!) that was basically a combination of a gigantic distributed database plus a scripting language that allowed you to run calculations over information in that database. One of the things that you could easily do, as far as I understand, was "diddle" a piece of information. The "diddle" would change that piece of data inside of a particular scope, so that you could quickly see different calculations over the graph, without necessarily persisting that data back to the larger system. This was a useful construct for exploring what might happen with changes to different input data and exploring different scenarios. (The first half of this blog post provides some insights into how the system might have worked).

Whether my understanding is exactly right or not is irrelevant except that this concept of "diddling" stuck with me. There are times when what you want to do is take persistent data, make a small change to it on the fly, and use the results of that change without necessarily persisting it back to the original data set.

I've often thought that this concept is particularly useful in places like personalization. Imagine the situation where you have a complex set of results that you wish to display to a user, like say, the google search results. We all know that the ranking system for google results is a complex beast, relying on a huge amount of precomputed data, for example, the links between pages. But now, to compute that graph with personalization taken into account? You're probably not calculating all personalization vectors on the fly, but when I go to search for "java" you're also probably not doing a lookup of pre-computed personalized results for "java + userid:camille." Instead, you're applying a "diddle" function to the top set of the overall graph, and showing me the results in the diddled order that makes sense for me.

There are two parts to the concept that make it powerful for me. The first is the idea that you are changing things temporarily. To serve large sets of results fast (or, in the case of google, to be able to function at all), you need to pre-calculate a huge amount of data. You're doing a complex piece of work that takes some time, you don't want to have to redo it for every request. However, you don't want to force yourself to store all the work for all possible scenarios up-front. But, diddling in my mind presents a second element: it is only applied to the set of data within a limited scope. You don't diddle across the entire search index. You diddle the first few results, the ones that matter to the user in question.

There can be a ton of technical complexity to implement such a concept in practice. One immediate challenge is that of "diddling" in such a way as to drop results from the top set, thus requiring a re-querying for additional responses to get enough data to satisfy the user. The purpose of this post is not to go into the technical details of how you might implement such a thing, but to show you that you can reframe your thinking on a problem like this through its phases. Just because you have a list of thousands as your first pass of results doesn't mean you need to personalize across that whole data set to get the best results for the end user. If your goal is to get the most personally relevant of the most generally relevant, you probably want to operate on the top of the generally-ordered list, not necessarily the whole list itself.

There's many ways to attack such problems, and I have no idea how companies like Google solve the challenge of personalizing results under the hood. I do know that, to me, the idea of mixing indexed and computed results in personalized querying is a sticky one, and it's an analogy I use frequently. It helps you remember that there is value in the underlying order of the results as provided by the source of truth, and that personalization is often an enhancement on an underlying set of computed data, not the fundamental computation itself. Pulling out to a larger picture, remember that when your data gets in front of a human end-user, they are going to operate on only the tiny surface area they, as a human, can process at any one time. So in cases where you're serving humans, you can apply different patterns on the fly to the human-visible surface area that would be too expensive to apply to the entire data set at large.