Scaling a Design Review Process
When I took on a staff engineer role on the OpenShift team at Red Hat last year, one of the first changes I wanted to make was to overhaul our design review process. Lately, we’ve been talking a lot about that process and whether we can update it to “go faster.” I’ve been explaining why the process works the way it does to different people, and thought that others outside of Red Hat might be interested in the same answers.
Benefits of Design Review
We usually think of a good design document review process as an important tool for ensuring that members of a large team are thinking about problems and working on solutions in a consistent way. Reviewing designs together is a tool to ensure that APIs behave consistently across areas of a product, for example, and that users can recognize patterns that provide affordances for learning new features quickly. It’s especially important to address consistency directly when a project’s architecture relies on microservices developed by independent teams.
A meta-benefit of reviewed design documents is that they help spread knowledge about how the system works among the team members. Having a cache of designs for existing work can be especially helpful for answering “why does it work like that?” questions when bringing in new members or “leveling up” junior members of the team. I’ve also been known to refer to an old document to refresh my own memory, even for things I designed and implemented myself.
A less obvious, yet important, benefit of a design review process is that it helps you manage the rate of change in a project. Contributors of all sorts (developers, documentation authors, testers, product managers, etc.) have a limited capacity to absorb and internalize change. When a project changes too quickly, it can lose coherence and stability, and the maintainers experience more stress trying to keep up.
So, how can you tune a process to introduce just the right amount of friction to let the team manage the rate of change without slowing themselves down so much that nothing can be accomplished?
Design Review Roles
Based on observing review processes in the Python, OpenStack, and Kubernetes communities, along with the processes used at several companies that I am familiar with, I have come to understand that there are three basic roles involved in a good design or change review process.
- The author of a proposal.
- The process guide.
- The reviewer(s) of proposals.
Authors produce new design documents for ideas and pitches them to the maintainers of the project. Authors may be subject matter experts in their area, but might not be experts about the project as a whole. For many projects, anyone can be an author. There is usually no explicit upper bound on the number of authors or ideas they may propose at any one time, especially for open source projects but even for internal projects.
Process guides assist authors who may not be familiar with the process, the project, or the maintainers. The guide may provide advice about who should review a specific proposal and point out deadlines or other time-based criteria for completing work. In a smoothly running review process, the guide may be given responsibility for recognizing when consensus has been reached so that a proposal is ready to be approved, or formally rejected. In cases where consensus is not emerging on its own, the guide may also step in as a mediator. The process guide may not be a subject-matter expert for the subject of the design, although it can help if they are. For internal projects the guides may be managers or senior engineers. For open source projects they are likely to come from contributors focused on community health.
Reviewers typically come from the pool of maintainers for a project. Reviewers are likely to be subject-matter experts, at least for some aspects of the proposals that they read. Most projects want several reviewers to look at each proposal, especially if it is a large or controversial change, to ensure that it is considered from multiple perspectives. The reviewer pool size varies between projects, but usually to become a maintainer you must have some broad amount of knowledge about the project and have achieved a level of trust among the other maintainers. Even for internal projects, there are likely to be fewer reviewers than authors.
What doesn’t work?
It is tempting to put a time limit on reviews, to prevent them from “dragging on”. For example, a policy that says all design reviews must be completed in 2 weeks or they will be automatically approved feels like it would naturally lead to quick progress. That progress is weighted towards the authors of proposals, though. It assumes that most proposals are a good idea that should be accepted quickly, and that there are enough maintainers actively reviewing proposals to catch and reject any that are not. It fails to ensure that any code associated with a design is prioritized for review and approval in a timely way, which means the content of the approved design documents does not match the project priorities or implementation. Authors are still likely to be frustrated by delays, just later in the process of implementing their change.
Lazy consensus is another common approach to keep projects moving. Invoking lazy consensus can be a useful tool to prevent open source contributors with different agendas from blocking each other, intentionally or otherwise. But it works best with an unbounded review period that only switches to lazy consensus when there has been some review feedback and it appears that agreement has been reached. In that model, lazy consensus is a final, but not the only, opportunity for contributors to raise objections and takes the place of a formal vote of approval – silence is consent. Lazy consensus is not appropriate at all for internal projects, where the participants should all have their priorities aligned. If there are surprises in prioritization or proposals for internal projects, there has been a failure in planning and communication between the contributors.
The final, and sometimes effective, approach is to appoint one, or a small number, of reviewers (often called “architects”) to approve all designs. While limiting design approval may be necessary to keep everyone aligned in the early phases of a project, it can quickly lead to burn out of the reviewer who has to keep up with everything and discontent among contributors who feel blocked behind the bottleneck of reviews. Projects that start that way should have a plan in place to change their process quickly.
What does work?
The most effective approach I have observed balances the perceived speed of change from design authors with the churn experienced by maintainers by using consensus-based reviews by representatives of all of the affected sub-projects. Which is a long way of saying “you need to get everyone who cares to agree to the plan.”
The maintainers of the project are the group that needs to absorb changes as they happen. They can manage the rate of change based on their ability to incorporate design reviews along with the other work that they do. If it isn’t possible to attract the attention of enough of the right maintainers to act as reviewers, that may be a signal that the project’s rate of change is maxed out.
Working based on consensus is not necessarily going to produce the fastest output, but it does produce a rate of progress appropriate for the project and team in its current state, and that is a better goal. Go as fast as possible, but no faster.