Stop Working So Hard: Scaling Open Source Community Practices

Lately, I have been revising some of the OpenStack community’s processes to make them more sustainable. As we grew over the last 7 years to have more than 2,000 individual contributors to the current release, some practices that worked when they were implemented have begun causing trouble for us now that our community is changing in different ways. My goal in reviewing those practices is to find ways to eliminate the challenges.

OpenStack is developed by a collection of project teams, most of which focus on a feature-related area, such as block storage or networking. The areas where we have most needed to change intersect with all of those teams, such as release management and documentation. Although the teams responsible for those tasks have tended to be small, their members have been active and dedicated. At times that dedication has masked the near-heroic level of effort they were making to keep up with the work load.

When someone is overloaded in a corporate environment, where tasks are assigned and the performance and workload of team members are reviewed regularly, the employee can appeal to management for help. The solution may be to hire or assign new contributors, change the project schedule, or to make a short term trade-off that incurs technical debt. However, open source projects are largely driven by volunteers, so assigning people to work on a task isn’t an option. Even in a sponsor-driven community such as OpenStack, where many contributors are being paid to work on the project overall, sponsors typically give a relatively narrow mandate for the way their contributors can spend their time. Changing the project schedule is always an option, but if there are no volunteers for a task today, there is no guarantee volunteers will appear tomorrow, so it may not help.

We must use a different approach to eliminate the need for heroic effort.

Continue reading “Stop Working So Hard: Scaling Open Source Community Practices”

OpenStack contributions to other open source projects

As part of preparing for the talk I will be giving with Thierry Carrez at EuroPython 2016 next month, I wanted to put together a list of some of the projects members of the OpenStack community contribute to outside of things we think of as being part of OpenStack itself. I started by brainstorming myself, but I also asked the community to help me out. I limited my query to projects that somehow touched OpenStack, since what I am trying to establish is that OpenStack contributors identify needs we have, and do the work “upstream” in other projects where appropriate.

OpenStack has many facets, and as a result has pulled in contributors from many parts of the industry. A large number of them are also members of other open source communities, so it’s no surprise that even with only a few respondents to my question (most of them privately, off-list) we came up with a reasonably long list of other projects where we’ve made contributions. I did not make a distinction between the types of contributions, so this list includes everything from bug reports and triage to documentation to code patches for bug fixes or new features. In several cases, the projects came into existence entirely driven by OpenStack’s needs but have found wide adoption outside of our immediate community.

Python Packaging

  • packaging
  • pip
  • setuptools
  • wheel

Python Web Tools

  • Pecan
  • requests
  • WebOb
  • Werkzeug
  • wsgi-intercept
  • WSME

Python Database and Data tools

  • alembic
  • python-memcache
  • Ming
  • Pandas
  • redis-py
  • SQLAlchemy

Python Testing

  • fixtures
  • testtools
  • testrepository
  • tox

Other Python libs and tools

  • APScheduler
  • dogpile
  • eventlet
  • iso8601
  • jaraco.itertools
  • ldappool
  • Mako
  • pykerberos
  • pysaml2
  • retrying
  • sphinxcontrib-datatemplates
  • six

Python Interpreters

  • CPython
  • PyPy (in the past)

Messaging

  • kazoo
  • kombu
  • pyngus
  • qpid
  • RabbitMQ

JavaScript

  • AngularJS
  • Registry-static
  • “other JS libraries”

Deployment, Automation, and Orchestration Tools

  • Ansible
  • Ansible modules for OpenStack
  • Puppet & Puppet Modules
  • Chef modules for OpenStack
  • saltstack

Linux

  • cloud-init
  • dpkg
  • libosinfo
  • Linux kernel
  • LUKS disk encryption
  • systemd

Virtualization

  • kvm
  • libguestfs
  • libvirt
  • qemu

Networking

  • Dibbler (DHCP)
  • OVS
  • OpenDaylight

Containers

  • Docker
  • Kubernetes
  • openvz

Testing and Developer Tools

  • gabbi
  • gerrit
  • Zuul
  • Jenkins Job Builder

Cloud Tools

  • fog
  • libcloud
  • nodepool
  • owncloud
  • phpopencloud
  • pkgcloud

Linux Distributions

  • Ubuntu
  • Red Hat
  • Debian
  • Fedora
  • Gentoo
  • SuSE

Other Tools

  • caimito (WebDAV front-end for object storage)
  • Corosync (cluster & HA synchronization)
  • Etherpad-lite
  • greenlet
  • jarco-tools
  • MySQL
  • Zanata (translation tools)

Updated 23 June to add Kubernetes to the list of container projects.

Updated 24 June to add pysaml2 to the list of Python libraries.

Book Review: Citizen Engineer

Disclosure: I received a copy of this book for free from the
publisher as part of the PyATL Book Club.

The goal of Citizen Engineer, from Prentice Hall/Pearson Education,
is to awaken the socially responsible engineer in each of us. The
topics covered range from the environmental impact of product design
to the sociopolitical ramifications of intellectual property
law. Authors David Douglas (Senior VP of Cloud Computing, Sun), Greg
Papadopoulos (CTO and Executive VP of R&D, Sun), and John Boutelle
(freelance writer) share their experience in all of these areas to
create a thought-provoking introductory guide to the issues of modern
engineering practice.

Citizen Engineers are techno-responsible, environmentally
responsible, economically responsible, socially responsible
participants in the engineering community.

The authors begin by covering the background of what they call the
“Citizen Engineer” and why the issues are important. The premise of
the book is that it is no longer sufficient for engineers to work in
isolation in their labs. We must engage with practitioners of other
disciplines to bridge the gap between science and society. This is not
a new responsibility, and the role of “citizen engineer” is not new.
However, it is expanding as engineers have an ever greater need to
understand a broader range of fields to do their job well. Even if the
engineer doesn’t practice intellectual property law or environmental
science, they need to be familiar with the issues involved in order to
collaborate effectively.

They start the discussion by listing several external driving forces
that are changing the economics of the way good engineering is being
done. These include the environment, corporate social responsibility,
fraud and security concerns, privacy, digital goods and intellectual
property issues, and government regulation in all of those areas. The
book touches on each of these areas in turn.

Engineers who were once preoccupied with Moore’s Law are now dealing
with more laws…

Part 2 of the book is devoted to environmental issues and making the
case that the environment is something engineers can, should, and are
thinking about. Environmental impact analysis is complicated by many
variables and the fact that reducing impact in one area can increase it
in others. The authors approach the problem pragmatically. They start by
prioritizing changes based on the biggest impact, and ensuring that
impact is studied over the full life-cycle of the product. They also
point out that sustainability issues need to be considered “at scale” to
expose the true impact of small changes. For example, billions of people
use consumer products such as light bulbs, meaning even seemingly tiny
incremental improvements in efficiency or sustainability can have
sweeping global impact in energy consumption, materials used, and
natural resources consumed. Even a change to reduce the packaging weight
of a product will reduce the amount of fuel needed for shipping and
distribution, and depending on the scale that impact can be as big or
bigger than a change directly to the product itself.

… IT equipment often consumes less than half the power used in a
typical data center.

A little closer to home, the authors refer to research that says less
than half of the power used by a typical data center goes into
computing. The rest is lost as heat waste in the conversion to different
power levels, or is used to keep the computers cool. There is research
being done into more efficient cooling techniques, consolidation of
systems through virtualization, and alternative power setups with higher
voltage. Changing the hardware is not the only path to reducing power
consumption, though. More efficient code, and more efficient execution
of that code, provides opportunities for using less hardware in the
first place. That’s food for thought the next time you put off
optimization by saying, “cycles are cheap”.

Intellectual property laws are crafted to protect inventors and
creators and the companies that market their works.

Part 3 is titled “Intellectual Responsibility” and covers the
fundamentals of intellectual property law, including copyright,
trademark, and patents. I found the definitions of the IP terms such as
patent, copyright, and trademark in this section particularly helpful
for clarifying the roles each of those tools has in IP generally and
software specifically. I also like the way the authors separated the
technical background material from their own opinions on the subject.
Not everyone will agree with all of their conclusions, but the
delineation between fact and opinion avoids clouding the issue.

Chapter 13 digs into the basic types of open source licenses and the
decisions a software developer needs to make when deciding how to
license their creations. They also talk about forms of open licenses
for non-software products, such as Creative Commons. The authors have
copyrighted their book under a Creative Commons license (BY-NC-SA
3.0), and have downloadable copies of chapters available online at
http://citizenengineer.org/.

The final section of the book, “Bringing it to Life”, talks about how
to apply some of the ideas from earlier chapters immediately, as well
as changes that are being made in engineering training so responsible
engineering practices are more central in future work. They talk about
cross-discipline training in law and environmental science and the
curriculum changes being tested in major universities. The book then
wraps up with examples from a selection of success stories from around
the world.

I recommend Citizen Engineer for engineers from all fields who
want a better understanding of some of the issues that are driving
changes in engineering practices. The clear and concise writing style
makes the book an easy read, without glossing over any difficult
topics. It does not attempt to provide an exhaustive reference
manual, but does give plenty of other resources for future research
and reading.

Book Review: The Success of Open Source

For the past few weeks I’ve been wrapped up reading Steven Weber’s
The Success of Open Source. Published in 2004, it is a look at what
the open source movement is and how it works, from the perspective of a
political scientist. This is no trite look at why people would choose to
give away the fruits of their labor. His analysis is serious and well
considered. He stresses several times that his goal is to ask questions
rather than answer them, but he does offer some observations about the
open source movement as a larger social movement and how it might spread
to other parts of the culture.

Weber starts out by explaining his goal for the book, to study the
political and economic foundations of open source communities and
processes. He makes two assertions, around which the rest of the book is
framed:

1. The open source phenomenon is an important “puzzle” for social
scientists who study cooperation.
2. OSS communities have been fundamentally impacted by the internet.

Early History

The second chapter covers the basic facts of the early history of open
source, well before it was called that. From the PACT compiler project
for IBM mainframes, through the failure of Multics, and the unintended
consequence of the AT&T consent decree that lead to the original
licensing terms for Unix, he covers some details that aren’t a part of
the usual story that includes DARPA, BSD, the fragmentation of the Unix
market, and FSF and the GNU project. The writing is engaging, and I
could recommend the book on this history section alone.

How Does OSS Work?

Chapter 3 tries to answer the question, “What is Open Source and How
Does It Work?”. It covers some essential software project
characteristics such as the division of labor between “analyst” and
“programmer” and how that historically lead to problems because the
designer of software was too far removed from the end-user.

The essence of software design, like the writing of poetry, is a
creative process. The role of technology and organization is to
liberate that creativity to the greatest extent possible and to
facilitate its translation into working code. Neither new technology
nor a “better” division of labor can replace the creative essence
that drives the project.

Weber builds on Brooke’s Law to say that success of a project isn’t
just about getting more people involved, but also about how they are
organized. He points out that open source is much more about the
process than the resulting product, which is an artifact of the
organization and creative energies of the participants. He identifies
four fundamental organization schemes that repeat in open source
projects:

1. A hierarchy, where patches flow up to a more or less central
maintainer, as with Linux.
2. The concentric circles used by the BSD project, in which
maintainers closer to the center have more rights and privileges, but
within a circle they are essentially equal.
3. The pumpkin holder or token-based system used by the developers
of Perl.
4. A democratic voting system, such as used to approve changes in
Apache.

One assertion Weber makes relates to the different cultures that
evolve around BSD vs. GPL-licensed projects. His claim is that core
developers in BSD-licensed projects do not depend as much on submissions
from the end user as GPL projects do. His evidence for this is the
various BSD operating systems and Linux. I think his sample size is too
small, though. I’m not convinced that the license has much to do with
“dependence” on contributions. I think the attitude of the core
developers, and their willingness to accept patches, is more important.

Evolution of Open Source

Chapter four talks about the “maturation” of three major projects
(Linux, BSD, and Apache) as they evolved in the 1990’s, the “golden age”
of open source. He covers several pivotal events during that period and
how the community identities gelled as a result of passing through
critical times like the fracturing of BSD and other Unixes, flame wars
and other crises among the Linux maintainers, and the conflict caused by
the “ideological passion” of Richard Stallman and the FSF. This chapter
was an interesting retrospective and it really pulled together a
cohesive picture of what happened that brought us to where we are today.

Motivation and Organization

Chapter five examines the microfoundations of open source made up of
the motivations of individual contributors. For example, he says that
open source developers self-select as a way to boost their egos by using
acceptance of their code as a “signal” of its quality to developers who
are not necessarily skilled enough to recognize quality on their own.

It is clearly the best programmers who have the strongest incentive
to show others just how good they are. If you are mediocre, the last
thing you want is for people to see your source code.

Ego boosting is one of 6 motivating factors he discusses, and is not
necessarily the most important for most developers.

Chapter six looks at how individual developers come together to form
groups and focus their creative energies with constructive
contributions. He studies the social and economic pressures for and
against forking a project, and comes to an interesting conclusion: The
leader of a project needs the fellow contributors more than they need
him. When a fork is created, the new leader has to convince potential
followers that the new project will be better or more popular than the
old one. So while forking may give the leader more visibility, that
only works if he is successful at attracting followers, in which case he
is just as likely to be a successful contributor to the original
project.

The Code That Changed The World

Weber begins his final chapter by comparing the impact of OSS to the
Japanese manufacturing innovations described in The Machine That
Changed the World : The Story of Lean Production
and re-emphasizing
the importance of process over product.

The Toyota “system” was not a car, and it was not uniquely Japanese.
… Open source is not a piece of software, and it is not unique to
a group of hackers.

This leads in to the rest of the conclusion, where he brings together
observations on intellectual property rights law, the limiting factors
for specialization and division of labor and how they impact
organizational structures, and the challenges of relating hierarchical
versus network organizations. He also offers some observations about how
open source techniques and attitudes can be applied directly in other
fields such as family practice medicine and genomics.

Recommendations

Weber covers a lot of material, and his writing is clear, for the most
part (especially for an academic :-). I enjoyed reading the first seven
chapters, but got a little bogged down in the chapter eight. I was
disappointed at his reluctance to draw more definite conclusions in a
few cases, but by remaining neutral he was able to focus on framing
several thought-provoking social and economic questions about the open
source movement.

Site outage, should be OK now

I spent a good portion of yesterday swapping the main drive out of my
home server because it was giving me signs that it was starting to fail
(random system halts, CRC errors, and undecipherable messages about DMA
support). After much wailing and gnashing of teeth, it seems to be
back online.

The serendipity of finding this Coding Horror post in my news
reader this morning was surprising. I’d long ago given up on upgrades
for similar reasons. I don’t like using computers, I like
programming them. Upgrading and reconfiguring mail, apache, etc. is no
fun and usually just serves to make me grumpy.

Unexpectedly popular: svnbackup

codehosting now supports feedburner

I just posted a new version of my codehosting project for django
which supports passing the Atom feeds for release updates through
feedburner.com. There isn’t anything tying the implementation to
FeedBurner, of course, but since that’s why I wanted the feature that’s
how I am describing it.

One tricky bit was I wanted all of the existing subscribers to my
feed(s) to be redirected to the FeedBurner URL. I couldn’t just add a
redirect rule in Apache, since not all of the feeds are set up with
FeedBurner yet. So I opted for letting the django code handle the
redirection. If a project has an external_feed property that is not
null, that value is used as the URL for feeds for the project. So when
someone accesses the old URL for the codehosting release feed
(http://www.doughellmann.com/projects/feed/atom/codehosting/) they are
redirected to http://feeds.feedburner.com/DougHellmann-codehosting
instead. And FeedBurner looks at
http://www.doughellmann.com/projects/local_feed/atom/codehosting/,
which always produces the Atom content locally.

The “local_feed” URL is never included in any templates, so no web
crawlers should ever find it by themselves.

This is one of those cases where I had thought to include this feature
from the beginning, since migrating the existing readers of the feed(s)
required this hackish change. But, it looks like it is working. I would
be interested in any feedback anyone else might have on other ways I
could have handled the redirects.

Coder’s Block

Logan Koester posted some tips for overcoming Coder’s Block.

I get blocked, once in a while, too. In those situations, it almost
always comes with the feeling that the problem I am trying to solve is
too big. That, in turn, usually stems from not having thought about the
problem enough, rather than the other way around.

The development staff at my company is pretty small, so we are all
involved in each new feature from “front to back”, as it were. I like to
start by thinking about the user interaction aspect of the problem. It
doesn’t make sense to start with the back-end design until you know what
the front-end is supposed to do, right? So I think about what operations
the user needs to perform, then what inputs are needed to handle them.
From there I can work out how many of those inputs should be stored for
re-use.

I like to draw diagrams, since I find they are easier to re-assimilate
when I come back to a problem after some time. So I may sketch out a few
UI screens, or draw a few boxes and arrows to understand the
relationships between objects (I use a sort of pidgin UML for that). I
also make lists of attributes I might need for classes, since those map
to the database schema.

There are plenty of good tools for making such sketches on the
computer, but I guess I’m Old School. I find that sitting down with a
pen and paper, away from the computer, helps clarify my thoughts. Since
I don’t have my text editor, the temptation to write code is reduced and
I can concentrate on the big picture. And once I have the big picture
worked out, the way forward is usually clear.

Python Cheese Shop

It has been a while since I released a new open source project. The
last time I dealt with the Python project registry it required a highly
manual through-the-web registration process. The Cheese Shop is so
much nicer, and the integration with distutils makes it so easy to
register a project and release that there is no reason in the world not
to do it. There are just a few basic steps to getting started:

  1. Create a user at http://cheeseshop.python.org/pypi by clicking on the
    “Register” link and following the instructions.
  2. Create a setup.py file for your Python project. You’re doing this
    already, aren’t you, so your users can install your app or library
    with disutils?
  3. Type: python setup.py register

The CheeseShopTutorial has more details, but once you’ve registered
it really is just that simple. It turns out they will even host
downloads of the source releases, if you want. I don’t mind hosting my
own releases, and they will only host Python (so none of my AppleScript
projects could go there). But that’s a nice commitment on their part.