Last December I spoke at eNovance’s OpenStack in Action
conference about the relationship between OpenStack and the broader
Python community. This essay is based on that presentation.
Last autumn, some of the statistics associated with the Havana release
started me thinking even more deeply about this topic. Three of those
numbers, in particular, caught my eye.
First, our line count: 1.3+ Million Lines
We all know that “lines” is not a good measure of quality or feature
completeness, but as a sheer measure of volume that number is
impressive. When I saw 1.3 million, my first thought was, “That is a
lot of code.”
My second thought was, “That is a lot of code. Why do we have so
much code?”
That question led me to start looking at our project from another
perspective. I wondered about how healthy our relationship is with the
rest of the Python community. How much do we rely on them, and how
much are we giving back? To answer that, I started by looking at what
we use from other developers.
At the time, I found around 120 dependencies listed in the global
requirements list, though that number changes frequently. We rely
on the Python community quite a bit to provide useful libraries, and
this second number led me to wonder how far out of balance we are.
How many projects depend on code created as output of OpenStack, aside
from our own client libraries for talking to OpenStack services? How
many people can use our code without using all of OpenStack? I was
able to identify around 5 libraries, depending on how you count them.
This third number is not as high as I would expect it to be, given the
amount of code we have.
Now there are a lot of ways to reuse code, and obviously we are
releasing all of OpenStack’s source in a way that it can be used
directly under the Apache license. But not everyone wants to run their
own cloud. We also release stand-alone tools, especially from the
infrastructure team.
But 1.3MM lines in the core of OpenStack is a lot of code. I have to
wonder how different our designs would be if we looked beyond our
immediate requirements as application developers, and thought about
solutions more generally? How much more code could be released to
stand on its own that way?
So how do we do that?
Releasing Code
A lot of the work is happening on the Oslo team which was created
specifically to address code reuse in OpenStack. Every program within
OpenStack has a mission statement, and ours is:
To produce a set of python libraries containing code shared by
OpenStack projects. The APIs provided by these libraries should be
high quality, stable, consistent, documented and generally
applicable.
Notice, that it does not say the libraries are to be used only by
OpenStack. So, while we focus on code meant to be reused within
OpenStack, we are not limited to that.
When I review code going into Oslo, besides reviewing the
implementation I ask myself a few meta questions about the code,
starting with looking at how general it is. As with any application,
we will have a lot of code that is only useful to OpenStack
projects. That’s natural. However, where there is a chance of making
something more widely reusable, we should be taking that
approach. Designing for reuse improves our code and our relationships
with other developers. So, I always try to determine whether the new
module is tied to OpenStack, or if it can be reused outside of our
project. Being tied to OpenStack is not necessarily a bad thing, but
it is a factor for deciding how to handle the code as it evolves.
One of the decisions we have to make for Oslo code as it matures is
how to brand the release. Naming things well is difficult, and it’s
important to do it properly. A poorly chosen name can have unintended
consequences. In the past, I have seen other large projects decide to
brand every library they release with the project name, and use that
name as a prefix for the packages when installing them. There are some
technical benefits to working this way, but unfortunately it leaves
the impression that all of those libraries are co-dependent, and must
be used together in order to be useful at all. I want us to avoid
this problem with OpenStack, and especially Oslo.
I had conversations with several groups of developers at the Icehouse
summit in the autumn of 2013 about bringing new libraries into Oslo.
In a few cases I encouraged them to think about managing the code as a
stand-alone package, without dependencies on OpenStack code at all.
And that leads me to the next point to consider: whether the module
requires incubation.
The Oslo incubator contains mostly modules copied out of another
OpenStack project and then modified to make them more general. We
take this approach to make adopting the changes easier, since any API
changes that break the old version can be merged when the project is
ready, rather than when the code is released. New code can address
reusability concerns from the beginning, though, so for work on a new
library we have also been taking an approach of creating a new git
repository and working on the library there, without going through the
incubator.
This is no different, conceptually, than if a developer went off and
created a library on their own, and then used it in OpenStack.
Practically, it does usually mean the library is created using the
same development tools, and we have to pay some more attention to API
design up front.
Upstream, Downstream
Finally, I consider whether the code should exist at all. That may be
surprising, since I have already mentioned pushing to release more of
our code without depending on OpenStack, and I also want us to be
thinking about whether we should be writing some of the code we create
at all.
The long list of dependencies we already have is a small fraction of
the libraries available for Python developers. By its nature, much of
the code that makes it into Oslo is general purpose, and as a result
some of it may replicate libraries that already exist.
The general rule for Oslo is to only incorporate code if it is used in
two or more openstack projects. The reason for that guideline is to
increase the likelihood that the code is more generally useful than
something that is project-specific.
Sharing is how open source works.
It is even better to consider whether we can use or adapt an existing
library or tool, instead of creating a new one ourselves. The more
existing code we can use, the more interesting new problems we can
work on.
So when I say I think about whether the code should exist, what I mean
is that I check if there is some other group of developers upstream we
can work with. Sharing code is a basic tenet of Open Source. Sharing
is how open source works, and it works better if we collaborate
and share effort as well as code. Every bug fix, feature, and API
improvement we can push upstream benefits the wider Python community,
from whom we are borrowing code. Even something as simple as adding
test coverage to a project helps, since that tool then becomes more
reliable.
Contributing to projects other than OpenStack also spreads the idea
that the OpenStack community is ready to collaborate with developers
on other projects. And using those more general components encourages
us to create designs that are less tightly coupled, because we end up
working on and with components that are not directly tied to
OpenStack.
One important way to contribute is to work on porting our dependencies
to Python 3. There is already work under way to port OpenStack to
Python 3, starting with the client and Oslo libraries, and moving to
the services as our dependencies catch up. We are currently enforcing
a rule that we cannot accept any new dependencies in the requirements
project unless they run under at least Python 3.3. But we do have a
few existing dependencies that are not compatible with Python 3, yet.
Another way to help upstream projects is to talk to them about moving
their code onto our code management and testing infrastructure.
Stackforge is not only a place for pre-incubated OpenStack
projects. We also host a few other projects that we rely on. In
extreme cases, we have even adopted abandoned projects and taken over
development.
For example, we use sqlalchemy-migrate, but the author has stopped
supporting it in favor of alembic. Until we can move over to the new
tool, we needed to fix some bugs, so sqlalchemy-migrate was adopted as
a stackforge project.
A less extreme but equally important story is WSME, one of the tools
we are using to slowly replace our home-grown REST API validation
framework. Moving WSME from bitbucket to stackforge increased the
contributor pool from a handful of developers to around 13. We have
fixed bugs and added features, using the tools we are familiar with
for working on OpenStack.
We also moved the Pecan web framework from github to stackforge, and
are now gating changes to Pecan on OpenStack, so Pecan releases cannot
break the OpenStack services that use it. That means we have an
upstream dependency who has agreed to test their code changes against
our tree.
Imagine if all of Open Source collaborated that closely.
Beyond Code
Code is easy to measure, but it isn’t everything. What sorts of
things would we be doing if we were not so focused on building
OpenStack? Attending events in person? Talking about our work with
other developers? Writing about it online?
These are the usual sorts of open source community interactions, and
we can use them as a way to ensure we are not just talking in an echo
chamber, to each other, and missing things going on in the community
around us.
Take “writing online” as an example. I’m sure everyone is aware of
the Planet server for aggregating blogs related to OpenStack.
There is another similar server set up for more general Python
blogs. If you have a blog where you talk about working with Python,
consider adding it to the list of feeds there (the instructions are in
the left sidebar of the site, and I can help if you need it).
Meetups are an easy way to engage with the community in person. The
map above shows some of the many OpenStack meetups around the
world. There is almost certainly a Python meetup in each of these
towns, and a similar map of Python meetups would show many many more
dots.
Presumably OpenStack developers are also interested in Python, since
it is our primary language. Attending meetups is an excellent way to
learn about new tools and techniques, and generally improve your
skills. Meetups are also an excellent way to share your knowledge
with other developers, and talk about work that you have done on
OpenStack.
It’s difficult to track cross-over participation between OpenStack and
Python meetups, although anecdotally I know that of several cities
where the two groups have common members, including Atlanta and Los
Angeles. I had better luck finding references to OpenStack at
conferences, since the schedules are published.
Our summits are exciting and an important aspect of our
community. Conferences are a good way to meet mid-cycle. But how do we
connect with people other than our existing contributors? I found
quite a few OpenStack-themed talks from recent conferences (2011-2013)
that are not organized around OpenStack or cloud development. Some of
the conferences are quite large – PyCon US 2013 had around 3000
participants.
All of these conference talks are good for our exposure. As hard as it
is to believe, especially given the number of contributors we have,
most of the Python community has no idea at all what OpenStack is. I
talked with dozens of the PyCon US attendees last year who did not
know that it existed at all. That is a large untapped pool of talent
for all of us who are looking for developers. But it also represents
the authors of many of those dependencies we are already using. How
can we connect with them?
As with meetups, it is harder to track how many speakers at these
conferences are OpenStack contributors but did not talk about
OpenStack directly, and that is the best way for us to interact with
the community. Talking about what OpenStack is engages them less than
talking about problems we have solved while building OpenStack. So,
what sorts of talks should we propose?
OpenStack is a heavily distributed application and relies on
synchronizing and scaling services. That’s not easy, and I’m sure
other developers would be interested to learn about some of the
problems we have encountered distributing parts of a task across
multiple services. The scheduler or the interaction between nova and
neutron, for example, would make good case studies.
We could also talk about working with a data model that is eventually
consistent, especially in a problem domain where strict consistency is
the normal approach. Caching, distributing data around the system,
and managing multi-part transactions are all techniques that can be
used in any large application. We are working on those problems, but
so are many other Python community members.
A lot of people would be interested in the issues we have encountered
with concurrency. We could talk about the way async I/O libraries
perform compared to threads or multiple processes in extremely large
scale deployments of applications, especially the practical aspects
like how those libraries change interactions with the database.
Then there are some of the social issues we have faced on such a large
project. For example, how we scaled up our release management
processes and code reviews – the infra and qa teams do a good job of
preparing talks on some of those topics.
These are just a few ideas. Based on my past experience on the PyCon
US program committee, I think there are a lot of areas of OpenStack we
could mine for topics of interest to developers building systems other
than OpenStack. Sharing our experiences and knowledge would be a
valuable contribution and a valuable recruiting tool.
Python Software Foundation
Another way we can interact with and give back to other Python
developers is through the Python Software Foundation (PSF).
Just as the OpenStack Foundation is responsible for supporting
OpenStack development, the Python Software Foundation is responsible
for supporting Python development and adoption. The PSF manages the
PyCon event in North America, and drives funding for some of other
community events and projects, as well as protecting the intellectual
property of Python.
Before this year, new members were invited to join the PSF by other
existing members. However, the bylaws are being changed right now,
based in part on the open structure of the OpenStack Foundation, to
allow anyone to join. That’s an important demonstration of the sort of
impact we can have on the community through the example we set.
What Did I Miss?
I would love to hear your feedback and ideas about other ways we can
be increasing our interaction with developers outside of OpenStack, to
learn from them, support them, or recruit them.
I would also love to hear about efforts you might already be making to
do this, or other groups you are involved with where you are sharing
the knowledge you picked up working on OpenStack.
Updates
2014-02-09 – Nick Coghlan pointed out that PEP 462 “Core
development workflow automation for CPython” was proposed based on a
discussion he had with some members of the OpenStack infrastructure
team at LCA2014. If adopted, the PEP would mean result in the CPython
development team adopting some of the automation tools currently used
by OpenStack developers, especially Zuul for managing automated
continuous integration testing and “gating” commits to the master
branch of each project.