The Intersection of the OpenStack and Python CommunitiesK

Last December I spoke at eNovance’s OpenStack in Action conference about the relationship between OpenStack and the broader Python community. This essay is based on that presentation.

I have been an open source developer since the early 1990s and a Python developer since the late 1990s. I became an OpenStack contributor a couple of years ago, right before the Folsom summit. One of the first areas where I started contributing was the Oslo team, in part because of my interest in how OpenStack contributors and other Python community members cross-over and interact. I’m interested in how OpenStack fits into the rest of the Python ecosystem.

Last autumn, some of the statistics associated with the Havana release started me thinking even more deeply about this topic. Three of those numbers, in particular, caught my eye.

First, our line count: 1.3+ Million Lines

We all know that “lines” is not a good measure of quality or feature completeness, but as a sheer measure of volume that number is impressive. When I saw 1.3 million, my first thought was, “That is a lot of code.”

My second thought was, “That is a lot of code. Why do we have so much code?”

That question led me to start looking at our project from another perspective. I wondered about how healthy our relationship is with the rest of the Python community. How much do we rely on them, and how much are we giving back? To answer that, I started by looking at what we use from other developers.

At the time, I found around 120 dependencies listed in the global requirements list, though that number changes frequently. We rely on the Python community quite a bit to provide useful libraries, and this second number led me to wonder how far out of balance we are.

How many projects depend on code created as output of OpenStack, aside from our own client libraries for talking to OpenStack services? How many people can use our code without using all of OpenStack? I was able to identify around 5 libraries, depending on how you count them. This third number is not as high as I would expect it to be, given the amount of code we have.

Now there are a lot of ways to reuse code, and obviously we are releasing all of OpenStack’s source in a way that it can be used directly under the Apache license. But not everyone wants to run their own cloud. We also release stand-alone tools, especially from the infrastructure team.

But 1.3MM lines in the core of OpenStack is a lot of code. I have to wonder how different our designs would be if we looked beyond our immediate requirements as application developers, and thought about solutions more generally? How much more code could be released to stand on its own that way?

So how do we do that?

Releasing CodeK

A lot of the work is happening on the Oslo team which was created specifically to address code reuse in OpenStack. Every program within OpenStack has a mission statement, and ours is:

To produce a set of python libraries containing code shared by OpenStack projects. The APIs provided by these libraries should be high quality, stable, consistent, documented and generally applicable.

Notice, that it does not say the libraries are to be used only by OpenStack. So, while we focus on code meant to be reused within OpenStack, we are not limited to that.

When I review code going into Oslo, besides reviewing the implementation I ask myself a few meta questions about the code, starting with looking at how general it is. As with any application, we will have a lot of code that is only useful to OpenStack projects. That’s natural. However, where there is a chance of making something more widely reusable, we should be taking that approach. Designing for reuse improves our code and our relationships with other developers. So, I always try to determine whether the new module is tied to OpenStack, or if it can be reused outside of our project. Being tied to OpenStack is not necessarily a bad thing, but it is a factor for deciding how to handle the code as it evolves.

One of the decisions we have to make for Oslo code as it matures is how to brand the release. Naming things well is difficult, and it’s important to do it properly. A poorly chosen name can have unintended consequences. In the past, I have seen other large projects decide to brand every library they release with the project name, and use that name as a prefix for the packages when installing them. There are some technical benefits to working this way, but unfortunately it leaves the impression that all of those libraries are co-dependent, and must be used together in order to be useful at all. I want us to avoid this problem with OpenStack, and especially Oslo.

I had conversations with several groups of developers at the Icehouse summit in the autumn of 2013 about bringing new libraries into Oslo. In a few cases I encouraged them to think about managing the code as a stand-alone package, without dependencies on OpenStack code at all. And that leads me to the next point to consider: whether the module requires incubation.

The Oslo incubator contains mostly modules copied out of another OpenStack project and then modified to make them more general. We take this approach to make adopting the changes easier, since any API changes that break the old version can be merged when the project is ready, rather than when the code is released. New code can address reusability concerns from the beginning, though, so for work on a new library we have also been taking an approach of creating a new git repository and working on the library there, without going through the incubator.

This is no different, conceptually, than if a developer went off and created a library on their own, and then used it in OpenStack. Practically, it does usually mean the library is created using the same development tools, and we have to pay some more attention to API design up front.

Upstream, DownstreamK

Finally, I consider whether the code should exist at all. That may be surprising, since I have already mentioned pushing to release more of our code without depending on OpenStack, and I also want us to be thinking about whether we should be writing some of the code we create at all.

The long list of dependencies we already have is a small fraction of the libraries available for Python developers. By its nature, much of the code that makes it into Oslo is general purpose, and as a result some of it may replicate libraries that already exist.

The general rule for Oslo is to only incorporate code if it is used in two or more openstack projects. The reason for that guideline is to increase the likelihood that the code is more generally useful than something that is project-specific.

Sharing is how open source works.

It is even better to consider whether we can use or adapt an existing library or tool, instead of creating a new one ourselves. The more existing code we can use, the more interesting new problems we can work on.

So when I say I think about whether the code should exist, what I mean is that I check if there is some other group of developers upstream we can work with. Sharing code is a basic tenet of Open Source. Sharing is how open source works, and it works better if we collaborate and share effort as well as code. Every bug fix, feature, and API improvement we can push upstream benefits the wider Python community, from whom we are borrowing code. Even something as simple as adding test coverage to a project helps, since that tool then becomes more reliable.

Contributing to projects other than OpenStack also spreads the idea that the OpenStack community is ready to collaborate with developers on other projects. And using those more general components encourages us to create designs that are less tightly coupled, because we end up working on and with components that are not directly tied to OpenStack.

One important way to contribute is to work on porting our dependencies to Python 3. There is already work under way to port OpenStack to Python 3, starting with the client and Oslo libraries, and moving to the services as our dependencies catch up. We are currently enforcing a rule that we cannot accept any new dependencies in the requirements project unless they run under at least Python 3.3. But we do have a few existing dependencies that are not compatible with Python 3, yet.

Another way to help upstream projects is to talk to them about moving their code onto our code management and testing infrastructure. Stackforge is not only a place for pre-incubated OpenStack projects. We also host a few other projects that we rely on. In extreme cases, we have even adopted abandoned projects and taken over development.

For example, we use sqlalchemy-migrate, but the author has stopped supporting it in favor of alembic. Until we can move over to the new tool, we needed to fix some bugs, so sqlalchemy-migrate was adopted as a stackforge project.

A less extreme but equally important story is WSME, one of the tools we are using to slowly replace our home-grown REST API validation framework. Moving WSME from bitbucket to stackforge increased the contributor pool from a handful of developers to around 13. We have fixed bugs and added features, using the tools we are familiar with for working on OpenStack.

We also moved the Pecan web framework from github to stackforge, and are now gating changes to Pecan on OpenStack, so Pecan releases cannot break the OpenStack services that use it. That means we have an upstream dependency who has agreed to test their code changes against our tree.

Imagine if all of Open Source collaborated that closely.

Beyond CodeK

Code is easy to measure, but it isn’t everything. What sorts of things would we be doing if we were not so focused on building OpenStack? Attending events in person? Talking about our work with other developers? Writing about it online?

These are the usual sorts of open source community interactions, and we can use them as a way to ensure we are not just talking in an echo chamber, to each other, and missing things going on in the community around us.

Take “writing online” as an example. I’m sure everyone is aware of the Planet server for aggregating blogs related to OpenStack. There is another similar server set up for more general Python blogs. If you have a blog where you talk about working with Python, consider adding it to the list of feeds there (the instructions are in the left sidebar of the site, and I can help if you need it).

OpenStack Meetups World-wide

Meetups are an easy way to engage with the community in person. The map above shows some of the many OpenStack meetups around the world. There is almost certainly a Python meetup in each of these towns, and a similar map of Python meetups would show many many more dots.

Presumably OpenStack developers are also interested in Python, since it is our primary language. Attending meetups is an excellent way to learn about new tools and techniques, and generally improve your skills. Meetups are also an excellent way to share your knowledge with other developers, and talk about work that you have done on OpenStack.

It’s difficult to track cross-over participation between OpenStack and Python meetups, although anecdotally I know that of several cities where the two groups have common members, including Atlanta and Los Angeles. I had better luck finding references to OpenStack at conferences, since the schedules are published.

Our summits are exciting and an important aspect of our community. Conferences are a good way to meet mid-cycle. But how do we connect with people other than our existing contributors? I found quite a few OpenStack-themed talks from recent conferences (2011-2013) that are not organized around OpenStack or cloud development. Some of the conferences are quite large – PyCon US 2013 had around 3000 participants.

All of these conference talks are good for our exposure. As hard as it is to believe, especially given the number of contributors we have, most of the Python community has no idea at all what OpenStack is. I talked with dozens of the PyCon US attendees last year who did not know that it existed at all. That is a large untapped pool of talent for all of us who are looking for developers. But it also represents the authors of many of those dependencies we are already using. How can we connect with them?

As with meetups, it is harder to track how many speakers at these conferences are OpenStack contributors but did not talk about OpenStack directly, and that is the best way for us to interact with the community. Talking about what OpenStack is engages them less than talking about problems we have solved while building OpenStack. So, what sorts of talks should we propose?

OpenStack is a heavily distributed application and relies on synchronizing and scaling services. That’s not easy, and I’m sure other developers would be interested to learn about some of the problems we have encountered distributing parts of a task across multiple services. The scheduler or the interaction between nova and neutron, for example, would make good case studies.

We could also talk about working with a data model that is eventually consistent, especially in a problem domain where strict consistency is the normal approach. Caching, distributing data around the system, and managing multi-part transactions are all techniques that can be used in any large application. We are working on those problems, but so are many other Python community members.

A lot of people would be interested in the issues we have encountered with concurrency. We could talk about the way async I/O libraries perform compared to threads or multiple processes in extremely large scale deployments of applications, especially the practical aspects like how those libraries change interactions with the database.

Then there are some of the social issues we have faced on such a large project. For example, how we scaled up our release management processes and code reviews – the infra and qa teams do a good job of preparing talks on some of those topics.

These are just a few ideas. Based on my past experience on the PyCon US program committee, I think there are a lot of areas of OpenStack we could mine for topics of interest to developers building systems other than OpenStack. Sharing our experiences and knowledge would be a valuable contribution and a valuable recruiting tool.

Python Software FoundationK

Another way we can interact with and give back to other Python developers is through the Python Software Foundation (PSF).

Just as the OpenStack Foundation is responsible for supporting OpenStack development, the Python Software Foundation is responsible for supporting Python development and adoption. The PSF manages the PyCon event in North America, and drives funding for some of other community events and projects, as well as protecting the intellectual property of Python.

Before this year, new members were invited to join the PSF by other existing members. However, the bylaws are being changed right now, based in part on the open structure of the OpenStack Foundation, to allow anyone to join. That’s an important demonstration of the sort of impact we can have on the community through the example we set.

What Did I Miss?K

I would love to hear your feedback and ideas about other ways we can be increasing our interaction with developers outside of OpenStack, to learn from them, support them, or recruit them.

I would also love to hear about efforts you might already be making to do this, or other groups you are involved with where you are sharing the knowledge you picked up working on OpenStack.

UpdatesK

2014-02-09 - Nick Coghlan pointed out that PEP 462 “Core development workflow automation for CPython” was proposed based on a discussion he had with some members of the OpenStack infrastructure team at LCA2014. If adopted, the PEP would mean result in the CPython development team adopting some of the automation tools currently used by OpenStack developers, especially Zuul for managing automated continuous integration testing and “gating” commits to the master branch of each project.