Glyph’s post about a “kernel python” from the 13th based on Amber’s presentation at PyCon made me start thinking about how minimal standard library could really be. Christian had previously started by nibbling around the edges, considering which modules are not frequently used, and could be removed. I started thinking about a more extreme change, of leaving in only enough code to successfully download and install other packages. The ensurepip module seemed like a necessary component for that, so I looked at its dependencies, with an eye to cutting everything else.
I experimented with a few different ways to find the dependencies between modules. First, I looked at the contents of sys.modules after importing something. That showed that there were a lot of dependencies, but not which modules depended directly on which others.
The output of python -v -c 'import ensurepip' was similarly general, showing the imports as they happened but not in a way that made establishing the relationships between modules obvious.
Next, I looked at a few packages that already exist for doing this sort of analysis. modulegraph seems to assume that you are starting with packages released to PyPI, since it looks for setup.py and other setuptools-related packaging data. pydeps looked promising, but I couldn’t make it produce any output for the standard library. snakefood uses the abstract syntax tree, but does not work under python 3.
Eventually I wrote a little code that uses the abstract syntax tree of each module to find import and from import statements. With that information, I was able to build a digraph with the module relationships.
I found one special case, where the heapq module imports doctest as part of its __main__ block, but does not use it at runtime. I decided that it would be safe to remove that use in heapq, or to at least protect it with a useful error message if the import fails, so I filtered out the use doctest, which also dropped a significant number of its dependencies.
I also treated __main__ as a special case, and ignored it completely.
As you might imagine, the standard library modules rely heavily on one another, so a full dependency graph is quite messy.
To make the output more readable, I added a command line switch to produce a “simplified” graph, where a module is only referenced the first time it is imported by any other module. By traversing the graph breadth-first, I was able to build a readable tree that emphasizes the dependencies of modules “higher up the stack” (closer to ensurepip).
This produced a list of 72 dependencies for ensurepip:
Some are C modules, and others are only used in some cases and may not need to be part of a “kernel” distribution.
This is really just a start, to satisfy my own curiosity about the idea of a minimal standard library. Anyone truly setting out to reduce the size of the standard library would need to take this analysis further, to ensure that all of the imports found with the scanner are relevant (e.g., doctest was not) and to understand whether some of the code being imported could be moved from its current home to break the dependency on a larger module if only a const or small function is being used. And of course, we may not want to go to extreme of removing everything other than the modules used by ensurepip. There may be modules needed by the interactive interpreter, or other use cases, that should similarly be included in any “kernel” distribution.