Determining the Name of a Process from Python

Finding the name of the program from which a Python module is running can be trickier than it would seem at first, and investigating the reasons led to some interesting experiments.

A couple of weeks ago at the OpenStack Folsom Summit, Mark McClain pointed out an interesting code snippet he had discovered in the Nova sources:

nova/utils.py

339script_dir = os.path.dirname(inspect.stack()[-1][1])

The code is part of the logic to find a configuration file that lives in a directory relative to where the application startup script is located. It looks at the call stack to find the main program, and picks the filename out of the stack details.

The code seems to be taken from a response to a StackOverflow question, and when I saw it I thought it looked like a case of someone going to more trouble than was needed to get the information. Mark had a similar reaction, and asked if I knew of a simpler way to determine the program name.

I thought it looked like a case of someone going to more trouble than was needed…

Similar examples with inspect.stack() appear in four places in the Nova source code (at last as-of today). All of them are either building filenames relative to the location of the original “main” program, or are returning the name of that program to be used to build a path to another file (such as a log file or other program). Those are all good reasons to be careful about the location and name of the main program, but none explain why the obvious solution isn’t good enough. I assumed that if the OpenStack developers were looking at stack frames there must have been a reason. I decided to examine the original code and spend a little time deciphering what it is doing, and especially to see if there were cases where it did not work as desired (so I could justify a patch).

The Stack

The call to inspect.stack() retrieves the Python interpreter stack frames for the current thread. The return value is a list with information about the calling function in position and the “top” of the stack at the end of the list. Each item in the list is a tuple containing:

  • the stack frame data structure
  • the filename for the code being run in that frame
  • the line number within the file
  • the co_name member of the code object from the frame, giving the function or method name being executed
  • the source lines for that bit of code, when available
  • an index into the list of source lines showing the actual source line for the frame

show_stack.py

 1import inspect
 2
 3
 4def show_stack():
 5    stack = inspect.stack()
 6    for s in stack:
 7        print 'filename:', s[1]
 8        print 'line    :', s[2]
 9        print 'co_name :', s[3]
10        print
11
12
13def inner():
14    show_stack()
15
16
17def outer():
18    inner()
19
20
21if __name__ == '__main__':
22    outer()

The information is intended to be used for generating tracebacks or by tools like pdb when debugging an application (although pdb has its own implementation). To answer the question “Which program am I running in?” the filename is the most interesting piece of data.

$ python show_stack.py
filename: show_stack.py
line    : 6
co_name : show_stack

filename: show_stack.py
line    : 15
co_name : inner

filename: show_stack.py
line    : 19
co_name : outer

filename: show_stack.py
line    : 23
co_name : <module>

One obvious issue with these results is that the filename in the stack frame is relative to the start up directory of the application. It could lead to an incorrect path if the process has changed its working directory between startup and checking the stack. But there is another mode where looking at the top of the stack produces completely invalid results.

The simple one-liner is not always going to produce the right results.

$ python -m show_stack
filename: /Users/dhellmann/.../show_stack.py
line    : 6
co_name : show_stack

filename: /Users/dhellmann/.../show_stack.py
line    : 15
co_name : inner

filename: /Users/dhellmann/.../show_stack.py
line    : 19
co_name : outer

filename: /Users/dhellmann/.../show_stack.py
line    : 23
co_name : <module>

filename: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py
line    : 72
co_name : _run_code

filename: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py
line    : 162
co_name : _run_module_as_main

The -m option to the interpreter triggers the runpy module, which takes the module name specified and executes it like a main program. As the stack printout above illustrates, runpy is then at the top of the stack, so the “main” part of our local module is several levels down from the top. That means the simple one-liner is not always going to produce the right results.

Why the Obvious Solution Fails

Now that I knew there were ways to get the wrong results by looking at the stack, the next question was whether there was another way to find the program name that was simpler, more efficient, and especially more correct. The simplest solution is to look at the command line arguments passed through sys.argv.

argv.py

1import sys
2
3print sys.argv[0]

Normally, the first element in sys.argv is the script that was run as the main program. The value always points to the same file, although the method of invoking it may cause the value to fluctuate between a relative and full path.

$ python argv.py
argv.py

$ ./argv.py
./argv.py

$ python -m argv
/Users/dhellmann/.../argv.py

As this example demonstrates, when a script is run directly or passed as an argument to the interpreter, sys.argv contains a relative path to the script file. Using -m we see the full path, so looking at the command line arguments is more robust for that case. However, we cannot depend on -m being used so we aren’t guaranteed to get the extra details.

Using import

The next alternative I considered was probing the main program module myself. Every module has a special property, __file__, which holds the path to the file from which the module was loaded. To access the main program module from within Python, you import a specially named module __main__. To test this method, I created a main program that loads another module:

import_main_app.py

1import import_main_module
2
3import_main_module.main()

And the second module imports __main__ and print the file it was loaded from.

import_main.py

1import __main__
2
3print __main__.__file__

Looking at the __main__ module always pointed to the actual main program module, but it did not always produce a full path. This makes sense, because the filename for a module that goes into the stack frame comes from the module itself.

$ python import_main.py
import_main.py

$ python -m import_main
/Users/dhellmann/.../import_main.py

Wandering Down the Garden Path

After I found such a simple way to reliably retrieve the program name, I spent a while thinking about the motivation of the person who decided that looking at stack frames was the best solution. I came up with two hypotheses. First, it is entirely possible that they did not know about importing __main__. It isn’t the sort of thing one needs to do very often, and I don’t even remember where I learned about doing it (or why, because I’m pretty sure I’ve never used the feature in production code for any reason). That seems like the most plausible reason, but the other idea I had was that for some reason it was very important to have a relatively tamper-proof value – something that could not be overwritten accidentally. This new idea merited further investigation, so I worked back through the methods of accessing the program name to determine which, if any, met the new criteria.

I did not need to experiment with sys.argv to know it was mutable. The arguments are saved in a normal list object, and can be modified quite easily, as demonstrated here.

argv_modify.py

1import sys
2
3print 'Type  :', type(sys.argv)
4print 'Before:', sys.argv
5
6sys.argv[0] = 'wrong'
7
8print 'After :', sys.argv

All normal list operations are supported, so replacing the program name is a simple assignment statement. Because sys.argv is a list, it is also susceptible to having values removed by pop(), remove(), or a slice assignment gone awry.

$ python argv_modify.py
Type  : <type 'list'>
Before: ['argv_modify.py']
After : ['wrong']

The __file__ attribute of a module is a string, which is not itself mutable, but the contents can be replaced by assigning a new value to the attribute.

import_modify.py

1import __main__
2
3print 'Before:', __main__.__file__
4
5__main__.__file__ = 'wrong'
6
7print 'After :', __main__.__file__

This is less likely to happen by accident, so it seems somewhat safer. Nonetheless, changing it is easy.

$ python import_modify.py
Before: import_modify.py
After : wrong

That leaves the stack frame.

Down the Rabbit Hole

As described above, the return value of inspect.stack() is a list of tuples. The list is computed each time the function is called, so it was unlikely that one part of a program would accidentally modify it. The key word there is accidentally, but even a malicious program would have to go to a bit of effort to return fake stack data.

stack_modify1.py

 1import inspect
 2
 3
 4def faux_stack():
 5    full_stack = orig_stack()
 6    top = full_stack[-1]
 7    top = (top[0], 'wrong') + top[2:]
 8    full_stack[-1] = top
 9    return full_stack
10orig_stack = inspect.stack
11inspect.stack = faux_stack
12
13
14stack_data = inspect.stack()
15script_name = stack_data[-1][1]
16print 'From stack:', script_name
17print 'From frame:', stack_data[-1][0].f_code.co_filename

The filename actually appears in two places in the data returned by inspect.stack(). The first location is in the tuple that is part of the list returned as the stack itself. The second is in the code object of the stack frame within that same tuple (frame.f_code.co_filename).

$ python stack_modify1.py
From stack: wrong
From frame: stack_modify1.py

It turned out to be more challenging to change the code object.

Replacing the filename in the tuple was relatively easy, and would be sufficient for code that trusted the stack contents returned by inspect.stack(). It turned out to be more challenging to change the code object. For C Python, the code class is implemented in C as part of the set of objects used internally by the interpreter.

Objects/codeobject.c

static PyMemberDef code_memberlist[] = {
    {"co_argcount",     T_INT,          OFF(co_argcount),       READONLY},
    {"co_nlocals",      T_INT,          OFF(co_nlocals),        READONLY},
    {"co_stacksize",T_INT,              OFF(co_stacksize),      READONLY},
    {"co_flags",        T_INT,          OFF(co_flags),          READONLY},
    {"co_code",         T_OBJECT,       OFF(co_code),           READONLY},
    {"co_consts",       T_OBJECT,       OFF(co_consts),         READONLY},
    {"co_names",        T_OBJECT,       OFF(co_names),          READONLY},
    {"co_varnames",     T_OBJECT,       OFF(co_varnames),       READONLY},
    {"co_freevars",     T_OBJECT,       OFF(co_freevars),       READONLY},
    {"co_cellvars",     T_OBJECT,       OFF(co_cellvars),       READONLY},
    {"co_filename",     T_OBJECT,       OFF(co_filename),       READONLY},
    {"co_name",         T_OBJECT,       OFF(co_name),           READONLY},
    {"co_firstlineno", T_INT,           OFF(co_firstlineno),    READONLY},
    {"co_lnotab",       T_OBJECT,       OFF(co_lnotab),         READONLY},
    {NULL}      /* Sentinel */
};

The data members of a code object are all defined as READONLY, which means you cannot modify them from within Python code directly.

code_modify_fail.py

1import inspect
2
3stack_data = inspect.stack()
4frame = stack_data[][]
5code = frame.f_code
6
7code.co_filename = 'wrong'

Attempting to change a read-only property causes a TypeError.

$ python code_modify_fail.py
Traceback (most recent call last):
  File "code_modify_fail.py", line 8, in <module>
    code.co_filename = 'wrong'
TypeError: readonly attribute

Instead of changing the code object itself, I would have to replace it with another object. The reference to the code object is accessed through the frame object, so in order to insert my code object into the stack frame I would need to modify the frame. Frame objects are also immutable, however, so that meant creating a fake frame to replace the original value. Unfortunately, it is not possible to instantiate code or frame objects from within Python, so I ended up having to create classes to mimic the originals.

stack_modify2.py

 1import collections
 2import inspect
 3
 4# Define a namedtuple with the attributes of a stack frame
 5frame_attrs = ['f_back',
 6               'f_code',
 7               'f_builtins',
 8               'f_globals',
 9               'f_lasti',
10               'f_lineno',
11               ]
12frame = collections.namedtuple('frame', ' '.join(frame_attrs))
13
14# Define a namedtuple with the attributes of a code object
15code_attrs = ['co_argcount',
16              'co_nlocals',
17              'co_stacksize',
18              'co_flags',
19              'co_code',
20              'co_consts',
21              'co_names',
22              'co_varnames',
23              'co_freevars',
24              'co_cellvars',
25              'co_filename',
26              'co_name',
27              'co_firstlineno',
28              'co_lnotab',
29              ]
30code = collections.namedtuple('code', ' '.join(code_attrs))
31
32
33def _make_fake_frame(original, filename):
34    """Return a new fake frame object with the wrong filename."""
35    new_c_attrs = dict((a, getattr(original.f_code, a))
36                          for a in code_attrs)
37    new_c_attrs['co_filename'] = filename
38    new_c = code(**new_c_attrs)
39
40    new_f_attrs = dict((a, getattr(original, a))
41                           for a in frame_attrs)
42    new_f_attrs['f_code'] = new_c
43    new_f = frame(**new_f_attrs)
44    return new_f
45
46
47def faux_stack():
48    full_stack = orig_stack()
49    top = full_stack[-1]
50
51    new_frame = _make_fake_frame(top[0], 'wrong')
52
53    # Replace the top of the stack
54    top = (new_frame, 'wrong') + top[2:]
55    full_stack[-1] = top
56
57    return full_stack
58orig_stack = inspect.stack
59inspect.stack = faux_stack
60
61
62def show_app_name():
63    stack_data = inspect.stack()
64    script_name = stack_data[-1][1]
65    print 'From stack:', script_name
66    print 'From frame:', stack_data[-1][0].f_code.co_filename
67
68
69if __name__ == '__main__':
70    show_app_name()

I stole the idea of using namedtuple as a convenient way to have a class with named attributes but no real methods from inspect, which uses it to define a Traceback class.

$ python stack_modify2.py
From stack: wrong
From frame: wrong

Replacing the frame and code objects worked well for accessing the “code” object directly, but failed when I tried to use inspect.getframeinfo() because there is an explicit type check with a TypeError near the beginning of getframeinfo() (see line 16 below).

http://hg.python.org/cpython/file/35ef949e85d7/Lib/inspect.py#l987

 1def getframeinfo(frame, context=1):
 2    """Get information about a frame or traceback object.
 3
 4    A tuple of five things is returned: the filename, the line number of
 5    the current line, the function name, a list of lines of context from
 6    the source code, and the index of the current line within that list.
 7    The optional second argument specifies the number of lines of context
 8    to return, which are centered around the current line."""
 9    if istraceback(frame):
10        lineno = frame.tb_lineno
11        frame = frame.tb_frame
12    else:
13        lineno = frame.f_lineno
14    if not isframe(frame):
15        raise TypeError('{!r} is not a frame or traceback object'.format(frame))
16
17    filename = getsourcefile(frame) or getfile(frame)
18    if context > :
19        start = lineno - 1 - context//2
20        try:
21            lines, lnum = findsource(frame)
22        except IOError:
23            lines = index = None
24        else:
25            start = max(start, 1)
26            start = max(, min(start, len(lines) - context))
27            lines = lines[start:start+context]
28            index = lineno - 1 - start
29    else:
30        lines = index = None
31
32    return Traceback(filename, lineno, frame.f_code.co_name, lines, index)

The solution was to replace getframeinfo() with a version that skips the check. Unfortunately, getframeinfo() uses getfile(), which performs a similar check, so that function needed to be replaced, too.

stack_modify3.py

 1import inspect
 2
 3from stack_modify2 import show_app_name
 4
 5
 6def getframeinfo(frame, context=1):
 7    """Get information about a frame or traceback object.
 8
 9    A tuple of five things is returned: the filename, the line number of
10    the current line, the function name, a list of lines of context from
11    the source code, and the index of the current line within that list.
12    The optional second argument specifies the number of lines of context
13    to return, which are centered around the current line."""
14    if inspect.istraceback(frame):
15        lineno = frame.tb_lineno
16        frame = frame.tb_frame
17    else:
18        lineno = frame.f_lineno
19    # if not isframe(frame):
20    #     raise TypeError('{!r} is not a frame or traceback object'.format(frame))
21
22    filename = inspect.getsourcefile(frame) or inspect.getfile(frame)
23    if context > 0:
24        start = lineno - 1 - context//2
25        try:
26            lines, lnum = inspect.findsource(frame)
27        except IOError:
28            lines = index = None
29        else:
30            start = max(start, 1)
31            start = max(0, min(start, len(lines) - context))
32            lines = lines[start:start+context]
33            index = lineno - 1 - start
34    else:
35        lines = index = None
36
37    return inspect.Traceback(filename, lineno, frame.f_code.co_name, lines, index)
38inspect.getframeinfo = getframeinfo
39
40
41def getfile(object):
42    """Work out which source or compiled file an object was defined in."""
43    if hasattr(object, 'f_code'):
44        return object.f_code.co_filename
45    return orig_getfile(object)
46orig_getfile = inspect.getfile
47inspect.getfile = getfile
48
49
50if __name__ == '__main__':
51    show_app_name()
52    s = inspect.stack()
53    print inspect.getframeinfo(s[-1][0])

Now the caller can use inspect.getframeinfo() (really my replacement function) and see the modified filename in the return value.

$ python stack_modify3.py
From stack: wrong
From frame: wrong
Traceback(filename='wrong', lineno=51, function='<module>',
code_context=None, index=None)

After reviewing inspect.py one more time to see if I needed to replace any other functions, I realized that a better solution was possible. The implementation of inspect.stack() is very small, since it calls inspect.getouterframes() to actually build the list of frames. The seed frame passed to getouterframes() comes from sys._getframe().

http://hg.python.org/cpython/file/35ef949e85d7/Lib/inspect.py#l1052

1def stack(context=1):
2    """Return a list of records for the stack above the caller's frame."""
3    return getouterframes(sys._getframe(1), context)

The rest of the stack is derived from the first frame returned by _getframe() using the f_back attribute to link from one frame to the next.

http://hg.python.org/cpython/file/35ef949e85d7/Lib/inspect.py#l1025

 1def getouterframes(frame, context=1):
 2    """Get a list of records for a frame and all higher (calling) frames.
 3
 4    Each record contains a frame object, filename, line number, function
 5    name, a list of lines of context, and index within the context."""
 6    framelist = []
 7    while frame:
 8        framelist.append((frame,) + getframeinfo(frame, context))
 9        frame = frame.f_back
10    return framelist

If I modified getouterframes() instead of inspect.stack(), then I could ensure that my fake frame information was inserted at the beginning of the stack, and all of the rest of the inspect functions would honor it.

stack_modify4.py

  1import collections
  2import inspect
  3
  4# Define a namedtuple with the attributes of a stack frame
  5frame_attrs = ['f_back',
  6               'f_code',
  7               'f_builtins',
  8               'f_globals',
  9               'f_lasti',
 10               'f_lineno',
 11               ]
 12frame = collections.namedtuple('frame', ' '.join(frame_attrs))
 13
 14# Define a namedtuple with the attributes of a code object
 15code_attrs = ['co_argcount',
 16              'co_nlocals',
 17              'co_stacksize',
 18              'co_flags',
 19              'co_code',
 20              'co_consts',
 21              'co_names',
 22              'co_varnames',
 23              'co_freevars',
 24              'co_cellvars',
 25              'co_filename',
 26              'co_name',
 27              'co_firstlineno',
 28              'co_lnotab',
 29              ]
 30code = collections.namedtuple('code', ' '.join(code_attrs))
 31
 32
 33def _make_fake_frame(original, filename):
 34    """Return a new fake frame object with the wrong filename."""
 35    new_c_attrs = dict((a, getattr(original.f_code, a))
 36                          for a in code_attrs)
 37    new_c_attrs['co_filename'] = filename
 38    new_c = code(**new_c_attrs)
 39
 40    new_f_attrs = dict((a, getattr(original, a))
 41                           for a in frame_attrs)
 42    new_f_attrs['f_code'] = new_c
 43    new_f = frame(**new_f_attrs)
 44    return new_f
 45
 46
 47def getouterframes(frame, context=1):
 48    """Get a list of records for a frame and all higher (calling) frames.
 49
 50    Each record contains a frame object, filename, line number, function
 51    name, a list of lines of context, and index within the context."""
 52    framelist = []
 53    while frame:
 54        if not frame.f_back:
 55            # Replace the top of the stack with a fake entry
 56            frame = _make_fake_frame(frame, 'wrong')
 57        framelist.append((frame,) + getframeinfo(frame, context))
 58        frame = frame.f_back
 59    return framelist
 60inspect.getouterframes = getouterframes
 61
 62
 63def getframeinfo(frame, context=1):
 64    """Get information about a frame or traceback object.
 65
 66    A tuple of five things is returned: the filename, the line number of
 67    the current line, the function name, a list of lines of context from
 68    the source code, and the index of the current line within that list.
 69    The optional second argument specifies the number of lines of context
 70    to return, which are centered around the current line."""
 71    if inspect.istraceback(frame):
 72        lineno = frame.tb_lineno
 73        frame = frame.tb_frame
 74    else:
 75        lineno = frame.f_lineno
 76    # if not isframe(frame):
 77    #     raise TypeError('{!r} is not a frame or traceback object'.format(frame))
 78
 79    filename = inspect.getsourcefile(frame) or inspect.getfile(frame)
 80    if context > 0:
 81        start = lineno - 1 - context // 2
 82        try:
 83            lines, lnum = inspect.findsource(frame)
 84        except IOError:
 85            lines = index = None
 86        else:
 87            start = max(start, 1)
 88            start = max(0, min(start, len(lines) - context))
 89            lines = lines[start:start + context]
 90            index = lineno - 1 - start
 91    else:
 92        lines = index = None
 93
 94    return inspect.Traceback(filename, lineno, frame.f_code.co_name, lines, index)
 95inspect.getframeinfo = getframeinfo
 96
 97
 98def getfile(object):
 99    """Work out which source or compiled file an object was defined in."""
100    if isinstance(object, frame):
101        return object.f_code.co_filename
102    return orig_getfile(object)
103orig_getfile = inspect.getfile
104inspect.getfile = getfile
105
106
107def show_app_name():
108    stack_data = inspect.stack()
109    #print [(s[1], s[0].__class__, s[0].f_code.co_name) for s in stack_data]
110    print 'From stack        :', stack_data[-1][1]
111    print 'From code in frame:', stack_data[-1][0].f_code.co_filename
112    print 'From frame info   :', inspect.getframeinfo(stack_data[-1][0]).filename
113
114
115if __name__ == '__main__':
116    show_app_name()

The customized versions of getframeinfo() and getfile() are still required to avoid exceptions caused by the type checking.

$ python stack_modify4.py
From stack        : wrong
From code in frame: wrong
From frame info   : wrong

Enough of That

At this point I have proven to myself that while it is unlikely that anyone would bother to do it in a real program (and they would certainly not do it by accident) it is possible to intercept the introspection calls and insert bogus information to mislead a program trying to discover information about itself. This implementation does not work to subvert pdb, because it does not use inspect. Probably because it predates inspect, pdb has its own implementation of a stack building function, which could be replaced using the same technique as what was done above.

This investigation led me to several conclusions. First, I still don’t know why the original code is looking at the stack to discover the program name. I should ask on the OpenStack mailing list, but in the mean time I had fun experimenting while researching the question. Second, given that looking at __main__.__file__ produces a value at least as correct as looking at the stack in all cases, and more correct when a program is launched using the -m flag, it seems like the solution with best combination of reliability and simplicity. A patch may be in order. And finally, monkey-patching can drive you to excesses, madness, or both.

See also

Updates

8 May – Updated styles around embedded source files and added a label for each.