Determining the Name of a Process from Python

Finding the name of the program from which a Python module is
running can be trickier than it would seem at first, and
investigating the reasons led to some interesting experiments.

A couple of weeks ago at the OpenStack Folsom Summit, Mark McClain
pointed out an interesting code snippet he had discovered in the Nova
sources
:

nova/utils.py: 339

script_dir = os.path.dirname(inspect.stack()[-1][1])

The code is part of the logic to find a configuration file that lives
in a directory relative to where the application startup script is
located. It looks at the call stack to find the main program, and
picks the filename out of the stack details.

The code seems to be taken from a response to a StackOverflow
question
, and when I saw it I thought it looked like a case of
someone going to more trouble than was needed to get the
information. Mark had a similar reaction, and asked if I knew of a
simpler way to determine the program name.

I thought it looked like a case of someone going to more trouble
than was needed…

Similar examples with inspect.stack() appear in four places in
the Nova source code (at last as-of today). All of them are either
building filenames relative to the location of the original “main”
program, or are returning the name of that program to be used to build
a path to another file (such as a log file or other program). Those
are all good reasons to be careful about the location and name of the
main program, but none explain why the obvious solution isn’t good
enough. I assumed that if the OpenStack developers were looking at
stack frames there must have been a reason. I decided to examine the
original code and spend a little time deciphering what it is doing,
and especially to see if there were cases where it did not work as
desired (so I could justify a patch).

The Stack

The call to inspect.stack() retrieves the Python interpreter
stack frames for the current thread. The return value is a list with
information about the calling function in position 0 and the “top”
of the stack at the end of the list. Each item in the list is a tuple
containing:

  • the stack frame data structure
  • the filename for the code being run in that frame
  • the line number within the file
  • the co_name member of the code object from the frame,
    giving the function or method name being executed
  • the source lines for that bit of code, when available
  • an index into the list of source lines showing the actual source
    line for the frame

show_stack.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import inspect


def show_stack():
    stack = inspect.stack()
    for s in stack:
        print 'filename:', s[1]
        print 'line    :', s[2]
        print 'co_name :', s[3]
        print


def inner():
    show_stack()


def outer():
    inner()


if __name__ == '__main__':
    outer()

The information is intended to be used for generating tracebacks or by
tools like pdb when debugging an application (although
pdb has its own implementation). To answer the question “Which
program am I running in?” the filename is most the interesting piece
of data.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
$ python show_stack.py
filename: show_stack.py
line    : 6
co_name : show_stack

filename: show_stack.py
line    : 15
co_name : inner

filename: show_stack.py
line    : 19
co_name : outer

filename: show_stack.py
line    : 23
co_name : <module>

One obvious issue with these results is that the filename in the stack
frame is relative to the start up directory of the application. It
could lead to an incorrect path if the process has changed its working
directory between startup and checking the stack. But there is
another mode where looking at the top of the stack produces completely
invalid results.

The simple one-liner is not always going to produce the right
results.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
$ python -m show_stack
filename: /Users/dhellmann/.../show_stack.py
line    : 6
co_name : show_stack

filename: /Users/dhellmann/.../show_stack.py
line    : 15
co_name : inner

filename: /Users/dhellmann/.../show_stack.py
line    : 19
co_name : outer

filename: /Users/dhellmann/.../show_stack.py
line    : 23
co_name : <module>

filename: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py
line    : 72
co_name : _run_code

filename: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py
line    : 162
co_name : _run_module_as_main

The -m option to the interpreter triggers the runpy module,
which takes the module name specified and executes it like a main
program. As the stack printout above illustrates, runpy is then at
the top of the stack, so the “main” part of our local module is
several levels down from the top. That means the simple one-liner is
not always going to produce the right results.

Why the Obvious Solution Fails

Now that I knew there were ways to get the wrong results by looking at
the stack, the next question was whether there was another way to find
the program name that was simpler, more efficient, and especially more
correct. The simplest solution is to look at the command line
arguments passed through sys.argv.

argv.py

1
2
3
import sys

print sys.argv[0]

Normally, the first element in sys.argv is the script that was run
as the main program. The value always points to the same file,
although the method of invoking it may cause the value to fluctuate
between a relative and full path.

1
2
3
4
5
6
7
8
$ python argv.py
argv.py

$ ./argv.py
./argv.py

$ python -m argv
/Users/dhellmann/.../argv.py

As this example demonstrates, when a script is run directly or passed
as an argument to the interpreter, sys.argv contains a relative
path
to the script file. Using -m we see the full path, so
looking at the command line arguments is more robust for that
case. However, we cannot depend on -m being used so we aren’t
guaranteed to get the extra details.

Using import

The next alternative I considered was probing the main program module
myself. Every module has a special property, __file__, which holds
the path to the file from which the module was loaded. To access the
main program module from within Python, you import a specially named
module __main__. To test this method, I created a main program
that loads another module:

import_main_app.py

1
2
3
import import_main_module

import_main_module.main()

And the second module imports __main__ and print the file it was
loaded from.

import_main.py

1
2
3
import __main__

print __main__.__file__

Looking at the __main__ module always pointed to the actual main
program module, but it did not always produce a full path. This makes
sense, because the filename for a module that goes into the stack
frame comes from the module itself.

1
2
3
4
5
$ python import_main.py
import_main.py

$ python -m import_main
/Users/dhellmann/.../import_main.py

Wandering Down the Garden Path

After I found such a simple way to reliably retrieve the program name,
I spent a while thinking about the motivation of the person who
decided that looking at stack frames was the best solution. I came up
with two hypotheses. First, it is entirely possible that they did not
know about importing __main__. It isn’t the sort of thing one
needs to do very often, and I don’t even remember where I learned
about doing it (or why, because I’m pretty sure I’ve never used the
feature in production code for any reason). That seems like the most
plausible reason, but the other idea I had was that for some reason it
was very important to have a relatively tamper-proof value –
something that could not be overwritten accidentally. This new idea
merited further investigation, so I worked back through the methods of
accessing the program name to determine which, if any, met the new
criteria.

I did not need to experiment with sys.argv to know it was
mutable. The arguments are saved in a normal list object, and
can be modified quite easily, as demonstrated here.

argv_modify.py

1
2
3
4
5
6
7
8
import sys

print 'Type  :', type(sys.argv)
print 'Before:', sys.argv

sys.argv[0] = 'wrong'

print 'After :', sys.argv

All normal list operations are supported, so replacing the program
name is a simple assignment statement. Because sys.argv is a list,
it is also susceptible to having values removed by pop(),
remove(), or a slice assignment gone awry.

$ python argv_modify.py
Type  : <type 'list'>
Before: ['argv_modify.py']
After : ['wrong']

The __file__ attribute of a module is a string, which is not
itself mutable, but the contents can be replaced by assigning a new
value to the attribute.

import_modify.py

1
2
3
4
5
6
7
import __main__

print 'Before:', __main__.__file__

__main__.__file__ = 'wrong'

print 'After :', __main__.__file__

This is less likely to happen by accident, so it seems somewhat
safer. Nonetheless, changing it is easy.

$ python import_modify.py
Before: import_modify.py
After : wrong

That leaves the stack frame.

Down the Rabbit Hole

As described above, the return value of inspect.stack() is a
list of tuples. The list is computed each time the function is called,
so it was unlikely that one part of a program would accidentally
modify it. The key word there is accidentally, but even a malicious
program would have to go to a bit of effort to return fake stack data.

stack_modify1.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import inspect


def faux_stack():
    full_stack = orig_stack()
    top = full_stack[-1]
    top = (top[0], 'wrong') + top[2:]
    full_stack[-1] = top
    return full_stack
orig_stack = inspect.stack
inspect.stack = faux_stack


stack_data = inspect.stack()
script_name = stack_data[-1][1]
print 'From stack:', script_name
print 'From frame:', stack_data[-1][0].f_code.co_filename

The filename actually appears in two places in the data returned by
inspect.stack(). The first location is in the tuple that is part
of the list returned as the stack itself. The second is in the code
object of the stack frame within that same tuple
(frame.f_code.co_filename).

$ python stack_modify1.py
From stack: wrong
From frame: stack_modify1.py

It turned out to be more challenging to change the code object.

Replacing the filename in the tuple was relatively easy, and would be
sufficient for code that trusted the stack contents returned by
inspect.stack(). It turned out to be more challenging to change
the code object. For C Python, the code class is implemented
in C as part of the set of objects used internally by the interpreter.

Objects/codeobject.c

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
static PyMemberDef code_memberlist[] = {
    {"co_argcount",     T_INT,          OFF(co_argcount),       READONLY},
    {"co_nlocals",      T_INT,          OFF(co_nlocals),        READONLY},
    {"co_stacksize",T_INT,              OFF(co_stacksize),      READONLY},
    {"co_flags",        T_INT,          OFF(co_flags),          READONLY},
    {"co_code",         T_OBJECT,       OFF(co_code),           READONLY},
    {"co_consts",       T_OBJECT,       OFF(co_consts),         READONLY},
    {"co_names",        T_OBJECT,       OFF(co_names),          READONLY},
    {"co_varnames",     T_OBJECT,       OFF(co_varnames),       READONLY},
    {"co_freevars",     T_OBJECT,       OFF(co_freevars),       READONLY},
    {"co_cellvars",     T_OBJECT,       OFF(co_cellvars),       READONLY},
    {"co_filename",     T_OBJECT,       OFF(co_filename),       READONLY},
    {"co_name",         T_OBJECT,       OFF(co_name),           READONLY},
    {"co_firstlineno", T_INT,           OFF(co_firstlineno),    READONLY},
    {"co_lnotab",       T_OBJECT,       OFF(co_lnotab),         READONLY},
    {NULL}      /* Sentinel */
};

The data members of a code object are all defined as READONLY,
which means you cannot modify them from within Python code directly.

code_modify_fail.py

1
2
3
4
5
6
7
import inspect

stack_data = inspect.stack()
frame = stack_data[0][0]
code = frame.f_code

code.co_filename = 'wrong'

Attempting to change a read-only property causes a TypeError.

1
2
3
4
5
$ python code_modify_fail.py
Traceback (most recent call last):
  File "code_modify_fail.py", line 8, in <module>
    code.co_filename = 'wrong'
TypeError: readonly attribute

Instead of changing the code object itself, I would have to replace it
with another object. The reference to the code object is accessed
through the frame object, so in order to insert my code object into
the stack frame I would need to modify the frame. Frame objects are
also immutable, however, so that meant creating a fake frame to
replace the original value. Unfortunately, it is not possible to
instantiate code or frame objects from within
Python, so I ended up having to create classes to mimic the originals.

stack_modify2.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
import collections
import inspect

# Define a namedtuple with the attributes of a stack frame
frame_attrs = ['f_back',
               'f_code',
               'f_builtins',
               'f_globals',
               'f_lasti',
               'f_lineno',
               ]
frame = collections.namedtuple('frame', ' '.join(frame_attrs))

# Define a namedtuple with the attributes of a code object
code_attrs = ['co_argcount',
              'co_nlocals',
              'co_stacksize',
              'co_flags',
              'co_code',
              'co_consts',
              'co_names',
              'co_varnames',
              'co_freevars',
              'co_cellvars',
              'co_filename',
              'co_name',
              'co_firstlineno',
              'co_lnotab',
              ]
code = collections.namedtuple('code', ' '.join(code_attrs))


def _make_fake_frame(original, filename):
    """Return a new fake frame object with the wrong filename."""
    new_c_attrs = dict((a, getattr(original.f_code, a))
                          for a in code_attrs)
    new_c_attrs['co_filename'] = filename
    new_c = code(**new_c_attrs)

    new_f_attrs = dict((a, getattr(original, a))
                           for a in frame_attrs)
    new_f_attrs['f_code'] = new_c
    new_f = frame(**new_f_attrs)
    return new_f


def faux_stack():
    full_stack = orig_stack()
    top = full_stack[-1]

    new_frame = _make_fake_frame(top[0], 'wrong')

    # Replace the top of the stack
    top = (new_frame, 'wrong') + top[2:]
    full_stack[-1] = top

    return full_stack
orig_stack = inspect.stack
inspect.stack = faux_stack


def show_app_name():
    stack_data = inspect.stack()
    script_name = stack_data[-1][1]
    print 'From stack:', script_name
    print 'From frame:', stack_data[-1][0].f_code.co_filename


if __name__ == '__main__':
    show_app_name()

I stole the idea of using namedtuple as a convenient way to
have a class with named attributes but no real methods from
inspect, which uses it to define a Traceback class.

$ python stack_modify2.py
From stack: wrong
From frame: wrong

Replacing the frame and code objects worked well for accessing the
“code” object directly, but failed when I tried to use
inspect.getframeinfo() because there is an explicit type check
with a TypeError near the beginning of getframeinfo()
(see line 16 below).

http://hg.python.org/cpython/file/35ef949e85d7/Lib/inspect.py#l987

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def getframeinfo(frame, context=1):
    """Get information about a frame or traceback object.

    A tuple of five things is returned: the filename, the line number of
    the current line, the function name, a list of lines of context from
    the source code, and the index of the current line within that list.
    The optional second argument specifies the number of lines of context
    to return, which are centered around the current line."""
    if istraceback(frame):
        lineno = frame.tb_lineno
        frame = frame.tb_frame
    else:
        lineno = frame.f_lineno
    if not isframe(frame):
        raise TypeError('{!r} is not a frame or traceback object'.format(frame))

    filename = getsourcefile(frame) or getfile(frame)
    if context > 0:
        start = lineno - 1 - context//2
        try:
            lines, lnum = findsource(frame)
        except IOError:
            lines = index = None
        else:
            start = max(start, 1)
            start = max(0, min(start, len(lines) - context))
            lines = lines[start:start+context]
            index = lineno - 1 - start
    else:
        lines = index = None

    return Traceback(filename, lineno, frame.f_code.co_name, lines, index)

The solution was to replace getframeinfo() with a version that
skips the check. Unfortunately, getframeinfo() uses
getfile(), which performs a similar check, so that function
needed to be replaced, too.

stack_modify3.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import inspect

from stack_modify2 import show_app_name


def getframeinfo(frame, context=1):
    """Get information about a frame or traceback object.

    A tuple of five things is returned: the filename, the line number of
    the current line, the function name, a list of lines of context from
    the source code, and the index of the current line within that list.
    The optional second argument specifies the number of lines of context
    to return, which are centered around the current line."""
    if inspect.istraceback(frame):
        lineno = frame.tb_lineno
        frame = frame.tb_frame
    else:
        lineno = frame.f_lineno
    # if not isframe(frame):
    #     raise TypeError('{!r} is not a frame or traceback object'.format(frame))

    filename = inspect.getsourcefile(frame) or inspect.getfile(frame)
    if context > 0:
        start = lineno - 1 - context//2
        try:
            lines, lnum = inspect.findsource(frame)
        except IOError:
            lines = index = None
        else:
            start = max(start, 1)
            start = max(0, min(start, len(lines) - context))
            lines = lines[start:start+context]
            index = lineno - 1 - start
    else:
        lines = index = None

    return inspect.Traceback(filename, lineno, frame.f_code.co_name, lines, index)
inspect.getframeinfo = getframeinfo


def getfile(object):
    """Work out which source or compiled file an object was defined in."""
    if hasattr(object, 'f_code'):
        return object.f_code.co_filename
    return orig_getfile(object)
orig_getfile = inspect.getfile
inspect.getfile = getfile


if __name__ == '__main__':
    show_app_name()
    s = inspect.stack()
    print inspect.getframeinfo(s[-1][0])

Now the caller can use inspect.getframeinfo() (really my
replacement function) and see the modified filename in the return
value.

1
2
3
4
5
$ python stack_modify3.py
From stack: wrong
From frame: wrong
Traceback(filename='wrong', lineno=51, function='<module>',
code_context=None, index=None)

After reviewing inspect.py one more time to see if I needed to
replace any other functions, I realized that a better solution was
possible. The implementation of inspect.stack() is very small,
since it calls inspect.getouterframes() to actually build the
list of frames. The seed frame passed to getouterframes() comes
from sys._getframe().

http://hg.python.org/cpython/file/35ef949e85d7/Lib/inspect.py#l1052

def stack(context=1):
    """Return a list of records for the stack above the caller's frame."""
    return getouterframes(sys._getframe(1), context)

The rest of the stack is derived from the first frame returned by
_getframe() using the f_back attribute to link from one
frame to the next.

http://hg.python.org/cpython/file/35ef949e85d7/Lib/inspect.py#l1025

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def getouterframes(frame, context=1):
    """Get a list of records for a frame and all higher (calling) frames.

    Each record contains a frame object, filename, line number, function
    name, a list of lines of context, and index within the context."""
    framelist = []
    while frame:
        framelist.append((frame,) + getframeinfo(frame, context))
        frame = frame.f_back
    return framelist

If I modified getouterframes() instead of inspect.stack(),
then I could ensure that my fake frame information was inserted at the
beginning of the stack, and all of the rest of the inspect
functions would honor it.

stack_modify4.py

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
import collections
import inspect

# Define a namedtuple with the attributes of a stack frame
frame_attrs = ['f_back',
               'f_code',
               'f_builtins',
               'f_globals',
               'f_lasti',
               'f_lineno',
               ]
frame = collections.namedtuple('frame', ' '.join(frame_attrs))

# Define a namedtuple with the attributes of a code object
code_attrs = ['co_argcount',
              'co_nlocals',
              'co_stacksize',
              'co_flags',
              'co_code',
              'co_consts',
              'co_names',
              'co_varnames',
              'co_freevars',
              'co_cellvars',
              'co_filename',
              'co_name',
              'co_firstlineno',
              'co_lnotab',
              ]
code = collections.namedtuple('code', ' '.join(code_attrs))


def _make_fake_frame(original, filename):
    """Return a new fake frame object with the wrong filename."""
    new_c_attrs = dict((a, getattr(original.f_code, a))
                          for a in code_attrs)
    new_c_attrs['co_filename'] = filename
    new_c = code(**new_c_attrs)

    new_f_attrs = dict((a, getattr(original, a))
                           for a in frame_attrs)
    new_f_attrs['f_code'] = new_c
    new_f = frame(**new_f_attrs)
    return new_f


def getouterframes(frame, context=1):
    """Get a list of records for a frame and all higher (calling) frames.

    Each record contains a frame object, filename, line number, function
    name, a list of lines of context, and index within the context."""
    framelist = []
    while frame:
        if not frame.f_back:
            # Replace the top of the stack with a fake entry
            frame = _make_fake_frame(frame, 'wrong')
        framelist.append((frame,) + getframeinfo(frame, context))
        frame = frame.f_back
    return framelist
inspect.getouterframes = getouterframes


def getframeinfo(frame, context=1):
    """Get information about a frame or traceback object.

    A tuple of five things is returned: the filename, the line number of
    the current line, the function name, a list of lines of context from
    the source code, and the index of the current line within that list.
    The optional second argument specifies the number of lines of context
    to return, which are centered around the current line."""
    if inspect.istraceback(frame):
        lineno = frame.tb_lineno
        frame = frame.tb_frame
    else:
        lineno = frame.f_lineno
    # if not isframe(frame):
    #     raise TypeError('{!r} is not a frame or traceback object'.format(frame))

    filename = inspect.getsourcefile(frame) or inspect.getfile(frame)
    if context > 0:
        start = lineno - 1 - context // 2
        try:
            lines, lnum = inspect.findsource(frame)
        except IOError:
            lines = index = None
        else:
            start = max(start, 1)
            start = max(0, min(start, len(lines) - context))
            lines = lines[start:start + context]
            index = lineno - 1 - start
    else:
        lines = index = None

    return inspect.Traceback(filename, lineno, frame.f_code.co_name, lines, index)
inspect.getframeinfo = getframeinfo


def getfile(object):
    """Work out which source or compiled file an object was defined in."""
    if isinstance(object, frame):
        return object.f_code.co_filename
    return orig_getfile(object)
orig_getfile = inspect.getfile
inspect.getfile = getfile


def show_app_name():
    stack_data = inspect.stack()
    #print [(s[1], s[0].__class__, s[0].f_code.co_name) for s in stack_data]
    print 'From stack        :', stack_data[-1][1]
    print 'From code in frame:', stack_data[-1][0].f_code.co_filename
    print 'From frame info   :', inspect.getframeinfo(stack_data[-1][0]).filename


if __name__ == '__main__':
    show_app_name()

The customized versions of getframeinfo() and getfile()
are still required to avoid exceptions caused by the type checking.

$ python stack_modify4.py
From stack        : wrong
From code in frame: wrong
From frame info   : wrong

Enough of That

At this point I have proven to myself that while it is unlikely that
anyone would bother to do it in a real program (and they would
certainly not do it by accident) it is possible to intercept the
introspection calls and insert bogus information to mislead a program
trying to discover information about itself. This implementation does
not work to subvert pdb, because it does not use
inspect. Probably because it predates inspect,
pdb has its own implementation of a stack building function,
which could be replaced using the same technique as what was done
above.

This investigation led me to several conclusions. First, I still don’t
know why the original code is looking at the stack to discover the
program name. I should ask on the OpenStack mailing list, but in the
mean time I had fun experimenting while researching the question.
Second, given that looking at __main__.__file__ produces a value
at least as correct as looking at the stack in all cases, and more
correct when a program is launched using the -m flag, it seems
like the solution with best combination of reliability and
simplicity. A patch may be in order. And finally, monkey-patching can
drive you to excesses, madness, or both.

Updates

8 May – Updated styles around embedded source files and added a label
for each.