Determining the Name of a Process from Python
Finding the name of the program from which a Python module is running can be trickier than it would seem at first, and investigating the reasons led to some interesting experiments.
A couple of weeks ago at the OpenStack Folsom Summit, Mark McClain pointed out an interesting code snippet he had discovered in the Nova sources:
nova/utils.py
339script_dir = os.path.dirname(inspect.stack()[-1][1])
The code is part of the logic to find a configuration file that lives in a directory relative to where the application startup script is located. It looks at the call stack to find the main program, and picks the filename out of the stack details.
The code seems to be taken from a response to a StackOverflow question, and when I saw it I thought it looked like a case of someone going to more trouble than was needed to get the information. Mark had a similar reaction, and asked if I knew of a simpler way to determine the program name.
I thought it looked like a case of someone going to more trouble than was needed…
Similar examples with inspect.stack()
appear in four places in
the Nova source code (at last as-of today). All of them are either
building filenames relative to the location of the original “main”
program, or are returning the name of that program to be used to build
a path to another file (such as a log file or other program). Those
are all good reasons to be careful about the location and name of the
main program, but none explain why the obvious solution isn’t good
enough. I assumed that if the OpenStack developers were looking at
stack frames there must have been a reason. I decided to examine the
original code and spend a little time deciphering what it is doing,
and especially to see if there were cases where it did not work as
desired (so I could justify a patch).
The Stack
The call to inspect.stack()
retrieves the Python interpreter stack
frames for the current thread. The return value is a list with
information about the calling function in position and the “top” of
the stack at the end of the list. Each item in the list is a tuple
containing:
- the stack frame data structure
- the filename for the code being run in that frame
- the line number within the file
- the
co_name
member of the code object from the frame, giving the function or method name being executed - the source lines for that bit of code, when available
- an index into the list of source lines showing the actual source line for the frame
show_stack.py
1import inspect
2
3
4def show_stack():
5 stack = inspect.stack()
6 for s in stack:
7 print 'filename:', s[1]
8 print 'line :', s[2]
9 print 'co_name :', s[3]
10 print
11
12
13def inner():
14 show_stack()
15
16
17def outer():
18 inner()
19
20
21if __name__ == '__main__':
22 outer()
The information is intended to be used for generating tracebacks or by
tools like pdb
when debugging an application (although pdb
has its
own implementation). To answer the question “Which program am I
running in?” the filename is the most interesting piece of data.
$ python show_stack.py
filename: show_stack.py
line : 6
co_name : show_stack
filename: show_stack.py
line : 15
co_name : inner
filename: show_stack.py
line : 19
co_name : outer
filename: show_stack.py
line : 23
co_name : <module>
One obvious issue with these results is that the filename in the stack frame is relative to the start up directory of the application. It could lead to an incorrect path if the process has changed its working directory between startup and checking the stack. But there is another mode where looking at the top of the stack produces completely invalid results.
The simple one-liner is not always going to produce the right results.
$ python -m show_stack
filename: /Users/dhellmann/.../show_stack.py
line : 6
co_name : show_stack
filename: /Users/dhellmann/.../show_stack.py
line : 15
co_name : inner
filename: /Users/dhellmann/.../show_stack.py
line : 19
co_name : outer
filename: /Users/dhellmann/.../show_stack.py
line : 23
co_name : <module>
filename: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py
line : 72
co_name : _run_code
filename: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py
line : 162
co_name : _run_module_as_main
The -m
option to the interpreter triggers the runpy
module, which
takes the module name specified and executes it like a main
program. As the stack printout above illustrates, runpy
is then at
the top of the stack, so the “main” part of our local module is
several levels down from the top. That means the simple one-liner is
not always going to produce the right results.
Why the Obvious Solution Fails
Now that I knew there were ways to get the wrong results by looking at
the stack, the next question was whether there was another way to find
the program name that was simpler, more efficient, and especially more
correct. The simplest solution is to look at the command line
arguments passed through sys.argv
.
argv.py
1import sys
2
3print sys.argv[0]
Normally, the first element in sys.argv
is the script that was run
as the main program. The value always points to the same file,
although the method of invoking it may cause the value to fluctuate
between a relative and full path.
$ python argv.py
argv.py
$ ./argv.py
./argv.py
$ python -m argv
/Users/dhellmann/.../argv.py
As this example demonstrates, when a script is run directly or passed
as an argument to the interpreter, sys.argv
contains a relative
path to the script file. Using -m
we see the full path, so looking
at the command line arguments is more robust for that case. However,
we cannot depend on -m
being used so we aren’t guaranteed to get the
extra details.
Using import
The next alternative I considered was probing the main program module
myself. Every module has a special property, __file__
, which holds
the path to the file from which the module was loaded. To access the
main program module from within Python, you import a specially named
module __main__
. To test this method, I created a main program that
loads another module:
import_main_app.py
1import import_main_module
2
3import_main_module.main()
And the second module imports __main__
and print the file it was
loaded from.
import_main.py
1import __main__
2
3print __main__.__file__
Looking at the __main__
module always pointed to the actual main
program module, but it did not always produce a full path. This makes
sense, because the filename for a module that goes into the stack
frame comes from the module itself.
$ python import_main.py
import_main.py
$ python -m import_main
/Users/dhellmann/.../import_main.py
Wandering Down the Garden Path
After I found such a simple way to reliably retrieve the program name,
I spent a while thinking about the motivation of the person who
decided that looking at stack frames was the best solution. I came up
with two hypotheses. First, it is entirely possible that they did not
know about importing __main__
. It isn’t the sort of thing one needs
to do very often, and I don’t even remember where I learned about
doing it (or why, because I’m pretty sure I’ve never used the feature
in production code for any reason). That seems like the most plausible
reason, but the other idea I had was that for some reason it was very
important to have a relatively tamper-proof value – something that
could not be overwritten accidentally. This new idea merited further
investigation, so I worked back through the methods of accessing the
program name to determine which, if any, met the new criteria.
I did not need to experiment with sys.argv
to know it was
mutable. The arguments are saved in a normal list
object, and can be
modified quite easily, as demonstrated here.
argv_modify.py
1import sys
2
3print 'Type :', type(sys.argv)
4print 'Before:', sys.argv
5
6sys.argv[0] = 'wrong'
7
8print 'After :', sys.argv
All normal list operations are supported, so replacing the program
name is a simple assignment statement. Because sys.argv
is a list,
it is also susceptible to having values removed by pop()
, remove()
,
or a slice assignment gone awry.
$ python argv_modify.py
Type : <type 'list'>
Before: ['argv_modify.py']
After : ['wrong']
The __file__
attribute of a module is a string, which is not itself
mutable, but the contents can be replaced by assigning a new value to
the attribute.
import_modify.py
1import __main__
2
3print 'Before:', __main__.__file__
4
5__main__.__file__ = 'wrong'
6
7print 'After :', __main__.__file__
This is less likely to happen by accident, so it seems somewhat safer. Nonetheless, changing it is easy.
$ python import_modify.py
Before: import_modify.py
After : wrong
That leaves the stack frame.
Down the Rabbit Hole
As described above, the return value of inspect.stack()
is a list of
tuples. The list is computed each time the function is called, so it
was unlikely that one part of a program would accidentally modify
it. The key word there is accidentally, but even a malicious program
would have to go to a bit of effort to return fake stack data.
stack_modify1.py
1import inspect
2
3
4def faux_stack():
5 full_stack = orig_stack()
6 top = full_stack[-1]
7 top = (top[0], 'wrong') + top[2:]
8 full_stack[-1] = top
9 return full_stack
10orig_stack = inspect.stack
11inspect.stack = faux_stack
12
13
14stack_data = inspect.stack()
15script_name = stack_data[-1][1]
16print 'From stack:', script_name
17print 'From frame:', stack_data[-1][0].f_code.co_filename
The filename actually appears in two places in the data returned by
inspect.stack()
. The first location is in the tuple that is part of
the list returned as the stack itself. The second is in the code
object of the stack frame within that same tuple
(frame.f_code.co_filename
).
$ python stack_modify1.py
From stack: wrong
From frame: stack_modify1.py
It turned out to be more challenging to change the code object.
Replacing the filename in the tuple was relatively easy, and would be
sufficient for code that trusted the stack contents returned by
inspect.stack()
. It turned out to be more challenging to change the
code object. For C Python, the code
class is implemented in C as
part of the set of objects used internally by the interpreter.
Objects/codeobject.c
static PyMemberDef code_memberlist[] = {
{"co_argcount", T_INT, OFF(co_argcount), READONLY},
{"co_nlocals", T_INT, OFF(co_nlocals), READONLY},
{"co_stacksize",T_INT, OFF(co_stacksize), READONLY},
{"co_flags", T_INT, OFF(co_flags), READONLY},
{"co_code", T_OBJECT, OFF(co_code), READONLY},
{"co_consts", T_OBJECT, OFF(co_consts), READONLY},
{"co_names", T_OBJECT, OFF(co_names), READONLY},
{"co_varnames", T_OBJECT, OFF(co_varnames), READONLY},
{"co_freevars", T_OBJECT, OFF(co_freevars), READONLY},
{"co_cellvars", T_OBJECT, OFF(co_cellvars), READONLY},
{"co_filename", T_OBJECT, OFF(co_filename), READONLY},
{"co_name", T_OBJECT, OFF(co_name), READONLY},
{"co_firstlineno", T_INT, OFF(co_firstlineno), READONLY},
{"co_lnotab", T_OBJECT, OFF(co_lnotab), READONLY},
{NULL} /* Sentinel */
};
The data members of a code object are all defined as READONLY
, which
means you cannot modify them from within Python code directly.
code_modify_fail.py
1import inspect
2
3stack_data = inspect.stack()
4frame = stack_data[][]
5code = frame.f_code
6
7code.co_filename = 'wrong'
Attempting to change a read-only property causes a TypeError
.
$ python code_modify_fail.py
Traceback (most recent call last):
File "code_modify_fail.py", line 8, in <module>
code.co_filename = 'wrong'
TypeError: readonly attribute
Instead of changing the code object itself, I would have to replace it
with another object. The reference to the code object is accessed
through the frame object, so in order to insert my code object into
the stack frame I would need to modify the frame. Frame objects are
also immutable, however, so that meant creating a fake frame to
replace the original value. Unfortunately, it is not possible to
instantiate code
or frame
objects from within Python, so I ended
up having to create classes to mimic the originals.
stack_modify2.py
1import collections
2import inspect
3
4# Define a namedtuple with the attributes of a stack frame
5frame_attrs = ['f_back',
6 'f_code',
7 'f_builtins',
8 'f_globals',
9 'f_lasti',
10 'f_lineno',
11 ]
12frame = collections.namedtuple('frame', ' '.join(frame_attrs))
13
14# Define a namedtuple with the attributes of a code object
15code_attrs = ['co_argcount',
16 'co_nlocals',
17 'co_stacksize',
18 'co_flags',
19 'co_code',
20 'co_consts',
21 'co_names',
22 'co_varnames',
23 'co_freevars',
24 'co_cellvars',
25 'co_filename',
26 'co_name',
27 'co_firstlineno',
28 'co_lnotab',
29 ]
30code = collections.namedtuple('code', ' '.join(code_attrs))
31
32
33def _make_fake_frame(original, filename):
34 """Return a new fake frame object with the wrong filename."""
35 new_c_attrs = dict((a, getattr(original.f_code, a))
36 for a in code_attrs)
37 new_c_attrs['co_filename'] = filename
38 new_c = code(**new_c_attrs)
39
40 new_f_attrs = dict((a, getattr(original, a))
41 for a in frame_attrs)
42 new_f_attrs['f_code'] = new_c
43 new_f = frame(**new_f_attrs)
44 return new_f
45
46
47def faux_stack():
48 full_stack = orig_stack()
49 top = full_stack[-1]
50
51 new_frame = _make_fake_frame(top[0], 'wrong')
52
53 # Replace the top of the stack
54 top = (new_frame, 'wrong') + top[2:]
55 full_stack[-1] = top
56
57 return full_stack
58orig_stack = inspect.stack
59inspect.stack = faux_stack
60
61
62def show_app_name():
63 stack_data = inspect.stack()
64 script_name = stack_data[-1][1]
65 print 'From stack:', script_name
66 print 'From frame:', stack_data[-1][0].f_code.co_filename
67
68
69if __name__ == '__main__':
70 show_app_name()
I stole the idea of using namedtuple
as a convenient way to have a
class with named attributes but no real methods from inspect
, which
uses it to define a Traceback
class.
$ python stack_modify2.py
From stack: wrong
From frame: wrong
Replacing the frame and code objects worked well for accessing the
“code” object directly, but failed when I tried to use
inspect.getframeinfo()
because there is an explicit type check with
a TypeError
near the beginning of getframeinfo()
(see line 16
below).
http://hg.python.org/cpython/file/35ef949e85d7/Lib/inspect.py#l987
1def getframeinfo(frame, context=1):
2 """Get information about a frame or traceback object.
3
4 A tuple of five things is returned: the filename, the line number of
5 the current line, the function name, a list of lines of context from
6 the source code, and the index of the current line within that list.
7 The optional second argument specifies the number of lines of context
8 to return, which are centered around the current line."""
9 if istraceback(frame):
10 lineno = frame.tb_lineno
11 frame = frame.tb_frame
12 else:
13 lineno = frame.f_lineno
14 if not isframe(frame):
15 raise TypeError('{!r} is not a frame or traceback object'.format(frame))
16
17 filename = getsourcefile(frame) or getfile(frame)
18 if context > :
19 start = lineno - 1 - context//2
20 try:
21 lines, lnum = findsource(frame)
22 except IOError:
23 lines = index = None
24 else:
25 start = max(start, 1)
26 start = max(, min(start, len(lines) - context))
27 lines = lines[start:start+context]
28 index = lineno - 1 - start
29 else:
30 lines = index = None
31
32 return Traceback(filename, lineno, frame.f_code.co_name, lines, index)
The solution was to replace getframeinfo()
with a version that skips
the check. Unfortunately, getframeinfo()
uses getfile()
, which
performs a similar check, so that function needed to be replaced, too.
stack_modify3.py
1import inspect
2
3from stack_modify2 import show_app_name
4
5
6def getframeinfo(frame, context=1):
7 """Get information about a frame or traceback object.
8
9 A tuple of five things is returned: the filename, the line number of
10 the current line, the function name, a list of lines of context from
11 the source code, and the index of the current line within that list.
12 The optional second argument specifies the number of lines of context
13 to return, which are centered around the current line."""
14 if inspect.istraceback(frame):
15 lineno = frame.tb_lineno
16 frame = frame.tb_frame
17 else:
18 lineno = frame.f_lineno
19 # if not isframe(frame):
20 # raise TypeError('{!r} is not a frame or traceback object'.format(frame))
21
22 filename = inspect.getsourcefile(frame) or inspect.getfile(frame)
23 if context > 0:
24 start = lineno - 1 - context//2
25 try:
26 lines, lnum = inspect.findsource(frame)
27 except IOError:
28 lines = index = None
29 else:
30 start = max(start, 1)
31 start = max(0, min(start, len(lines) - context))
32 lines = lines[start:start+context]
33 index = lineno - 1 - start
34 else:
35 lines = index = None
36
37 return inspect.Traceback(filename, lineno, frame.f_code.co_name, lines, index)
38inspect.getframeinfo = getframeinfo
39
40
41def getfile(object):
42 """Work out which source or compiled file an object was defined in."""
43 if hasattr(object, 'f_code'):
44 return object.f_code.co_filename
45 return orig_getfile(object)
46orig_getfile = inspect.getfile
47inspect.getfile = getfile
48
49
50if __name__ == '__main__':
51 show_app_name()
52 s = inspect.stack()
53 print inspect.getframeinfo(s[-1][0])
Now the caller can use inspect.getframeinfo()
(really my replacement
function) and see the modified filename in the return value.
$ python stack_modify3.py
From stack: wrong
From frame: wrong
Traceback(filename='wrong', lineno=51, function='<module>',
code_context=None, index=None)
After reviewing inspect.py
one more time to see if I needed to
replace any other functions, I realized that a better solution was
possible. The implementation of inspect.stack()
is very small, since
it calls inspect.getouterframes()
to actually build the list of
frames. The seed frame passed to getouterframes()
comes from
sys._getframe()
.
http://hg.python.org/cpython/file/35ef949e85d7/Lib/inspect.py#l1052
1def stack(context=1):
2 """Return a list of records for the stack above the caller's frame."""
3 return getouterframes(sys._getframe(1), context)
The rest of the stack is derived from the first frame returned by
_getframe()
using the f_back
attribute to link from one frame to
the next.
http://hg.python.org/cpython/file/35ef949e85d7/Lib/inspect.py#l1025
1def getouterframes(frame, context=1):
2 """Get a list of records for a frame and all higher (calling) frames.
3
4 Each record contains a frame object, filename, line number, function
5 name, a list of lines of context, and index within the context."""
6 framelist = []
7 while frame:
8 framelist.append((frame,) + getframeinfo(frame, context))
9 frame = frame.f_back
10 return framelist
If I modified getouterframes()
instead of inspect.stack()
, then I
could ensure that my fake frame information was inserted at the
beginning of the stack, and all of the rest of the inspect
functions
would honor it.
stack_modify4.py
1import collections
2import inspect
3
4# Define a namedtuple with the attributes of a stack frame
5frame_attrs = ['f_back',
6 'f_code',
7 'f_builtins',
8 'f_globals',
9 'f_lasti',
10 'f_lineno',
11 ]
12frame = collections.namedtuple('frame', ' '.join(frame_attrs))
13
14# Define a namedtuple with the attributes of a code object
15code_attrs = ['co_argcount',
16 'co_nlocals',
17 'co_stacksize',
18 'co_flags',
19 'co_code',
20 'co_consts',
21 'co_names',
22 'co_varnames',
23 'co_freevars',
24 'co_cellvars',
25 'co_filename',
26 'co_name',
27 'co_firstlineno',
28 'co_lnotab',
29 ]
30code = collections.namedtuple('code', ' '.join(code_attrs))
31
32
33def _make_fake_frame(original, filename):
34 """Return a new fake frame object with the wrong filename."""
35 new_c_attrs = dict((a, getattr(original.f_code, a))
36 for a in code_attrs)
37 new_c_attrs['co_filename'] = filename
38 new_c = code(**new_c_attrs)
39
40 new_f_attrs = dict((a, getattr(original, a))
41 for a in frame_attrs)
42 new_f_attrs['f_code'] = new_c
43 new_f = frame(**new_f_attrs)
44 return new_f
45
46
47def getouterframes(frame, context=1):
48 """Get a list of records for a frame and all higher (calling) frames.
49
50 Each record contains a frame object, filename, line number, function
51 name, a list of lines of context, and index within the context."""
52 framelist = []
53 while frame:
54 if not frame.f_back:
55 # Replace the top of the stack with a fake entry
56 frame = _make_fake_frame(frame, 'wrong')
57 framelist.append((frame,) + getframeinfo(frame, context))
58 frame = frame.f_back
59 return framelist
60inspect.getouterframes = getouterframes
61
62
63def getframeinfo(frame, context=1):
64 """Get information about a frame or traceback object.
65
66 A tuple of five things is returned: the filename, the line number of
67 the current line, the function name, a list of lines of context from
68 the source code, and the index of the current line within that list.
69 The optional second argument specifies the number of lines of context
70 to return, which are centered around the current line."""
71 if inspect.istraceback(frame):
72 lineno = frame.tb_lineno
73 frame = frame.tb_frame
74 else:
75 lineno = frame.f_lineno
76 # if not isframe(frame):
77 # raise TypeError('{!r} is not a frame or traceback object'.format(frame))
78
79 filename = inspect.getsourcefile(frame) or inspect.getfile(frame)
80 if context > 0:
81 start = lineno - 1 - context // 2
82 try:
83 lines, lnum = inspect.findsource(frame)
84 except IOError:
85 lines = index = None
86 else:
87 start = max(start, 1)
88 start = max(0, min(start, len(lines) - context))
89 lines = lines[start:start + context]
90 index = lineno - 1 - start
91 else:
92 lines = index = None
93
94 return inspect.Traceback(filename, lineno, frame.f_code.co_name, lines, index)
95inspect.getframeinfo = getframeinfo
96
97
98def getfile(object):
99 """Work out which source or compiled file an object was defined in."""
100 if isinstance(object, frame):
101 return object.f_code.co_filename
102 return orig_getfile(object)
103orig_getfile = inspect.getfile
104inspect.getfile = getfile
105
106
107def show_app_name():
108 stack_data = inspect.stack()
109 #print [(s[1], s[0].__class__, s[0].f_code.co_name) for s in stack_data]
110 print 'From stack :', stack_data[-1][1]
111 print 'From code in frame:', stack_data[-1][0].f_code.co_filename
112 print 'From frame info :', inspect.getframeinfo(stack_data[-1][0]).filename
113
114
115if __name__ == '__main__':
116 show_app_name()
The customized versions of getframeinfo()
and getfile()
are still
required to avoid exceptions caused by the type checking.
$ python stack_modify4.py
From stack : wrong
From code in frame: wrong
From frame info : wrong
Enough of That
At this point I have proven to myself that while it is unlikely that
anyone would bother to do it in a real program (and they would
certainly not do it by accident) it is possible to intercept the
introspection calls and insert bogus information to mislead a program
trying to discover information about itself. This implementation does
not work to subvert pdb
, because it does not use inspect
. Probably
because it predates inspect
, pdb
has its own implementation of a
stack building function, which could be replaced using the same
technique as what was done above.
This investigation led me to several conclusions. First, I still don’t
know why the original code is looking at the stack to discover the
program name. I should ask on the OpenStack mailing list, but in the
mean time I had fun experimenting while researching the
question. Second, given that looking at __main__.__file__
produces a
value at least as correct as looking at the stack in all cases, and
more correct when a program is launched using the -m
flag, it seems
like the solution with best combination of reliability and
simplicity. A patch may be in order. And finally, monkey-patching can
drive you to excesses, madness, or both.
See also
- OpenStack Nova Source
- StackOverflow – In Python, how do I get the path and name of the file that is currently executing?
- sys module
- inspect module
- pdb module
- PyMOTW – sys
- PyMOTW – inspect
- PyMOTW – pdb
Updates
8 May – Updated styles around embedded source files and added a label for each.