Finding the name of the program from which a Python module is
running can be trickier than it would seem at first, and
investigating the reasons led to some interesting experiments.
A couple of weeks ago at the OpenStack Folsom Summit, Mark McClain
pointed out an interesting code snippet he had discovered in the Nova
sources:
nova/utils.py: 339
script_dir = os.path.dirname(inspect.stack()[-1][1])
The code is part of the logic to find a configuration file that lives
in a directory relative to where the application startup script is
located. It looks at the call stack to find the main program, and
picks the filename out of the stack details.
The code seems to be taken from a response to a StackOverflow
question, and when I saw it I thought it looked like a case of
someone going to more trouble than was needed to get the
information. Mark had a similar reaction, and asked if I knew of a
simpler way to determine the program name.
I thought it looked like a case of someone going to more trouble
than was needed…
Similar examples with inspect.stack() appear in four places in
the Nova source code (at last as-of today). All of them are either
building filenames relative to the location of the original “main”
program, or are returning the name of that program to be used to build
a path to another file (such as a log file or other program). Those
are all good reasons to be careful about the location and name of the
main program, but none explain why the obvious solution isn’t good
enough. I assumed that if the OpenStack developers were looking at
stack frames there must have been a reason. I decided to examine the
original code and spend a little time deciphering what it is doing,
and especially to see if there were cases where it did not work as
desired (so I could justify a patch).
The Stack
The call to inspect.stack() retrieves the Python interpreter
stack frames for the current thread. The return value is a list with
information about the calling function in position 0 and the “top”
of the stack at the end of the list. Each item in the list is a tuple
containing:
- the stack frame data structure
- the filename for the code being run in that frame
- the line number within the file
- the co_name member of the code object from the frame,
giving the function or method name being executed
- the source lines for that bit of code, when available
- an index into the list of source lines showing the actual source
line for the frame
show_stack.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
import inspect
def show_stack():
stack = inspect.stack()
for s in stack:
print 'filename:', s[1]
print 'line :', s[2]
print 'co_name :', s[3]
print
def inner():
show_stack()
def outer():
inner()
if __name__ == '__main__':
outer()
|
The information is intended to be used for generating tracebacks or by
tools like pdb when debugging an application (although
pdb has its own implementation). To answer the question “Which
program am I running in?” the filename is most the interesting piece
of data.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
$ python show_stack.py
filename: show_stack.py
line : 6
co_name : show_stack
filename: show_stack.py
line : 15
co_name : inner
filename: show_stack.py
line : 19
co_name : outer
filename: show_stack.py
line : 23
co_name : <module>
|
One obvious issue with these results is that the filename in the stack
frame is relative to the start up directory of the application. It
could lead to an incorrect path if the process has changed its working
directory between startup and checking the stack. But there is
another mode where looking at the top of the stack produces completely
invalid results.
The simple one-liner is not always going to produce the right
results.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
$ python -m show_stack
filename: /Users/dhellmann/.../show_stack.py
line : 6
co_name : show_stack
filename: /Users/dhellmann/.../show_stack.py
line : 15
co_name : inner
filename: /Users/dhellmann/.../show_stack.py
line : 19
co_name : outer
filename: /Users/dhellmann/.../show_stack.py
line : 23
co_name : <module>
filename: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py
line : 72
co_name : _run_code
filename: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py
line : 162
co_name : _run_module_as_main
|
The -m option to the interpreter triggers the runpy module,
which takes the module name specified and executes it like a main
program. As the stack printout above illustrates, runpy is then at
the top of the stack, so the “main” part of our local module is
several levels down from the top. That means the simple one-liner is
not always going to produce the right results.
Why the Obvious Solution Fails
Now that I knew there were ways to get the wrong results by looking at
the stack, the next question was whether there was another way to find
the program name that was simpler, more efficient, and especially more
correct. The simplest solution is to look at the command line
arguments passed through sys.argv.
argv.py
|
import sys
print sys.argv[0]
|
Normally, the first element in sys.argv is the script that was run
as the main program. The value always points to the same file,
although the method of invoking it may cause the value to fluctuate
between a relative and full path.
|
$ python argv.py
argv.py
$ ./argv.py
./argv.py
$ python -m argv
/Users/dhellmann/.../argv.py
|
As this example demonstrates, when a script is run directly or passed
as an argument to the interpreter, sys.argv contains a relative
path to the script file. Using -m we see the full path, so
looking at the command line arguments is more robust for that
case. However, we cannot depend on -m being used so we aren’t
guaranteed to get the extra details.
Using import
The next alternative I considered was probing the main program module
myself. Every module has a special property, __file__, which holds
the path to the file from which the module was loaded. To access the
main program module from within Python, you import a specially named
module __main__. To test this method, I created a main program
that loads another module:
import_main_app.py
|
import import_main_module
import_main_module.main()
|
And the second module imports __main__ and print the file it was
loaded from.
import_main.py
|
import __main__
print __main__.__file__
|
Looking at the __main__ module always pointed to the actual main
program module, but it did not always produce a full path. This makes
sense, because the filename for a module that goes into the stack
frame comes from the module itself.
|
$ python import_main.py
import_main.py
$ python -m import_main
/Users/dhellmann/.../import_main.py
|
Wandering Down the Garden Path
After I found such a simple way to reliably retrieve the program name,
I spent a while thinking about the motivation of the person who
decided that looking at stack frames was the best solution. I came up
with two hypotheses. First, it is entirely possible that they did not
know about importing __main__. It isn’t the sort of thing one
needs to do very often, and I don’t even remember where I learned
about doing it (or why, because I’m pretty sure I’ve never used the
feature in production code for any reason). That seems like the most
plausible reason, but the other idea I had was that for some reason it
was very important to have a relatively tamper-proof value –
something that could not be overwritten accidentally. This new idea
merited further investigation, so I worked back through the methods of
accessing the program name to determine which, if any, met the new
criteria.
I did not need to experiment with sys.argv to know it was
mutable. The arguments are saved in a normal list object, and
can be modified quite easily, as demonstrated here.
argv_modify.py
|
import sys
print 'Type :', type(sys.argv)
print 'Before:', sys.argv
sys.argv[0] = 'wrong'
print 'After :', sys.argv
|
All normal list operations are supported, so replacing the program
name is a simple assignment statement. Because sys.argv is a list,
it is also susceptible to having values removed by pop(),
remove(), or a slice assignment gone awry.
$ python argv_modify.py
Type : <type 'list'>
Before: ['argv_modify.py']
After : ['wrong']
The __file__ attribute of a module is a string, which is not
itself mutable, but the contents can be replaced by assigning a new
value to the attribute.
import_modify.py
|
import __main__
print 'Before:', __main__.__file__
__main__.__file__ = 'wrong'
print 'After :', __main__.__file__
|
This is less likely to happen by accident, so it seems somewhat
safer. Nonetheless, changing it is easy.
$ python import_modify.py
Before: import_modify.py
After : wrong
That leaves the stack frame.
Down the Rabbit Hole
As described above, the return value of inspect.stack() is a
list of tuples. The list is computed each time the function is called,
so it was unlikely that one part of a program would accidentally
modify it. The key word there is accidentally, but even a malicious
program would have to go to a bit of effort to return fake stack data.
stack_modify1.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
import inspect
def faux_stack():
full_stack = orig_stack()
top = full_stack[-1]
top = (top[0], 'wrong') + top[2:]
full_stack[-1] = top
return full_stack
orig_stack = inspect.stack
inspect.stack = faux_stack
stack_data = inspect.stack()
script_name = stack_data[-1][1]
print 'From stack:', script_name
print 'From frame:', stack_data[-1][0].f_code.co_filename
|
The filename actually appears in two places in the data returned by
inspect.stack(). The first location is in the tuple that is part
of the list returned as the stack itself. The second is in the code
object of the stack frame within that same tuple
(frame.f_code.co_filename).
$ python stack_modify1.py
From stack: wrong
From frame: stack_modify1.py
It turned out to be more challenging to change the code object.
Replacing the filename in the tuple was relatively easy, and would be
sufficient for code that trusted the stack contents returned by
inspect.stack(). It turned out to be more challenging to change
the code object. For C Python, the code class is implemented
in C as part of the set of objects used internally by the interpreter.
Objects/codeobject.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
static PyMemberDef code_memberlist[] = {
{"co_argcount", T_INT, OFF(co_argcount), READONLY},
{"co_nlocals", T_INT, OFF(co_nlocals), READONLY},
{"co_stacksize",T_INT, OFF(co_stacksize), READONLY},
{"co_flags", T_INT, OFF(co_flags), READONLY},
{"co_code", T_OBJECT, OFF(co_code), READONLY},
{"co_consts", T_OBJECT, OFF(co_consts), READONLY},
{"co_names", T_OBJECT, OFF(co_names), READONLY},
{"co_varnames", T_OBJECT, OFF(co_varnames), READONLY},
{"co_freevars", T_OBJECT, OFF(co_freevars), READONLY},
{"co_cellvars", T_OBJECT, OFF(co_cellvars), READONLY},
{"co_filename", T_OBJECT, OFF(co_filename), READONLY},
{"co_name", T_OBJECT, OFF(co_name), READONLY},
{"co_firstlineno", T_INT, OFF(co_firstlineno), READONLY},
{"co_lnotab", T_OBJECT, OFF(co_lnotab), READONLY},
{NULL} /* Sentinel */
};
|
The data members of a code object are all defined as READONLY,
which means you cannot modify them from within Python code directly.
code_modify_fail.py
|
import inspect
stack_data = inspect.stack()
frame = stack_data[0][0]
code = frame.f_code
code.co_filename = 'wrong'
|
Attempting to change a read-only property causes a TypeError.
|
$ python code_modify_fail.py
Traceback (most recent call last):
File "code_modify_fail.py", line 8, in <module>
code.co_filename = 'wrong'
TypeError: readonly attribute
|
Instead of changing the code object itself, I would have to replace it
with another object. The reference to the code object is accessed
through the frame object, so in order to insert my code object into
the stack frame I would need to modify the frame. Frame objects are
also immutable, however, so that meant creating a fake frame to
replace the original value. Unfortunately, it is not possible to
instantiate code or frame objects from within
Python, so I ended up having to create classes to mimic the originals.
stack_modify2.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
|
import collections
import inspect
# Define a namedtuple with the attributes of a stack frame
frame_attrs = ['f_back',
'f_code',
'f_builtins',
'f_globals',
'f_lasti',
'f_lineno',
]
frame = collections.namedtuple('frame', ' '.join(frame_attrs))
# Define a namedtuple with the attributes of a code object
code_attrs = ['co_argcount',
'co_nlocals',
'co_stacksize',
'co_flags',
'co_code',
'co_consts',
'co_names',
'co_varnames',
'co_freevars',
'co_cellvars',
'co_filename',
'co_name',
'co_firstlineno',
'co_lnotab',
]
code = collections.namedtuple('code', ' '.join(code_attrs))
def _make_fake_frame(original, filename):
"""Return a new fake frame object with the wrong filename."""
new_c_attrs = dict((a, getattr(original.f_code, a))
for a in code_attrs)
new_c_attrs['co_filename'] = filename
new_c = code(**new_c_attrs)
new_f_attrs = dict((a, getattr(original, a))
for a in frame_attrs)
new_f_attrs['f_code'] = new_c
new_f = frame(**new_f_attrs)
return new_f
def faux_stack():
full_stack = orig_stack()
top = full_stack[-1]
new_frame = _make_fake_frame(top[0], 'wrong')
# Replace the top of the stack
top = (new_frame, 'wrong') + top[2:]
full_stack[-1] = top
return full_stack
orig_stack = inspect.stack
inspect.stack = faux_stack
def show_app_name():
stack_data = inspect.stack()
script_name = stack_data[-1][1]
print 'From stack:', script_name
print 'From frame:', stack_data[-1][0].f_code.co_filename
if __name__ == '__main__':
show_app_name()
|
I stole the idea of using namedtuple as a convenient way to
have a class with named attributes but no real methods from
inspect, which uses it to define a Traceback class.
$ python stack_modify2.py
From stack: wrong
From frame: wrong
Replacing the frame and code objects worked well for accessing the
“code” object directly, but failed when I tried to use
inspect.getframeinfo() because there is an explicit type check
with a TypeError near the beginning of getframeinfo()
(see line 16 below).
http://hg.python.org/cpython/file/35ef949e85d7/Lib/inspect.py#l987
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
|
def getframeinfo(frame, context=1):
"""Get information about a frame or traceback object.
A tuple of five things is returned: the filename, the line number of
the current line, the function name, a list of lines of context from
the source code, and the index of the current line within that list.
The optional second argument specifies the number of lines of context
to return, which are centered around the current line."""
if istraceback(frame):
lineno = frame.tb_lineno
frame = frame.tb_frame
else:
lineno = frame.f_lineno
if not isframe(frame):
raise TypeError('{!r} is not a frame or traceback object'.format(frame))
filename = getsourcefile(frame) or getfile(frame)
if context > 0:
start = lineno - 1 - context//2
try:
lines, lnum = findsource(frame)
except IOError:
lines = index = None
else:
start = max(start, 1)
start = max(0, min(start, len(lines) - context))
lines = lines[start:start+context]
index = lineno - 1 - start
else:
lines = index = None
return Traceback(filename, lineno, frame.f_code.co_name, lines, index)
|
The solution was to replace getframeinfo() with a version that
skips the check. Unfortunately, getframeinfo() uses
getfile(), which performs a similar check, so that function
needed to be replaced, too.
stack_modify3.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
|
import inspect
from stack_modify2 import show_app_name
def getframeinfo(frame, context=1):
"""Get information about a frame or traceback object.
A tuple of five things is returned: the filename, the line number of
the current line, the function name, a list of lines of context from
the source code, and the index of the current line within that list.
The optional second argument specifies the number of lines of context
to return, which are centered around the current line."""
if inspect.istraceback(frame):
lineno = frame.tb_lineno
frame = frame.tb_frame
else:
lineno = frame.f_lineno
# if not isframe(frame):
# raise TypeError('{!r} is not a frame or traceback object'.format(frame))
filename = inspect.getsourcefile(frame) or inspect.getfile(frame)
if context > 0:
start = lineno - 1 - context//2
try:
lines, lnum = inspect.findsource(frame)
except IOError:
lines = index = None
else:
start = max(start, 1)
start = max(0, min(start, len(lines) - context))
lines = lines[start:start+context]
index = lineno - 1 - start
else:
lines = index = None
return inspect.Traceback(filename, lineno, frame.f_code.co_name, lines, index)
inspect.getframeinfo = getframeinfo
def getfile(object):
"""Work out which source or compiled file an object was defined in."""
if hasattr(object, 'f_code'):
return object.f_code.co_filename
return orig_getfile(object)
orig_getfile = inspect.getfile
inspect.getfile = getfile
if __name__ == '__main__':
show_app_name()
s = inspect.stack()
print inspect.getframeinfo(s[-1][0])
|
Now the caller can use inspect.getframeinfo() (really my
replacement function) and see the modified filename in the return
value.
|
$ python stack_modify3.py
From stack: wrong
From frame: wrong
Traceback(filename='wrong', lineno=51, function='<module>',
code_context=None, index=None)
|
After reviewing inspect.py one more time to see if I needed to
replace any other functions, I realized that a better solution was
possible. The implementation of inspect.stack() is very small,
since it calls inspect.getouterframes() to actually build the
list of frames. The seed frame passed to getouterframes() comes
from sys._getframe().
The rest of the stack is derived from the first frame returned by
_getframe() using the f_back attribute to link from one
frame to the next.
http://hg.python.org/cpython/file/35ef949e85d7/Lib/inspect.py#l1025
|
def getouterframes(frame, context=1):
"""Get a list of records for a frame and all higher (calling) frames.
Each record contains a frame object, filename, line number, function
name, a list of lines of context, and index within the context."""
framelist = []
while frame:
framelist.append((frame,) + getframeinfo(frame, context))
frame = frame.f_back
return framelist
|
If I modified getouterframes() instead of inspect.stack(),
then I could ensure that my fake frame information was inserted at the
beginning of the stack, and all of the rest of the inspect
functions would honor it.
stack_modify4.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
|
import collections
import inspect
# Define a namedtuple with the attributes of a stack frame
frame_attrs = ['f_back',
'f_code',
'f_builtins',
'f_globals',
'f_lasti',
'f_lineno',
]
frame = collections.namedtuple('frame', ' '.join(frame_attrs))
# Define a namedtuple with the attributes of a code object
code_attrs = ['co_argcount',
'co_nlocals',
'co_stacksize',
'co_flags',
'co_code',
'co_consts',
'co_names',
'co_varnames',
'co_freevars',
'co_cellvars',
'co_filename',
'co_name',
'co_firstlineno',
'co_lnotab',
]
code = collections.namedtuple('code', ' '.join(code_attrs))
def _make_fake_frame(original, filename):
"""Return a new fake frame object with the wrong filename."""
new_c_attrs = dict((a, getattr(original.f_code, a))
for a in code_attrs)
new_c_attrs['co_filename'] = filename
new_c = code(**new_c_attrs)
new_f_attrs = dict((a, getattr(original, a))
for a in frame_attrs)
new_f_attrs['f_code'] = new_c
new_f = frame(**new_f_attrs)
return new_f
def getouterframes(frame, context=1):
"""Get a list of records for a frame and all higher (calling) frames.
Each record contains a frame object, filename, line number, function
name, a list of lines of context, and index within the context."""
framelist = []
while frame:
if not frame.f_back:
# Replace the top of the stack with a fake entry
frame = _make_fake_frame(frame, 'wrong')
framelist.append((frame,) + getframeinfo(frame, context))
frame = frame.f_back
return framelist
inspect.getouterframes = getouterframes
def getframeinfo(frame, context=1):
"""Get information about a frame or traceback object.
A tuple of five things is returned: the filename, the line number of
the current line, the function name, a list of lines of context from
the source code, and the index of the current line within that list.
The optional second argument specifies the number of lines of context
to return, which are centered around the current line."""
if inspect.istraceback(frame):
lineno = frame.tb_lineno
frame = frame.tb_frame
else:
lineno = frame.f_lineno
# if not isframe(frame):
# raise TypeError('{!r} is not a frame or traceback object'.format(frame))
filename = inspect.getsourcefile(frame) or inspect.getfile(frame)
if context > 0:
start = lineno - 1 - context // 2
try:
lines, lnum = inspect.findsource(frame)
except IOError:
lines = index = None
else:
start = max(start, 1)
start = max(0, min(start, len(lines) - context))
lines = lines[start:start + context]
index = lineno - 1 - start
else:
lines = index = None
return inspect.Traceback(filename, lineno, frame.f_code.co_name, lines, index)
inspect.getframeinfo = getframeinfo
def getfile(object):
"""Work out which source or compiled file an object was defined in."""
if isinstance(object, frame):
return object.f_code.co_filename
return orig_getfile(object)
orig_getfile = inspect.getfile
inspect.getfile = getfile
def show_app_name():
stack_data = inspect.stack()
#print [(s[1], s[0].__class__, s[0].f_code.co_name) for s in stack_data]
print 'From stack :', stack_data[-1][1]
print 'From code in frame:', stack_data[-1][0].f_code.co_filename
print 'From frame info :', inspect.getframeinfo(stack_data[-1][0]).filename
if __name__ == '__main__':
show_app_name()
|
The customized versions of getframeinfo() and getfile()
are still required to avoid exceptions caused by the type checking.
$ python stack_modify4.py
From stack : wrong
From code in frame: wrong
From frame info : wrong
Enough of That
At this point I have proven to myself that while it is unlikely that
anyone would bother to do it in a real program (and they would
certainly not do it by accident) it is possible to intercept the
introspection calls and insert bogus information to mislead a program
trying to discover information about itself. This implementation does
not work to subvert pdb, because it does not use
inspect. Probably because it predates inspect,
pdb has its own implementation of a stack building function,
which could be replaced using the same technique as what was done
above.
This investigation led me to several conclusions. First, I still don’t
know why the original code is looking at the stack to discover the
program name. I should ask on the OpenStack mailing list, but in the
mean time I had fun experimenting while researching the question.
Second, given that looking at __main__.__file__ produces a value
at least as correct as looking at the stack in all cases, and more
correct when a program is launched using the -m flag, it seems
like the solution with best combination of reliability and
simplicity. A patch may be in order. And finally, monkey-patching can
drive you to excesses, madness, or both.
Updates
8 May – Updated styles around embedded source files and added a label
for each.