PyMOTW: tempfile

Securely generate temporary files and directories with the tempfile module.

Module: tempfile
Purpose: Create temporary filesystem resources.
Python Version: Since 1.4 with major security revisions in 2.3

Description:

Many programs need to create files to write intermediate data. Creating files with unique names securely, so they cannot be guessed by someone wanting to break the application, is challenging. The tempfile module provides several functions for creating filesystem resources securely. TemporaryFile() opens and returns an un-named file, NamedTemporaryFile() opens and returns a named file, and mkdtemp() creates a temporary directory and returns its name.

TemporaryFile:

If your application needs a temporary file to store data, but does not need to share that file with other programs, the best option for creating the file is the TemporaryFile() function. It creates a file, and on platforms where it is possible, unlinks it immediately. This makes it impossible for another program to find or open the file, since there is no reference to it in the filesystem table. The file created by TemporaryFile() is removed automatically when it is closed.

import os
import tempfile

print 'Building a file name yourself:'
filename = '/tmp/guess_my_name.%s.txt' % os.getpid()
temp = open(filename, 'w+b')
try:
print 'temp:', temp
print 'temp.name:', temp.name
finally:
temp.close()
# Clean up the temporary file yourself
os.remove(filename)

print
print 'TemporaryFile:'
temp = tempfile.TemporaryFile()
try:
print 'temp:', temp
print 'temp.name:', temp.name
finally:
# Automatically cleans up the file
temp.close()


This example illustrates the difference in creating a temporary file using a common pattern for making up a name, versus using the TemporaryFile() function. Notice that the file returned by TemporaryFile has no name.


$ python tempfile_TemporaryFile.py
Building a file name yourself:
temp:
temp.name: /tmp/guess_my_name.7297.txt

TemporaryFile:
temp: ', mode 'w+b' at 0x5c410>
temp.name:


By default, the file handle is created with mode 'w+b' so it behaves consistently on all platforms and your program can write to it and read from it.

import os
import tempfile

temp = tempfile.TemporaryFile()
try:
temp.write('Some data')
temp.seek(0)

print temp.read()
finally:
temp.close()


After writing, you have to rewind the file handle using seek() in order to read the data back from it.


$ python tempfile_TemporaryFile_binary.py
Some data


If you want the file to work in text mode, pass mode='w+t' when you create it:

import tempfile

f = tempfile.TemporaryFile(mode='w+t')
try:
f.writelines(['first\n', 'second\n'])
f.seek(0)

for line in f:
print line.rstrip()
finally:
f.close()


The file handle treats the data as text:


$ python tempfile_TemporaryFile_text.py
first
second


NamedTemporaryFile:

There are situations, however, where having a named temporary file is important. If your application spans multiple processes, or even hosts, naming the file is the simplest way to pass it between parts of the application. The NamedTemporaryFile() function creates a file with a name, accessed from the name attribute.

import os
import tempfile

temp = tempfile.NamedTemporaryFile()
try:
print 'temp:', temp
print 'temp.name:', temp.name
finally:
# Automatically cleans up the file
temp.close()
print 'Exists after close:', os.path.exists(temp.name)


Even though the file is named, it is still removed after the handle is closed.


$ python tempfile_NamedTemporaryFile.py
temp: ', mode 'w+b' at 0x5c338>
temp.name: /var/folders/9R/9R1t+tR02Raxzk+F71Q50U+++Uw/-Tmp-/tmplBKZMv
Exists after close: False


mkdtemp:

If you need several temporary files, it may be more convenient to create a single temporary directory and then open all of the files in that directory. To create a temporary directory, use mkdtemp().

import os
import tempfile

directory_name = tempfile.mkdtemp()
print directory_name
# Clean up the directory yourself
os.removedirs(directory_name)


Since the directory is not “opened” per se, you have to remove it yourself when you are done with it.


$ python tempfile_mkdtemp.py
/var/folders/9R/9R1t+tR02Raxzk+F71Q50U+++Uw/-Tmp-/tmp0OsHPg



Predicting Names:

For debugging purposes, it is useful to be able to include some indication of the origin of the temporary files. While obviously less secure than strictly anonymous temporary files, including a predictable portion in the name lets you find the file to examine it while your program is using it. All of the functions described so far take three arguments to allow you to control the filenames to some degree. Names are generated using the formula:

dir + prefix + random + suffix


where all of the values except random can be passed as arguments to TemporaryFile(), NamedTemporaryFile(), and mkdtemp(). For example:

import tempfile

temp = tempfile.NamedTemporaryFile(suffix='_suffix',
prefix='prefix_',
dir='/tmp',
)
try:
print 'temp:', temp
print 'temp.name:', temp.name
finally:
temp.close()


The prefix and suffix arguments are combined with a random string of characters to build the file name, and the dir argument is taken as-is and used as the location of the new file.


$ python tempfile_NamedTemporaryFile_args.py
temp: ', mode 'w+b' at 0x5c338>
temp.name: /tmp/prefix_zy-7H3_suffix


Temporary File Location:

If you don’t specify an explicit destination using the dir argument, the actual path used for the temporary files will vary based on your platform and settings. The tempfile module includes 2 functions for querying the settings being used at runtime:

import tempfile

print 'gettempdir():', tempfile.gettempdir()
print 'gettempprefix():', tempfile.gettempprefix()


gettempdir() returns the default directory that will hold all of the temporary files and gettempprefix() returns the string prefix for new file and directory names.


$ python tempfile_settings.py
gettempdir(): /var/folders/9R/9R1t+tR02Raxzk+F71Q50U+++Uw/-Tmp-
gettempprefix(): tmp


The value returned by gettempdir() is set based on a straightforward algorithm of looking through a list of locations for the first place the current process can create a file. From the library documentation:

Python searches a standard list of directories and sets tempdir to the first one which the calling user can create files in. The list is:

1. The directory named by the TMPDIR environment variable.

2. The directory named by the TEMP environment variable.

3. The directory named by the TMP environment variable.

4. A platform-specific location:

* On RiscOS, the directory named by the Wimp$ScrapDir environment variable.

* On Windows, the directories C:$\backslash$TEMP, C:$\backslash$TMP, $\backslash$TEMP, and $\backslash$TMP, in that order.

* On all other platforms, the directories /tmp, /var/tmp, and /usr/tmp, in that order.

5. As a last resort, the current working directory.



If your program needs to use a global location for all temporary files that you need to set explicitly but do not want to set through one of these environment variables, you can set tempfile.tempdir directly.

import tempfile

tempfile.tempdir = '/I/changed/this/path'
print 'gettempdir():', tempfile.gettempdir()



$ python tempfile_tempdir.py
gettempdir(): /I/changed/this/path


References:

Python Module of the Week Home
Download Sample Code


Technorati Tags:
,


  • http://www.blogger.com/profile/12229578427522022392 Catherine

    Doug, thanks SO much for including this part:

    “After writing, you have to rewind the file handle using seek() in order to read the data back from it.”

    I was absolutely dying in frustration, Googling for terms like “can’t read from Python tempfile”.

  • http://www.blogger.com/profile/01892352754222143463 Doug Hellmann

    Catherine, forgetting to seek() bit me once, too. It makes sense if you think about how a file handle works, but when you’re in the thick of a problem sometimes those little details don’t bubble up.

  • http://www.blogger.com/profile/03176073782462742187 Tom

    Thanks Doug,
    the way you have explained, helped me in understanding the usage of the module.