Python Optimisation and Memory Management

Repository: GitHub

Description

This tutorial aims to give a brief introduction to some tools that can be used to profile the CPU and memory usage of Python codes, along with some tips to improve performance.

Contents
Set Up
Deterministic Profiling
Memory Profiling
Optimisation
Further Reading

Set Up

The easiest way to run all of the examples provided in this tutorial is to build the Conda environment.

$ conda env create -f environment.yml

Alternatively, you can simply install the requirements.

$ pip install -r requirements.txt

If you choose the latter, you will also need to install Graphviz, which can done with conda, apt or brew.

Deterministic Profiling

Before attempting to optimise your code (i.e. make it run faster) it is essential to profile it. It can take a lot of time to improve a code and this effort can easily be wasted if you optimise the wrong parts. Profiling, in particular deterministic profiling, enables you to identify potential bottlenecks in your code so that your can better focus any optimisation measures.

Deterministic profiling is meant to reflect the fact that all function call, function return, and exception events are monitored, and precise timings are made for the intervals between these events (during which time the user’s code is executing). - source

cProfile

cProfile is a built-in Python profiler. The commands can be written directly into modules/scripts to track the number of calls to functions and the time spent on each call. Alternatively, cProfile can be passed as an option to the python command when executing a script. For this tutorial, we will use the latter option.

We will start by looking at the sleeper.py script. This script contains a function (sleep_for_1s) that calls Python’s built-in sleep method for one second. This will make it easy for us to assess the time spent on each call to this function. The other two functions in the script (function1, function2) simply make calls to the sleep_for_1s function.

Running the script on its own produces no output, however if we time the process we can see that it takes six seconds in total.

$ time python examples/sleeper.py
python sleeper.py  0.02s user 0.01s system 0% cpu 6.045 total

cProfile will allow us to see how many calls were made to each function in the script and how long each call took.

$ python -m cProfile examples/sleeper.py

The output should look something like this.

function calls in 6.009 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:1009(_handle_fromlist)
  0.000    0.000    6.009    6.009 sleeper.py:1(<module>)
  0.000    0.000    5.008    5.008 sleeper.py:11(function1)
  0.000    0.000    1.000    1.000 sleeper.py:27(function2)
  0.000    0.000    6.009    6.009 sleeper.py:37(main)
  0.000    0.000    6.009    1.001 sleeper.py:4(sleep_for_1s)
  0.000    0.000    6.009    6.009 {built-in method builtins.exec}
  0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
  6.009    1.001    6.009    1.001 {built-in method time.sleep}
  0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

We can also sort this output by e.g. the number of calls to a given function.

$ python -m cProfile -s calls examples/sleeper.py

Which gives.

function calls in 6.016 seconds

Ordered by: call count

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  6.016    1.003    6.016    1.003 {built-in method time.sleep}
  0.000    0.000    6.016    1.003 sleeper.py:4(sleep_for_1s)
  0.000    0.000    6.016    6.016 {built-in method builtins.exec}
  0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  0.000    0.000    5.016    5.016 sleeper.py:11(function1)
  0.000    0.000    1.000    1.000 sleeper.py:27(function2)
  0.000    0.000    6.016    6.016 sleeper.py:37(main)
  0.000    0.000    6.016    6.016 sleeper.py:1(<module>)

Where we can see that sleep_for_1s was called a total of six times for a cumulative time of six seconds. function1 was called once for a cumulative time of five seconds. If we look at the code we can indeed see that function1 calls sleep_for_1s five times. Finally, function2 was called once for a cumulative time of one second. If we add the time of the these two function calls we get the total runtime of the script, as expected.

gprof2dot

gprof2dot is a Python package for converting cProfile outputs into a dot graph. These graphs will make it easier for us to visualise the profiling information.

First we tell cProfile to output to a file (e.g. sleeper.pstats).

$ python -m cProfile -o sleeper.pstats examples/sleeper.py

Then we can run gprof2dot on this output.

$ gprof2dot -f pstats sleeper.pstats | dot -Tpng -o sleeper.png

Which produces the following image.

Here we can more easily see the hierarchy and the number of calls to each function.

SnakeViz

SnakeViz is a browser based graphical viewer for cProfile output.

As before we need a cProfile output file.

$ python -m cProfile -o sleeper.pstats examples/sleeper.py

Then we can run SnakeViz.

$ snakeviz sleeper.pstats

This will open a browser window where you can navigate and search for function calls.

This is particularly useful for more complicated scripts such as complicated.py that use a lot of built-in functions behind the scenes.

$ python -m cProfile -o complicated.pstats examples/complicated.py
$ snakeviz complicated.pstats

pyinstrument

Finally, if you prefer an alternative to cProfile, various other tools also exist such as pyinstrument.

To run simply replace calls to python with pyinstrument.

$ pyinstrument examples/sleeper.py

Which will produce something like the following.

Program: examples/sleeper.py

6.011 <module>  sleeper.py:1
└─ 6.011 main  sleeper.py:52
   ├─ 5.009 function1  sleeper.py:26
   │  └─ 5.009 sleep_for_1s  sleeper.py:19
   └─ 1.001 function2  sleeper.py:42
      └─ 1.001 sleep_for_1s  sleeper.py:19

pyinstrument also allows for more interactive profiling by rendering the output as HTML.

$ pyinstrument -r html examples/sleeper.py

Memory Profiling

The runtime is not the only consideration you should have when aiming to optimise your code. Inefficient memory management can be even more problematic as you can run into hardware limitations. As before, you should identify the greedy parts of your code before attempting to reduce the memory usage.

Memory Profiler

Memory Profiler is a package for monitoring the memory consumption of a Python process.

Similarly to cProfile, Memory Profiler can be run at the function level or on a whole script. To profile a single function, simply add a @profile decorator as done for the memory_eater function in the memory.py script the execute the script as follows.

$ python -m memory_profiler examples/memory.py

Which will provide something like the following.

Line #    Mem usage    Increment   Line Contents
================================================
    21   11.879 MiB   11.879 MiB   @profile
    22                             def memory_eater():
    23                                 """Memory Eater
    24
    25                                 This function creates a list and increases the memory consumption several
    26                                 times before deleting the object.
    27
    28                                 """
    29
    30   11.879 MiB    0.000 MiB       big_list = [2]
    31   11.879 MiB    0.000 MiB       sleep(1)
    32
    33   88.176 MiB   76.297 MiB       big_list *= (10 ** 7)
    34   88.176 MiB    0.000 MiB       sleep(1)
    35
    36  164.469 MiB   76.293 MiB       big_list *= 2
    37  164.469 MiB    0.000 MiB       sleep(1)
    38
    39  317.055 MiB  152.586 MiB       big_list *= 2
    40  317.055 MiB    0.000 MiB       sleep(1)
    41
    42   11.879 MiB    0.000 MiB       del big_list

Where we can see total memory used and how much memory is added by each python object.

To profile the whole script use mprof as follows.

$ mprof run examples/memory.py

This will create a temporary file (mprofile_<YYYYMMDDhhmmss>.dat). You can visualise the content of this file by running.

mprof plot

Which produces the following type of plot.

Here we can clearly see that memory_eater function increases the memory consumption over time.

Optimisation

Once you have profiled your code and identified the bottlenecks you can start working on ways to make it run more efficiently. Sadly, there is no magic method to make everything run faster. You will have to deal with things on a case by case basis and find the most appropriate way to optimise.

In the following subsections I will provide some tips and briefly introduce some tools to help you get started.

Efficient Implementation

The first thing to look at, before using an special plugins or tricks, is how efficiently you have implemented your code. Look at your code and ask yourself:

Is each function being called the correct number of times?
Can any quantities be pre-computed to save time?
Are any calculations being performed unnecessarily?

A very simplistic example of this is presented in efficient.py. In this script two different approaches are shown for obtaining the same final quantities. In the inefficient implementation unnecessary calculations are made, while in the efficient implementation only the operations needed at a given time are performed. Try profiling this script to compare the two implementations.

Pythonic Coding

One of the simplest ways to optimise your code is to take advantage of native Python data structures such as list comprehensions , generators, etc.

For example, identifying loops that can be replaced by more efficient list comprehensions can shave valuable seconds off your code. Have a look at the script list_comp_vs_loop.py. The objective is to produce a list of cubed values from zero to n, however in one function this is accomplished using a standard loop, while the other implements a list comprehension. Try profiling both of this script to identify which implementation is faster.

See Pythonic Thinking tutorial for more examples.

Numba

There are various packages specifically designed to speed up calculations in Python. Numba is one such package that works particularly well in conjunction with Numpy.

Numba is a compiler for Python array and numerical functions that gives you the power to speed up your applications with high performance functions written directly in Python. - source

Numba works by compiling the first call to a given function meaning that subsequent calls do not require interpretation and are thus executed much faster.

In the script numba_vs_numpy.py we compare two functions to calculate tanh of the diagonal elements of a matrix, one implemented with Numba and the other without. Try profiling this script to compare the performance of the two implementations. What happens if you reduce the number of calls to each function?

Memory Mapping

Reducing the memory consumption of your code can be a challenging problem. There are, however, some tricks such as memory mapping that can have a big impact.

The memory_map.py script demonstrates how loading a Numpy binary file as a memory map can significantly reduce the amount memory used. Try profiling this script to see how much of an impact this makes.