Memory usage#

New in version 0.0.18: execute_notebook

With ploomber-engine you can profile Jupyter notebook’s memory usage. Unlike papermill, which isn’t capable of doing it.

Install requirements:

%pip install ploomber-engine psutil matplotlib --quiet
Note: you may need to restart the kernel to use updated packages.

Example#

Import the execute_notebook function:

from ploomber_engine import execute_notebook

We’ll now programmatically create a sample notebook and stored it in notebook.ipynb. Note that it creates a 1MB numpy array on cell 3 and one 10MB numpy array on cell 5.

import nbformat

nb = nbformat.v4.new_notebook()
sleep = "time.sleep(0.5)"
cells = [
    # cell 1
    "import numpy as np; import time",
    # cell 2
    sleep,
    # cell 3
    "x = np.ones(131072, dtype='float64')",
    # cell 4
    sleep,
    # cell 5
    "y = np.ones(131072*10, dtype='float64')",
    # cell 6
    sleep,
]

nb.cells = [nbformat.v4.new_code_cell(cell) for cell in cells]

nbformat.write(nb, "notebook.ipynb")

Let’s execute the notebook with profile_memory=True

_ = execute_notebook("notebook.ipynb", "output.ipynb", profile_memory=True)
  0%|                                                     | 0/6 [00:00<?, ?it/s]
Executing cell: 1:   0%|                                  | 0/6 [00:00<?, ?it/s]
Executing cell: 2:   0%|                                  | 0/6 [00:00<?, ?it/s]
Executing cell: 2:  33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                 | 2/6 [00:00<00:01,  3.93it/s]
Executing cell: 3:  33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                 | 2/6 [00:00<00:01,  3.93it/s]
Executing cell: 4:  33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                 | 2/6 [00:00<00:01,  3.93it/s]
Executing cell: 4:  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž        | 4/6 [00:01<00:00,  3.93it/s]
Executing cell: 5:  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž        | 4/6 [00:01<00:00,  3.93it/s]
Executing cell: 6:  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž        | 4/6 [00:01<00:00,  3.93it/s]
Executing cell: 6: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6/6 [00:01<00:00,  3.92it/s]
Executing cell: 6: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6/6 [00:01<00:00,  3.92it/s]

../../_images/145229b3471d4645b00cf2f1442fd5f8b299f33b7dd73cc21b72fd1c79bb19ce.png

We can see that after running cells 1-2, there isn’t any important increment in memory usage. However, when finishing execution of cell 3, we see a bump of 1MB, since we allocated the array there. Cell 4 doesn’t increase memory usage, since it only contains a call to time.sleep, but cell 5 has a 10MB bump since we allocated the second (larger) array.

If you want to look at the executed notebook, it’s available at output.ipynb.

Customizing the plot#

You might customize the plot by calling the plot_memory_usage function and passing the output notebook, the returned object is a matplotlib.Axes.

%%capture

from ploomber_engine.profiling import plot_memory_usage

nb = execute_notebook("notebook.ipynb", "output.ipynb", profile_memory=True)
ax = plot_memory_usage(nb)
_ = ax.set_title("My custom title")
../../_images/7d2be5fc989c4ae644a58327c146afacea626cb2f14d47551091448c7d3b4977.png

Saving profiling data#

You can save the profiling data by setting save_profiling_data=True.

%%capture
_ = execute_notebook(
    "notebook.ipynb", "output.ipynb",
    profile_memory=True, save_profiling_data=True
)
import pandas as pd
pd.read_csv("output-profiling-data.csv")
cell runtime memory
0 1 0.002503 113.183594
1 2 0.503049 113.183594
2 3 0.002475 113.183594
3 4 0.503107 113.183594
4 5 0.006068 123.187500
5 6 0.502892 123.191406