Testing existing outputs#
Note
This is an experimental feature, please share your feedback on Slack!. This feature requires ploomber-engine 0.0.16
or higher
With ploomber-engine
, you can re-run your notebooks and ensure that their outputs still match.
Example (test passes)#
Let’s create a simple notebook that prints a few numbers:
import nbformat
nb = nbformat.v4.new_notebook()
cells = [
"print(1)",
"print(2)",
]
nb.cells = [nbformat.v4.new_code_cell(cell) for cell in cells]
nbformat.write(nb, "notebook.ipynb")
Let’s run the notebook, and re-write the original file:
from ploomber_engine.ipython import PloomberClient
client = PloomberClient.from_path("notebook.ipynb")
out = client.execute()
nbformat.write(out, "notebook.ipynb")
0%| | 0/2 [00:00<?, ?it/s]
Executing cell: 1: 0%| | 0/2 [00:00<?, ?it/s]
Executing cell: 2: 0%| | 0/2 [00:00<?, ?it/s]
Executing cell: 2: 100%|█████████████████████████| 2/2 [00:00<00:00, 502.67it/s]
Run the function to test the notebook (it won’t raise any errors since the notebook will produce the same outputs):
from ploomber_engine.testing import test_notebook
test_notebook("notebook.ipynb")
0%| | 0/2 [00:00<?, ?it/s]
Executing cell: 1: 0%| | 0/2 [00:00<?, ?it/s]
Executing cell: 2: 0%| | 0/2 [00:00<?, ?it/s]
Executing cell: 2: 100%|█████████████████████████| 2/2 [00:00<00:00, 534.34it/s]
Test failure: output mismatch#
Let’s load the notebook and modify the source code, but keep the outputs the same:
nb = nbformat.read("notebook.ipynb", as_version=nbformat.NO_CONVERT)
# this was previously: print(1)
nb.cells[0].source = "print(100)"
# store the notebook
nbformat.write(nb, "notebook.ipynb")
test_notebook("notebook.ipynb")
0%| | 0/2 [00:00<?, ?it/s]
Executing cell: 1: 0%| | 0/2 [00:00<?, ?it/s]
Executing cell: 2: 0%| | 0/2 [00:00<?, ?it/s]
Executing cell: 2: 100%|█████████████████████████| 2/2 [00:00<00:00, 529.32it/s]
---------------------------------------------------------------------------
NotebookTestException Traceback (most recent call last)
Cell In[6], line 1
----> 1 test_notebook("notebook.ipynb")
File ~/checkouts/readthedocs.org/user_builds/ploomber-engine/checkouts/latest/src/ploomber_engine/testing.py:74, in test_notebook(path_to_nb)
68 if len_expected != len_actual:
69 raise NotebookTestException(
70 f"Error in cell {idx}: Expected number of "
71 f"cell outputs ({len_expected}), actual ({len_actual})"
72 )
---> 74 _compare_outputs(idx, expected, actual)
File ~/checkouts/readthedocs.org/user_builds/ploomber-engine/checkouts/latest/src/ploomber_engine/testing.py:45, in _compare_outputs(idx, out_ref, out_actual)
43 for ref, actual in zip(out_ref, out_actual):
44 if ref != actual:
---> 45 raise NotebookTestException(
46 f"Error in cell {idx}: Expected output ({ref}), actual ({actual})"
47 )
NotebookTestException: Error in cell 1: Expected output (1), actual (100)
Test failure: different num of outputs#
test_notebook
also checks that the number of outputs for each cell match.
Let’s modify the notebook so the first cell produces two outputs:
nb = nbformat.read("notebook.ipynb", as_version=nbformat.NO_CONVERT)
nb.cells[0].source = "print(100); 200"
# store the notebook
nbformat.write(nb, "notebook.ipynb")
test_notebook("notebook.ipynb")
0%| | 0/2 [00:00<?, ?it/s]
Executing cell: 1: 0%| | 0/2 [00:00<?, ?it/s]
Executing cell: 2: 0%| | 0/2 [00:00<?, ?it/s]
Executing cell: 2: 100%|█████████████████████████| 2/2 [00:00<00:00, 433.32it/s]
---------------------------------------------------------------------------
NotebookTestException Traceback (most recent call last)
Cell In[8], line 1
----> 1 test_notebook("notebook.ipynb")
File ~/checkouts/readthedocs.org/user_builds/ploomber-engine/checkouts/latest/src/ploomber_engine/testing.py:69, in test_notebook(path_to_nb)
66 len_actual = len(actual)
68 if len_expected != len_actual:
---> 69 raise NotebookTestException(
70 f"Error in cell {idx}: Expected number of "
71 f"cell outputs ({len_expected}), actual ({len_actual})"
72 )
74 _compare_outputs(idx, expected, actual)
NotebookTestException: Error in cell 1: Expected number of cell outputs (1), actual (2)
Limitations#
Currently, plots are ignored since they’ll produce different data even if the plots look the same.