Why and how to use custom Docker environment on DeepNote

Klaus
3 min readMar 6, 2021

To ensure that dependencies are installed before we run a notebook, it is common to see a bunch of pip install , wget and apt-get commands at the beginning.

DeepNote offers an elegant way to manage these installation scripts with the init.ipynb, so we could keep our main notebook file clean and ensure that the dependencies are installed every time we spin up the virtual machine.

This works well when the number of packages are relatively small and when we have to experiment on different packages. However, there are some drawbacks.

Problem of dependency versions

It is a good practice to craft carefully the dependency versions in our installation scripts. However, most of us would rather work on the fun part of the notebook than testing dependency versions. When the versions are not specified, the dependency versions may be different each time we run the notebook and even break someday when certain parts of the library deprecates.

Start up speed

Installing dependencies takes time. When the number of required packages grow, the startup time of the environment grows.

Using custom Docker images

When

  • the installed packages are unlikely to change, for example you need the AWS CLI in the environment and it is unlikely that you will remove it, or
  • you would like to share the environment on multiple projects

Using custom Docker image for the environment should be a better choice than installing dependencies every time you run the notebook and this is one of the unique features that DeepNote offers.

How to create and use a custom Docker environment?

Constraints

Using the customer Docker environment is easy on DeepNote. According to the documentation, you just have to fulfill these 2 requirements:

  • Have thepythoncommand with Python version > 3.6
  • The pip command works

Building the environment with a Dockerfile

Since I use Google Cloud, I will demonstrate how to use the dockerized Google Cloud SDK custom environment.

Click Dockerfile to customize the image.

First, click the Environment tab on the left. Choose Local ./Dockerfile and click the Dockerfile anchor to edit your custom image.

Note: Make sure the machine is turned off or the Dockerfile cannot be edited.

Then, copy the following lines into the editor and don’t forget to press Build.

By default, the python command links to Python 2 in this image. To fulfill the above-mentioned requirement, line 3 ln -s ... creates a symbolic link to link the python command to python3.

FROM google/cloud-sdk:latestRUN ln -s /usr/bin/python3 /usr/local/bin/python

Note: Using the latest tag may not be ideal if you want to keep the same version between builds.

Using the gcloud SDK container Terminal in DeepNote.

Once you start the machine, you can try to use the gcloud commands under the Terminal on DeepNote. It seems that the Terminal could not handle color codes yet. I believe improvements will be made.

Hope this post helps. Thank you for reading.

--

--

Klaus

A data engineer from Hong Kong. Share tools and experience for better productivity. Follow to get my latest stories!