Using Custom Python environments for Jupyter Notebooks in HiPerGator

Python
code
AI
tutorial
Author

RPM

Published

December 30, 2022

Custom Python Environments for Jupyter Notebooks

To simply use the environment go to

For additional info about using Python virtual environments with Conda please go the the UFRC page or the Software Carpentries pages from which these procedures were derived.

Background (from UFRC page)

Many projects that use Python code require careful management of the respective Python environments. Rapid changes in package dependencies, package version conflicts, deprecation of APIs (function calls) by individual projects, and obsolescence of system drivers and libraries make it virtually impossible to use an arbitrary set of packages or create one all-encompassing environment that will serve everyone’s needs over long periods of time. The high velocity of changes in the popular ML/DL frameworks and packages and GPU computing exacerbates the problem.

Getting Started: Conda Configuration

The ~/.condarc configuration file

conda‘s behavior is controlled by a configuration file in your home directory called .condarc. The dot at the start of the name means that the file is hidden from ’ls’ file listing command by default. If you have not run conda before, you won’t have this file. Whether the file exists or not, the steps here will help you modify the file to work best on HiPerGator. First load of the conda environment module on HiPerGator will put the current ‘’best practice’’ .condarc into your home directory.

This path will be found in your home directory (i.e., Home/<myname> is usually symbolized as : ~).

Home/<myname>/.condarc

conda package cache location

conda caches (keeps a copy) of all downloaded packages by default in the ~/.conda/pkgs directory tree. If you install a lot of packages you may end up filling up your home quota. You can change the default package cache path. To do so, add or change the pkgs_dirs setting in your ~/.condarc configuration file e.g.:

pkgs_dirs:
  - /blue/akeil/share/conda/pkgs

or

  - /blue/akeil/$USER/conda/pkgs

Replace akeil or mygroup with your actual group name.

conda environment location

conda puts all packages installed in a particular environment into a single directory. By default ‘’named’’ conda environments are created in the ~/.conda/envs directory tree. They can quickly grow in size and, especially if you have many environments, fill the 40GB home directory quota. For example, the environment we will create in this training is 5.3GB in size. As such, it is important to use ‘’path’’ based (conda create -p PATH) conda environments, which allow you to use any path for a particular environment for example allowing you to keep a project-specific conda environment close to the project data in /blue/ where you group has terrabyte(s) of space.

You can also change the default path for the ‘’name’’ environments (conda create -n NAME) if you prefer to keep all conda environments in the same directory tree. To do so, add or change the envs_dirs setting in the ~/.condarc configuration file e.g.:

envs_dirs:
  - /blue/akeil/share/conda/envs

or

- /blue/akeil/$USER/conda/envs

Replace mygroup with your actual group name.

Editing your ~/.condarc file.

One way to edit your ~/.condarc file is to type:

nano ~/.condarc

If the file is empty, paste in the text below, editing the env_dirs: and pkg_dirs: as below. If the file has contents, update those lines.

Note

Your ~/.condarc should look something like this when you are done editing (again, replacing group-akeil and USER in the paths with your actual group and username).

channels: 
  - conda-forge 
  - bioconda 
  - defaults 
envs_dirs: 
  - /blue/akeil/USER/conda/envs 
pkgs_dirs: 
  - /blue/akeil/USER/conda/pkgs 
auto_activate_base: false 
auto_update_conda: false 
always_yes: false 
show_channel_urls: false

Use your kernel from command line or scripts

Now that we have our environment ready, we can use it from the command line or a script using something like:

module load conda
conda activate mne

# Run my python script
python amazing_script.py


or
a path based setting:

# Set path to environment 
#   pre-pend to PATH variable
env_path=/blue/akeil/share/mne_1_x/conda/bin
export PATH=$env_path:$PATH
 
# Run my python script
python amazing_script.py

Setup a Jupyter Kernel for our environment

Often, we want to use the environment in a Jupyter notebook. To do that, we can create our own Jupyter Kernel.

Add the jupyterlab package

In order to use an environment in Jupyter, we need to make sure we install the jupyterlab package in the environment:

mamba install jupyterlab

Copy the template_kernel folder to your path

On HiPerGator, Jupyter looks in two places for kernels when you launch a notebook:

  • /apps/jupyterhub/kernels/ for the globally available kernels that all users can use. (Also a good place to look for troubleshooting getting your own kernel going)

  • ~/.local/share/jupyter/kernels for each user. (Again, your home directory and the .local folder is hidden since it starts with a dot)

Make the ~/.local/share/jupyter/kernels directory: mkdir -p ~/.local/share/jupyter/kernels

Copy the /apps/jupyterhub/template_kernel folder into your ~/.local/share/jupyter/kernels directory:

cp -r /apps/jupyterhub/template_kernel/ ~/.local/share/jupyter/kernels/hfrl

Note

This also renames the folder in the copy. It is important that the directory names be distinct in both your directory and the global /apps/jupyterhub/kernels/ directory.

Edit the template_kernel files

The template_kernel directory has four files: the run.sh and kernel.json files will need to be edited in a text editor. We will use nano in this tutorial. The logo-64X64.png and logo-32X32.png are icons for your kernel to help visually distinguish it from others. You can upload icons of those dimensions to replace the files, but they need to be named with those names.

Edit the kernel.json file

Let’s start editing the kernel.json file. As an example, we can use:

nano ~/.local/share/jupyter/kernels/hfrl/kernel.json

The template has most of the information and notes on what needs to be updated. Edit the file to look like:

{
 "language": "python",
 "display_name": "MNE v1.x",
 "argv": [
  "~/.local/share/jupyter/kernels/mne_1_x/run.sh",
  "-f",
  "{connection_file}"
 ]
}

Edit the run.sh file

The run.sh file needs the path to the python application that is in our environment. The easiest way to get that is to make sure the environment is activated and run the command: which python

The path it outputs should look something like: /blue/group/share/conda/envs/mne_1_x/bin/python

Copy that path.

Edit the run.sh file with nano:

nano ~/.local/share/jupyter/kernels/mne_1_x/run.sh

The file should looks like this, but with your path:

#!/usr/bin/bash

exec /blue/akeil/share/conda/envs/mne_1_x/bin/python -m ipykernel "$@"

If you are doing this in a Jupyter session, refresh your page. If not, launch Jupyter.

Your kernel should be there ready for you to use!

Working with yml files

Export your environment to an environment.yml file

Now that you have your environment working, you may want to document its contents and/or share it with others. The environment.yml file defines the environment and can be used to build a new environment with the same setup.

To export an environment file from an existing environment, run:

conda env export > mne_1_x.yml

You can inspect the contents of this file with cat mne_1_x.yml. This file defines the packages and versions that make up the environment as it is at this point in time. Note that it also includes packages that were installed via pip.

Create an environment from a yaml file

If you share the environment yaml file created above with another user, they can create a copy of your environment using the command:

conda env create –file mne_1_x.yml

They may need to edit the last line to change the location to match where they want their environment created.

Group environments

It is possible to create a shared environment accessed by a group on HiPerGator, storing the environment in, for example, /blue/akeil/share/conda. In general, this works best if only one user has write access to the environment. All installs should be made by that one user and should be communicated with the other users in the group.