From 0a9269a4b46d5880e7719da981f9f1d5145b9bd2 Mon Sep 17 00:00:00 2001 From: Elias Werner <eliwerner3@googlemail.com> Date: Wed, 16 Feb 2022 09:18:24 +0100 Subject: [PATCH] incorporated MR 446: *add cleaned JupyterHub Teaching example *improved JupyterHub for Teaching *add environments.yml/requirements.txt creation to python_virtual_environments *add git compatibility to jupyterhub --- .../docs/access/jupyter_teaching_example.md | 229 +++++------------- .../docs/access/jupyterhub.md | 16 +- .../docs/access/jupyterhub_for_teaching.md | 17 +- .../software/python_virtual_environments.md | 70 +++++- 4 files changed, 160 insertions(+), 172 deletions(-) diff --git a/doc.zih.tu-dresden.de/docs/access/jupyter_teaching_example.md b/doc.zih.tu-dresden.de/docs/access/jupyter_teaching_example.md index 431c27e8a..efb4da7e0 100644 --- a/doc.zih.tu-dresden.de/docs/access/jupyter_teaching_example.md +++ b/doc.zih.tu-dresden.de/docs/access/jupyter_teaching_example.md @@ -1,7 +1,7 @@ # JupyterLab Teaching Example Setting up a Jupyter Lab Course involves additional steps, beyond JupyterHub, such as creating -course specific environments, and allowing participants to link and activate these environments during +course specific environments and allowing participants to link and activate these environments during the course. This page includes a work through of these additional steps, with best practice examples for each part. @@ -19,196 +19,95 @@ or built in advance. We will focus on the custom environment approach here. ## Prerequisites -- A public git repository with notebook files (`ipynb`) and all other starting files required +- A public git repository with the notebook files (`ipynb`) and all other starting files required by participants. One option to host the repository is the [TU Chemnitz Gitlab](https://gitlab.hrz.tu-chemnitz.de/). -- A tested `environment.yml`, with specific conda dependencies, that is used for creating - the environment. We will summarize steps below. - A [HPC project](https://hpcprojekte.zih.tu-dresden.de/managers/) for teaching, with students as registered participants - For the tutor, a shell access to the HPC resources and project folder. -## Preparing the git repository +## Preparation on the Lecturer's Side -Notebooks need to be available in the repository as `.ipynb` files. +The following part describes several steps for the preparation of a course with the JupyterHub at ZIH. -??? "Tracking with Jupytext" - Version tracking of `.ipynb` in git can be improved with - [Jupytext](https://jupytext.readthedocs.io/en/latest/). - Jupytext will provide Markdown (`.md`) and Python (`.py`) - conversions of Notebooks on the fly, next to `.ipynb`. - - Tracking these files will provide a cleaner git history. A - further advantage is that Python notebook versions can be imported, - allowing to split larger notebooks into smaller ones, based on - chained imports. - - However, `ipynb` files need still to be made available - in the repository, since Jupytext is not installed in the base - JupyterHub environment at the ZIH. - -A basic structured git repository could look like this. - -```output -. -├─.git -├─notebooks -├───01_intro.ipynb -├───02_part1.ipynb -├───02_part2.ipynb -├─.gitignore -├─environment.yml -└─Readme.md -``` - -**1. Creating a custom conda environment** - -There are several ways to [create conda environments](../../software/python_virtual_environments/#conda-virtual-environment). - -For preparing a custom environment for a Jupyter Lab course, -all participants will need to have read-access to this environment. -This is best done by storing the conda environment in the project -folder (e.g. `/projects/p_lv_jupyter_course/`). +### 1. Creating a custom Python environment ### -Shown below is the process to prepare this environment from a `environment.yml`. +Prepare a a Python virtual environment (`virtualenv`) or conda virtual environment as described in [Python virtual environments](../software/python_virtual_environments). +Note, for preparing a custom environment for a Jupyter Lab course, all participants will need to have read-access to this environment. +This is best done by storing the environment in either a [workspace](../data_lifecycle/workspaces.md) with a limited liftime or +in a projects folder (e.g. `/projects/p_lv_jupyter_course/`) without a limited lifetime. -**2. Clone the repository** +### 2. Clone the repository and store environment setup ### +First prepare the `requirements.txt` or the `environment.yml` to persist the environment as described in [Python virtual environments](../software/python_virtual_environments). -First connect to taurus and clone the repository to a user folder in `/home/`. +Then clone the repository of your course to your home directory or into a directory in the projects folder and add the file to the repository. -```bash -git clone git@gitlab.hrz.tu-chemnitz.de:zih/username/jupyterlab_course.git -cd jupyterlab_course -``` +=== "virtualenv" + ```console + marie@compute$ git clone git@gitlab.hrz.tu-chemnitz.de:zih/projects/p_lv_jupyter_course/clone_marie/jupyterlab_course.git + [...] + marie@compute$ cp requirements.txt /projects/p_lv_jupyter_course/clone_marie/jupyterlab_course + marie@compute$ cd /projects/p_lv_jupyter_course/clone_marie/jupyterlab_course + marie@compute$ git add requirements.txt + marie@compute$ git commit + marie@compute$ git push -**3. Prepare the `environment.yml`** - -First, have a look a the [conda docs](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html?highlight=environment.yml#create-env-file-manually) -that describe the environment.yml syntax. - -An example course environment.yml could be: -```yml -name: workshop_env -channels: - - conda-forge - - defaults -dependencies: - - python>=3.7 - - pip - - colorcet - - 'geoviews-core=1.8.1' - - 'ipywidgets=7.6.*' - - geopandas - - hvplot - - pyepsg - - python-dotenv - - 'shapely=1.7.1' - - pip: - - python-hll -``` + ``` +=== "conda" + ```console + marie@compute$ git clone git@gitlab.hrz.tu-chemnitz.de:zih/projects/p_lv_jupyter_course/clone_marie/jupyterlab_course.git + [...] + marie@compute$ cp requirements.txt /projects/p_lv_jupyter_course/clone_marie/jupyterlab_course + marie@compute$ cd /projects/p_lv_jupyter_course/clone_marie/jupyterlab_course + marie@compute$ git add environment.yml + marie@compute$ git commit + marie@compute$ git push -After specifying the `name`, the conda -[channel priority](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html) is defined. -In the example above, packages will be first installed from the `conda-forge` channel, and if not found, -from the `default` Anaconda channel. + ``` -Below, dependencies can be specified. Optionally, <abbr title="Pinning is a process that -allows you to remain on a stable release while grabbing packages from a more recent version."> -pinning</abbr> can be used to delimit the packages installed to compatible package versions. +Now, you can re-create the environment and the whole course from the git repository in the future. -Finally, packages not available on conda can be specified (indented) below `- pip:` +To test the activation of the environment use: -**4. Re-create the environment in the user home folder** +=== "virtualenv" + ```console + marie@compute$ source /scratch/ws/1/python_virtual_environment_teaching/env/bin/activate #Activate virtual environment. Example output: (envtest) bash-4.2$ -The `/project` file system can be slow to write. It is -therefore advised to create the environment in the user's home -folder first, and then move the environment to the -`/project` partition using the [Data Mover Tools](../../data_transfer/datamover/) + ``` +=== "conda" + ```console + marie@compute$ conda activate /scratch/ws/1/conda_virtual_environment_teaching -!!! note - There is a time limit of `5` Minutes for commands directly - issued on Taurus. It is therefore adviced to create a job - for the conda environment installation. - -Create an interactive job shell. -```bash -srun --account=p_lv_jupyter_course \ - --pty --ntasks=1 --cpus-per-task=4 \ - --time=1:00:00 --mem-per-cpu=1700 bash -l -``` + ``` -```output -> srun: job xxxxxxxx queued and waiting for resources -> srun: job xxxxxxxx has been allocated resources -``` - -Create the environment in your user home folder. This uses the `--prefix` flag to -override the default location of new environments (see [the docs](https://docs.conda.io/projects/conda/en/latest/commands/create.html#Target%20Environment%20Specification)). - -```bash -mkdir /home/$USER/workshop_env -module load Anaconda3 -conda config --set channel_priority strict -conda env create \ - --prefix /home/$USER/workshop_env \ - --file /$USER/jupyterlab_course/environment.yml \ - --quiet -``` +### 3. Prepare an activation file ### +Create a file to install the `ipykernel` to the user-folder, linking the central `workshop_env` to the ZIH JupyterLab. +An `activate_workshop_env.sh` should have the following content: -This may take a while. Once the environment is created, -logout from the job (<kbd>CTRL+D</kbd>) and test activation. -```bash -source /sw/installed/Anaconda3/2019.03/bin/activate ~/workshop_env +```console +/projects/jupyterlab_course/workshop_env/bin/python -m ipykernel install --user --name workshop_env --display-name="workshop_env" ``` -If successful, move the environment to the central course -project folder on `/projects` using the [Data Mover Tools](../../data_transfer/datamover/). - -```bash -dtmv /home/$USER/workshop_env /projects/jupyterlab_course/. -``` - -The process can be inspected using `squeue`. -``` -squeue -u $USER -``` - -Test the activation from the central `/projects` filesystem. - -```bash -source /sw/installed/Anaconda3/2019.03/bin/activate /projects/jupyterlab_course/workshop_env -``` +!!! note + The file for installing the kernel should also be added to the git repository. -## Prepare the spawn link +### 4. Prepare the spawn link ### Have a look at the instructions to prepare [a custom spawn link in combination with the git-pull feature](../jupyterhub_for_teaching/#combination-of-quickstart-and-git-pull-feature). -## Preparing activation of the custom environment in notebooks +## Usage on the Student's Side -When students open the notebooks (e.g.) through a Spawn Link that pulls the Git files -and notebooks from our repository, the conda central environment must be linked and activated -first. +### Preparing activation of the custom environment in notebooks +When students open the notebooks (e.g. through a Spawn Link that pulls the Git files +and notebooks from our repository), the Python environment must be activated first by installing a Jupyter kernel. This can be done inside the first notebook using a shell command (`.sh`). -Create a file called `activate_workshop_env.sh` in your repository. - -In the file, instructions are given to install the `ipykernel` to the user-folder, -linking the central `workshop_env` to the ZIH JupyterLab. - -```sh -/projects/jupyterlab_course/workshop_env/bin/python \ - -m ipykernel install \ - --user \ - --name workshop_env \ - --display-name="workshop_env" -``` - -The students will need to run this shell file, which can be done +Therefore the students will need to run the `activation_workshop_env.sh` file, which can be done in the first cell of the first notebook (e.g. inside `01_intro.ipynb`). In a code cell in `01_intro.ipynb`, add: -```bash +```console !cd .. && sh activate_workshop_env.sh ``` @@ -217,35 +116,35 @@ When students run this file, the following output signals a successful setup.  {: align="center"} -Afterwards, the `workshop_env` can be selected in the top-right corner of Jupyter Lab. +Afterwards, the `workshop_env` Jupyter kernel can be selected in the top-right corner of Jupyter Lab. !!! note A few seconds may be needed until the environment becomes available in the list. -# Test spawn link and environment activation +## Test spawn link and environment activation -During testing, it may be necessary to reset the workspace -to the initial state. There are two steps involved +During testing, it may be necessary to reset the workspace to the initial state. There are two steps involved First, remove the cloned git repository in user home folder. !!! warning Check carefully the syntax below, to avoid removing the wrong files. -```bash +```console cd ~ rm -rf ./jupyterlab_course.git ``` Second, the IPython Kernel must be unlinked from the user workshop_env. -```bash +```console jupyter kernelspec uninstall workshop_env ``` -# Summary +## Summary The following video shows an example of the process of opening the spawn link and activating the environment, from the students perspective. +Note that this video shows the case for a conda virtual environment. <div align="center"> <video width="446" height="240" controls muted> @@ -260,4 +159,4 @@ Your browser does not support the video tag. - Students must be advised to _not_ click "Start My Server" or edit the form, if the server does not start automatically. - - If the server does not start automatically, click (or copy & paste) the spawn link again. \ No newline at end of file + - If the server does not start automatically, click (or copy & paste) the spawn link again. diff --git a/doc.zih.tu-dresden.de/docs/access/jupyterhub.md b/doc.zih.tu-dresden.de/docs/access/jupyterhub.md index f9a916195..5d548e629 100644 --- a/doc.zih.tu-dresden.de/docs/access/jupyterhub.md +++ b/doc.zih.tu-dresden.de/docs/access/jupyterhub.md @@ -95,10 +95,24 @@ directories or terminals. ## Jupyter Notebooks in General In JupyterHub you can create scripts in notebooks. Notebooks are programs which are split into -multiple logical code blocks. In between those code blocks you can insert text blocks for +multiple logical code blocks. In between those code blocks you can insert text blocks for documentation and each block can be executed individually. Each notebook is paired with a kernel running the code. We currently offer one for Python, C++, MATLAB and R. +### Version Control of Jupyter Notebooks with Git + +Since Jupyter notebooks are files containing multiple blocks for input code, documentation, +output and further information, it is difficult to use them with Git. Version tracking of +the `.ipynb` notebook files can be improved with the [Jupytext plugin](https://jupytext.readthedocs.io/en/latest/). +Jupytext will provide Markdown (`.md`) and Python (`.py`) conversions of notebooks on the fly, +next to `.ipynb`. Tracking these files will then provide a cleaner git history. A further +advantage is that Python notebook versions can be imported, allowing to split larger notebooks +into smaller ones, based on chained imports. + +!!! note + The Jupytext plugin is not installed on the ZIH system at the moment. Therefore `ipynb` + files need to be made available in a repository for shared usage within the ZIH system. + ## Stop a Session It is good practice to stop your session once your work is done. This releases resources for other diff --git a/doc.zih.tu-dresden.de/docs/access/jupyterhub_for_teaching.md b/doc.zih.tu-dresden.de/docs/access/jupyterhub_for_teaching.md index a261b59a1..e705ec473 100644 --- a/doc.zih.tu-dresden.de/docs/access/jupyterhub_for_teaching.md +++ b/doc.zih.tu-dresden.de/docs/access/jupyterhub_for_teaching.md @@ -140,7 +140,7 @@ and files via `chmod`. Set up your shared Python virtual environment for JupyterHub: === "virtualenv" - ```bash + ```console marie@compute$ module load Python #Load default Python [...] marie@compute$ ws_allocate -F scratch python_virtual_environment_teaching 1 @@ -163,7 +163,7 @@ Set up your shared Python virtual environment for JupyterHub: ``` === "conda" - ```bash + ```console marie@compute$ module load Anaconda3 #Load Anaconda [...] marie@compute$ ws_allocate -F scratch conda_virtual_environment_teaching 1 @@ -186,7 +186,7 @@ Set up your shared Python virtual environment for JupyterHub: Now, users have to install the kernel in order to use the shared Python virtual environment in JupyerHub: === "virtualenv" - ```bash + ```console marie@compute$ module load Python #Load default Python [...] marie@compute$ source /scratch/ws/1/python_virtual_environment_teaching/env/bin/activate #Activate virtual environment. Example output: (envtest) bash-4.2$ @@ -196,7 +196,7 @@ environment in JupyerHub: ``` === "conda" - ```bash + ```console marie@compute$ module load Anaconda3 #Load Anaconda [...] marie@compute$ conda activate /scratch/ws/1/conda_virtual_environment_teaching @@ -207,4 +207,11 @@ environment in JupyerHub: ``` After spawning the Notebook, you can select the kernel with the created Python -virtual environment. +virtual environment. +Note that you can also write the installation of the kernel (above steps) in a script file as +described in [JupyterHub Teaching Example](jupyterhub_teaching_example.md) + +!!! hint + You can also execute the commands for installing the kernel from the Jupyter as described in + [JupyterHub Teaching Example](jupyterhub_teaching_example.md). Then users do not have to use the + command line interface after the preparation. diff --git a/doc.zih.tu-dresden.de/docs/software/python_virtual_environments.md b/doc.zih.tu-dresden.de/docs/software/python_virtual_environments.md index 67b10817c..8fabbc929 100644 --- a/doc.zih.tu-dresden.de/docs/software/python_virtual_environments.md +++ b/doc.zih.tu-dresden.de/docs/software/python_virtual_environments.md @@ -53,7 +53,7 @@ Info: creating workspace. [...] marie@compute$ virtualenv --system-site-packages /scratch/ws/1/python_virtual_environment/env #Create virtual environment [...] -marie@compute$ source /scratch/ws/1/python_virtual_environment/env/bin/activate #Activate virtual environment. Example output: (envtest) bash-4.2$ +marie@compute$ source /scratch/ws/1/python_virtual_environment/env/bin/activate #Activate virtual environment. Example output: (env) bash-4.2$ ``` Now you can work in this isolated environment, without interfering with other tasks running on the @@ -64,6 +64,27 @@ the virtual environment. You can deactivate the environment as follows: (env) marie@compute$ deactivate #Leave the virtual environment ``` +### Persistency of Python Virtual Environment + +To persist a virtualenv, you can store the names and versions of installed packages in a +file. Then you can restore this virtualenv by installing the packages from this file. +Use the `pip freeze` command for storing: + +```console +(env) marie@compute$ pip freeze > requirements.txt #Store the currently installed packages +``` + +Use the `pip install` command installing the packages from the file: + +```console +marie@compute$ module load Python #Load default Python +[...] +marie@compute$ virtualenv --system-site-packages /scratch/ws/1/python_virtual_environment/env_post #Create virtual environment +[...] +marie@compute$ source /scratch/ws/1/python_virtual_environment/env/bin/activate #Activate virtual environment. Example output: (env_post) bash-4.2$ +(env_post) marie@compute$ pip install -r requirements.txt #Install packages from the created requirements.txt file +``` + ## Conda Virtual Environment This example shows how to start working with **conda** and virtual environment (with using module @@ -122,3 +143,50 @@ are in the virtual environment. You can deactivate the conda environment as foll 0.10.0+cu102 (my-torch-env) marie@alpha$ deactivate ``` + +### Persistency of Conda Virtual Environment + +To persist a conda virtual environment, you can define an ***environments.yml*** file. +Have a look a the [conda docs](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html?highlight=environment.yml#create-env-file-manually) +for a description of the syntax. See an example for the ***environments.yml*** file below. + +??? example + ```yml + name: workshop_env + channels: + - conda-forge + - defaults + dependencies: + - python>=3.7 + - pip + - colorcet + - 'geoviews-core=1.8.1' + - 'ipywidgets=7.6.*' + - geopandas + - hvplot + - pyepsg + - python-dotenv + - 'shapely=1.7.1' + - pip: + - python-hll + ``` + +After specifying the `name`, the conda +[channel priority](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html) is defined. +In the example above, packages will be first installed from the `conda-forge` channel, and if not found, +from the `default` Anaconda channel. + +Below, dependencies can be specified. Optionally, <abbr title="Pinning is a process that +allows you to remain on a stable release while grabbing packages from a more recent version."> +pinning</abbr> can be used to delimit the packages installed to compatible package versions. + +Finally, packages not available on conda can be specified (indented) below `- pip:` + +Recreate the conda virtual environment with the packages from the created **environment.yml** file: + +```console +marie@compute$ mkdir workshop_env #Create directory for environment +marie@compute$ module load Anaconda3 #Load Anaconda +marie@compute$ conda config --set channel_priority strict +marie@compute$ conda env create --prefix workshop_env --file environment.yml #Create conda env in directory with packages from environment.yml file +``` -- GitLab