From 6b296d8701aac46d73cd92d008921a8d37005ab8 Mon Sep 17 00:00:00 2001
From: Elias Werner <eliwerner3@googlemail.com>
Date: Wed, 15 Dec 2021 17:32:51 +0100
Subject: [PATCH] remove separate flink.md file and move jupyter flink content
 to big_data_frameworks.md

---
 .../docs/software/big_data_frameworks.md      |  34 +--
 doc.zih.tu-dresden.de/docs/software/flink.md  | 208 ------------------
 2 files changed, 21 insertions(+), 221 deletions(-)
 delete mode 100644 doc.zih.tu-dresden.de/docs/software/flink.md

diff --git a/doc.zih.tu-dresden.de/docs/software/big_data_frameworks.md b/doc.zih.tu-dresden.de/docs/software/big_data_frameworks.md
index 869e80aaf..d409f33c1 100644
--- a/doc.zih.tu-dresden.de/docs/software/big_data_frameworks.md
+++ b/doc.zih.tu-dresden.de/docs/software/big_data_frameworks.md
@@ -32,7 +32,6 @@ The steps are:
 
 Apache Spark can be used in [interactive](#interactive-jobs) and [batch](#batch-jobs) jobs as well
 as via [Jupyter notebooks](#jupyter-notebook). All three ways are outlined in the following.
-The usage of Flink with Jupyter notebooks is currently under examination.
 
 ## Interactive Jobs
 
@@ -238,27 +237,36 @@ example below:
 
 ## Jupyter Notebook
 
-You can run Jupyter notebooks with Spark on the ZIH systems in a similar way as described on the
-[JupyterHub](../access/jupyterhub.md) page. Interaction of Flink with JupyterHub is currently
-under examination and will be posted here upon availability.
+You can run Jupyter notebooks with Spark and Flink on the ZIH systems in a similar way as described 
+on the [JupyterHub](../access/jupyterhub.md) page.
 
 ### Spawning a Notebook
 
 Go to [https://taurus.hrsk.tu-dresden.de/jupyter](https://taurus.hrsk.tu-dresden.de/jupyter).
-In the tab "Advanced", go to the field "Preload modules" and select the following Spark module:
+In the tab "Advanced", go to the field "Preload modules" and select the following Spark or Flink
+module:
+
+=== "Spark"
+    ```
+    Spark/3.0.1-Hadoop-2.7-Java-1.8-Python-3.7.4-GCCcore-8.3.0
+    ```
+=== "Flink"
+    ```
+    Flink/1.12.3-Java-1.8.0_161-OpenJDK-Python-3.7.4-GCCcore-8.3.0
+    ```
 
-```
-Spark/3.0.1-Hadoop-2.7-Java-1.8-Python-3.7.4-GCCcore-8.3.0
-```
 
-When your Jupyter instance is started, you can set up Spark. Since the setup in the notebook
-requires more steps than in an interactive session, we have created an example notebook that you can
-use as a starting point for convenience: [SparkExample.ipynb](misc/SparkExample.ipynb)
+When your Jupyter instance is started, you can set up Spark/Flink. Since the setup in the notebook
+requires more steps than in an interactive session, we have created example notebooks that you can
+use as a starting point for convenience: 
+[SparkExample.ipynb](misc/SparkExample.ipynb), 
+[FlinkExample.ipynb](misc/FlinkExample.ipynb)
 
 !!! warning
 
-    This notebook only works with the Spark module mentioned above. When using other Spark modules,
-    it is possible that you have to do additional or other steps in order to make Spark running.
+    The notebooks only work with the Spark or Flink module mentioned above. When using other 
+    Spark/Flink modules, it is possible that you have to do additional or other steps in order to 
+    make Spark/Flink running.
 
 !!! note
 
diff --git a/doc.zih.tu-dresden.de/docs/software/flink.md b/doc.zih.tu-dresden.de/docs/software/flink.md
deleted file mode 100644
index bf4dddbae..000000000
--- a/doc.zih.tu-dresden.de/docs/software/flink.md
+++ /dev/null
@@ -1,208 +0,0 @@
-# Apache Flink
-
-[Apache Flink](https://flink.apache.org/) is a framework for processing and integrating Big Data.
-It offers a similar API as [Apache Spark](big_data_frameworks.md), but is more appropriate
-for data stream processing. You can check module versions and availability with the command:
-
-```console
-marie@login$ module avail Flink
-```
-
-**Prerequisites:** To work with the frameworks, you need [access](../access/ssh_login.md) to ZIH
-systems and basic knowledge about data analysis and the batch system
-[Slurm](../jobs_and_resources/slurm.md).
-
-The usage of Big Data frameworks is different from other modules due to their master-worker
-approach. That means, before an application can be started, one has to do additional steps.
-
-The steps are:
-
-1. Load the Flink software module
-1. Configure the Flink cluster
-1. Start a Flink cluster
-1. Start the Flink application
-
-Apache Flink can be used in [interactive](#interactive-jobs) and [batch](#batch-jobs) jobs as
-described below.
-
-## Interactive Jobs
-
-### Default Configuration
-
-Let us assume that two nodes should be used for the computation. Use a `srun` command similar to
-the following to start an interactive session using the partition haswell. The following code
-snippet shows a job submission to haswell nodes with an allocation of two nodes with 60 GB main
-memory exclusively for one hour:
-
-```console
-marie@login$ srun --partition=haswell --nodes=2 --mem=60g --exclusive --time=01:00:00 --pty bash -l
-```
-
-Once you have the shell, load Flink using the command
-
-```console
-marie@compute$ module load Flink
-```
-
-Before the application can be started, the Flink cluster needs to be set up. To do this, configure
-Flink first using configuration template at `$FLINK_ROOT_DIR/conf`:
-
-```console
-marie@compute$ source framework-configure.sh flink $FLINK_ROOT_DIR/conf
-```
-
-This places the configuration in a directory called `cluster-conf-<JOB_ID>` in your `home`
-directory, where `<JOB_ID>` stands for the id of the Slurm job. After that, you can start Flink in
-the usual way:
-
-```console
-marie@compute$ start-cluster.sh
-```
-
-The Flink processes should now be set up and you can start your application, e. g.:
-
-```console
-marie@compute$ flink run $FLINK_ROOT_DIR/examples/batch/KMeans.jar
-```
-
-!!! warning
-
-    Do not delete the directory `cluster-conf-<JOB_ID>` while the job is still
-    running. This leads to errors.
-
-### Custom Configuration
-
-The script `framework-configure.sh` is used to derive a configuration from a template. It takes two
-parameters:
-
-- The framework to set up (Spark, Flink, Hadoop)
-- A configuration template
-
-Thus, you can modify the configuration by replacing the default configuration template with a
-customized one. This way, your custom configuration template is reusable for different jobs. You
-can start with a copy of the default configuration ahead of your interactive session:
-
-```console
-marie@login$ cp -r $FLINK_ROOT_DIR/conf my-config-template
-```
-
-After you have changed `my-config-template`, you can use your new template in an interactive job
-with:
-
-```console
-marie@compute$ source framework-configure.sh flink my-config-template
-```
-
-### Using Hadoop Distributed Filesystem (HDFS)
-
-If you want to use Flink and HDFS together (or in general more than one framework), a scheme
-similar to the following can be used:
-
-```console
-marie@compute$ module load Hadoop
-marie@compute$ module load Flink
-marie@compute$ source framework-configure.sh hadoop $HADOOP_ROOT_DIR/etc/hadoop
-marie@compute$ source framework-configure.sh flink $FLINK_ROOT_DIR/conf
-marie@compute$ start-dfs.sh
-marie@compute$ start-cluster.sh
-```
-
-## Batch Jobs
-
-Using `srun` directly on the shell blocks the shell and launches an interactive job. Apart from
-short test runs, it is **recommended to launch your jobs in the background using batch jobs**. For
-that, you can conveniently put the parameters directly into the job file and submit it via
-`sbatch [options] <job file>`.
-
-Please use a [batch job](../jobs_and_resources/slurm.md) with a configuration, similar to the
-example below:
-
-??? example "flink.sbatch"
-    ```bash
-    #!/bin/bash -l
-    #SBATCH --time=00:05:00
-    #SBATCH --partition=haswell
-    #SBATCH --nodes=2
-    #SBATCH --exclusive
-    #SBATCH --mem=60G
-    #SBATCH --job-name="example-flink"
-
-    ml Flink/1.12.3-Java-1.8.0_161-OpenJDK-Python-3.7.4-GCCcore-8.3.0
-
-    function myExitHandler () {
-        stop-cluster.sh
-    }
-
-    #configuration
-    . framework-configure.sh flink $FLINK_ROOT_DIR/conf
-
-    #register cleanup hook in case something goes wrong
-    trap myExitHandler EXIT
-
-    #start the cluster
-    start-cluster.sh
-
-    #run your application
-    flink run $FLINK_ROOT_DIR/examples/batch/KMeans.jar
-
-    #stop the cluster
-    stop-cluster.sh
-
-    exit 0
-    ```
-
-!!! note
-
-    You could work with simple examples in your home directory, but, according to the
-    [storage concept](../data_lifecycle/overview.md), **please use
-    [workspaces](../data_lifecycle/workspaces.md) for your study and work projects**. For this
-    reason, you have to use advanced options of Jupyterhub and put "/" in "Workspace scope" field.
-
-## Jupyter Notebook
-
-You can run Jupyter notebooks with Flink on the ZIH systems in a similar way as described on the
-[JupyterHub](../access/jupyterhub.md) page.
-
-### Spawning a Notebook
-
-Go to [https://taurus.hrsk.tu-dresden.de/jupyter](https://taurus.hrsk.tu-dresden.de/jupyter).
-In the tab "Advanced", go to the field "Preload modules" and select the following Flink module:
-
-```
-Flink/1.12.3-Java-1.8.0_161-OpenJDK-Python-3.7.4-GCCcore-8.3.0
-```
-
-When your Jupyter instance is started, you can set up Flink. Since the setup in the notebook
-requires more steps than in an interactive session, we have created an example notebook that you can
-use as a starting point for convenience: [FlinkExample.ipynb](misc/FlinkExample.ipynb)
-
-!!! warning
-
-    This notebook only works with the Flink module mentioned above. When using other Flink modules,
-    it is possible that you have to do additional or other steps in order to make Flink running.
-
-!!! note
-
-    You could work with simple examples in your home directory, but, according to the
-    [storage concept](../data_lifecycle/overview.md), **please use
-    [workspaces](../data_lifecycle/workspaces.md) for your study and work projects**. For this
-    reason, you have to use advanced options of Jupyterhub and put "/" in "Workspace scope" field.
-
-## FAQ
-
-Q: Command `source framework-configure.sh hadoop
-$HADOOP_ROOT_DIR/etc/hadoop` gives the output:
-`bash: framework-configure.sh: No such file or directory`. How can this be resolved?
-
-A: Please try to re-submit or re-run the job and if that doesn't help
-re-login to the ZIH system.
-
-Q: There are a lot of errors and warnings during the set up of the
-session
-
-A: Please check the work capability on a simple example as shown in this documentation.
-
-!!! help
-
-    If you have questions or need advice, please use the contact form on
-    [https://scads.ai/contact/](https://scads.ai/contact/) or contact the HPC support.
-- 
GitLab