big_data_frameworks.md

marie@login$ srun --partition=haswell --nodes=2 --mem=60000M --exclusive --time=01:00:00 --pty bash -l
Do not delete the directory `cluster-conf-<JOB_ID>` while the job is still
running. This leads to errors.
    module load Spark/3.0.1-Hadoop-2.7-Java-1.8-Python-3.7.4-GCCcore-8.3.0

    function myExitHandler () {
        stop-all.sh
    }

    #configuration
    . framework-configure.sh spark $SPARK_HOME/conf

    #register cleanup hook in case something goes wrong
    trap myExitHandler EXIT

    start-all.sh

    spark-submit --class org.apache.spark.examples.SparkPi $SPARK_HOME/examples/jars/spark-examples_2.12-3.0.1.jar 1000

    stop-all.sh

    exit 0
    ```
=== "Flink"
    ```bash
    #!/bin/bash -l
    #SBATCH --time=01:00:00
    #SBATCH --partition=haswell
    #SBATCH --nodes=2
    #SBATCH --exclusive
    #SBATCH --mem=60000M
    #SBATCH --job-name="example-flink"

    module load Flink/1.12.3-Java-1.8.0_161-OpenJDK-Python-3.7.4-GCCcore-8.3.0

    function myExitHandler () {
        stop-cluster.sh
    }

    #configuration
    . framework-configure.sh flink $FLINK_ROOT_DIR/conf

    #register cleanup hook in case something goes wrong
    trap myExitHandler EXIT

    #start the cluster
    start-cluster.sh

    #run your application
    flink run $FLINK_ROOT_DIR/examples/batch/KMeans.jar

    #stop the cluster
    stop-cluster.sh

    exit 0
    ```
marie@login$ srun --pty --ntasks=1 --cpus-per-task=2 --mem-per-cpu=2500 --time=01:00:00 bash -l
marie@compute$ cd $HOME
marie@compute$ mkdir jupyter-kernel
marie@compute$ module load Python
marie@compute$ virtualenv --system-site-packages jupyter-kernel/env  #Create virtual environment
[...]
marie@compute$ source jupyter-kernel/env/bin/activate    #Activate virtual environment.
(env) marie@compute$ pip install ipykernel
[...]
(env) marie@compute$ python -m ipykernel install --user --name haswell-py3.7-spark --display-name="haswell-py3.7-spark"
Installed kernelspec haswell-py3.7-spark in [...]

(env) marie@compute$ pip install findspark
(env) marie@compute$ deactivate
You could work with simple examples in your home directory, but, according to the
[storage concept](../data_lifecycle/overview.md), **please use
[workspaces](../data_lifecycle/workspaces.md) for your study and work projects**. For this
reason, you have to use advanced options of Jupyterhub and put "/" in "Workspace scope" field.
If you have questions or need advice, please use the contact form on
[https://scads.ai/contact/](https://scads.ai/contact/) or contact the HPC support.