From 786bf032935c3b8dd9262719fb702599cd29e90c Mon Sep 17 00:00:00 2001
From: Christoph Lehmann <christoph.lehmann@tu-dresden.de>
Date: Thu, 5 Aug 2021 17:34:47 +0200
Subject: [PATCH] basic stucturing DA section finished

---
 .../docs/{software => archive}/power_ai.md    |   0
 doc.zih.tu-dresden.de/docs/software/dask.md   | 136 -----------------
 ...ython.md => data_analytics_with_python.md} | 137 ++++++++++++++++++
 ...udio.md => data_analytics_with_rstudio.md} |   0
 doc.zih.tu-dresden.de/mkdocs.yml              |   4 +-
 5 files changed, 139 insertions(+), 138 deletions(-)
 rename doc.zih.tu-dresden.de/docs/{software => archive}/power_ai.md (100%)
 delete mode 100644 doc.zih.tu-dresden.de/docs/software/dask.md
 rename doc.zih.tu-dresden.de/docs/software/{python.md => data_analytics_with_python.md} (77%)
 rename doc.zih.tu-dresden.de/docs/software/{rstudio.md => data_analytics_with_rstudio.md} (100%)

diff --git a/doc.zih.tu-dresden.de/docs/software/power_ai.md b/doc.zih.tu-dresden.de/docs/archive/power_ai.md
similarity index 100%
rename from doc.zih.tu-dresden.de/docs/software/power_ai.md
rename to doc.zih.tu-dresden.de/docs/archive/power_ai.md
diff --git a/doc.zih.tu-dresden.de/docs/software/dask.md b/doc.zih.tu-dresden.de/docs/software/dask.md
deleted file mode 100644
index d6f7d087e..000000000
--- a/doc.zih.tu-dresden.de/docs/software/dask.md
+++ /dev/null
@@ -1,136 +0,0 @@
-# Dask
-
-**Dask** is an open-source library for parallel computing. Dask is a flexible library for parallel
-computing in Python.
-
-Dask natively scales Python. It provides advanced parallelism for analytics, enabling performance at
-scale for some of the popular tools. For instance: Dask arrays scale Numpy workflows, Dask
-dataframes scale Pandas workflows, Dask-ML scales machine learning APIs like Scikit-Learn and
-XGBoost.
-
-Dask is composed of two parts:
-
-- Dynamic task scheduling optimized for computation and interactive
-  computational workloads.
-- Big Data collections like parallel arrays, data frames, and lists
-  that extend common interfaces like NumPy, Pandas, or Python
-  iterators to larger-than-memory or distributed environments. These
-  parallel collections run on top of dynamic task schedulers.
-
-Dask supports several user interfaces:
-
-High-Level:
-
-- Arrays: Parallel NumPy
-- Bags: Parallel lists
-- DataFrames: Parallel Pandas
-- Machine Learning : Parallel Scikit-Learn
-- Others from external projects, like XArray
-
-Low-Level:
-
-- Delayed: Parallel function evaluation
-- Futures: Real-time parallel function evaluation
-
-## Installation
-
-### Installation Using Conda
-
-Dask is installed by default in [Anaconda](https://www.anaconda.com/download/). To install/update
-Dask on a Taurus with using the [conda](https://www.anaconda.com/download/) follow the example:
-
-```Bash
-# Job submission in ml nodes with allocating: 1 node, 1 gpu per node, 4 hours
-srun -p ml -N 1 -n 1 --mem-per-cpu=5772 --gres=gpu:1 --time=04:00:00 --pty bash
-```
-
-Create a conda virtual environment. We would recommend using a workspace. See the example (use
-`--prefix` flag to specify the directory).
-
-**Note:** You could work with simple examples in your home directory (where you are loading by
-default). However, in accordance with the
-[HPC storage concept](../data_lifecycle/hpc_storage_concept2019.md) please use a
-[workspaces](../data_lifecycle/workspaces.md) for your study and work projects.
-
-```Bash
-conda create --prefix /scratch/ws/0/aabc1234-Workproject/conda-virtual-environment/dask-test python=3.6
-```
-
-By default, conda will locate the environment in your home directory:
-
-```Bash
-conda create -n dask-test python=3.6
-```
-
-Activate the virtual environment, install Dask and verify the installation:
-
-```Bash
-ml modenv/ml
-ml PythonAnaconda/3.6
-conda activate /scratch/ws/0/aabc1234-Workproject/conda-virtual-environment/dask-test python=3.6
-which python
-which conda
-conda install dask
-python
-
-from dask.distributed import Client, progress
-client = Client(n_workers=4, threads_per_worker=1)
-client
-```
-
-### Installation Using Pip
-
-You can install everything required for most common uses of Dask (arrays, dataframes, etc)
-
-```Bash
-srun -p ml -N 1 -n 1 --mem-per-cpu=5772 --gres=gpu:1 --time=04:00:00 --pty bash
-
-cd /scratch/ws/0/aabc1234-Workproject/python-virtual-environment/dask-test
-
-ml modenv/ml
-module load PythonAnaconda/3.6
-which python
-
-python3 -m venv --system-site-packages dask-test
-source dask-test/bin/activate
-python -m pip install "dask[complete]"
-
-python
-from dask.distributed import Client, progress
-client = Client(n_workers=4, threads_per_worker=1)
-client
-```
-
-Distributed scheduler
-
-?
-
-## Run Dask on Taurus
-
-The preferred and simplest way to run Dask on HPC systems today both for new, experienced users or
-administrator is to use [dask-jobqueue](https://jobqueue.dask.org/).
-
-You can install dask-jobqueue with `pip` or `conda`
-
-Installation with Pip
-
-```Bash
-srun -p haswell -N 1 -n 1 -c 4 --mem-per-cpu=2583 --time=01:00:00 --pty bash
-cd
-/scratch/ws/0/aabc1234-Workproject/python-virtual-environment/dask-test
-ml modenv/ml module load PythonAnaconda/3.6 which python
-
-source dask-test/bin/activate pip
-install dask-jobqueue --upgrade # Install everything from last released version
-```
-
-Installation with Conda
-
-```Bash
-srun -p haswell -N 1 -n 1 -c 4 --mem-per-cpu=2583 --time=01:00:00 --pty bash
-
-ml modenv/ml module load PythonAnaconda/3.6 source
-dask-test/bin/activate
-
-conda install dask-jobqueue -c conda-forge\</verbatim>
-```
diff --git a/doc.zih.tu-dresden.de/docs/software/python.md b/doc.zih.tu-dresden.de/docs/software/data_analytics_with_python.md
similarity index 77%
rename from doc.zih.tu-dresden.de/docs/software/python.md
rename to doc.zih.tu-dresden.de/docs/software/data_analytics_with_python.md
index 281d1fd99..f5121b33c 100644
--- a/doc.zih.tu-dresden.de/docs/software/python.md
+++ b/doc.zih.tu-dresden.de/docs/software/data_analytics_with_python.md
@@ -130,6 +130,143 @@ the Jupyterhub.
 
 **Keep in mind that the remote Jupyter server can offer more freedom with settings and approaches.**
 
+## Dask
+
+**Dask** is an open-source library for parallel computing. Dask is a flexible library for parallel
+computing in Python.
+
+Dask natively scales Python. It provides advanced parallelism for analytics, enabling performance at
+scale for some of the popular tools. For instance: Dask arrays scale Numpy workflows, Dask
+dataframes scale Pandas workflows, Dask-ML scales machine learning APIs like Scikit-Learn and
+XGBoost.
+
+Dask is composed of two parts:
+
+- Dynamic task scheduling optimized for computation and interactive
+  computational workloads.
+- Big Data collections like parallel arrays, data frames, and lists
+  that extend common interfaces like NumPy, Pandas, or Python
+  iterators to larger-than-memory or distributed environments. These
+  parallel collections run on top of dynamic task schedulers.
+
+Dask supports several user interfaces:
+
+High-Level:
+
+- Arrays: Parallel NumPy
+- Bags: Parallel lists
+- DataFrames: Parallel Pandas
+- Machine Learning : Parallel Scikit-Learn
+- Others from external projects, like XArray
+
+Low-Level:
+
+- Delayed: Parallel function evaluation
+- Futures: Real-time parallel function evaluation
+
+### Installation
+
+### Installation Using Conda
+
+Dask is installed by default in [Anaconda](https://www.anaconda.com/download/). To install/update
+Dask on a Taurus with using the [conda](https://www.anaconda.com/download/) follow the example:
+
+```Bash
+# Job submission in ml nodes with allocating: 1 node, 1 gpu per node, 4 hours
+srun -p ml -N 1 -n 1 --mem-per-cpu=5772 --gres=gpu:1 --time=04:00:00 --pty bash
+```
+
+Create a conda virtual environment. We would recommend using a workspace. See the example (use
+`--prefix` flag to specify the directory).
+
+**Note:** You could work with simple examples in your home directory (where you are loading by
+default). However, in accordance with the
+[HPC storage concept](../data_lifecycle/hpc_storage_concept2019.md) please use a
+[workspaces](../data_lifecycle/workspaces.md) for your study and work projects.
+
+```Bash
+conda create --prefix /scratch/ws/0/aabc1234-Workproject/conda-virtual-environment/dask-test python=3.6
+```
+
+By default, conda will locate the environment in your home directory:
+
+```Bash
+conda create -n dask-test python=3.6
+```
+
+Activate the virtual environment, install Dask and verify the installation:
+
+```Bash
+ml modenv/ml
+ml PythonAnaconda/3.6
+conda activate /scratch/ws/0/aabc1234-Workproject/conda-virtual-environment/dask-test python=3.6
+which python
+which conda
+conda install dask
+python
+
+from dask.distributed import Client, progress
+client = Client(n_workers=4, threads_per_worker=1)
+client
+```
+
+### Installation Using Pip
+
+You can install everything required for most common uses of Dask (arrays, dataframes, etc)
+
+```Bash
+srun -p ml -N 1 -n 1 --mem-per-cpu=5772 --gres=gpu:1 --time=04:00:00 --pty bash
+
+cd /scratch/ws/0/aabc1234-Workproject/python-virtual-environment/dask-test
+
+ml modenv/ml
+module load PythonAnaconda/3.6
+which python
+
+python3 -m venv --system-site-packages dask-test
+source dask-test/bin/activate
+python -m pip install "dask[complete]"
+
+python
+from dask.distributed import Client, progress
+client = Client(n_workers=4, threads_per_worker=1)
+client
+```
+
+Distributed scheduler
+
+?
+
+### Run Dask on Taurus
+
+The preferred and simplest way to run Dask on HPC systems today both for new, experienced users or
+administrator is to use [dask-jobqueue](https://jobqueue.dask.org/).
+
+You can install dask-jobqueue with `pip` or `conda`
+
+Installation with Pip
+
+```Bash
+srun -p haswell -N 1 -n 1 -c 4 --mem-per-cpu=2583 --time=01:00:00 --pty bash
+cd
+/scratch/ws/0/aabc1234-Workproject/python-virtual-environment/dask-test
+ml modenv/ml module load PythonAnaconda/3.6 which python
+
+source dask-test/bin/activate pip
+install dask-jobqueue --upgrade # Install everything from last released version
+```
+
+Installation with Conda
+
+```Bash
+srun -p haswell -N 1 -n 1 -c 4 --mem-per-cpu=2583 --time=01:00:00 --pty bash
+
+ml modenv/ml module load PythonAnaconda/3.6 source
+dask-test/bin/activate
+
+conda install dask-jobqueue -c conda-forge\</verbatim>
+```
+
 ## MPI for Python
 
 Message Passing Interface (MPI) is a standardized and portable
diff --git a/doc.zih.tu-dresden.de/docs/software/rstudio.md b/doc.zih.tu-dresden.de/docs/software/data_analytics_with_rstudio.md
similarity index 100%
rename from doc.zih.tu-dresden.de/docs/software/rstudio.md
rename to doc.zih.tu-dresden.de/docs/software/data_analytics_with_rstudio.md
diff --git a/doc.zih.tu-dresden.de/mkdocs.yml b/doc.zih.tu-dresden.de/mkdocs.yml
index e200c93f7..ca42ea609 100644
--- a/doc.zih.tu-dresden.de/mkdocs.yml
+++ b/doc.zih.tu-dresden.de/mkdocs.yml
@@ -49,8 +49,8 @@ nav:
     - Data Analytics:
       - Overview: software/data_analytics.md
       - Data Analytics with R: software/data_analytics_with_r.md
-      - Data Analytics with RStudio: software/rstudio.md
-      - Data Analytics with Python: software/python.md
+      - Data Analytics with RStudio: software/data_analytics_with_rstudio.md
+      - Data Analytics with Python: software/data_analytics_with_python.md
       - Dask: software/dask.md
       - Power AI: software/power_ai.md
       - Apache Spark, Apache Flink, Apache Hadoop: software/big_data_frameworks.md
-- 
GitLab