Skip to content
Snippets Groups Projects

Draft: Restructuring of Data Analytics and Machine Learning sections

Closed Christoph Lehmann requested to merge data_analytics_ML_neustrukturierung into preview
  • create new area for Data Analytics:
    • Overview: software/data_analytics.md
    • Data Analytics with R: software/data_analytics_with_r.md
    • Data Analytics with RStudio: software/data_analytics_with_rstudio.md
    • Data Analytics with Python: software/data_analytics_with_python.md
    • Apache Spark: software/big_data_frameworks_spark.md
  • create new area for Machine Learning:
    • Overview: software/machine_learning.md
    • TensorFlow: software/tensorflow.md
    • PyTorch: software/pytorch.md
    • Tensorboard: software/tensorboard.md
    • Distributed Training: software/distributed_training.md
    • Hyperparameter Optimization (OmniOpt): software/hyperparameter_optimization.md
  • fuse information from the following pages into the structure above and delete these:
    • docs/software/data_analytics_with_r.md
    • docs/software/deep_learning.md
    • docs/software/get_started_with_hpcda.md
    • docs/software/keras.md,
    • docs/software/machine_learning.md
    • docs/software/python.md
    • docs/software/tensor_flow_container_on_hpcda.md
    • docs/software/tensor_flow.md
    • docs/software/tensor_flow_on_jupyter_notebook.md

Closes #157 (closed), #156 (closed), #155 (closed), #154 (closed), #153 (closed), #152 (closed), #151 (closed), #120 (closed), #119 (closed), #118 (closed), #112 (closed), #111 (closed), #110 (closed), #105 (closed), #103 (closed), #101 (closed), #99 (closed), #98 (closed)

Edited by Jan Frenzel

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
159 - Machine Learning : Parallel Scikit-Learn
160 - Others from external projects, like XArray
161
162 Low-Level:
163
164 - Delayed: Parallel function evaluation
165 - Futures: Real-time parallel function evaluation
166
167 ### Installation
168
169 ### Installation Using Conda
170
171 Dask is installed by default in [Anaconda](https://www.anaconda.com/download/). To install/update
172 Dask on a Taurus with using the [conda](https://www.anaconda.com/download/) follow the example:
173
174 ```Bash
  • 152 Dask supports several user interfaces:
    153
    154 High-Level:
    155
    156 - Arrays: Parallel NumPy
    157 - Bags: Parallel lists
    158 - DataFrames: Parallel Pandas
    159 - Machine Learning : Parallel Scikit-Learn
    160 - Others from external projects, like XArray
    161
    162 Low-Level:
    163
    164 - Delayed: Parallel function evaluation
    165 - Futures: Real-time parallel function evaluation
    166
    167 ### Installation
  • 223 module load PythonAnaconda/3.6
    224 which python
    225
    226 python3 -m venv --system-site-packages dask-test
    227 source dask-test/bin/activate
    228 python -m pip install "dask[complete]"
    229
    230 python
    231 from dask.distributed import Client, progress
    232 client = Client(n_workers=4, threads_per_worker=1)
    233 client
    234 ```
    235
    236 Distributed scheduler
    237
    238 ?
  • 225
    226 python3 -m venv --system-site-packages dask-test
    227 source dask-test/bin/activate
    228 python -m pip install "dask[complete]"
    229
    230 python
    231 from dask.distributed import Client, progress
    232 client = Client(n_workers=4, threads_per_worker=1)
    233 client
    234 ```
    235
    236 Distributed scheduler
    237
    238 ?
    239
    240 ### Run Dask on Taurus
  • 234 ```
    235
    236 Distributed scheduler
    237
    238 ?
    239
    240 ### Run Dask on Taurus
    241
    242 The preferred and simplest way to run Dask on HPC systems today both for new, experienced users or
    243 administrator is to use [dask-jobqueue](https://jobqueue.dask.org/).
    244
    245 You can install dask-jobqueue with `pip` or `conda`
    246
    247 Installation with Pip
    248
    249 ```Bash
  • 3 3 On the machine learning nodes, you can use the tools from [IBM Power
    4 4 AI](power_ai.md).
    5 5
    6 # Get started with HPC-DA
    7
    8 HPC-DA (High-Performance Computing and Data Analytics) is a part of TU-Dresden general purpose HPC
    9 cluster (Taurus). HPC-DA is the best **option** for **Machine learning, Deep learning** applications
    10 and tasks connected with the big data.
    11
    12 **This is an introduction of how to run machine learning applications on the HPC-DA system.**
    13
    14 The main **aim** of this guide is to help users who have started working with Taurus and focused on
  • 3 3 On the machine learning nodes, you can use the tools from [IBM Power
    4 4 AI](power_ai.md).
    5 5
    6 # Get started with HPC-DA
    7
    8 HPC-DA (High-Performance Computing and Data Analytics) is a part of TU-Dresden general purpose HPC
    9 cluster (Taurus). HPC-DA is the best **option** for **Machine learning, Deep learning** applications
    10 and tasks connected with the big data.
    11
    12 **This is an introduction of how to run machine learning applications on the HPC-DA system.**
    13
    14 The main **aim** of this guide is to help users who have started working with Taurus and focused on
    15 working with Machine learning frameworks such as TensorFlow or Pytorch.
  • 37
    38 The main feature of the Power9 architecture (ppc64le) is the ability to work the
    39 [NVIDIA Tesla V100](https://www.nvidia.com/en-gb/data-center/tesla-v100/) GPU with **NV-Link**
    40 support. NV-Link technology allows increasing a total bandwidth of 300 gigabytes per second (GB/sec)
    41
    42 - 10X the bandwidth of PCIe Gen 3. The bandwidth is a crucial factor for deep learning and machine
    43 learning applications.
    44
    45 **Note:** The Power9 architecture not so common as an x86 architecture. This means you are not so
    46 flexible with choosing applications for your projects. Even so, the main tools and applications are
    47 available. See available modules here.
    48
    49 **Please use the ml partition if you need GPUs!** Otherwise using the x86 partitions (e.g Haswell)
    50 most likely would be more beneficial.
    51
    52 ## Start your application
  • Martin Schroschk marked this merge request as draft

    marked this merge request as draft

  • Taras Lazariv added 1 commit

    added 1 commit

    • f3f11d9e - Move rstudio part to new file and update launcher image

    Compare with previous version

  • 3 3 On the machine learning nodes, you can use the tools from [IBM Power
    4 4 AI](power_ai.md).
    5 5
    6 # Get started with HPC-DA
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading