Skip to content
Snippets Groups Projects
Commit 54099187 authored by Martin Schroschk's avatar Martin Schroschk
Browse files

Fix todos, fix spelling

parent d1d39941
No related branches found
No related tags found
3 merge requests!392Merge preview into contrib guide for browser users,!377Merge preview into main,!371Issue198
......@@ -18,11 +18,11 @@ In the following, a brief overview on relevant topics w.r.t. data life cycle man
The main concept of working with data on ZIH systems bases on [Workspaces](workspaces.md). Use it
properly:
* use a `/home` directory for the limited amount of personal data, simple examples and the results
of calculations. The home directory is not a working directory! However, `/home` filesystem is
[backed up](#backup) using snapshots;
* use `workspaces` as a place for working data (i.e. data sets); Recommendations of choosing the
correct storage system for workspace presented below.
* use a `/home` directory for the limited amount of personal data, simple examples and the results
of calculations. The home directory is not a working directory! However, `/home` filesystem is
[backed up](#backup) using snapshots;
* use `workspaces` as a place for working data (i.e. data sets); Recommendations of choosing the
correct storage system for workspace presented below.
### Taxonomy of Filesystems
......@@ -31,31 +31,22 @@ It is important to design your data workflow according to characteristics, like
storage to efficiently use the provided storage and filesystems.
The page [filesystems](file_systems.md) holds a comprehensive documentation on the different
filesystems.
<!--In general, the mechanisms of
so-called--> <!--[Workspaces](workspaces.md) are compulsory for all HPC users to store data for a
defined duration ---> <!--depending on the requirements and the storage system this time span might
range from days to a few--> <!--years.-->
<!--- [HPC filesystems](file_systems.md)-->
<!--- [Intermediate Archive](intermediate_archive.md)-->
<!--- [Special data containers] **todo** Special data containers (was no valid link in old compendium)-->
<!--- [Move data between filesystems](../data_transfer/data_mover.md)-->
<!--- [Move data to/from ZIH's filesystems](../data_transfer/export_nodes.md)-->
<!--- [Longterm Preservation for ResearchData](preservation_research_data.md)-->
!!! hint "Recommendations to choose of storage system"
* For data that seldom changes but consumes a lot of space, the
[warm_archive](file_systems.md#warm_archive) can be used.
(Note that this is mounted **read-only** on the compute nodes).
* For a series of calculations that works on the same data please use a `scratch` based [workspace](workspaces.md).
* For a series of calculations that works on the same data please use a `scratch` based
[workspace](workspaces.md).
* **SSD**, in its turn, is the fastest available filesystem made only for large parallel
applications running with millions of small I/O (input, output operations).
* If the batch job needs a directory for temporary data then **SSD** is a good choice as well.
The data can be deleted afterwards.
Keep in mind that every workspace has a storage duration. Thus, be careful with the expire date
otherwise it could vanish. The core data of your project should be [backed up](#backup) and
[archived]**todo link** (for the most [important]**todo link** data).
otherwise it could vanish. The core data of your project should be [backed up](#backup) and the most
important data should be [archived](preservation_research_data.md).
### Backup
......@@ -73,13 +64,13 @@ files can be restored directly by the users. Details can be found
Organizing of living data using the filesystem helps for consistency of the
project. We recommend following the rules for your work regarding:
* Organizing the data: Never change the original data; Automatize the organizing the data; Clearly
separate intermediate and final output in the filenames; Carry identifier and original name
along in your analysis pipeline; Make outputs clearly identifiable; Document your analysis
steps.
* Naming Data: Keep short, but meaningful names; Keep standard file endings; File names
don’t replace documentation and metadata; Use standards of your discipline; Make rules for your
project, document and keep them (See the [README recommendations]**todo link** below)
* Organizing the data: Never change the original data; Automatize the organizing the data; Clearly
separate intermediate and final output in the filenames; Carry identifier and original name
along in your analysis pipeline; Make outputs clearly identifiable; Document your analysis
steps.
* Naming Data: Keep short, but meaningful names; Keep standard file endings; File names
don’t replace documentation and metadata; Use standards of your discipline; Make rules for your
project, document and keep them (See the [README recommendations](#readme-recommendation) below)
This is the example of an organization (hierarchical) for the folder structure. Use it as a visual
illustration of the above:
......@@ -128,49 +119,10 @@ Don't forget about data hygiene: Classify your current data into critical (need
its life cycle (from creation, storage and use to sharing, archiving and destruction); Erase the data
you don’t need throughout its life cycle.
<!--## Software Packages-->
<!--As was written before the module concept is the basic concept for using software on ZIH systems.-->
<!--Uniformity of the project has to be achieved by using the same set of software on different levels.-->
<!--It could be done by using environments. There are two types of environments should be distinguished:-->
<!--runtime environment (the project level, use scripts to load [modules]**todo link**), Python virtual-->
<!--environment. The concept of the environment will give an opportunity to use the same version of the-->
<!--software on every level of the project for every project member.-->
<!--### Private Individual and Project Modules Files-->
<!--[Private individual and project module files]**todo link** will be discussed in [chapter 7]**todo-->
<!--link**. Project modules list is a powerful instrument for effective teamwork.-->
<!--### Python Virtual Environment-->
<!--If you are working with the Python then it is crucial to use the virtual environment on ZIH systems. The-->
<!--main purpose of Python virtual environments (don't mess with the software environment for modules)-->
<!--is to create an isolated environment for Python projects (self-contained directory tree that-->
<!--contains a Python installation for a particular version of Python, plus a number of additional-->
<!--packages).-->
<!--**Vitualenv (venv)** is a standard Python tool to create isolated Python environments. We-->
<!--recommend using venv to work with Tensorflow and Pytorch on ZIH systems. It has been integrated into the-->
<!--standard library under the [venv module]**todo link**. **Conda** is the second way to use a virtual-->
<!--environment on the ZIH systems. Conda is an open-source package management system and environment-->
<!--management system from the Anaconda.-->
<!--[Detailed information]**todo link** about using the virtual environment.-->
<!--## Application Software Availability-->
<!--Software created for the purpose of the project should be available for all members of the group.-->
<!--The instruction of how to use the software: installation of packages, compilation etc should be-->
<!--documented and gives the opportunity to comfort efficient and safe work.-->
## Access rights
## Access Rights
The concept of **permissions** and **ownership** is crucial in Linux. See the
[HPC-introduction]**todo link** slides for the understanding of the main concept. Standard Linux
changing permission command (i.e `chmod`) valid for ZIH systems as well. The **group** access level
contains members of your project group. Be careful with 'write' permission and never allow to change
the original data.
Useful links: [Data Management]**todo link**, [Filesystems]**todo link**,
[Project Management]**todo link**, [Preservation research data[**todo link**
[slides of HPC introduction](../misc/HPC-Introduction.pdf) for understanding of the main concept.
Standard Linux changing permission command (i.e `chmod`) valid for ZIH systems as well. The
**group** access level contains members of your project group. Be careful with 'write' permission
and never allow to change the original data.
......@@ -84,6 +84,7 @@ FMA
foreach
Fortran
Galilei
IOPS
Gauss
Gaussian
GBit
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment