Skip to content
Snippets Groups Projects
Commit 6bcccd7b authored by Martin Schroschk's avatar Martin Schroschk
Browse files

Review page

- This is the(!) entry page for Data Life Cycle Managment, not
  filesystems in details
- Remove desc. of filesystem properties from this page; Is available in
  filesystem page
parent 8572e36b
No related branches found
No related tags found
2 merge requests!1138Automated merge from preview to main,!1134Review documentation w.r.t. filesystems and hardware
...@@ -8,44 +8,30 @@ uniformity of the project can be achieved by taking into account and setting up ...@@ -8,44 +8,30 @@ uniformity of the project can be achieved by taking into account and setting up
* a defined **data life cycle management** including the same **data storage** or set of them, * a defined **data life cycle management** including the same **data storage** or set of them,
* and **access rights** to project data. * and **access rights** to project data.
The used set of software within an HPC project can be management with environments on different The used set of software within an HPC project can be managed with environments on different
levels either defined by [modules](../software/modules.md), [containers](../software/containers.md) levels either defined by [modules](../software/modules.md), [containers](../software/containers.md)
or by [Python virtual environments](../software/python_virtual_environments.md). or by [Python virtual environments](../software/python_virtual_environments.md).
In the following, a brief overview on relevant topics w.r.t. data life cycle management is provided. In the following, a brief overview on relevant topics w.r.t. data life cycle management is provided.
## Data Storage and Management ## Data Storage and Management
The main concept of working with data on ZIH systems bases on [Workspaces](workspaces.md). Use it In general, you should separate your data and store it on the appropriate storage and filesystem.
properly: What is the appropriate storage and filesystem depends on the amount/volume of data and its kind,
and might differ over time. Please note the following rules of thumb:
* use your personal `/home` directory for the limited amount of personal data, simple examples and
the results of calculations. Your `home` directory is not a working directory! However, `/home` * Use your personal `/home` directory for the limited amount of *personal data*, e.g., simple
filesystem is [backed up](#backup); examples and the results of calculations. Your `/home` directory is not a working directory!
* use `workspaces` as a place for your working data (i.e. data sets). Recommendations of choosing However, `/home` filesystem is [backed up](#backup). The section
the correct filesystem for your workspaces is presented in the following subsection. [Global `/home` Filesystem](permanent.md#global-home-filesystem) provides additional information.
* Use your `/project` directory for project-related data. This directory enables collaboration by
### Taxonomy of Filesystems sharing data with colleagues and project members. Please refer to the section
[Global `/projects` Filesystem](permanent.md#global-projects-filesystem) for further information.
It is important to design your data workflow according to characteristics, like I/O footprint * Use [`workspaces`](workspaces.md) as a place for your *working data* (i.e. data sets).
(bandwidth/IOPS) of the application, size of the data, (number of files,) and duration of the Recommendations of choosing the most suitable filesystem for your workspaces is presented on the
storage to efficiently use the provided storage and filesystems. page [Working Filesystems](working.md).
The page [filesystems](file_systems.md) holds a comprehensive documentation on the different * Use the [Intermediate Archive](intermediate_archive.md) and the
filesystems. [Long-Term Archive](longterm_preservation.md) to store all kind of data that needs to be kept
for a long time, e.g. result data.
!!! hint "Recommendations to choose of storage system"
* For a series of calculations that works on the same data please use a `scratch` based
[workspace](workspaces.md).
* For data that seldom changes but consumes a lot of space, the
[`walrus` filesystem](working.md) can be used.
* If your batch job needs a directory for temporary data then node-local storage (`/tmp`) is
a good choice. The data will be deleted when the job has finished. The subsection
[Node-Local Storage in Jobs](../jobs_and_resources/slurm.md#node-local-storage-in-jobs) holds
valuable information on this topic.
Keep in mind that every workspace has a storage duration. Thus, be careful with the expire date
otherwise it could vanish. The core data of your project should be [backed up](#backup) and the most
important data should be [archived (long-term preservation)](longterm_preservation.md).
### Backup ### Backup
...@@ -117,7 +103,7 @@ Don't forget about data hygiene: Classify your current data into critical (need ...@@ -117,7 +103,7 @@ Don't forget about data hygiene: Classify your current data into critical (need
its life cycle (from creation, storage and use to sharing, archiving and destruction); Erase the data its life cycle (from creation, storage and use to sharing, archiving and destruction); Erase the data
you don’t need throughout its life cycle. you don’t need throughout its life cycle.
## Access Rights ### Access Rights
The concept of **permissions** and **ownership** is crucial in Linux. See the The concept of **permissions** and **ownership** is crucial in Linux. See the
[slides of HPC introduction](../misc/HPC-Introduction.pdf) for understanding of the main concept. [slides of HPC introduction](../misc/HPC-Introduction.pdf) for understanding of the main concept.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment