From 6bcccd7b85b571e30ce8bcbdee23e167a6bf1fc9 Mon Sep 17 00:00:00 2001 From: Martin Schroschk <martin.schroschk@tu-dresden.de> Date: Wed, 9 Oct 2024 08:03:57 +0200 Subject: [PATCH] Review page - This is the(!) entry page for Data Life Cycle Managment, not filesystems in details - Remove desc. of filesystem properties from this page; Is available in filesystem page --- .../docs/data_lifecycle/overview.md | 52 +++++++------------ 1 file changed, 19 insertions(+), 33 deletions(-) diff --git a/doc.zih.tu-dresden.de/docs/data_lifecycle/overview.md b/doc.zih.tu-dresden.de/docs/data_lifecycle/overview.md index 2985a05cb..8aae31c91 100644 --- a/doc.zih.tu-dresden.de/docs/data_lifecycle/overview.md +++ b/doc.zih.tu-dresden.de/docs/data_lifecycle/overview.md @@ -8,44 +8,30 @@ uniformity of the project can be achieved by taking into account and setting up * a defined **data life cycle management** including the same **data storage** or set of them, * and **access rights** to project data. -The used set of software within an HPC project can be management with environments on different +The used set of software within an HPC project can be managed with environments on different levels either defined by [modules](../software/modules.md), [containers](../software/containers.md) or by [Python virtual environments](../software/python_virtual_environments.md). In the following, a brief overview on relevant topics w.r.t. data life cycle management is provided. ## Data Storage and Management -The main concept of working with data on ZIH systems bases on [Workspaces](workspaces.md). Use it -properly: - -* use your personal `/home` directory for the limited amount of personal data, simple examples and - the results of calculations. Your `home` directory is not a working directory! However, `/home` - filesystem is [backed up](#backup); -* use `workspaces` as a place for your working data (i.e. data sets). Recommendations of choosing - the correct filesystem for your workspaces is presented in the following subsection. - -### Taxonomy of Filesystems - -It is important to design your data workflow according to characteristics, like I/O footprint -(bandwidth/IOPS) of the application, size of the data, (number of files,) and duration of the -storage to efficiently use the provided storage and filesystems. -The page [filesystems](file_systems.md) holds a comprehensive documentation on the different -filesystems. - -!!! hint "Recommendations to choose of storage system" - - * For a series of calculations that works on the same data please use a `scratch` based - [workspace](workspaces.md). - * For data that seldom changes but consumes a lot of space, the - [`walrus` filesystem](working.md) can be used. - * If your batch job needs a directory for temporary data then node-local storage (`/tmp`) is - a good choice. The data will be deleted when the job has finished. The subsection - [Node-Local Storage in Jobs](../jobs_and_resources/slurm.md#node-local-storage-in-jobs) holds - valuable information on this topic. - -Keep in mind that every workspace has a storage duration. Thus, be careful with the expire date -otherwise it could vanish. The core data of your project should be [backed up](#backup) and the most -important data should be [archived (long-term preservation)](longterm_preservation.md). +In general, you should separate your data and store it on the appropriate storage and filesystem. +What is the appropriate storage and filesystem depends on the amount/volume of data and its kind, +and might differ over time. Please note the following rules of thumb: + +* Use your personal `/home` directory for the limited amount of *personal data*, e.g., simple + examples and the results of calculations. Your `/home` directory is not a working directory! + However, `/home` filesystem is [backed up](#backup). The section + [Global `/home` Filesystem](permanent.md#global-home-filesystem) provides additional information. +* Use your `/project` directory for project-related data. This directory enables collaboration by + sharing data with colleagues and project members. Please refer to the section + [Global `/projects` Filesystem](permanent.md#global-projects-filesystem) for further information. +* Use [`workspaces`](workspaces.md) as a place for your *working data* (i.e. data sets). + Recommendations of choosing the most suitable filesystem for your workspaces is presented on the + page [Working Filesystems](working.md). +* Use the [Intermediate Archive](intermediate_archive.md) and the + [Long-Term Archive](longterm_preservation.md) to store all kind of data that needs to be kept + for a long time, e.g. result data. ### Backup @@ -117,7 +103,7 @@ Don't forget about data hygiene: Classify your current data into critical (need its life cycle (from creation, storage and use to sharing, archiving and destruction); Erase the data you don’t need throughout its life cycle. -## Access Rights +### Access Rights The concept of **permissions** and **ownership** is crucial in Linux. See the [slides of HPC introduction](../misc/HPC-Introduction.pdf) for understanding of the main concept. -- GitLab