diff --git a/doc.zih.tu-dresden.de/docs/data_lifecycle/overview.md b/doc.zih.tu-dresden.de/docs/data_lifecycle/overview.md index bdbaa5a1523ec2fc06150195e18764cf14b618ef..6ca492527e390a3d2c04bf181c2074767ac4404f 100644 --- a/doc.zih.tu-dresden.de/docs/data_lifecycle/overview.md +++ b/doc.zih.tu-dresden.de/docs/data_lifecycle/overview.md @@ -18,11 +18,11 @@ In the following, a brief overview on relevant topics w.r.t. data life cycle man The main concept of working with data on ZIH systems bases on [Workspaces](workspaces.md). Use it properly: - * use a `/home` directory for the limited amount of personal data, simple examples and the results - of calculations. The home directory is not a working directory! However, `/home` filesystem is - [backed up](#backup) using snapshots; - * use `workspaces` as a place for working data (i.e. data sets); Recommendations of choosing the - correct storage system for workspace presented below. +* use a `/home` directory for the limited amount of personal data, simple examples and the results + of calculations. The home directory is not a working directory! However, `/home` filesystem is + [backed up](#backup) using snapshots; +* use `workspaces` as a place for working data (i.e. data sets); Recommendations of choosing the + correct storage system for workspace presented below. ### Taxonomy of Filesystems @@ -31,31 +31,22 @@ It is important to design your data workflow according to characteristics, like storage to efficiently use the provided storage and filesystems. The page [filesystems](file_systems.md) holds a comprehensive documentation on the different filesystems. -<!--In general, the mechanisms of -so-called--> <!--[Workspaces](workspaces.md) are compulsory for all HPC users to store data for a -defined duration ---> <!--depending on the requirements and the storage system this time span might -range from days to a few--> <!--years.--> -<!--- [HPC filesystems](file_systems.md)--> -<!--- [Intermediate Archive](intermediate_archive.md)--> -<!--- [Special data containers] **todo** Special data containers (was no valid link in old compendium)--> -<!--- [Move data between filesystems](../data_transfer/data_mover.md)--> -<!--- [Move data to/from ZIH's filesystems](../data_transfer/export_nodes.md)--> -<!--- [Longterm Preservation for ResearchData](preservation_research_data.md)--> !!! hint "Recommendations to choose of storage system" * For data that seldom changes but consumes a lot of space, the [warm_archive](file_systems.md#warm_archive) can be used. (Note that this is mounted **read-only** on the compute nodes). - * For a series of calculations that works on the same data please use a `scratch` based [workspace](workspaces.md). + * For a series of calculations that works on the same data please use a `scratch` based + [workspace](workspaces.md). * **SSD**, in its turn, is the fastest available filesystem made only for large parallel applications running with millions of small I/O (input, output operations). * If the batch job needs a directory for temporary data then **SSD** is a good choice as well. The data can be deleted afterwards. Keep in mind that every workspace has a storage duration. Thus, be careful with the expire date -otherwise it could vanish. The core data of your project should be [backed up](#backup) and -[archived]**todo link** (for the most [important]**todo link** data). +otherwise it could vanish. The core data of your project should be [backed up](#backup) and the most +important data should be [archived](preservation_research_data.md). ### Backup @@ -73,13 +64,13 @@ files can be restored directly by the users. Details can be found Organizing of living data using the filesystem helps for consistency of the project. We recommend following the rules for your work regarding: - * Organizing the data: Never change the original data; Automatize the organizing the data; Clearly - separate intermediate and final output in the filenames; Carry identifier and original name - along in your analysis pipeline; Make outputs clearly identifiable; Document your analysis - steps. - * Naming Data: Keep short, but meaningful names; Keep standard file endings; File names - don’t replace documentation and metadata; Use standards of your discipline; Make rules for your - project, document and keep them (See the [README recommendations]**todo link** below) +* Organizing the data: Never change the original data; Automatize the organizing the data; Clearly + separate intermediate and final output in the filenames; Carry identifier and original name + along in your analysis pipeline; Make outputs clearly identifiable; Document your analysis + steps. +* Naming Data: Keep short, but meaningful names; Keep standard file endings; File names + don’t replace documentation and metadata; Use standards of your discipline; Make rules for your + project, document and keep them (See the [README recommendations](#readme-recommendation) below) This is the example of an organization (hierarchical) for the folder structure. Use it as a visual illustration of the above: @@ -128,49 +119,10 @@ Don't forget about data hygiene: Classify your current data into critical (need its life cycle (from creation, storage and use to sharing, archiving and destruction); Erase the data you don’t need throughout its life cycle. -<!--## Software Packages--> - -<!--As was written before the module concept is the basic concept for using software on ZIH systems.--> -<!--Uniformity of the project has to be achieved by using the same set of software on different levels.--> -<!--It could be done by using environments. There are two types of environments should be distinguished:--> -<!--runtime environment (the project level, use scripts to load [modules]**todo link**), Python virtual--> -<!--environment. The concept of the environment will give an opportunity to use the same version of the--> -<!--software on every level of the project for every project member.--> - -<!--### Private Individual and Project Modules Files--> - -<!--[Private individual and project module files]**todo link** will be discussed in [chapter 7]**todo--> -<!--link**. Project modules list is a powerful instrument for effective teamwork.--> - -<!--### Python Virtual Environment--> - -<!--If you are working with the Python then it is crucial to use the virtual environment on ZIH systems. The--> -<!--main purpose of Python virtual environments (don't mess with the software environment for modules)--> -<!--is to create an isolated environment for Python projects (self-contained directory tree that--> -<!--contains a Python installation for a particular version of Python, plus a number of additional--> -<!--packages).--> - -<!--**Vitualenv (venv)** is a standard Python tool to create isolated Python environments. We--> -<!--recommend using venv to work with Tensorflow and Pytorch on ZIH systems. It has been integrated into the--> -<!--standard library under the [venv module]**todo link**. **Conda** is the second way to use a virtual--> -<!--environment on the ZIH systems. Conda is an open-source package management system and environment--> -<!--management system from the Anaconda.--> - -<!--[Detailed information]**todo link** about using the virtual environment.--> - -<!--## Application Software Availability--> - -<!--Software created for the purpose of the project should be available for all members of the group.--> -<!--The instruction of how to use the software: installation of packages, compilation etc should be--> -<!--documented and gives the opportunity to comfort efficient and safe work.--> - -## Access rights +## Access Rights The concept of **permissions** and **ownership** is crucial in Linux. See the -[HPC-introduction]**todo link** slides for the understanding of the main concept. Standard Linux -changing permission command (i.e `chmod`) valid for ZIH systems as well. The **group** access level -contains members of your project group. Be careful with 'write' permission and never allow to change -the original data. - -Useful links: [Data Management]**todo link**, [Filesystems]**todo link**, -[Project Management]**todo link**, [Preservation research data[**todo link** +[slides of HPC introduction](../misc/HPC-Introduction.pdf) for understanding of the main concept. +Standard Linux changing permission command (i.e `chmod`) valid for ZIH systems as well. The +**group** access level contains members of your project group. Be careful with 'write' permission +and never allow to change the original data. diff --git a/doc.zih.tu-dresden.de/wordlist.aspell b/doc.zih.tu-dresden.de/wordlist.aspell index 51dae5ace01b47f301eff7d29a1c7fd6078eaa42..60e8bac5c32d5c5ad80027488dd577a05ffb9568 100644 --- a/doc.zih.tu-dresden.de/wordlist.aspell +++ b/doc.zih.tu-dresden.de/wordlist.aspell @@ -84,6 +84,7 @@ FMA foreach Fortran Galilei +IOPS Gauss Gaussian GBit