Merge branch 'filesystems.md' into 'preview'

Filesystems.md See merge request !223

Merge branch 'filesystems.md' into 'preview'
bfd3f227 · Michael Müller · 966daab0 · 70ad11f4 · bfd3f227 · bfd3f227
Commit bfd3f227 authored 3 years ago by Michael Müller
--- a/doc.zih.tu-dresden.de/docs/data_lifecycle/file_systems.md
+++ b/doc.zih.tu-dresden.de/docs/data_lifecycle/file_systems.md
-# File Systems
+# Overview

 As soon as you have access to ZIH systems you have to manage your data. Several file systems are
 available. Each file system serves for special purpose according to their respective capacity,
 performance and permanence.

-## Permanent File Systems
-
-### Global /home File System
-
-Each user has 50 GB in a `/home` directory independent of the granted capacity for the project.
-Hints for the usage of the global home directory:
-
- If you need distinct `.bashrc` files for each machine, you should
-  create separate files for them, named `.bashrc_<machine_name>`
- If you use various machines frequently, it might be useful to set
-  the environment variable HISTFILE in `.bashrc_deimos` and
-  `.bashrc_mars` to `$HOME/.bash_history_<machine_name>`. Setting
-  HISTSIZE and HISTFILESIZE to 10000 helps as well.
- Further, you may use private module files to simplify the process of
-  loading the right installation directories, see
-  **todo link: private modules - AnchorPrivateModule**.
-
-### Global /projects File System
-
-For project data, we have a global project directory, that allows better collaboration between the
-members of an HPC project. However, for compute nodes /projects is mounted as read-only, because it
-is not a filesystem for parallel I/O. See below and also check the
-**todo link: HPC introduction - %PUBURL%/Compendium/WebHome/HPC-Introduction.pdf** for more details.
-
-### Backup and Snapshots of the File System
-
- Backup is **only** available in the `/home` and the `/projects` file systems!
- Files are backed up using snapshots of the NFS server and can be restored by the user
- A changed file can always be recovered as it was at the time of the snapshot
- Snapshots are taken:
-  - From Monday through Saturday between 06:00 and 18:00 every two hours and kept for one day
-    (7 snapshots)
-  - From Monday through Saturday at 23:30 and kept for two weeks (12 snapshots)
-  - Every Sunday st 23:45 and kept for 26 weeks
- To restore a previous version of a file:
-  - Go into the directory of the file you want to restore
-  - Run `cd .snapshot` (this subdirectory exists in every directory on the `/home` file system
-    although it is not visible with `ls -a`)
-  - In the .snapshot-directory are all available snapshots listed
-  - Just `cd` into the directory of the point in time you wish to restore and copy the file you
-    wish to restore to where you want it
-  - **Attention** The `.snapshot` directory is not only hidden from normal view (`ls -a`), it is
-    also embedded in a different directory structure. An `ls ../..` will not list the directory
-    where you came from. Thus, we recommend to copy the file from the location where it
-    originally resided:
-    `pwd /home/username/directory_a % cp .snapshot/timestamp/lostfile lostfile.backup`
- `/home` and `/projects/` are definitely NOT made as a work directory:
-  since all files are kept in the snapshots and in the backup tapes over a long time, they
-  - Senseless fill the disks and
-  - Prevent the backup process by their sheer number and volume from working efficiently.
-
-### Group Quotas for the File System
-
-The quotas of the home file system are meant to help the users to keep in touch with their data.
-Especially in HPC, it happens that millions of temporary files are created within hours. This is the
-main reason for performance degradation of the file system. If a project exceeds its quota (total
-size OR total number of files) it cannot submit jobs into the batch system. The following commands
-can be used for monitoring:
-
- `showquota` shows your projects' usage of the file system.
- `quota -s -f /home` shows the user's usage of the file system.
-
-In case a project is above it's limits please ...
-
- Remove core dumps, temporary data
- Talk with your colleagues to identify the hotspots,
- Check your workflow and use /tmp or the scratch file systems for temporary files
- *Systematically* handle your important data:
-  - For later use (weeks...months) at the HPC systems, build tar
-    archives with meaningful names or IDs and store e.g. them in an
-    [archive](intermediate_archive.md).
-  - Refer to the hints for [long term preservation for research data](preservation_research_data.md).
-
 ## Work Directories

 | File system | Usable directory  | Capacity | Availability | Backup | Remarks                                                                                                                                                         |
@@ -84,25 +11,7 @@ In case a project is above it's limits please ...
 | `Lustre`    | `/scratch/`       | 4 PB     | global       | No     | Only accessible via **todo link: workspaces - WorkSpaces**. Not made for billions of files!                                                                                   |
 | `Lustre`    | `/lustre/ssd`     | 40 TB    | global       | No     | Only accessible via **todo link: workspaces - WorkSpaces**. For small I/O operations                                                                                          |
 | `BeeGFS`    | `/beegfs/global0` | 232 TB   | global       | No     | Only accessible via **todo link: workspaces - WorkSpaces**. Fastest available file system, only for large parallel applications running with millions of small I/O operations |
-| `ext4`      | `/tmp`            | 95.0 GB  | local        | No     | is cleaned up after the job automatically                                                                                                                       |
-
-### Large Files in /scratch
-
-The data containers in Lustre are called object storage targets (OST).  The capacity of one OST is
-about 21 TB. All files are striped over a certain number of these OSTs. For small and medium files,
-the default number is 2. As soon as a file grows above \~1 TB it makes sense to spread it over a
-higher number of OSTs, eg. 16. Once the file system is used \> 75%, the average space per OST is
-only 5 GB. So, it is essential to split your larger files so that the chunks can be saved!
-
-Lets assume you have a dierctory where you tar your results, e.g.  `/scratch/mark/tar`. Now, simply
-set the stripe count to a higher number in this directory with:
-
-```Bash
-lfs setstripe -c 20  /scratch/ws/mark-stripe20/tar
-```
-
-**Note:** This does not affect existing files. But all files that **will be created** in this
-directory will be distributed over 20 OSTs.
+| `ext4`      | `/tmp`            | 95.0 GB  | local        | No     | is cleaned up after the job automatically  |

 ## Warm Archive

@@ -167,45 +76,6 @@ output.
 We do **not recommend** the usage of the "du"-command for this purpose.  It is able to cause issues
 for other users, while reading data from the filesystem.

-### Lustre File System
-
-These commands work for `/scratch` and `/ssd`.
-
-#### Listing Disk Usages per OST and MDT
-
-```Bash
-lfs quota -h -u username /path/to/my/data
-```
-
-It is possible to display the usage on each OST by adding the "-v"-parameter.
-
-#### Listing space usage per OST and MDT
-
-```Bash
-lfs df -h /path/to/my/data
-```
-
-#### Listing inode usage for an specific path
-
-```Bash
-lfs df -i /path/to/my/data
-```
-
-#### Listing OSTs
-
-```Bash
-lfs osts /path/to/my/data
-```
-
-#### View striping information
-
-```Bash
-lfs getstripe myfile
-lfs getstripe -d mydirectory
-```
-
-The `-d`-parameter will also display striping for all files in the directory
-
 ### BeeGFS

 Commands to work with the BeeGFS file system.

--- a/doc.zih.tu-dresden.de/docs/data_lifecycle/lustre.md
+++ b/doc.zih.tu-dresden.de/docs/data_lifecycle/lustre.md
+# Lustre File System(s)
+
+## Large Files in /scratch
+
+The data containers in Lustre are called object storage targets (OST).  The capacity of one OST is
+about 21 TB. All files are striped over a certain number of these OSTs. For small and medium files,
+the default number is 2. As soon as a file grows above \~1 TB it makes sense to spread it over a
+higher number of OSTs, eg. 16. Once the file system is used \> 75%, the average space per OST is
+only 5 GB. So, it is essential to split your larger files so that the chunks can be saved!
+
+Lets assume you have a dierctory where you tar your results, e.g.  `/scratch/mark/tar`. Now, simply
+set the stripe count to a higher number in this directory with:
+
+```Bash
+lfs setstripe -c 20  /scratch/ws/mark-stripe20/tar
+```
+
+**Note:** This does not affect existing files. But all files that **will be created** in this
+directory will be distributed over 20 OSTs.
+
+## Useful Commands for Lustre
+
+These commands work for `/scratch` and `/ssd`.
+
+### Listing Disk Usages per OST and MDT
+
+```Bash
+lfs quota -h -u username /path/to/my/data
+```
+
+It is possible to display the usage on each OST by adding the "-v"-parameter.
+
+### Listing space usage per OST and MDT
+
+```Bash
+lfs df -h /path/to/my/data
+```
+
+### Listing inode usage for an specific path
+
+```Bash
+lfs df -i /path/to/my/data
+```
+
+### Listing OSTs
+
+```Bash
+lfs osts /path/to/my/data
+```
+
+### View striping information
+
+```Bash
+lfs getstripe myfile
+lfs getstripe -d mydirectory
+```
+
+The `-d`-parameter will also display striping for all files in the directory.
--- a/doc.zih.tu-dresden.de/docs/data_lifecycle/permanent.md
+++ b/doc.zih.tu-dresden.de/docs/data_lifecycle/permanent.md
+# Permanent File Systems
+
+## Global /home File System
+
+Each user has 50 GB in a `/home` directory independent of the granted capacity for the project.
+Hints for the usage of the global home directory:
+
+- Do not use your `/home` as work directory: Frequent changes (like temporary output from a
+  running job) would fill snapshots and backups (see below).
+- If you need distinct `.bashrc` files for each machine, you should
+  create separate files for them, named `.bashrc_<machine_name>`
+- Further, you may use private module files to simplify the process of
+  loading the right installation directories, see
+  **todo link: private modules - AnchorPrivateModule**.
+
+## Global /projects File System
+
+For project data, we have a global project directory, that allows better collaboration between the
+members of an HPC project. However, for compute nodes /projects is mounted as read-only, because it
+is not a filesystem for parallel I/O.
+
+## Backup and Snapshots of the File System
+
+- Backup is **only** available in the `/home` and the `/projects` file systems!
+- Files are backed up using snapshots of the NFS server and can be restored by the user
+- A changed file can always be recovered as it was at the time of the snapshot
+- Snapshots are taken:
+  - From Monday through Saturday between 06:00 and 18:00 every two hours and kept for one day
+    (7 snapshots)
+  - From Monday through Saturday at 23:30 and kept for two weeks (12 snapshots)
+  - Every Sunday st 23:45 and kept for 26 weeks
+- To restore a previous version of a file:
+  - Go into the directory of the file you want to restore
+  - Run `cd .snapshot` (this subdirectory exists in every directory on the `/home` file system
+    although it is not visible with `ls -a`)
+  - In the .snapshot-directory are all available snapshots listed
+  - Just `cd` into the directory of the point in time you wish to restore and copy the file you
+    wish to restore to where you want it
+  - **Attention** The `.snapshot` directory is not only hidden from normal view (`ls -a`), it is
+    also embedded in a different directory structure. An `ls ../..` will not list the directory
+    where you came from. Thus, we recommend to copy the file from the location where it
+    originally resided:
+    `pwd /home/username/directory_a % cp .snapshot/timestamp/lostfile lostfile.backup`
+- `/home` and `/projects/` are definitely NOT made as a work directory:
+  since all files are kept in the snapshots and in the backup tapes over a long time, they
+  - Senseless fill the disks and
+  - Prevent the backup process by their sheer number and volume from working efficiently.
+
+## Group Quotas for the File System
+
+The quotas of the home file system are meant to help the users to keep in touch with their data.
+Especially in HPC, it happens that millions of temporary files are created within hours. This is the
+main reason for performance degradation of the file system. If a project exceeds its quota (total
+size OR total number of files) it cannot submit jobs into the batch system. The following commands
+can be used for monitoring:
+
+- `showquota` shows your projects' usage of the file system.
+- `quota -s -f /home` shows the user's usage of the file system.
+
+In case a project is above it's limits please ...
+
+- Remove core dumps, temporary data
+- Talk with your colleagues to identify the hotspots,
+- Check your workflow and use /tmp or the scratch file systems for temporary files
+- *Systematically* handle your important data:
+  - For later use (weeks...months) at the HPC systems, build tar
+    archives with meaningful names or IDs and store e.g. them in an
+    [archive](intermediate_archive.md).
+  - Refer to the hints for [long term preservation for research data](preservation_research_data.md)
--- a/doc.zih.tu-dresden.de/mkdocs.yml
+++ b/doc.zih.tu-dresden.de/mkdocs.yml
@@ -74,7 +74,9 @@ nav:
  - Data Life Cycle Management:
    - Overview: data_lifecycle/overview.md
    - Filesystems:
-      - Filesystems: data_lifecycle/file_systems.md
+      - Overview: data_lifecycle/file_systems.md
+      - Permanent File Systems: data_lifecycle/permanent.md
+      - Lustre: data_lifecycle/lustre.md
      - BeeGFS: data_lifecycle/bee_gfs.md
      - Intermediate Archive: data_lifecycle/intermediate_archive.md
      - Quotas: data_lifecycle/quotas.md