diff --git a/doc.zih.tu-dresden.de/docs/data_lifecycle/permanent.md b/doc.zih.tu-dresden.de/docs/data_lifecycle/permanent.md index 98e64e7f56c81b811e5455d785239a40d340ced5..807f5f6eb41a72988a9f51184e72aaae9a91b2d7 100644 --- a/doc.zih.tu-dresden.de/docs/data_lifecycle/permanent.md +++ b/doc.zih.tu-dresden.de/docs/data_lifecycle/permanent.md @@ -1,69 +1,92 @@ # Permanent File Systems +Do not use permanent file systems as work directories: + +- Even temporary files are kept in the snapshots and in the backup tapes over a long time, +senseless filling the disks, +- By the sheer number and volume of work files they may keep the backup from working efficiently. + ## Global /home File System Each user has 50 GB in a `/home` directory independent of the granted capacity for the project. +This file system is mounted with read-write permissions on all HPC systems. + Hints for the usage of the global home directory: -- Do not use your `/home` as work directory: Frequent changes (like temporary output from a - running job) would fill snapshots and backups (see below). - If you need distinct `.bashrc` files for each machine, you should create separate files for them, named `.bashrc_<machine_name>` -- Further, you may use private module files to simplify the process of - loading the right installation directories, see - **todo link: private modules - AnchorPrivateModule**. + +If a user exceeds her/his quota (total size OR total number of files) she/he cannot +submit jobs into the batch system. Running jobs are not affected. + +!!! note + + We have no feasible way to get the contribution of + a single user to a project's disk usage. ## Global /projects File System For project data, we have a global project directory, that allows better collaboration between the -members of an HPC project. However, for compute nodes /projects is mounted as read-only, because it -is not a filesystem for parallel I/O. - -## Backup and Snapshots of the File System - -- Backup is **only** available in the `/home` and the `/projects` file systems! -- Files are backed up using snapshots of the NFS server and can be restored by the user -- A changed file can always be recovered as it was at the time of the snapshot -- Snapshots are taken: - - From Monday through Saturday between 06:00 and 18:00 every two hours and kept for one day - (7 snapshots) - - From Monday through Saturday at 23:30 and kept for two weeks (12 snapshots) - - Every Sunday st 23:45 and kept for 26 weeks -- To restore a previous version of a file: - - Go into the directory of the file you want to restore - - Run `cd .snapshot` (this subdirectory exists in every directory on the `/home` file system - although it is not visible with `ls -a`) - - In the .snapshot-directory are all available snapshots listed - - Just `cd` into the directory of the point in time you wish to restore and copy the file you - wish to restore to where you want it - - **Attention** The `.snapshot` directory is not only hidden from normal view (`ls -a`), it is - also embedded in a different directory structure. An `ls ../..` will not list the directory - where you came from. Thus, we recommend to copy the file from the location where it - originally resided: - `pwd /home/username/directory_a % cp .snapshot/timestamp/lostfile lostfile.backup` -- `/home` and `/projects/` are definitely NOT made as a work directory: - since all files are kept in the snapshots and in the backup tapes over a long time, they - - Senseless fill the disks and - - Prevent the backup process by their sheer number and volume from working efficiently. - -## Group Quotas for the File System - -The quotas of the home file system are meant to help the users to keep in touch with their data. +members of an HPC project. +Typically, all members of the project have read/write access to that directory. +It can only be written to on the login and export nodes. + +!!! note + On compute nodes, /projects is mounted as read-only, because it must nut be used as + work directory and heavy I/O. + +## Snapshots + +A changed file can always be recovered as it was at the time of the snapshot. +These snapshots are taken (subject to changes): + +- from Monday through Saturday between 06:00 and 18:00 every two hours and kept for one day + (7 snapshots) +- from Monday through Saturday at 23:30 and kept for two weeks (12 snapshots) +- every Sunday st 23:45 and kept for 26 weeks. + +To restore a previous version of a file: + +- Go into the directory of the file you want to restore +- Run `cd .snapshot` (this subdirectory exists in every directory on the `/home` file system + although it is not visible with `ls -a`). +- In the .snapshot-directory are all available snapshots listed. +- Just `cd` into the directory of the point in time you wish to restore and copy the file you + wish to restore to where you want it. + +!!! note + + The `.snapshot` directory is embedded in a different directory structure. An `ls ../..` will not show the directory + where you came from. Thus, for your `cp`, you should *use an absolute path* as destination. + +## Backup + +Just for the eventuality of a major file system crash we keep tape-based backups of our +permanent file systems for 180 days. + +## Quotas + +The quotas of the permanent file system are meant to help the users to keep in touch with their data. Especially in HPC, it happens that millions of temporary files are created within hours. This is the -main reason for performance degradation of the file system. If a project exceeds its quota (total -size OR total number of files) it cannot submit jobs into the batch system. The following commands -can be used for monitoring: +main reason for performance degradation of the file system. + +!!! note + + If a quota is exceeded - project or home - (total size OR total number of files) + job submission is forbidden. Running jobs are not affected. + +The following commands can be used for monitoring: - `showquota` shows your projects' usage of the file system. - `quota -s -f /home` shows the user's usage of the file system. -In case a project is above it's limits please ... +In case a quota is above it's limits: -- Remove core dumps, temporary data -- Talk with your colleagues to identify the hotspots, -- Check your workflow and use /tmp or the scratch file systems for temporary files -- *Systematically* handle your important data: - - For later use (weeks...months) at the HPC systems, build tar - archives with meaningful names or IDs and store e.g. them in an - [archive](intermediate_archive.md). - - Refer to the hints for [long term preservation for research data](preservation_research_data.md) + - Remove core dumps and temporary data + - Talk with your colleagues to identify the hotspots, + - Check your workflow and use /tmp or the scratch file systems for temporary files + - *Systematically* handle your important data: + - For later use (weeks...months) at the HPC systems, build and zip tar + archives with meaningful names or IDs and store e.g. them in a workspace in the + [warm archive](warm_archive.md) or an [archive](intermediate_archive.md). + - Refer to the hints for [long term preservation for research data](preservation_research_data.md) diff --git a/doc.zih.tu-dresden.de/docs/data_lifecycle/quotas.md b/doc.zih.tu-dresden.de/docs/data_lifecycle/quotas.md deleted file mode 100644 index 24665aa573549b6290fae90523450c98fc9d9240..0000000000000000000000000000000000000000 --- a/doc.zih.tu-dresden.de/docs/data_lifecycle/quotas.md +++ /dev/null @@ -1,56 +0,0 @@ -# Quotas for the home file system - -The quotas of the home file system are meant to help the users to keep in touch with their data. -Especially in HPC, millions of temporary files can be created within hours. We have identified this -as a main reason for performance degradation of the HOME file system. To stay in operation with out -HPC systems we regrettably have to fall back to this unpopular technique. - -Based on a balance between the allotted disk space and the usage over the time, reasonable quotas -(mostly above current used space) for the projects have been defined. The will be activated by the -end of April 2012. - -If a project exceeds its quota (total size OR total number of files) it cannot submit jobs into the -batch system. Running jobs are not affected. The following commands can be used for monitoring: - -- `quota -s -g` shows the file system usage of all groups the user is - a member of. -- `showquota` displays a more convenient output. Use `showquota -h` to - read about its usage. It is not yet available on all machines but we - are working on it. - -**Please mark:** We have no quotas for the single accounts, but for the -project as a whole. There is no feasible way to get the contribution of -a single user to a project's disk usage. - -## Alternatives - -In case a project is above its limits, please - -- remove core dumps, temporary data, -- talk with your colleagues to identify the hotspots, -- check your workflow and use /fastfs for temporary files, -- *systematically* handle your important data: - - for later use (weeks...months) at the HPC systems, build tar - archives with meaningful names or IDs and store them in the - [DMF system](#AnchorDataMigration). Avoid using this system - (`/hpc_fastfs`) for files < 1 MB! - - refer to the hints for - [long term preservation for research data](../data_lifecycle/preservation_research_data.md). - -## No Alternatives - -The current situation is this: - -- `/home` provides about 50 TB of disk space for all systems. Rapidly - changing files (temporary data) decrease the size of usable disk - space since we keep all files in multiple snapshots for 26 weeks. If - the *number* of files comes into the range of a million the backup - has problems handling them. -- The work file system for the clusters is `/fastfs`. Here, we have 60 - TB disk space (without backup). This is the file system of choice - for temporary data. -- About 180 projects have to share our resources, so it makes no sense - at all to simply move the data from `/home` to `/fastfs` or to - `/hpc_fastfs`. - -In case of problems don't hesitate to ask for support. diff --git a/doc.zih.tu-dresden.de/mkdocs.yml b/doc.zih.tu-dresden.de/mkdocs.yml index ffbb644460a9c415674a3c71acf46fb40468842e..32f4a2a2dabe2595cddca0bb898251837c3ccbc0 100644 --- a/doc.zih.tu-dresden.de/mkdocs.yml +++ b/doc.zih.tu-dresden.de/mkdocs.yml @@ -80,7 +80,6 @@ nav: - BeeGFS 2: data_lifecycle/bee_gfs.md - WarmArchive: data_lifecycle/warm_archive.md - Intermediate Archive: data_lifecycle/intermediate_archive.md - - Quotas: data_lifecycle/quotas.md - Workspaces: data_lifecycle/workspaces.md - Preservation of Research Data: data_lifecycle/preservation_research_data.md - Structuring Experiments: data_lifecycle/experiments.md