Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
hpc-compendium
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Deploy
Releases
Package Registry
Container Registry
Model registry
Operate
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
ZIH
hpcsupport
hpc-compendium
Commits
60e02331
Commit
60e02331
authored
2 years ago
by
Maximilian Knespel
Browse files
Options
Downloads
Patches
Plain Diff
Add sections about ratarmount and pragzip
parent
f9101e13
No related branches found
Branches containing commit
No related tags found
2 merge requests
!652
Automated merge from preview to main
,
!649
Add sections about ratarmount and pragzip
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc.zih.tu-dresden.de/docs/software/utilities.md
+71
-0
71 additions, 0 deletions
doc.zih.tu-dresden.de/docs/software/utilities.md
with
71 additions
and
0 deletions
doc.zih.tu-dresden.de/docs/software/utilities.md
+
71
−
0
View file @
60e02331
...
...
@@ -123,3 +123,74 @@ failed to connect to server
marie@login3$
ssh login4 tmux
ls
marie_is_testing: 1 windows (created Tue Mar 29 19:06:26 2022) [105x32]
```
## Working with Large Archives and Compressed Files
### Parallel Gzip Decompression
There is a plethora of gzip tools but none of them can fully utilize multiple cores.
The fastest single-core decoder is igzip from the
[
Intelligent Storage Acceleration Library
](
https://github.com/intel/isa-l.git
)
.
In tests, it can reach ~500 MB/s compared to ~200 MB/s for the system-default gzip.
If you have very large files and need to decompress them even faster, you can use
[
pragzip
](
https://github.com/mxmlnkn/pragzip
)
.
Currently, it can reach ~1.5 GB/s using a 12-core processor in the above-mentioned tests.
[
Pragzip
](
https://github.com/mxmlnkn/pragzip
)
is available on PyPI and can be installed via pip.
It is recommended to install it inside a
[
Python virtual environment
](
python_virtual_environments.md
)
.
```
bash
pip
install
pragzip
```
It can also be installed from its C++ source code.
If you prefer that over the version on PyPI, then you can build it like this:
```
bash
git clone https://github.com/mxmlnkn/pragzip.git
cd
pragzip
mkdir
build
cd
build
cmake ..
cmake
--build
.
pragzip
src/tools/pragzip
--help
```
The built binary can then be used directly or copied inside a folder that is available in your
`PATH`
environment variable.
Pragzip is still in development, so if it crashes or if it is slower than the system gzip,
please
[
open an issue
](
https://github.com/mxmlnkn/pragzip/issues
)
on Github.
### Direct Archive Access Without Extraction
In some cases of archives with millions of small files, it might not be feasible to extract the
whole archive to a filesystem.
The known
`archivemount`
tool has performance problems with such archives even if they are simply
uncompressed TAR files.
Furthermore, with
`archivemount`
the archive would have to be reanalyzed whenever a new job is started.
`Ratarmount`
is an alternative that solves these performance issues.
The archive will be analyzed and then can be accessed via a FUSE mountpoint showing the internal
folder hierarchy.
Access to files is consistently fast no matter the archive size while
`archivemount`
might take
minutes per file access.
Furthermore, the analysis results of the archive will be stored in a sidecar file alongside the
archive or in your home directory if the archive is in a non-writable location.
Subsequent mounts instantly load that sidecar file instead of reanalyzing the archive.
[
Ratarmount
](
https://github.com/mxmlnkn/ratarmount
)
is available on PyPI and can be installed via pip.
It is recommended to install it inside a
[
Python virtual environment
](
python_virtual_environments.md
)
.
```
bash
pip
install
ratarmount
```
Ratarmount is still in development, so if there are problems or if it is unexpectedly slow,
please
[
open an issue
](
https://github.com/mxmlnkn/ratarmount/issues
)
on Github.
There also is a library interface called
[
ratarmountcore
](
https://github.com/mxmlnkn/ratarmount/tree/master/core#example
)
that works
fully without FUSE, which might make access to files from Python even faster.
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment