diff --git a/doc.zih.tu-dresden.de/docs/software/mathematics.md b/doc.zih.tu-dresden.de/docs/software/mathematics.md index 909970debf879b3b3c227fd80dbd26e06463309e..cfac5938d5848236abbdb26f7735023ae30a2a02 100644 --- a/doc.zih.tu-dresden.de/docs/software/mathematics.md +++ b/doc.zih.tu-dresden.de/docs/software/mathematics.md @@ -105,7 +105,7 @@ marie@compute$ matlab ``` With following command you can see a list of installed software - also -the different versions of matlab. +the different versions of MATLAB. ```console marie@login$ module avail @@ -123,7 +123,7 @@ Or use: marie@login$ module load MATLAB ``` -(then you will get the most recent Matlab version. +(then you will get the most recent MATLAB version. [Refer to the modules section for details.](../software/modules.md#modules)) ### Interactive @@ -135,7 +135,7 @@ with command marie@login$ srun --pty --x11=first bash ``` -- now you can call "matlab" (you have 8h time to work with the matlab-GUI) +- now you can call "matlab" (you have 8h time to work with the MATLAB-GUI) ### Non-interactive @@ -218,9 +218,354 @@ marie@login$ srun ./run_compiled_executable.sh $EBROOTMATLAB Please refer to the documentation `help parfor` for further information. +### MATLAB Parallel Computing Toolbox + +In the following, the steps to configure MATLAB to submit jobs to a cluster, retrieve results, and +debug errors are outlined. + +#### Configuration – MATLAB client on the cluster + +After logging into the HPC system, you configure MATLAB to run parallel jobs on the HPC system by +calling the shell script `configCluster.sh`. This only needs to be called once per version of +MATLAB. + +```console +marie@login$ module load MATLAB +marie@login$ configCluster.sh +``` + +Jobs will now default to the HPC system rather than submit to the local machine. + +#### Installation and Configuration – MATLAB client off the cluster + +The MATLAB support package for ZIH Systems can be found as follows: + +* Windows: + * [tud.nonshared.R2021b.zip](misc/tud.nonshared.R2021b.zip) + * [tud.nonshared.R2022a.zip](misc/tud.nonshared.R2022a.zip) +* Linux/macOS: + * [tud.nonshared.R2021b.tar.gz](misc/tud.nonshared.R2021b.tar.gz) + * [tud.nonshared.R2022a.tar.gz](misc/tud.nonshared.R2022a.tar.gz) + +Download the appropriate archive file and start MATLAB. The archive file should be extracted +in the location returned by calling + +```matlabsession +>> userpath +``` + +Configure MATLAB to run parallel jobs on ZIH Systems by calling `configCluster`. `configCluster` +only needs to be called once per version of MATLAB. + +```matlabsession +>> configCluster +``` + +Submission to the remote cluster requires SSH credentials. You will be prompted for your SSH +username and password or identity file (private key). The username and location of the private key +will be stored in MATLAB for future sessions. Jobs will now default to the cluster rather than +submit to the local machine. + +!!! note + + If you would like to submit to the local machine then run the following command: + + ```matlab + >> % Get a handle to the local resources + >> c = parcluster('local'); + ``` + +#### Configuring Jobs + +Prior to submitting the job, you can specify various parameters to pass to your jobs, such as queue, +e-mail, walltime, etc. *Only `MemPerCpu` and `QueueName` are required*. + +```matlabsession +>> % Get a handle to the cluster +>> c = parcluster; + +[REQUIRED] + +>> % Specify memory to use, per core (default: 2gb) +>> c.AdditionalProperties.MemPerCpu = '4gb'; + +>> % Specify the walltime (e.g., 5 hours) +>> c.AdditionalProperties.WallTime = '05:00:00'; + +[OPTIONAL] + +>> % Specify the account to use +>> c.AdditionalProperties.Account = 'account-name'; + +>> % Request constraint +>> c.AdditionalProperties.Constraint = 'a-constraint'; + +>> % Request job to run on exclusive node(s) (default: false) +>> c.AdditionalProperties.EnableExclusive = true; + +>> % Request email notification of job status +>> c.AdditionalProperties.EmailAddress = 'user-id@tu-dresden.de'; + +>> % Specify number of GPUs to use (GpuType is optional) +>> c.AdditionalProperties.GpusPerNode = 1; +>> c.AdditionalProperties.GpuType = 'gpu-card'; + +>> % Specify the queue to use +>> c.AdditionalProperties.Partition = 'queue-name'; + +>> % Specify a reservation to use +>> c.AdditionalProperties.Reservation = 'a-reservation'; +``` + +Save changes after modifying `AdditionalProperties` for the above changes to persist between MATLAB +sessions. + +```matlabsession +>> c.saveProfile +``` + +To see the values of the current configuration options, display `AdditionalProperties`. + +```matlabsession +>> % To view current properties +>> c.AdditionalProperties +``` + +You can unset a value when no longer needed. + +```matlabsession +>> % Turn off email notifications +>> c.AdditionalProperties.EmailAddress = ''; +>> c.saveProfile +``` + +#### Interactive Jobs - MATLAB Client on the Cluster + +To run an interactive pool job on the ZIH systems, continue to use `parpool` as you’ve done before. + +```matlabsession +>> % Get a handle to the cluster +>> c = parcluster; + +>> % Open a pool of 64 workers on the cluster +>> pool = c.parpool(64); +``` + +Rather than running local on your machine, the pool can now run across multiple nodes on the +cluster. + +```matlabsession +>> % Run a parfor over 1000 iterations +>> parfor idx = 1:1000 + a(idx) = … + end +``` + +Once you are done with the pool, delete it. + +```matlabsession +>> % Delete the pool +>> pool.delete +``` + +#### Independent Batch Job + +Use the batch command to submit asynchronous jobs to the HPC system. The `batch` command will return +a job object which is used to access the output of the submitted job. See the MATLAB documentation +for more help on `batch`. + +```matlabsession +>> % Get a handle to the cluster +>> c = parcluster; + +>> % Submit job to query where MATLAB is running on the cluster +>> job = c.batch(@pwd, 1, {}, ... + 'CurrentFolder','.', 'AutoAddClientPath',false); + +>> % Query job for state +>> job.State + +>> % If state is finished, fetch the results +>> job.fetchOutputs{:} + +>> % Delete the job after results are no longer needed +>> job.delete +``` + +To retrieve a list of currently running or completed jobs, call `parcluster` to retrieve the cluster +object. The cluster object stores an array of jobs that were run, are running, or are queued to +run. This allows us to fetch the results of completed jobs. Retrieve and view the list of jobs as +shown below. + +```matlabsession +>> c = parcluster; +>> jobs = c.Jobs; +``` + +Once you have identified the job you want, you can retrieve the results as done previously. + +`fetchOutputs` is used to retrieve function output arguments; if calling `batch` with a script, use +`load` instead. Data that has been written to files on the cluster needs be retrieved directly +from the filesystem (e.g. via ftp). To view results of a previously completed job: + +```matlabsession +>> % Get a handle to the job with ID 2 +>> job2 = c.Jobs(2); +``` + +!!! note + + You can view a list of your jobs, as well as their IDs, using the above `c.Jobs` command. + + ```matlabsession + >> % Fetch results for job with ID 2 + >> job2.fetchOutputs{:} + ``` + +#### Parallel Batch Job + +You can also submit parallel workflows with the `batch` command. Let’s use the following example +for a parallel job, which is saved as `parallel_example.m`. + +```matlab +function [t, A] = parallel_example(iter) + +if nargin==0 + iter = 8; +end + +disp('Start sim') + +t0 = tic; +parfor idx = 1:iter + A(idx) = idx; + pause(2) + idx +end +t = toc(t0); + +disp('Sim completed') + +save RESULTS A + +end +``` + +This time when you use the `batch` command, to run a parallel job, you will also specify a MATLAB +Pool. + +```matlabsession +>> % Get a handle to the cluster +>> c = parcluster; + +>> % Submit a batch pool job using 4 workers for 16 simulations +>> job = c.batch(@parallel_example, 1, {16}, 'Pool',4, ... + 'CurrentFolder','.', 'AutoAddClientPath',false); + +>> % View current job status +>> job.State + +>> % Fetch the results after a finished state is retrieved +>> job.fetchOutputs{:} +ans = + 8.8872 +``` + +The job ran in 8.89 seconds using four workers. Note that these jobs will always request N+1 CPU +cores, since one worker is required to manage the batch job and pool of workers. For example, a +job that needs eight workers will consume nine CPU cores. + +You might run the same simulation but increase the Pool size. This time, to retrieve the results later, +you will keep track of the job ID. + +!!! note + + For some applications, there will be a diminishing return when allocating too many workers, as + the overhead may exceed computation time. + + ```matlabsession + >> % Get a handle to the cluster + >> c = parcluster; + + >> % Submit a batch pool job using 8 workers for 16 simulations + >> job = c.batch(@parallel_example, 1, {16}, 'Pool', 8, ... + 'CurrentFolder','.', 'AutoAddClientPath',false); + + >> % Get the job ID + >> id = job.ID + id = + 4 + >> % Clear job from workspace (as though you quit MATLAB) + >> clear job + ``` + +Once you have a handle to the cluster, you can call the `findJob` method to search for the job with +the specified job ID. + +```matlabsession +>> % Get a handle to the cluster +>> c = parcluster; + +>> % Find the old job +>> job = c.findJob('ID', 4); + +>> % Retrieve the state of the job +>> job.State +ans = + finished +>> % Fetch the results +>> job.fetchOutputs{:}; +ans = + 4.7270 +``` + +The job now runs in 4.73 seconds using eight workers. Run code with different number of workers to +determine the ideal number to use. Alternatively, to retrieve job results via a graphical user +interface, use the Job Monitor (Parallel > Monitor Jobs). + + +{: summary="Retrieve job results via GUI using the Job Monitor." align="center"} + +#### Debugging + +If a serial job produces an error, call the `getDebugLog` method to view the error log file. When +submitting independent jobs, with multiple tasks, specify the task number. + +```matlabsession +>> c.getDebugLog(job.Tasks(3)) +``` + +For Pool jobs, only specify the job object. + +```matlabsession +>> c.getDebugLog(job) +``` + +When troubleshooting a job, the cluster admin may request the scheduler ID of the job. This can be +derived by calling `schedID`. + +```matlabsession +>> schedID(job) +ans = + 25539 +``` + +#### Further Reading + +To learn more about the MATLAB Parallel Computing Toolbox, check out these resources: + +* [Parallel Computing Coding + Examples](https://www.mathworks.com/help/parallel-computing/examples.html) +* [Parallel Computing Documentation](http://www.mathworks.com/help/distcomp/index.html) +* [Parallel Computing Overview](http://www.mathworks.com/products/parallel-computing/index.html) +* [Parallel Computing + Tutorials](http://www.mathworks.com/products/parallel-computing/tutorials.html) +* [Parallel Computing Videos](http://www.mathworks.com/products/parallel-computing/videos.html) +* [Parallel Computing Webinars](http://www.mathworks.com/products/parallel-computing/webinars.html) + ## Octave GNU [Octave](https://www.gnu.org/software/octave/index) is a high-level language, primarily intended for numerical computations. It provides a convenient command line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that -is mostly compatible with Matlab. It may also be used as a batch-oriented language. +is mostly compatible with MATLAB. It may also be used as a batch-oriented language. diff --git a/doc.zih.tu-dresden.de/docs/software/misc/matlab_monitor_jobs.png b/doc.zih.tu-dresden.de/docs/software/misc/matlab_monitor_jobs.png new file mode 100644 index 0000000000000000000000000000000000000000..c91906e819495e345da69f80192ea3b8fee0a248 Binary files /dev/null and b/doc.zih.tu-dresden.de/docs/software/misc/matlab_monitor_jobs.png differ diff --git a/doc.zih.tu-dresden.de/docs/software/misc/tud.nonshared.R2021b.tar.gz b/doc.zih.tu-dresden.de/docs/software/misc/tud.nonshared.R2021b.tar.gz new file mode 100644 index 0000000000000000000000000000000000000000..a4a30943f36ee4ebb5ad94c635be49a016f1eadd Binary files /dev/null and b/doc.zih.tu-dresden.de/docs/software/misc/tud.nonshared.R2021b.tar.gz differ diff --git a/doc.zih.tu-dresden.de/docs/software/misc/tud.nonshared.R2021b.zip b/doc.zih.tu-dresden.de/docs/software/misc/tud.nonshared.R2021b.zip new file mode 100644 index 0000000000000000000000000000000000000000..02118ef5354a1f972321bde558a3e2bb08a5b6af Binary files /dev/null and b/doc.zih.tu-dresden.de/docs/software/misc/tud.nonshared.R2021b.zip differ diff --git a/doc.zih.tu-dresden.de/docs/software/misc/tud.nonshared.R2022a.tar.gz b/doc.zih.tu-dresden.de/docs/software/misc/tud.nonshared.R2022a.tar.gz new file mode 100644 index 0000000000000000000000000000000000000000..a4160a5f33f094f340eb20c3e140687170864609 Binary files /dev/null and b/doc.zih.tu-dresden.de/docs/software/misc/tud.nonshared.R2022a.tar.gz differ diff --git a/doc.zih.tu-dresden.de/docs/software/misc/tud.nonshared.R2022a.zip b/doc.zih.tu-dresden.de/docs/software/misc/tud.nonshared.R2022a.zip new file mode 100644 index 0000000000000000000000000000000000000000..481ab9a1d1a18515abcda82fbdb7d1ab3b580a5e Binary files /dev/null and b/doc.zih.tu-dresden.de/docs/software/misc/tud.nonshared.R2022a.zip differ diff --git a/doc.zih.tu-dresden.de/wordlist.aspell b/doc.zih.tu-dresden.de/wordlist.aspell index a808318d64a38981956ed1ac5fa5a7d1c05e703d..758cae07744f907d37996a366e07f0ec01ec35a6 100644 --- a/doc.zih.tu-dresden.de/wordlist.aspell +++ b/doc.zih.tu-dresden.de/wordlist.aspell @@ -183,11 +183,12 @@ LoadLeveler localhost lsf lustre +macOS markdownlint Mathematica MathKernel MathWorks -matlab +MATLAB Matplotlib MEGWARE mem @@ -411,6 +412,7 @@ VMSize VNC VPN VRs +walltime WebVNC WinSCP WML