From c036763e7dfabfe9f1443f5e7b23352d8c51e03f Mon Sep 17 00:00:00 2001 From: Moe Jette <jette1@llnl.gov> Date: Sat, 29 Jan 2011 18:02:49 +0000 Subject: [PATCH] salloc: add support for Cray This adds support for execution of salloc on a local Cray system, disabling node sharing (still not supported on XT/XE). It further disables running salloc within salloc, as it leads to errors: since Cray uses process group / PAGG IDs for tracking its reservations, running salloc from within salloc invariably leads to a ALPS resource allocation error. Thirdly, it disable Cray node allocation on non-Cray systems, since this requires that the host on which salloc spawns the shell process is capable of Cray task launch. If it is not, then the remote slurmctld will reserve the requested nodes, but the local host runninc salloc will neither be able to confirm the ALPS reservation (due to the absence of a local apbasil command), nor would it be able to run jobs on the compute nodes. To distinguish this case from general task launch (we use a frontend host where salloc could end up running jobs on different clusters, depending on the value exported via $SLURM_CONF), the following condition is tested: * Cray build support has been enabled (HAVE_CRAY); * the loaded slurm.conf uses select/cray (required on Cray hosts); * the local host does not have support for apbasil (HAVE_NATIVE_CRAY undefined). Since the 'apbasil' command is only available on native Cray systems, this combination of conditions seems sufficient to prevent accidentally using salloc on a host which does not support it. (For sbatch the case is different, since the job script runs on the remote host.) 11_salloc.diff done with minor change for Cray emulation --- NEWS | 1 + src/salloc/opt.c | 19 ++++++++++++++++++- 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index 7f83a88aab2..03241875500 100644 --- a/NEWS +++ b/NEWS @@ -29,6 +29,7 @@ documents those changes that are of interest to users and admins. rpm and expat XML parser library/headers. 04_Cray-autoconf-rules.diff -- On Cray/ALPS systems do node inventory before scheduling jobs. 09_Cray-INVENTORY-directly-before-schedule.diff + -- Disable some salloc options on Cray systems. 11_salloc.diff * Changes in SLURM 2.3.0.pre1 ============================= diff --git a/src/salloc/opt.c b/src/salloc/opt.c index 1ba81d0f6fd..aa55c0e3bca 100644 --- a/src/salloc/opt.c +++ b/src/salloc/opt.c @@ -1266,7 +1266,24 @@ static bool _opt_verify(void) verified = false; } -#ifdef HAVE_BGL +#if defined(HAVE_CRAY) + if (getenv("BASIL_RESERVATION_ID") != NULL) { + error("BASIL_RESERVATION_ID already set - running salloc " + "within salloc?"); + return false; + } + if (opt.shared && opt.shared != (uint16_t)NO_VAL) { + info("Node sharing is not (yet) supported on Cray."); + opt.shared = false; + } + if (opt.overcommit) { + info("Oversubscribing is not supported on Cray."); + opt.overcommit = false; + } + if (!opt.wait_all_nodes) + info("Cray needs --wait-all-nodes to wait on ALPS reservation"); + opt.wait_all_nodes = true; +#elif defined(HAVE_BGL) if (opt.blrtsimage && strchr(opt.blrtsimage, ' ')) { error("invalid BlrtsImage given '%s'", opt.blrtsimage); verified = false; -- GitLab