From c036763e7dfabfe9f1443f5e7b23352d8c51e03f Mon Sep 17 00:00:00 2001
From: Moe Jette <jette1@llnl.gov>
Date: Sat, 29 Jan 2011 18:02:49 +0000
Subject: [PATCH] salloc: add support for Cray

This adds support for execution of salloc on a local Cray system,
disabling node sharing (still not supported on XT/XE).

It further disables running salloc within salloc, as it leads to errors: since
Cray uses process group / PAGG IDs for tracking its reservations, running
salloc from within salloc invariably leads to a ALPS resource allocation error.

Thirdly, it disable Cray node allocation on non-Cray systems, since this
requires that the host on which salloc spawns the shell process is capable
of Cray task launch.

If it is not, then the remote slurmctld will reserve the requested nodes, but
the local host runninc salloc will neither be able to confirm the ALPS
reservation (due to the absence of a local apbasil command), nor would it be
able to run jobs on the compute nodes.

To distinguish this case from general task launch (we use a frontend host where
salloc could end up running jobs on different clusters, depending on the value
exported via $SLURM_CONF), the following condition is tested:

 * Cray build support has been enabled (HAVE_CRAY);
 * the loaded slurm.conf uses select/cray (required on Cray hosts);
 * the local host does not have support for apbasil (HAVE_NATIVE_CRAY undefined).

Since the 'apbasil' command is only available on native Cray systems, this
combination of conditions seems sufficient to prevent accidentally using
salloc on a host which does not support it.

(For sbatch the case is different, since the job script runs on the remote host.)

11_salloc.diff
done with minor change for Cray emulation
---
 NEWS             |  1 +
 src/salloc/opt.c | 19 ++++++++++++++++++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index 7f83a88aab2..03241875500 100644
--- a/NEWS
+++ b/NEWS
@@ -29,6 +29,7 @@ documents those changes that are of interest to users and admins.
     rpm and expat XML parser library/headers. 04_Cray-autoconf-rules.diff
  -- On Cray/ALPS systems  do node inventory before scheduling jobs.
     09_Cray-INVENTORY-directly-before-schedule.diff
+ -- Disable some salloc options on Cray systems. 11_salloc.diff
 
 * Changes in SLURM 2.3.0.pre1
 =============================
diff --git a/src/salloc/opt.c b/src/salloc/opt.c
index 1ba81d0f6fd..aa55c0e3bca 100644
--- a/src/salloc/opt.c
+++ b/src/salloc/opt.c
@@ -1266,7 +1266,24 @@ static bool _opt_verify(void)
 		verified = false;
 	}
 
-#ifdef HAVE_BGL
+#if defined(HAVE_CRAY)
+	if (getenv("BASIL_RESERVATION_ID") != NULL) {
+		error("BASIL_RESERVATION_ID already set - running salloc "
+		      "within salloc?");
+		return false;
+	}
+	if (opt.shared && opt.shared != (uint16_t)NO_VAL) {
+		info("Node sharing is not (yet) supported on Cray.");
+		opt.shared = false;
+	}
+	if (opt.overcommit) {
+		info("Oversubscribing is not supported on Cray.");
+		opt.overcommit = false;
+	}
+	if (!opt.wait_all_nodes)
+		info("Cray needs --wait-all-nodes to wait on ALPS reservation");
+	opt.wait_all_nodes = true;
+#elif defined(HAVE_BGL)
 	if (opt.blrtsimage && strchr(opt.blrtsimage, ' ')) {
 		error("invalid BlrtsImage given '%s'", opt.blrtsimage);
 		verified = false;
-- 
GitLab