Skip to content
Snippets Groups Projects
Commit 6c927b3f authored by Moe Jette's avatar Moe Jette
Browse files

select/cray: fix error in 'is_gemini' logic

The is_gemini logic is too simple: as just observed on a SeaStar system, it can
be fooled into the wrong result if more than 1 row has NULL coordinates. 

This case happens if a blade has been powered down completely, so that the SeaStar
network chip is also powered off. The routing system recognizes this case and 
routes around the powered-down node in the torus. It is plausible that in such a
case the torus coordinates are NULL, since the node(s) are no longer part of the
torus. 

(It is also possible to set all nodes on a blade down, but leave power switched
 on. The SeaStar chip, which is independent of the motherboard, will continue to
 provide routing connectivity, i.e. the torus coordinates would all be non-NULL,
 but no computing can be done by the node, the ALPS state is "ROUTING".)

Here is the example which revealed this behaviour: one blade, nodes 804-807,
had been powered down after system failure.

mysql> select COUNT(*), COUNT(DISTINCT x_coord,y_coord,z_coord) FROM processor;
+----------+-----------------------------------------+
| COUNT(*) | COUNT(DISTINCT x_coord,y_coord,z_coord) |
+----------+-----------------------------------------+
|     1882 |                                    1878 | 
+----------+-----------------------------------------+

==> There are 4 more node IDs than there are distinct coordinates.

mysql> select processor_id,x_coord,y_coord,z_coord from processor\
       WHERE x_coord IS NULL OR y_coord IS NULL OR z_coord IS NULL;
+--------------+---------+---------+---------+
| processor_id | x_coord | y_coord | z_coord |
+--------------+---------+---------+---------+
|          804 |    NULL |    NULL |    NULL | 
|          805 |    NULL |    NULL |    NULL | 
|          806 |    NULL |    NULL |    NULL | 
|          807 |    NULL |    NULL |    NULL | 
+--------------+---------+---------+---------+

==> The corrected query now also gives the correct result (equality):
mysql> select COUNT(*), COUNT(DISTINCT x_coord,y_coord,z_coord) FROM processor\
       WHERE x_coord IS NOT NULL AND y_coord IS NOT NULL AND z_coord IS NOT NULL;
+----------+-----------------------------------------+
| COUNT(*) | COUNT(DISTINCT x_coord,y_coord,z_coord) |
+----------+-----------------------------------------+
|     1878 |                                    1878 | 
+----------+-----------------------------------------+
parent 3d96a32e
No related branches found
No related tags found
No related merge requests found
...@@ -75,19 +75,24 @@ int cray_is_gemini_system(MYSQL *handle) ...@@ -75,19 +75,24 @@ int cray_is_gemini_system(MYSQL *handle)
{ {
/* /*
* Rationale: * Rationale:
* - XT SeaStar systems have one SeaStar ASIC per node * - XT SeaStar systems have one SeaStar ASIC per node.
* There are 4 nodes on a blade, 4 SeaStar ASICS, giving * There are 4 nodes and 4 SeaStar ASICS on each blade, giving
* 4 distinct (X,Y,Z) coordinates. * 4 distinct (X,Y,Z) coordinates per blade, so that the total
* - XE Gemini systems connect pairs of nodes to a Gemini chip * node count equals the total count of torus coordinates.
* There are 4 nodes on a blade and 2 Gemini chips, nodes 0/1 * - XE Gemini systems connect pairs of nodes to a Gemini chip.
* There are 4 nodes on a blade and 2 Gemini chips. Nodes 0/1
* are connected to Gemini chip 0, nodes 2/3 are connected to * are connected to Gemini chip 0, nodes 2/3 are connected to
* Gemini chip 1. This configuration acts as if the nodes were * Gemini chip 1. This configuration acts as if the nodes were
* already connected in Y dimension, hence there are half as * internally joined in Y dimension; hence there are half as
* many (X,Y,Z) coordinates than there are nodes in the system. * many (X,Y,Z) coordinates than there are nodes in the system.
* - Coordinates may be NULL if a network chip is deactivated.
*/ */
const char query[] = const char query[] =
"SELECT COUNT(DISTINCT x_coord, y_coord, z_coord) < COUNT(*) " "SELECT COUNT(DISTINCT x_coord, y_coord, z_coord) < COUNT(*) "
"FROM processor"; "FROM processor "
"WHERE x_coord IS NOT NULL "
"AND y_coord IS NOT NULL "
"AND z_coord IS NOT NULL";
MYSQL_BIND result[1]; MYSQL_BIND result[1];
signed char answer; signed char answer;
my_bool is_null; my_bool is_null;
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment