-
Moe Jette authored
The is_gemini logic is too simple: as just observed on a SeaStar system, it can be fooled into the wrong result if more than 1 row has NULL coordinates. This case happens if a blade has been powered down completely, so that the SeaStar network chip is also powered off. The routing system recognizes this case and routes around the powered-down node in the torus. It is plausible that in such a case the torus coordinates are NULL, since the node(s) are no longer part of the torus. (It is also possible to set all nodes on a blade down, but leave power switched on. The SeaStar chip, which is independent of the motherboard, will continue to provide routing connectivity, i.e. the torus coordinates would all be non-NULL, but no computing can be done by the node, the ALPS state is "ROUTING".) Here is the example which revealed this behaviour: one blade, nodes 804-807, had been powered down after system failure. mysql> select COUNT(*), COUNT(DISTINCT x_coord,y_coord,z_coord) FROM processor; +----------+-----------------------------------------+ | COUNT(*) | COUNT(DISTINCT x_coord,y_coord,z_coord) | +----------+-----------------------------------------+ | 1882 | 1878 | +----------+-----------------------------------------+ ==> There are 4 more node IDs than there are distinct coordinates. mysql> select processor_id,x_coord,y_coord,z_coord from processor\ WHERE x_coord IS NULL OR y_coord IS NULL OR z_coord IS NULL; +--------------+---------+---------+---------+ | processor_id | x_coord | y_coord | z_coord | +--------------+---------+---------+---------+ | 804 | NULL | NULL | NULL | | 805 | NULL | NULL | NULL | | 806 | NULL | NULL | NULL | | 807 | NULL | NULL | NULL | +--------------+---------+---------+---------+ ==> The corrected query now also gives the correct result (equality): mysql> select COUNT(*), COUNT(DISTINCT x_coord,y_coord,z_coord) FROM processor\ WHERE x_coord IS NOT NULL AND y_coord IS NOT NULL AND z_coord IS NOT NULL; +----------+-----------------------------------------+ | COUNT(*) | COUNT(DISTINCT x_coord,y_coord,z_coord) | +----------+-----------------------------------------+ | 1878 | 1878 | +----------+-----------------------------------------+
Moe Jette authoredThe is_gemini logic is too simple: as just observed on a SeaStar system, it can be fooled into the wrong result if more than 1 row has NULL coordinates. This case happens if a blade has been powered down completely, so that the SeaStar network chip is also powered off. The routing system recognizes this case and routes around the powered-down node in the torus. It is plausible that in such a case the torus coordinates are NULL, since the node(s) are no longer part of the torus. (It is also possible to set all nodes on a blade down, but leave power switched on. The SeaStar chip, which is independent of the motherboard, will continue to provide routing connectivity, i.e. the torus coordinates would all be non-NULL, but no computing can be done by the node, the ALPS state is "ROUTING".) Here is the example which revealed this behaviour: one blade, nodes 804-807, had been powered down after system failure. mysql> select COUNT(*), COUNT(DISTINCT x_coord,y_coord,z_coord) FROM processor; +----------+-----------------------------------------+ | COUNT(*) | COUNT(DISTINCT x_coord,y_coord,z_coord) | +----------+-----------------------------------------+ | 1882 | 1878 | +----------+-----------------------------------------+ ==> There are 4 more node IDs than there are distinct coordinates. mysql> select processor_id,x_coord,y_coord,z_coord from processor\ WHERE x_coord IS NULL OR y_coord IS NULL OR z_coord IS NULL; +--------------+---------+---------+---------+ | processor_id | x_coord | y_coord | z_coord | +--------------+---------+---------+---------+ | 804 | NULL | NULL | NULL | | 805 | NULL | NULL | NULL | | 806 | NULL | NULL | NULL | | 807 | NULL | NULL | NULL | +--------------+---------+---------+---------+ ==> The corrected query now also gives the correct result (equality): mysql> select COUNT(*), COUNT(DISTINCT x_coord,y_coord,z_coord) FROM processor\ WHERE x_coord IS NOT NULL AND y_coord IS NOT NULL AND z_coord IS NOT NULL; +----------+-----------------------------------------+ | COUNT(*) | COUNT(DISTINCT x_coord,y_coord,z_coord) | +----------+-----------------------------------------+ | 1878 | 1878 | +----------+-----------------------------------------+