Skip to content
Snippets Groups Projects
  • Morris Jette's avatar
    eeb97050
    Cray PMI refinements · eeb97050
    Morris Jette authored
    Refine commit 5f89223f based upon
    feedback from David Gloe:
    * It's not only MPI jobs, but anything that uses PMI. That includes MPI,
    shmem, etc, so you may want to reword the error message.
    * I added the terminated flag because if multiple tasks on a node exit,
    you would get an error message from each of them. That reduces it to one
    error message per node. Cray bug 810310 prompted that change.
    * Since we're now relying on --kill-on-bad-exit, I think we should update
    the Cray slurm.conf template to default to 1 (set KillOnBadExit=1 in
    contribs/cray/slurm.conf.template).
    bug 1171
    eeb97050
    History
    Cray PMI refinements
    Morris Jette authored
    Refine commit 5f89223f based upon
    feedback from David Gloe:
    * It's not only MPI jobs, but anything that uses PMI. That includes MPI,
    shmem, etc, so you may want to reword the error message.
    * I added the terminated flag because if multiple tasks on a node exit,
    you would get an error message from each of them. That reduces it to one
    error message per node. Cray bug 810310 prompted that change.
    * Since we're now relying on --kill-on-bad-exit, I think we should update
    the Cray slurm.conf template to default to 1 (set KillOnBadExit=1 in
    contribs/cray/slurm.conf.template).
    bug 1171