On Backend Failure, slurm job exits and job is lost
It often takes hours to get a job on a supercomputer, especially a job of reasonable size with 1000s of cores. If ParaView server crashes the job is lost and a resubmission the queue resets the multi hour wait. Can we make it reconnect?