Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
ParaView
ParaView
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 1,729
    • Issues 1,729
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 60
    • Merge Requests 60
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI / CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • ParaView
  • ParaViewParaView
  • Issues
  • #18175

Closed
Open
Opened May 16, 2018 by John Patchett@patchett2002Developer

Ghost Cell Generator Causes Core dump in certain circumstances

I have been experiencing an inability to run a ParaView Catalyst script with a LANL code on 10s of ks of nodes if a ghost cell generator is in the pipeline with ParaView 5.5.0. I have replicated the the problem with a 21 million cell partitioned dataset (a pvtu with 64 vtus). It tars up to 274 MB with zlib (.tar.gz). The problem exposes itself on 4 nodes with 8 procs per node (total=32) on the trinitite supercomputer. Using ParaView 5.5.0 client from pkg and a pvserver built with: intel/17.0.4 PrgEnv-intel/6.0.4 cray-mpich/7.7.0 python/2.7-anaconda-4.1.1 paraview/5.5.0-osmesa The problem does not expose itself on the same computer using 2 nodes with 16 procs per node. This is possibly a system problem ... @wascott @boonth testme4.py The attached file is used to test on haswell: 1 Node 64 cores works 2 Nodes 64 cores doesn't

` projects/gcg_test> srun -N 2 -n 64 pvbatch ./testme4.py Generic Warning: In /tmp/pv-5.5.0-again/SRC/paraview/VTK/Parallel/MPI/vtkMPICommunicator.cxx, line 71 MPI had an error

Message truncated, error stack: PMPI_Test(178)...................: MPI_Test(request=0x22d0f30, flag=0x7fffffff282c, status=0x7fffffff2818) failed MPIR_Test_impl(67)...............: MPID_nem_gni_lmt_start_recv(1667): Message from rank 30 and tag 9001 truncated; 1611360 bytes received but buffer size is 185760

Rank 32 [Wed May 16 19:07:00 2018] [c0-0c2s4n3] application called MPI_Abort(comm=0x84000007, 942310158) - process 32 Generic Warning: In /tmp/pv-5.5.0-again/SRC/paraview/VTK/Parallel/MPI/vtkMPICommunicator.cxx, line 71 MPI had an error

Message truncated, error stack: PMPI_Test(178)...................: MPI_Test(request=0x27ba9a0, flag=0x7fffffff282c, status=0x7fffffff2818) failed MPIR_Test_impl(67)...............: MPID_nem_gni_lmt_start_recv(1667): Message from rank 32 and tag 9001 truncated; 1006776 bytes received but buffer size is 1006152

Rank 30 [Wed May 16 19:07:00 2018] [c0-0c2s4n2] application called MPI_Abort(comm=0x84000004, 203064078) - process 30 srun: error: nid00147: task 32: Aborted srun: Terminating job step 333504.13 slurmstepd: error: *** STEP 333504.13 ON nid00146 CANCELLED AT 2018-05-16T19:07:01 *** srun: error: nid00146: task 30: Aborted srun: error: nid00146: tasks 0,4-6,8-9,11-16,18,20,22-23,25,27,31: Terminated srun: error: nid00147: tasks 34-54,56-58,60-63: Terminated srun: error: nid00147: tasks 33,55: Terminated srun: error: nid00146: tasks 1-3,7,10,17,19,21,24,26,28-29: Terminated srun: error: nid00147: task 59: Terminated srun: Force Terminated job step 333504.13

projects/gcg_test> srun -N 1 -n 64 pvbatch ./testme4.py SWR detected AVX2 instruction support (using: libswrAVX2.so). SWR detected AVX2 instruction support (using: libswrAVX2.so). projects/gcg_test> `

Sample Data Set Here: one.tgz it will break with four nodes and 8 or more ranks: srun -N4 -n8 pvbatch testme4.py (linked above)

Edited May 17, 2018 by John Patchett
To upload designs, you'll need to enable LFS and have admin enable hashed storage. More information
Assignee
Assign to
5.5.1 (Summer 2018)
Milestone
5.5.1 (Summer 2018) (Past due)
Assign milestone
Time tracking
None
Due date
None
Reference: paraview/paraview#18175