Fides does not always initialize HIP before using it
When I compile Fides in ParaView with HIP support, I find that sometimes when I load a file, ParaView crashes with an error that HIP was not initialized. This seems to happen while allocating data.
There is a good chance this is a problem with VTK-m not properly initializing Kokkos before attempting to allocate something on the device.
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Author Developer
Here is the backtrace (heavily edited because I got simultaneous output from 8 concurrent nodes):
Backtrace: Kokkos::Experimental::HIP::HIP instance constructor : ERROR device not initialized Kokkos::Impl::save_stacktrace() [0x7fffd09fb082] Kokkos::Impl::traceback_callstack(std::ostream&) [0x7fffd09f295a] Kokkos::Impl::host_abort(char const*) [0x7fffd09f29db] [0x7fffe9ad6a96] Kokkos::Experimental::Impl::HIPInternal::verify_is_initialized(char const*) const [0x7fffd09fe255] Kokkos::Experimental::HIP::HIP() [0x7fffd0a0532e] Kokkos::Experimental::HIPSpace::HIPSpace() [0x7fffd0a01ea5] vtkm::cont::kokkos::internal::Allocate(unsigned long) [0x7fffd31606da] vtkm::cont::internal::DeviceAdapterMemoryManager<vtkm::cont::DeviceAdapterTagKokkos>::Allocate(long long) const [0x7fffd315d412] vtkm::cont::internal::DeviceAdapterMemoryManager<vtkm::cont::DeviceAdapterTagKokkos>::CopyHostToDevice(vtkm::cont::internal::BufferInfo const&) const [0x7fffd315d4dd] [0x7fffd2d1a3cb] vtkm::cont::internal::Buffer::ReadPointerDevice(vtkm::cont::DeviceAdapterId, vtkm::cont::Token&) const [0x7fffd2d14ec4] vtkm::cont::internal::Storage<vtkm::internal::RecombineVec<vtkm::internal::ArrayPortalMultiplexer<vtkm::internal::ArrayPortalStrideRead<int>, vtkm::internal::ArrayPortalStrideWrite<int> > >, vtkm::cont::internal::StorageTagRecombineVec>::CreateReadPortal(std::vector<vtkm::cont::internal::Buffer, std::allocator<vtkm::cont::internal::Buffer> > const&, vtkm::cont::DeviceAdapterId, vtkm::cont::Token&) [0x7fffe9e0fff3] [0x7fffd2cdb10c] [0x7fffd2cd8461] [0x7fffd2cc116d] [0x7fffd2cc02fb] fides::io::DataSetReader::DataSetReaderImpl::ReadMetaData(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) [0x7fffc1b978db] fides::io::DataSetReader::ReadMetaData(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) [0x7fffc1b95c4c] vtkFidesReader::RequestInformation(vtkInformation*, vtkInformationVector**, vtkInformationVector*) [0x7fffc66b9e9f] vtkExecutive::CallAlgorithm(vtkInformation*, int, vtkInformationVector**, vtkInformationVector*) [0x7fffc7d6eaa3] vtkStreamingDemandDrivenPipeline::ExecuteInformation(vtkInformation*, vtkInformationVector**, vtkInformationVector*) [0x7fffc7db0ae0]
There's more, but it just chases down the VTK pipeline execution to the RMI.
- Author Developer
The problem might be here where
fides::io::DataSetReader::DataSetReaderImpl
callsCopyShallowIfPossible
on avtkm::cont::UnknownArrayHandle
for a time array. I suspect VTK-m has to make a copy, it is trying to do the copy on the device, and it has to copy data to the device to do that. - Kenneth Moreland mentioned in merge request vtk-m!3286 (merged)
mentioned in merge request vtk-m!3286 (merged)
- Kitware Robot closed with merge request vtk-m!3286 (merged)
closed with merge request vtk-m!3286 (merged)
- Kenneth Moreland mentioned in commit kmorel/vtk@2dbfbef4
mentioned in commit kmorel/vtk@2dbfbef4
- Kenneth Moreland mentioned in merge request vtk!11821 (merged)
mentioned in merge request vtk!11821 (merged)
- Kenneth Moreland mentioned in commit kmorel/vtk@57ac6758
mentioned in commit kmorel/vtk@57ac6758