Exodus reader is crazy slow reading time data
This issue was created automatically from an original Mantis Issue. Further discussion may take place here.
Exodus reader is crazy slow reading many timestep, many file, many variable datasets. The reason is we are reading redundant data. Within the Exodus reader, we are basically doing the following:
for every file() for all timesteps read the floating point time for each timestep as a double.
Now, if we have a small number of cells (for instance, a million) and a lot of variables, the time on each timestep is spread out quite far within each file. This forces the disk to read up a mutli-million byte disk block for each time read. This is crazy expensive.
ParaView reads a example dataset in 11 minutes 10 seconds. EnSnight reads the same dataset (reading in the last time step) in 14 seconds.
From one of the archtects of the Exodus II spec, Greg Sja...,
If there is a spatially decomposed (file-per-processor) set of files, then all files should have the same number of time steps and the same times for those time steps. If the set of files is written by a sierra application or one that uses the IOSS library, then in addition, there will be an attribute in the file called “last_written_time” which should be the time of the timestep which was written and flushed to disk. If, for example, the code crashed while writing a timestep, then the last_written_time would be less than the maximum time on the database.
So, what I would propose is that we read in the time data from one file, and pass this time data around to the data structures from the other files. Reads are amazingly expensive compared to just passing data between objects/ processes.
I am giving the latest Exodus II spec to Utkarsh. It may be passed into the public domain.