Clarifications and modifications to Global Ids

From an e-mail thread with Utkarsh:

We surely have some confusion between the name of an array and it's "attribute" type. To clarify, let's recollect that VTK has a notion of an array that can be flagged as "GLOBAL_IDS". This means something very specific to all filters; it means that two elements with the same "Global ID" are indeed the same and filters (such as D3) may use this fact to speed things up. When any filter splits an element or creates new elements, the GLOBAL_IDS in the input cannot be preserved (without regenerating them) and hence they are often dropped -- this is the oft seen issue where global ids suddenly disappears. At the same time, an array maybe named "GlobalIds" or "GlobalElementId" or whatever. That really doesn't mean anything for the VTK pipeline. Oftentimes the array named GlobalIds is indeed the GLOBAL_IDS but not always. For example, if you open disk_out_ref.ex2 and tetrahedralize it, the array named "GlobalElementId" still exists in the output, but it's not longer a GLOBAL_IDS as far as VTK/ParaView is concerned.

Following a discussion with Berk, here's a thought to make this more streamlined:

readers like Exodus which have unique ids for elements should indeed add those arrays to the output and flag them as GLOBAL_IDS but name the arrays using the format conventions e.g. ExodusNodeId, ExodusElementId, exo_id or whatever makes sense.
panels like spreadsheet view and information tab should explicitly indicate which arrays are GLOBAL_IDS. That helps users identify them irrespective of how they are named. Maybe different color for the column in the Spreadsheet view, a different icon and tooltip in Information tab, or something like that.
filters that cannot preserve GLOBAL_IDS, such as contour etc, should not drop them entirely instead flag them as PEDIGREE_IDS instead. Thus, after such a filter, ExodusElementId array will not disappear (as it does now) and continue to exist but will no longer be flagged GLOBAL_IDS instead as PEDIGREE_IDS. While users don't really need to know what the qualification of PEDIGREE_IDS means, it means that these are the original GLOBAL_IDS that these elements came from. For all intents and purposes, PEDIGREE_IDS notion can continue to stay internal to VTK/ParaView. Currently, the PedigreeNodeId and PedigreeElementId arrays are added by the Exodus reader. It should stop doing that.
the Point ID, Cell ID columns in the spreadsheet view are simply the point's (or cell's) index in the local dataset. Maybe we rename them to "Point Index" or "Cell Index" to avoid confusion with other IDs.

Thoughts? Utkarsh

Then, from Watney:

From a user's perspective, I don't mind that some variables "internal" to paraview are exposed; in fact it is necessary to understand what paraview is doing.
It should be clear which variables are internal vs. which are read in. I'm fine with prefixing internal variables with vtk or something similar. For the processID example, I don't think it is far fetched that a simulation code running in parallel would write the processID to the output, and it should be easy to distinguish that from the processID for the parallel paraview session.
Variable names should be consistent throughout.
I think it would be helpful to have the exodus reader display variables ExodusGlobalElementID and ExodusGloalNodeID, etc., but I also recognize that for exodus this would probably be disruptive for backwards compatibility and that whoever writes a reader might have good reasons for not modifying that format's variable names. Maybe a checkbox to allow name munging?
Where it makes sense, access to the mapping between internal variables and those read in should be accessible. (Of course, by variables here I mean mesh topology fields like block, cell, and point identifiers.) I guess this would have to be provided by the reader, since e.g., the exodus reader knows that exodus format defines a global element id and a global node id, and since it is a reader for paraview it should also know that the GlobalElementID is a VTK GLOBAL_IDS. On the other hand, structured cgns has no concept of a global cell id; it has blocks, and within each block there is i, j, k; if the cgns reader provided the (block, i, j, k) to GLOBAL_IDS mapping that would be great. One reason it would be great is then, in the FindData, the user could query for cells of (block, i, j, k) to get cells or lines or slices (in i, j, k space, not x, y, z space); without the reader providing this mapping, as a user I have to know how to get the internal GLOBAL IDS and then find the one that matches the (block, i, j, k) coordinate that I may be more familiar with. Of course, this extends beyond FindData to any place the user wants to identify a cell or point by the ID system of the mesh (mesh format) then ran their simulation on.
Thank you for the explanation of GLOBAL_IDS and PEDIGREE_IDS, I never understood the distinction. I agree that that the behavior should be initially, an array is marked GLOBAL_IDS, on modification they are re-marked as PEDIGREE_IDS; and then also expose the new (internal) array marked GLOBAL_IDS. Also agree with Exodus reader behavior being modified.
Perhaps the local Point ID and Cell ID should be consistently treated throughout paraview. If I can see those indices in the spreadsheet, I should be able to display them in the selection inspector, query on them in FindData, etc. Also, please be clear if they are indexes or IDs. (This applies to all the other variables we have been discussing; I think they have all been IDs.) I recognize that exposing these variables everywhere, as well as global ids and other reader-specific variables, might be excessive.

Thanks for digging into all this.

Watney

Edited Jan 27, 2021 by Vicente Bolea

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information