GetGlobalRuntimeDeviceTracker should use thread_local
Now that OSX supports thread_local
we should simplify the implementation of GetGlobalRuntimeDeviceTracker
to use thread_local
instead of having a map for all threads.
Not only will this make the code easier to read, it will reduce the cost of getting the thread runtime device tracker, and will have better runtime overhead if users construct lots of short lived threads that use VTK-m.