Enable shared CUDA builds when not compiling virtuals
The reason why we did not support shared libraries when CUDA compiles were on is that virtual methods require a special linking step to pull together all virtual methods that might be called. I other words, you cannot call a virtual CUDA method defined inside a library. This requirement goes away when virtuals are removed.
Also removed the necessity of using seprable compilation with cuda. Again, this is only needed when a CUDA function is defined in one translation unit and used in another. Now we can enforce that all translation units define their own CUDA functions.