Change cuda calls to use the per-thread stream.

Address #168 (closed)

Merge request reports

Loading