copying cpu memory to pascal managed memory now works consistently.
When copying small arrays from cpu memory to pascal memory we would see subsequent kernels fail as the memory transfer hadn't finished. This is a bug as each stream should act like a FIFO queue. So for now when encountering this use case we explicitly synchronize after the memcpy.
Showing with 10 additions and 0 deletions