Change StorageBasic to use an aligned allocator.
The storage used will now be aligned to 64 bytes, resulting in slightly better cache usage and load/store performance.
The allocator used should (I think) also be STL compatible so if user's want an aligned vector then they can use the StorageBasic allocator. This will also help in the case that they provide memory to the storage since we don't do anything in the case that the user provides unaligned memory (things will just be slower).
I also switched benchmarks that would use unaligned user memory from a std::vector to just do the allocation in an aligned StorageBasic, since the vectors weren't really needed anyways.
I'm a bit unsure on the portability of the aligned allocation functions used but it sounds like they should be available in most places, let me know if there's a better alternative or other platform to support.