UTF-8 filename support for Windows via vtksys
In !6122 (merged) the default KWSys encoding for VTK on Windows was changed to UTF-8 in Utilities/KWSys/CMakeLists.txt
:
set(KWSYS_ENCODING_DEFAULT_CODEPAGE CP_UTF8)
This causes vtksys::Encoding::ToWide()
and vtksys::Encoding::ToWindowsExtendedPath()
to decode their arguments as UTF-8 instead of using the Windows system locale. The benefit is that it allows VTK users to access filenames that contain Unicode characters that aren't within their locale's character set. The goal is, for example, to allow a VTK user in the USA to easily access a filename that contains Chinese characters.
Of course, in order for this to work, VTK must access the filesystem via vtksys
, or if this isn't possible, then VTK must at least use vtksys::Encoding::ToWindowsExtendedPath()
to convert UTF-8 strings to the wide strings that Windows uses to natively support Unicode on its filesystems.
Stuff that has been done: !6122 (merged) !6291 (merged) !6301 (merged) !6422 (merged) !6426 (merged)
- change
fopen()
tovtksys::SystemTools::Fopen()
- change
ifstream/ofstream
tovtksys::ifstream/ofstream
- use
vtksys::Encoding::ToWindowsExtendedPath()
where the above is impossible/inconvenient
Stuff that hasn't been done:
The changes above covered VTK classes that access files via fopen() and C++ streams. However, they didn't touch VTK classes that use third-party libraries such as hdf5, netcdf, or libz to access files.
VTKs third-party IO libraries fit into three categories:
- libraries that require the files to already be open: jpeg, png, ...
- libraries that provide 'wide string' APIs for use on Windows: hdf5, zlib, tiff, ...
- libraries that only allow narrow strings encoded in the current locale: netcdf, ...
For (1), nothing need be done, as these are covered by the previous batch of changes.
For (2), it will be necessary to change the VTK classes that use these libraries so that they apply vtksys::Encoding::ToWindowsExtendedPath()
and call the 'wide' variant of the library APIs. Examples are the vtkNIFTIImageReader/Writer (for zlib), and all VTK classes that use hdf5.
For (3), there is no good solution at this time. If a library accepts neither UTF-8 nor wide strings, the options are:
- accept only narrow strings in the locale encoding, instead of accepting UTF-8, or
- accept UTF-8 and attempt to convert it to a narrow string in the locale encoding (generate an error on failure)
The first option really isn't acceptable, because VTK users shouldn't be expected to consult the documentation for individual VTK classes to see whether they expect UTF-8 vs. the locale encoding. And this is assuming that the documentation actually provides this information and is kept up-to-date.
The second option is better, because it at least allows users to work with UTF-8 filenames that only use characters from their local language. Also, if illegal characters are encountered, they can be informed via an error message that this is the reason they are unable to open the file.
Addendum: vtkDirectory
The vtkDirectory class currently uses the Windows C library functions '_findfirst()' and '_findnext()' and narrow strings.