[cmake-developers] [PATCH v4 4/4] For Windows encode process output to internally used encoding
Dāvis Mosāns
davispuh at gmail.com
Thu Jul 21 20:43:05 EDT 2016
2016-07-21 20:46 GMT+03:00 Brad King <brad.king at kitware.com>:
> On 07/21/2016 01:36 PM, Dāvis Mosāns wrote:
>> Anyway I improved this in places where it was easy, but in some places it's
>> more complicated...
>>
>> For example
>>
>> while ((p = cmsysProcess_WaitForData(cp, &data, &length, CM_NULLPTR), p)) {
>> // Put the output in the right place.
>> if (p == cmsysProcess_Pipe_STDOUT && !output_quiet) {
>> if (output_variable.empty()) {
>> cmSystemTools::Stdout(data, length);
>>
>> Here we output buffer immediately.
>>
>> while ((out || err) &&
>> (p = cmsysProcess_WaitForData(cp, &data, &length, CM_NULLPTR), p)) {
>> if (out && p == cmsysProcess_Pipe_STDOUT) {
>> if (!out->Process(data, length)) {
>
> In such cases the data need to be piped through a buffered decoder
> that can keep partial fragments around between updates.
>
> Does MultiByteToWideChar or some other API have a way to detect
> such boundaries?
As far as I know in WinAPI there isn't any such function.
With MultiByteToWideChar such partial char would be replaced with ? (U+003F)
or � (U+FFFD).
We would need to use some library or implement this ourselves.
In WinAPI there's CharPrevExA and IsDBCSLeadByteEx (or GetCPInfo) which we can
use and implement this easily for 1-2 byte code pages but it doesn't work for
code pages where character can be more than 2 bytes, eg. UTF-8. Those would
need to be handled separately.
Also could check if last character is ? and try again with one byte less.
Using EnumSystemCodePages and GetCPInfoEx I collected info about all supported
code pages on my Windows 10
https://paste.kde.org/pthwqdbxv/rjwgwd/raw
Code pages where MaxCharSize is more than 1 and UseLeadByte is No need special
handling for those depending on that particular encoding.
>> + bool DecodeText(std::string raw, std::string& decoded)
>> + {
>> + bool success = true;
>> + decoded = raw;
>> +#if defined(_WIN32)
>> + if (raw.size() > 0 && codepage != defaultCodepage) {
>> + success = false;
>> + const int wlength = MultiByteToWideChar(codepage, 0, raw.c_str(), int(raw.size()), NULL, 0);
>
> Why do we need new calls to MultiByteToWideChar instead of
> having clients just directly use kwsysEncoding_mbstowcs?
>
Because from WaitForData we're getting data and length, and I assume that data
might not be null-terminated but kwsysEncoding_mbstowcs expects source to be
null-terminated and doesn't accept length.
More information about the cmake-developers
mailing list