MantisBT - CMake
View Issue Details
0015377CMakeCMakepublic2015-01-27 18:342015-07-08 08:57
Ongun Kanat 
Brad King 
highmajoralways
closedfixed 
Linux x86_64Arch LinuxRolling
CMake 3.1.1 
CMake 3.1.3CMake 3.1.3 
0015377: CMake cannot test compiler features in Turkish locale
When using Turkish UTF-8 locale(tr_TR.UTF-8) CMake exits with error below

   CMake Error at /usr/share/cmake-3.1/Modules/CMakeTestCCompiler.cmake:78 (CMAKE_DETERMINE_COMPILE_FEATURES):
  Unknown CMake command "CMAKE_DETERMINE_COMPILE_FEATURES".

Exporting LANG and LC_ALL variables as en_US.UTF-8 fixes problem temporarily.
- Download any source with CMake build support
- Run
  $ export LANG=tr_TR.UTF-8
  $ export LC_ALL=tr_TR.UTF-8
  $ cmake
- It will exit.
I suspect that there may be a Turkish 'I' problem in source code. If it does a uppercase/lowercase conversion there is a risk that the result of conversion wrong/non-English.

For detailed info check:
http://www.moserware.com/2008/02/does-your-code-pass-turkey-test.html [^]

I'm also adding trace output of cmake.
linux, locale, make
txt cmakeout.txt (336,038) 2015-01-27 18:34
https://public.kitware.com/Bug/file/5366/cmakeout.txt
patch 0001-Encoding-Only-call-setlocale-where-required.patch (3,876) 2015-02-06 12:19
https://public.kitware.com/Bug/file/5377/0001-Encoding-Only-call-setlocale-where-required.patch
Issue History
2015-01-27 18:34Ongun KanatNew Issue
2015-01-27 18:34Ongun KanatFile Added: cmakeout.txt
2015-01-27 18:36Ongun KanatTag Attached: linux
2015-01-27 18:36Ongun KanatTag Attached: make
2015-01-27 18:36Ongun KanatTag Attached: locale
2015-02-02 13:54Ben BoeckelNote Added: 0037879
2015-02-02 14:27Clinton StimpsonNote Added: 0037880
2015-02-04 17:51Ongun KanatNote Added: 0037918
2015-02-05 14:15Stephen KellyNote Added: 0037927
2015-02-05 15:05Ben BoeckelNote Added: 0037928
2015-02-05 16:28Clinton StimpsonNote Added: 0037929
2015-02-06 11:40Brad KingNote Added: 0037934
2015-02-06 12:00Clinton StimpsonNote Added: 0037935
2015-02-06 12:19Clinton StimpsonFile Added: 0001-Encoding-Only-call-setlocale-where-required.patch
2015-02-06 12:20Clinton StimpsonNote Added: 0037936
2015-02-06 13:30Brad KingAssigned To => Brad King
2015-02-06 13:30Brad KingStatusnew => assigned
2015-02-06 13:30Brad KingTarget Version => CMake 3.1.3
2015-02-06 13:40Brad KingNote Added: 0037937
2015-02-06 14:34Clinton StimpsonNote Added: 0037941
2015-02-10 09:50Brad KingNote Added: 0037951
2015-02-10 09:50Brad KingStatusassigned => resolved
2015-02-10 09:50Brad KingResolutionopen => fixed
2015-02-10 09:50Brad KingFixed in Version => CMake 3.1.3
2015-07-08 08:57Robert MaynardNote Added: 0039046
2015-07-08 08:57Robert MaynardStatusresolved => closed

Notes
(0037879)
Ben Boeckel   
2015-02-02 13:54   
Should we just force the locale to be either en_US.UTF-8 or C in main()? Or maybe just for try_* functions?
(0037880)
Clinton Stimpson   
2015-02-02 14:27   
Probably by setting the locale to C in the try_* functions.
See FindSubversion.cmake as an example.
(0037918)
Ongun Kanat   
2015-02-04 17:51   
Does not changing locale to "C" affect files with UTF-8 names. There may be files with non-ascii names.
(0037927)
Stephen Kelly   
2015-02-05 14:15   
To reproduce on Ubuntu, install the language-pack-tr package. Then:


  $ cat turkish.cmake

  macro(MACRO1_)
  endmacro()

  MACRO1_()

  macro(MACRO2_)
  endmacro()

  macro2_()

  macro(MACRO3_I)
  endmacro()

  macro3_i()

  $ LC_ALL=tr_TR.UTF-8 cmake -P turkish.cmake
  CMake Error at turkish.cmake:15 (macro3_i):
    Unknown CMake command "macro3_i".


  $ cat turkish_if.cmake

  IF(TRUE)
  ENDIF()

  $ LC_ALL=tr_TR.UTF-8 cmake -P turkish_if.cmake
  CMake Error at turkish_if.cmake:2 (IF):
    Unknown CMake command "IF".


This bisects to commit v3.1.0-rc1~406^2~1 (Encoding: Add setlocale() to applications., 2014-05-30).

I haven't followed what has been going on regarding encodings, but it seems that if the locale is going to come from the environment, we'd have to use ICU or so to do case insensitive comparisons for things like that. ToLower won't cut it.

Also, questions come up about whether TOUPPER should be locale aware so that

 string(TOUPPER "Straße" OUT)

results in "STRASSE" etc. Currently it outputs STRAßE, which is 'wrong'. Or should a new command should be added for locale aware uppercasing etc.

Also whether need new commands like

 if(A LOCALE_AWARE_STREQUAL B)

are needed, whether list(SORT) should be locale aware etc. All that is stuff that ICU gives.
(0037928)
Ben Boeckel   
2015-02-05 15:05   
Well, LOCALE_STREQUAL makes no sense because that is closer to Unicode normalization rules than anything else (something I don't want to touch). As for sorting and string(TOUPPER) and string(TOLOWER), having LOCALE_ versions of those makes sense. Outside of LOCALE_* bits, we should probably just force en_US.UTF-8 while saving LC_ALL in main() for use at those places. "Just" need to put icu into CMake's build tree with support for an external one.

Also, seems your commit name is off? I see it as 730e386291cb7aad8f532125216b2ec71d710748 while v3.1.0-rc1~406^2~1 is b70295760c22414ca80f51704ee1ab63872e0a7a.
(0037929)
Clinton Stimpson   
2015-02-05 16:28   
Thanks Stephen for narrowing that down and your comments Ben.

Since this is a regression, I see a few possible ways to get the old behavior back:

1.
Change SystemTools.cxx to use:
std::toupper(..., std::locale("C"));
std::tolower(..., std::locale("C"));

2.
Don't assume cmCommand::GetName() returns a lower case string, and always call to lower() on it while comparing with another tolower'd string.

3.
Use SystemTools::Strucmp() for all case independent comparisons.

4.
Remove the setlocale() call in the commit identified by Stephen (this will cause other regressions).


And yes, its a good question whether string(TOUPPER ...) should be locale aware.
But I think introducing new string(LOCALE_*) options is separate from fixing the regression.


Any preference for the regression fix, or other ideas?
(0037934)
Brad King   
2015-02-06 11:40   
Re 0015377:0037929: Will removing setlocale cause 3.1 to regress from 3.0 capabilities?

Currently the setlocale() call uses only LC_CTYPE. Why is that necessary/sufficient to address 0014934?

Why do we need a locale to handle UTF-8 strings and file names if our implementation is 8-bit clean? I can't imagine every tool in the world needs to link/distribute libicu and a huge amount of locale data to deal with non-ASCII file names.

We don't provide any functionality for conversion or normalization of strings beyond TOUPPER, TOLOWER, and case-insensitive command names. All of these are defined by CMake only for ASCII characters right now.
(0037935)
Clinton Stimpson   
2015-02-06 12:00   
I also don't think we need ICU.

libarchive uses nl_langinfo(CODESET) for iconv, which requires setlocale(LC_CTYPE) to work with non-ascii filenames.

Perhaps we can just move the setlocale() to go around libarchive calls. This is probably a better way to go.
(0037936)
Clinton Stimpson   
2015-02-06 12:20   
I've attached a patch to remove setlocale() which fixes this Turkish issue, plus it adds setlocale() calls for libarchive to keep the fix for bug 0014934.
(0037937)
Brad King   
2015-02-06 13:40   
Re 0015377:0037936: Thanks. Based on that I constructed these commits on top of 3.1.2:

 Do not call setlocale() globally in CMake applications
 http://cmake.org/gitweb?p=cmake.git;a=commitdiff;h=87be2e14 [^]

 Add setlocale() calls around use of libarchive APIs
 http://cmake.org/gitweb?p=cmake.git;a=commitdiff;h=cd408d93 [^]

Please test.
(0037941)
Clinton Stimpson   
2015-02-06 14:34   
I tested stage/no-global-setlocale on examples provided in this bug report and also in bug 0014934, and it works fine for me.
(0037951)
Brad King   
2015-02-10 09:50   
The changes linked in 0015377:0037937 have been merged to the 'release' branch for 3.2.0 and also to 'release-3.1' for 3.1.3.
(0039046)
Robert Maynard   
2015-07-08 08:57   
Closing resolved issues that have not been updated in more than 4 months.