MantisBT - CMake
View Issue Details
0013760CMakeCMakepublic2012-11-29 16:132016-06-10 14:31
Andreas Mohr 
Kitware Robot 
normalmajoralways
closedmoved 
PC 32bitLinuxDebian stable
CMake 2.8.9 
 
0013760: file(STRINGS): very questionable (sufficiently certainly buggy?) behaviour for square brackets
I just tried parsing a section that resembles an IDL file part (the format spec of which has sections enclosed in '[',']').
I was rather very astonished about the result of file(STRINGS) on this
(after already having spent a sizeable chunk of the day about various other file(STRINGS) specifics, to add insult to injury).

Why in h*ll would file(STRINGS) take such specific care about the format of the text file?
Don't tell me that it's because of (quoting docs) "Intel Hex and Motorola S-record files", which could possibly happen to have certain '['-enclosed sections. That would be a sad result for an otherwise (in the case of non-Intel/Motorola files) supposedly(?) sufficiently generic file(STRINGS) functionality.

Needless to say having any []-enclosed yet originally *multi-line* content
end up delivered as a *single* line within foreach() processing is very problematic when contrasted against my expectations.
If it actually is correct handling (for certain aspects of "correct") and there's no quite standard CMake mechanism explanation for this that I managed to miss, then docs should definitely be corrected to mention this possibly '['-specific handling.

Any ideas or comments about this?

Severity major since it's data corrupting (e.g. going line-by-line over a regex with start-of-line/end-of-line constraints - ^$ - *will* cause headache or worse).

Thank you!
cmake_minimum_required(VERSION 2.8)

project(file_strings_bug_test NONE)

macro(write_file _file _content)
  file(WRITE "${CMAKE_CURRENT_BINARY_DIR}/${_file}" "${_content}")
endmacro(write_file _file _content)

macro(read_file _file)
  file(STRINGS "${CMAKE_CURRENT_BINARY_DIR}/${_file}" _content_list)
  foreach(line_ ${_content_list})
    message("line ${_file}: ${line_}")
  endforeach(line_ ${_content_list})
endmacro(read_file _file)

set("content_ok" "Hello
World
My Worrying
Test")

set(content_ko "[${content_ok}]")
set(content_ko2 "Hi
There
[${content_ok}] ")

write_file(file_ok "${content_ok}")
write_file(file_ko "${content_ko}")
write_file(file_ko2 "${content_ko2}")

read_file(file_ok)
read_file(file_ko)
read_file(file_ko2)
$ cmake ..
line file_ok: Hello
line file_ok: World
line file_ok: My Worrying
line file_ok: Test
line file_ko: [Hello;World;My Worrying;Test]
line file_ko2: Hi
line file_ko2: There
line file_ko2: [Hello;World;My Worrying;Test]
-- Configuring done
-- Generating done
-- Build files have been written to: /home/andi/prg/cmake_tests/file_strings_bug_test/build
No tags attached.
Issue History
2012-11-29 16:13Andreas MohrNew Issue
2012-11-29 16:52David ColeNote Added: 0031770
2012-11-29 17:27Andreas MohrNote Added: 0031771
2012-11-29 17:37Andreas MohrNote Added: 0031772
2012-11-30 06:51David ColeNote Added: 0031778
2012-11-30 06:55David ColeNote Added: 0031779
2012-11-30 08:26Brad KingNote Added: 0031781
2016-06-10 14:28Kitware RobotNote Added: 0042162
2016-06-10 14:28Kitware RobotStatusnew => resolved
2016-06-10 14:28Kitware RobotResolutionopen => moved
2016-06-10 14:28Kitware RobotAssigned To => Kitware Robot
2016-06-10 14:31Kitware RobotStatusresolved => closed

Notes
(0031770)
David Cole   
2012-11-29 16:52   
The results are as I would expect them...

It is equivalent to running the "strings" command line utility on the file.

Why do you think the file(STRINGS should return "lines" of text? It returns strings.

If you want to have newlines contained in the returned strings, try using the NEWLINE_CONSUME argument (meaning consume newlines into the returned strings...):

  file(STRINGS "${CMAKE_CURRENT_BINARY_DIR}/${_file}" _content_list NEWLINE_CONSUME)
(0031771)
Andreas Mohr   
2012-11-29 17:27   
Hi,

first, thank you for your fast response! (hmm, I sense a recurring pattern...)

Darn, right, of course UNIX "strings" has a purpose which is rather different from line splitting. Don't know how I managed to get that wrong.
I think adding a docs phrase like "similar to the strings UNIX utility" would be useful.

To achieve the very same output with "strings" on my IDL file, I had to use strings -n 1, though.


However, that being said, I'm still unconvinced that all is fine in la-la land.

I realized that having one '[' added into the previously alpha-only file (i.e., even with the closing ']' omitted) will drastically change the splitting behaviour.

I'm currently talking of this content:

hello
[Hello
World
My Worrying
Test

And in this case even "strings -n 1 file_ko" output is *different* from what CMake produces:
$ strings -n 1 file_k
hello
[Hello
World
My Worrying
Test


line file_ko: hello
line file_ko: [Hello;World;My Worrying;Test


Note that prepending neither '*' nor '<' nor '{' rather than '[' produce this effect (nor '%', '!', ':', '\"', '-'), it's *only* '[' which does that.

A parser "feature" (parser state machine paying close attention to its somehow "special" '[' char and then starting to do weird things) seems more and more likely.
(0031772)
Andreas Mohr   
2012-11-29 17:37   
Doing a file(STRINGS) over a large binary e.g. /bin/touch will cause one to realize that foreach() output keeps alternating between single-element and concatenated-elements dumping (right after it encountered one of '['/']' chars each time...), whereas "strings -n 1" totally does not do that.
(0031778)
David Cole   
2012-11-30 06:51   
I can't quite figure out why file(STRINGS cares about "[" characters...

The loop in the code starting here:
  https://github.com/Kitware/CMake/blob/e0af55a5f4cd84db1cc5a3517e730ea8c6332f45/Source/cmFileCommand.cxx#L582 [^]

...should only ever treat the file input character by character, and pull ASCII strings out of it. The '[' and ']' are not treated specially at all. They should fall squarely in the middle of the "(c >= 0x20 && c < 0x7F)" character range.

This is a very weird issue...
(0031779)
David Cole   
2012-11-30 06:55   
FYI... Here is some test code in the CMake source tree that reliably does a foreach over lines of text in a text file:

  https://github.com/Kitware/CMake/blob/e0af55a5f4cd84db1cc5a3517e730ea8c6332f45/Tests/CMakeTests/CheckSourceTreeTest.cmake.in#L249 [^]

...although you do have to add&remove a special end of line character to each line to account for semi-colons in the output. 'E' in this case.
(0031781)
Brad King   
2012-11-30 08:26   
The [] behavior is probably in the

 foreach(line_ ${_content_list})

line where variable expansion does not separate on ';' inside square brackets. Therefore if '[' appears on one line and ']' appears on a later line the "each" will not necessarily see them as two lines.
(0042162)
Kitware Robot   
2016-06-10 14:28   
Resolving issue as `moved`.

This issue tracker is no longer used. Further discussion of this issue may take place in the current CMake Issues page linked in the banner at the top of this page.