[cmake-developers] Making your regular expression engine more reliable

Sebastian Holtermann seblist at xwmw.org
Fri May 19 07:18:42 EDT 2017



Am 18.05.2017 um 23:07 schrieb Domen Vrankar:
> 2017-05-18 21:44 GMT+02:00 Alan W. Irwin <irwin at beluga.phys.uvic.ca
> <mailto:irwin at beluga.phys.uvic.ca>>:
> 
>     I have just discovered a long-standing regular expression bug (see
>     <https://gitlab.kitware.com/cmake/cmake/issues/16899
>     <https://gitlab.kitware.com/cmake/cmake/issues/16899>>) that has been
>     around since at least 3.0.2.
> 
>     So your unit tests for regular expressions obviously missed at least
>     this issue. I have no idea what those unit tests are (or even if they
>     exist), but one possibility for attempting to wring most of the bugs out
>     of your regular expression processor is to adapt some other project's
>     regexp test suite. See
>     <http://stackoverflow.com/questions/15819919/where-can-i-find-unit-tests-for-regular-expressions-in-multiple-languages
>     <http://stackoverflow.com/questions/15819919/where-can-i-find-unit-tests-for-regular-expressions-in-multiple-languages>>
>     for a rather large list of such test suites.
> 
>     Another possibility is simply to forget supporting your own regexp
>     engine and adopt someone else's very well regarded regexp engine (such
>     as libprng).  I vaguely recall that has been suggested before, but
>     since that hasn't happened I presume inertia or NIH syndrome won or
>     else there was some strong reason why you didn't go that route.
> 
> 
> There's a third option that comes to mind - I remember that a while back
> there was talk about TR1 becoming a requirement for building CMake so
> TR1 regex library could be exposed (probably just |ECMAScript version).

+1

There are more limitations in the current regexp implementation.

1) It uses global variables that store only the result of
   the latest evaluation. This makes it impossible to access the
   matches of two or more cmsys::RegularExpression instances.

2) Because of the global variables cmsys::RegularExpression
   is not thread safe.

There are no threads used in CMake as far as I can tell
from a quick code search.
But there are some places in the AUTOGEN parts that could be
parallelized if regular expressions were thread safe (and threads
were available in CMake).

-Sebastian


More information about the cmake-developers mailing list