[cmake-developers] slow regex implementation in RegularExpression

Brad King brad.king at kitware.com
Wed Nov 23 18:34:13 EST 2011


On 11/23/2011 5:43 PM, Brad King wrote:
> On 11/23/2011 12:44 PM, Brad King wrote:
>> However, the above does not need to stand in the way of solving the
>> problem you're addressing.  We can simply set that goal aside for
>> now by not exposing TRE in the CMake language anywhere.  Use it
>> just for cmCTestBuildHandler.
> 
> but people kept going on "the above" part of the debate ;)

After some more thought, I've realized that no approach currently
proposed is practical:

- cmCTestBuildHandler can use a list of custom regular expressions
  so we cannot assume all of them will be compatible with TRE

- As David Cole pointed out there are many places, like CTest's
  "-R" and "-E" options, that use regular expressions in contexts
  where we cannot possibly use a policy.  Any attempt to do so in
  such places would just turn into a second API to set the policy
  in the local context of the regex.

- If we add a second API like MATCHES => MATCHES_TRE then we would
  eventually need to do that in *every* place that offers regex
  matching.  That would mean alternatives to the above -R and -E
  options and a lot more.

- People could write code that passes a regex around in a variable.
  This would hide from the author of the regex the context in which
  it will be used, so it is unknown whether it is TRE or traditional.

I propose we go back to an approach discussed the first time PCRE
was proposed.  The indication of the type of regex must be in the
regex itself.  IIRC the proposal was something like

  REGEX:...    # old
  PCRE:...     # PCRE

Of course that is ambiguous too because the prefixes are valid
expressions.  Instead we can use a prefix that is not otherwise
a valid expression.  We can use an idea from Python:

  http://docs.python.org/library/re.html

that defines expressions of the form (?...) which are not otherwise
valid.  In order to avoid conflict with future use of the constructs
they define, we can use the comment form Python defines:

 (?#OLD)...   # old
 (?#TRE)...   # TRE

This is quite easy to implement.  Just take the currently proposed
patch that replaces use of cmsys::RegularExpression with the new
cmFastRegularExpression wrapper (perhaps renamed cmRegularExpression).
Inside the wrapper look for a leading comment of the above form to
decide which regex impl to use internally.  Then strip off the prefix
and pass the rest of the regex to the underlying implementation.
Once this is done update all the default warning and error regular
expressions that CTest uses.  Add the (?#TRE) prefix to them.

This approach will solve the speed problem, give people access to the
TRE extended features when they want it anywhere CMake already uses
a regex, has no compatibility problems, is a very narrow second
interface, and is extensible for future optional regex behavior.

-Brad



More information about the cmake-developers mailing list