[cmake-developers] slow regex implementation in RegularExpression

Michael Wild themiwi at gmail.com
Thu Nov 24 01:03:13 EST 2011


On 11/24/2011 12:34 AM, Brad King wrote:
> On 11/23/2011 5:43 PM, Brad King wrote:
>> On 11/23/2011 12:44 PM, Brad King wrote:
>>> However, the above does not need to stand in the way of solving the
>>> problem you're addressing.  We can simply set that goal aside for
>>> now by not exposing TRE in the CMake language anywhere.  Use it
>>> just for cmCTestBuildHandler.
>>
>> but people kept going on "the above" part of the debate ;)
> 
> After some more thought, I've realized that no approach currently
> proposed is practical:
> 
> - cmCTestBuildHandler can use a list of custom regular expressions
>   so we cannot assume all of them will be compatible with TRE
> 
> - As David Cole pointed out there are many places, like CTest's
>   "-R" and "-E" options, that use regular expressions in contexts
>   where we cannot possibly use a policy.  Any attempt to do so in
>   such places would just turn into a second API to set the policy
>   in the local context of the regex.
> 
> - If we add a second API like MATCHES => MATCHES_TRE then we would
>   eventually need to do that in *every* place that offers regex
>   matching.  That would mean alternatives to the above -R and -E
>   options and a lot more.
> 
> - People could write code that passes a regex around in a variable.
>   This would hide from the author of the regex the context in which
>   it will be used, so it is unknown whether it is TRE or traditional.
> 
> I propose we go back to an approach discussed the first time PCRE
> was proposed.  The indication of the type of regex must be in the
> regex itself.  IIRC the proposal was something like
> 
>   REGEX:...    # old
>   PCRE:...     # PCRE
> 
> Of course that is ambiguous too because the prefixes are valid
> expressions.  Instead we can use a prefix that is not otherwise
> a valid expression.  We can use an idea from Python:
> 
>   http://docs.python.org/library/re.html
> 
> that defines expressions of the form (?...) which are not otherwise
> valid.  In order to avoid conflict with future use of the constructs
> they define, we can use the comment form Python defines:
> 
>  (?#OLD)...   # old
>  (?#TRE)...   # TRE
> 
> This is quite easy to implement.  Just take the currently proposed
> patch that replaces use of cmsys::RegularExpression with the new
> cmFastRegularExpression wrapper (perhaps renamed cmRegularExpression).
> Inside the wrapper look for a leading comment of the above form to
> decide which regex impl to use internally.  Then strip off the prefix
> and pass the rest of the regex to the underlying implementation.
> Once this is done update all the default warning and error regular
> expressions that CTest uses.  Add the (?#TRE) prefix to them.
> 
> This approach will solve the speed problem, give people access to the
> TRE extended features when they want it anywhere CMake already uses
> a regex, has no compatibility problems, is a very narrow second
> interface, and is extensible for future optional regex behavior.
> 
> -Brad

I like that proposal a lot, although I'm afraid it is a bit verbose.
Some of my regexes are already pretty lengthy, pushing the 80-columns limit.

Michael


More information about the cmake-developers mailing list