[cmake-developers] slow regex implementation in RegularExpression
Michael Wild
themiwi at gmail.com
Thu Nov 24 01:03:13 EST 2011
On 11/24/2011 12:34 AM, Brad King wrote:
> On 11/23/2011 5:43 PM, Brad King wrote:
>> On 11/23/2011 12:44 PM, Brad King wrote:
>>> However, the above does not need to stand in the way of solving the
>>> problem you're addressing. We can simply set that goal aside for
>>> now by not exposing TRE in the CMake language anywhere. Use it
>>> just for cmCTestBuildHandler.
>>
>> but people kept going on "the above" part of the debate ;)
>
> After some more thought, I've realized that no approach currently
> proposed is practical:
>
> - cmCTestBuildHandler can use a list of custom regular expressions
> so we cannot assume all of them will be compatible with TRE
>
> - As David Cole pointed out there are many places, like CTest's
> "-R" and "-E" options, that use regular expressions in contexts
> where we cannot possibly use a policy. Any attempt to do so in
> such places would just turn into a second API to set the policy
> in the local context of the regex.
>
> - If we add a second API like MATCHES => MATCHES_TRE then we would
> eventually need to do that in *every* place that offers regex
> matching. That would mean alternatives to the above -R and -E
> options and a lot more.
>
> - People could write code that passes a regex around in a variable.
> This would hide from the author of the regex the context in which
> it will be used, so it is unknown whether it is TRE or traditional.
>
> I propose we go back to an approach discussed the first time PCRE
> was proposed. The indication of the type of regex must be in the
> regex itself. IIRC the proposal was something like
>
> REGEX:... # old
> PCRE:... # PCRE
>
> Of course that is ambiguous too because the prefixes are valid
> expressions. Instead we can use a prefix that is not otherwise
> a valid expression. We can use an idea from Python:
>
> http://docs.python.org/library/re.html
>
> that defines expressions of the form (?...) which are not otherwise
> valid. In order to avoid conflict with future use of the constructs
> they define, we can use the comment form Python defines:
>
> (?#OLD)... # old
> (?#TRE)... # TRE
>
> This is quite easy to implement. Just take the currently proposed
> patch that replaces use of cmsys::RegularExpression with the new
> cmFastRegularExpression wrapper (perhaps renamed cmRegularExpression).
> Inside the wrapper look for a leading comment of the above form to
> decide which regex impl to use internally. Then strip off the prefix
> and pass the rest of the regex to the underlying implementation.
> Once this is done update all the default warning and error regular
> expressions that CTest uses. Add the (?#TRE) prefix to them.
>
> This approach will solve the speed problem, give people access to the
> TRE extended features when they want it anywhere CMake already uses
> a regex, has no compatibility problems, is a very narrow second
> interface, and is extensible for future optional regex behavior.
>
> -Brad
I like that proposal a lot, although I'm afraid it is a bit verbose.
Some of my regexes are already pretty lengthy, pushing the 80-columns limit.
Michael
More information about the cmake-developers
mailing list