[cmake-developers] slow regex implementation in RegularExpression
Alexander Neundorf
neundorf at kde.org
Wed Nov 16 12:47:27 EST 2011
On Wednesday 16 November 2011, Alexandru Ciobanu wrote:
> Hi,
>
> I was successful in making CMake work with PCRE. As expected, it was
> straightforward.
>
> The problem is that PCRE is also slow. So, I tested the same string and
> regex with multiple different libraries in order to assess performance.
>
> The regular expression in question is:
> ([^:]+): warning[ \t]*[0-9]+[ \t]*:
>
> The string is a 6k character string, a typical compiler command line. (See
> my first message for sample code).
>
> For each library the steps are:
> - regcomp() the regular expression
> - regexec() the expression on the string
>
> Here is how much time it takes to process the string *one* time:
> current CMake -- 860ms
> TRex -- 680ms
> PCRE -- 610ms ( with pcre_exec() )
> PCRE -- 990ms ( with pcre_dfa_exec() )
> re2 -- 0.085ms
> /usr/include/regex.h -- 0.075ms
I wouldn't have expected this.
> As it can be seen re2 and the standard regex.h are orders of magnitude
> faster in executing this particular regular expression.
>
> The difference between PCRE and re2 is also confirmed by this study:
> http://swtch.com/~rsc/regexp/regexp3.html
>
> CONCLUSTION:
> - PCRE is not fast enough
>
> QUESTION:
> - is there a reason we shouldn't use the standard regex.h?
Does it exist everywhere, e.g. on Windows, e.g. with MSVC 6 ?
Alex
More information about the cmake-developers
mailing list