[cmake-developers] slow regex implementation in RegularExpression
Alexandru Ciobanu
alex at rogue-research.com
Wed Nov 16 14:12:39 EST 2011
Hi Brad,
[1]
> On 11/16/2011 12:44 PM, Alexandru Ciobanu wrote:
>> For each library the steps are:
>> - regcomp() the regular expression
>> - regexec() the expression on the string
>
> Can you time each of these steps separately for each library? I would not
> be surprised if the compilation time is the bottleneck. The evaluation and
> matching of a given string just followed a DFA which should be pretty fast.
> If it turns out that compilation is the bottleneck then we should refactor
> things to make sure CTest compiles each regex at most once so we can re-use
> the same DFA every time.
This is how I run the tests (pseudocode):
recomp()
repeat 1000 times:
regexec()
The times I reported are the total run times divided by 1000.
For the slow ones (TRex, PCRE, CMake regexp) I have to repeat 10 times only otherwise I wait too long. So it seems that regcomp() is not the problem in this case.
[2]
I have just tested another library - TRE.
It performs well, I will put it in context:
current CMake -- 860ms
TRex -- 680ms
PCRE -- 610ms ( with pcre_exec() )
PCRE -- 990ms ( with pcre_dfa_exec() )
re2 -- 0.085ms
/usr/include/regex.h -- 0.075ms
TRE -- 0.3ms ( <<<<<< NEW )
Advantages of TRE:
- API very similar to standard regex.h (i.e. easy to integrate with CMake)
- supports wide characters
- compiles on many platforms Windows, AIX, HP-UX, you name it.
What do you think about TRE?
sincerely,
Alex Ciobanu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tre.test.c
Type: application/octet-stream
Size: 13340 bytes
Desc: not available
URL: <http://public.kitware.com/pipermail/cmake-developers/attachments/20111116/3d0912de/attachment-0002.obj>
-------------- next part --------------
More information about the cmake-developers
mailing list