[cmake-developers] slow regex implementation in RegularExpression
Alexandru Ciobanu
alex at rogue-research.com
Mon Nov 14 12:57:57 EST 2011
Hi,
Our team is affected by issue 0012381, that causes extremely poor performance by CTest. Details here:
http://public.kitware.com/Bug/view.php?id=12381
I've created a small test case that demonstrates the problem. Please find the .cpp file attached.
From what I see, the RegularExpression class uses Henry Spencer regex implementation, which is known to be slow for some cases.
On my machine, the attached example runs in 0.8 sec. Just to process one string!
$ time ./repr
real 0m0.865s
user 0m0.862s
sys 0m0.002s
Grep can process 100k such strings in 0.5 sec (which includes reading a 570MB file from disk):
$ wc -l big.str.txt
100000 big.str.txt
$ ls -lh big.str.txt
-rw-r--r-- 1 alex staff 572M 14 Nov 12:30 big.str.txt
$ time grep "([^:]+): warning[ \t]*[0-9]+[ \t]*:" big.str.txt
real 0m0.525s
user 0m0.255s
sys 0m0.269s
I see three ways to fix this problem:
A) use a trusted 3rd party regex library, like re2 or pcre
B) find another self-contained regex implementation
C) try to use the standard POSIX regex available in regex.h on most systems
I tried to find another self-contained regex implementation, that we could use. I found Tiny REX, but it is as slow, in this case, as Henry Spencer's implementation.
So what do you think is the best way to proceed about this problem?
sincerely,
Alex Ciobanu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: repr.cpp
Type: application/octet-stream
Size: 6460 bytes
Desc: not available
URL: <http://public.kitware.com/pipermail/cmake-developers/attachments/20111114/958ec299/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Makefile
Type: application/octet-stream
Size: 153 bytes
Desc: not available
URL: <http://public.kitware.com/pipermail/cmake-developers/attachments/20111114/958ec299/attachment-0005.obj>
More information about the cmake-developers
mailing list