[cmake-developers] slow regex implementation in RegularExpression
Bill Hoffman
bill.hoffman at kitware.com
Mon Nov 14 13:30:27 EST 2011
Sorry for the top post... However, if the issue with ctest being slow
can be fixed by using PCRE in CMake, that is good news. We can just
link in the library, and replace that small part of CMake internal code
that has the performance problem. This should not break backwards
compatibility. It also gives us a way to slowly bring in PCRE into CMake.
Alex, is there a way you can try PCRE in CMake to see if it fixes the
problem?
-Bill
On 11/14/2011 1:13 PM, Pau Garcia i Quiles wrote:
> Hi,
>
> Check this:
>
> A wish a day 11: Perl Compatible Regular Expressions in CMake
> http://www.elpauer.org/?p=684
>
> Unfortunately the student turned out to be a total fraud: he knew
> nothing about CMake, regular expressions (much less PCRE!), git, and
> could barely manage with C/C++. After months of explaining *really*
> basic stuff (such as the difference between a static and a shared
> library), he silently gave up.
>
> I do have an initial implementation and extensive information on how to
> implement PCRE in CMake. It's just I don't have enough spare time to do
> that, and at work I cannot justify investing so many time in CMake for
> free (for now, we don't need advanced regular expressions)
>
>
> On Mon, Nov 14, 2011 at 6:57 PM, Alexandru Ciobanu
> <alex at rogue-research.com <mailto:alex at rogue-research.com>> wrote:
>
> Hi,
>
> Our team is affected by issue 0012381, that causes extremely poor
> performance by CTest. Details here:
> http://public.kitware.com/Bug/view.php?id=12381
>
> I've created a small test case that demonstrates the problem. Please
> find the .cpp file attached.
>
> >From what I see, the RegularExpression class uses Henry Spencer
> regex implementation, which is known to be slow for some cases.
>
> On my machine, the attached example runs in 0.8 sec. Just to process
> one string!
> $ time ./repr
> real 0m0.865s
> user 0m0.862s
> sys 0m0.002s
>
> Grep can process 100k such strings in 0.5 sec (which includes
> reading a 570MB file from disk):
> $ wc -l big.str.txt
> 100000 big.str.txt
> $ ls -lh big.str.txt
> -rw-r--r-- 1 alex staff 572M 14 Nov 12:30 big.str.txt
> $ time grep "([^:]+): warning[ \t]*[0-9]+[ \t]*:" big.str.txt
> real 0m0.525s
> user 0m0.255s
> sys 0m0.269s
>
> I see three ways to fix this problem:
> A) use a trusted 3rd party regex library, like re2 or pcre
> B) find another self-contained regex implementation
> C) try to use the standard POSIX regex available in regex.h on
> most systems
>
> I tried to find another self-contained regex implementation, that we
> could use. I found Tiny REX, but it is as slow, in this case, as
> Henry Spencer's implementation.
>
> So what do you think is the best way to proceed about this problem?
>
> sincerely,
> Alex Ciobanu
>
>
>
>
> --
>
> Powered by www.kitware.com <http://www.kitware.com>
>
> Visit other Kitware open-source projects at
> http://www.kitware.com/opensource/opensource.html
>
> Please keep messages on-topic and check the CMake FAQ at:
> http://www.cmake.org/Wiki/CMake_FAQ
>
> Follow this link to subscribe/unsubscribe:
> http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
>
>
>
>
> --
> Pau Garcia i Quiles
> http://www.elpauer.org
> (Due to my workload, I may need 10 days to answer)
>
>
> --
>
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html
>
> Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ
>
> Follow this link to subscribe/unsubscribe:
> http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
--
Bill Hoffman
Kitware, Inc.
28 Corporate Drive
Clifton Park, NY 12065
bill.hoffman at kitware.com
http://www.kitware.com
518 881-4905 (Direct)
518 371-3971 x105
Fax (518) 371-4573
More information about the cmake-developers
mailing list