[cmake-developers] CMake daemon for user tools
Milian Wolff
mail at milianw.de
Sat Jan 23 10:40:47 EST 2016
On Freitag, 22. Januar 2016 17:55:46 CET James Johnston wrote:
> > -----Original Message-----
> > From: cmake-developers [mailto:cmake-developers-bounces at cmake.org]
> > On Behalf Of Milian Wolff
> > Sent: Thursday, January 21, 2016 22:31
> > To: Daniel Pfeifer
> > Cc: CMake Developers; Stephen Kelly
> > Subject: Re: [cmake-developers] CMake daemon for user tools
> >
> > > What do you think about string interning? I started a cmString class
> > > [2] that stores short strings inplace and puts long strings into a
> > > pool. Copies never allocate extra memory and equality comparison is
> > > always O(1).
> > > In addition to `operator<` (which is lexicographical) there is a O(1)
> > > comparison that can be used where lexicographical ordering is not
> > > required (ie. for lookup tables).
> > >
> > > [1] https://en.wikipedia.org/wiki/String_interning
> > > [2] https://github.com/purpleKarrot/CMake/commits/string-pool
> >
> > Imo, you should replace this custom code by a boost::flyweight of
>
> std::string.
>
> > That said, I think this can have a significant impact on the memory
>
> footprint
>
> > of CMake, considering how heavy it relies on strings internally. But it
>
> also
>
> > seems to mutate strings a lot. I've seen places e.g. where a list of
>
> compile-
>
> > time known identifiers is prepended with "CMAKE_" at runtime. This is slow
> > with normal strings (reallocations), but will also be slow with a
>
> flyweight or
>
> > interning, possibly even leading to the pollution of the internal pool
>
> with
>
> > temporary strings.
> >
> > Did you measure any impact on both, runtime speed and memory footprint
> > yet?
>
> I was wondering the same. I would guess maybe the biggest impact would be
> the inplace storage of strings for small sized strings. But to know the
> inplace buffer size would probably require some profiling and measurement of
> string sizes... otherwise it is just a wild guess...
You are aware that modern std::string is SSO'ed? I'm running on such a system.
Another reason why you should not reinvent the wheel and keep relying on the
STL wherever possible.
> Maybe for testing, you can swap out the string header file on your system
> with one that logs allocations/string sizes, and perhaps also profiles the
> time it takes to make each allocation?
The data recorded by heaptrack would allow such an analysis post-mortem
without modification of any header. Someone just needs to write such an
analysis step...
> The interesting question is: could inplace storage be used for 95% of the
> cases such that fussing with string interning becomes unnecessary
> complexity? If so, then you mentioned equality comparison as another issue:
> the interesting question there is how much time is spent on allocations vs
> comparisons...
Just run cmake (or the daemon) through a profiler and check the results. Doing
so for the daemon (built with RelWithDebInfo) on the LLVM build dir and
recording it with `perf --call-graph lbr` I get these hotspots when looking at
the results with `perf report -g graph --no-children`:
+ 8.67% cmake cmake [.]
cmGlobalGenerator::FindGeneratorTargetImpl
+ 4.21% cmake libc-2.22.so [.] _int_malloc
+ 2.67% cmake cmake [.] cmCommandArgument_yylex
+ 2.09% cmake libc-2.22.so [.] _int_free
+ 2.06% cmake libc-2.22.so [.] __memcmp_sse4_1
+ 1.84% cmake libc-2.22.so [.] malloc
This already shows you that you can gain a lot by reducing the number of
allocations done. Heaptrack is a good tool for that. Similarly, someone should
investigate cmGlobalGenerator::FindGeneratorTargetImpl. That does a lot of
string comparisons to find targets from my quick glance, so indeed could be
sped up with a smarter string class.
But potentially you could also get a much quicker lookup by storing a hash map
of target name to cmGeneratorTarget.
> In another application I worked on, I was able to get a big improvement in
> performance by replacing usage of std::vector in one place with a custom
> vector that stack-allocated the first 10 items (i.e. fixed-size C array as a
> member variable of the class), and then reverted to a regular vector after
> that. But to pick the number "10" required some profiling/measurement.
> The remaining use of the heap was so negligible as to not be worth
> improving.
Qt has such a class, it's called QVarLengthArray, and I've also been able to
apply it in multiple occasions to good effect. That said, when you look at
where time is spent in CMake on allocations (either using perf or heaptrack),
you'll come across the following hotspots:
+ 0.83% cmMakefile::ExecuteCommand
+ 0.28% cmFunctionHelperCommand::InvokeInitialPass
+ 0.21% cmMakefile::ExpandArguments
+ 0.18% cmMakefile::ExpandVariablesInString
+ 0.12% cmMakefile::ExpandVariablesInStringOld
+ 0.11% cmCommandArgumentParserHelper::ParseString
Note that these numbers are the global fraction of time spent in _int_malloc.
You get another chunk of similar size (a bit less) for _int_free again. Also
note that removing allocations usually improves cache utilization, which often
dramatically increases the performance outside of what the cycle samples
reported by perf would indicate.
Seems like there's more than enough areas one could (and should) optimize
CMake.
Cheers
--
Milian Wolff
mail at milianw.de
http://milianw.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: This is a digitally signed message part.
URL: <http://public.kitware.com/pipermail/cmake-developers/attachments/20160123/3b94f76e/attachment.sig>
More information about the cmake-developers
mailing list