[cmake-developers] CMake daemon for user tools

James Johnston JamesJ at motionview3d.com
Fri Jan 22 12:55:46 EST 2016


> -----Original Message-----
> From: cmake-developers [mailto:cmake-developers-bounces at cmake.org]
> On Behalf Of Milian Wolff
> Sent: Thursday, January 21, 2016 22:31
> To: Daniel Pfeifer
> Cc: CMake Developers; Stephen Kelly
> Subject: Re: [cmake-developers] CMake daemon for user tools
> 
> > What do you think about string interning? I started a cmString class
> > [2] that stores short strings inplace and puts long strings into a
> > pool. Copies never allocate extra memory and equality comparison is
> > always O(1).
> > In addition to `operator<` (which is lexicographical) there is a O(1)
> > comparison that can be used where lexicographical ordering is not
> > required (ie. for lookup tables).
> >
> > [1] https://en.wikipedia.org/wiki/String_interning
> > [2] https://github.com/purpleKarrot/CMake/commits/string-pool
> 
> Imo, you should replace this custom code by a boost::flyweight of
std::string.
> That said, I think this can have a significant impact on the memory
footprint
> of CMake, considering how heavy it relies on strings internally. But it
also
> seems to mutate strings a lot. I've seen places e.g. where a list of
compile-
> time known identifiers is prepended with "CMAKE_" at runtime. This is slow
> with normal strings (reallocations), but will also be slow with a
flyweight or
> interning, possibly even leading to the pollution of the internal pool
with
> temporary strings.
> 
> Did you measure any impact on both, runtime speed and memory footprint
> yet?

I was wondering the same.  I would guess maybe the biggest impact would be
the inplace storage of strings for small sized strings.  But to know the
inplace buffer size would probably require some profiling and measurement of
string sizes... otherwise it is just a wild guess... 

Maybe for testing, you can swap out the string header file on your system
with one that logs allocations/string sizes, and perhaps also profiles the
time it takes to make each allocation?

The interesting question is: could inplace storage be used for 95% of the
cases such that fussing with string interning becomes unnecessary
complexity?  If so, then you mentioned equality comparison as another issue:
the interesting question there is how much time is spent on allocations vs
comparisons...

In another application I worked on, I was able to get a big improvement in
performance by replacing usage of std::vector in one place with a custom
vector that stack-allocated the first 10 items (i.e. fixed-size C array as a
member variable of the class), and then reverted to a regular vector after
that.  But to pick the number "10" required some profiling/measurement.  The
remaining use of the heap was so negligible as to not be worth improving.

Best regards,

James Johnston




More information about the cmake-developers mailing list