[cmake-developers] daemon-mode: Project structure

Sat Jun 11 16:32:54 EDT 2016

Hi Stephen,

Am 11.06.2016 15:37 schrieb "Stephen Kelly" <steveire at gmail.com>:
>
> On 06/10/2016 11:35 PM, Tobias Hunger wrote:
> >
> > > Part of the design of the daemon is that messages that it sends can be
> > > 'spontaneous' - it watches for filesystem changes and can tell
> > clients to
> > > re-read the buildsystem information.
> >
> > That is currently not done or needed, but can be added.
> >
>
> I think this is essential to the first version of the protocol.

Let's Start small and build from there. At this time this is not needed and
it is trivial to add when it becomes necessary.

Can we get something that is useful and working before going overboard?

> The central goal of the daemon is to make it easier to implement good
> support in existing tools and in new tools. In your branch you have a
> reset command in the protocol. I think that should be dropped because
> the daemon protocol should be designed such that it is never needed.

The Reset command is extremely handy as it allows to go back to a clean
slate to e.g. switch to a different project.

Without it you have to restart the daemon. I do not want to run one CMake
daemon per CMake project that is open in Creator. That can be a lot, and
once you store snapshots memory usage will increase quite a bit I assume:)

> > > Additionally, it appears to be
> > > redundant if you have a 'cookie'?
> >
> > No, cookies are to attach user data to requests and that is entirely
> > orthogonal to what inReplyTo does.
>
> Ok. Then I don't understand why clients need inReplyTo. I don't
> understand why it is not redundant. Clients know what cookie they
> specified in a request and they get the same cookie back, so they can
> check their cookie jar and get the same information as inReplyTo, right?

A cookie is used to attach user data to a request.

Discovering how messages relate to each other has to be possible without a
cookie.

> > > It is important that the daemon
> > > does not create a new claim of truth in this regard - the structure
> > from the
> > > daemon should be the same as the structure generated for those ide
> > projects.
> >
> > Why?
> >
> > I want the structure that *CMake* sees. What some unrelated program
> > generates out of that structure is irrelevant.
> >
>
> What CMake sees is what it generates for IDE buildsystems.
>
> Try taking a buildsystem of multiple directories and targets and putting
> project() commands at various positions - before/after add_subdirectory
> calls, in between targets, top of the file etc, and see what CMake does
> there.
>
> You might have an argument for *changing* what CMake does there, but you
> would need to state such a change explicitly. (and to do that you need
> to be aware of it in the first place)

I do not want to change anything here!

This mechanism works well, it is what the Codeblocks generator uses and it
also happens to be very close to how CMake operates. At least the concepts
map down to CMake object very naturally.

> This is part of the design work for the daemon because the behavior of
> the daemon should not be accidental in any way. While protocol
> versioning will make it possible to introduce new versions of the
> protocol, old versions of the protocol will need to remain, and that can
> affect the C++ code of CMake in bad ways, particularly if what is
> exposed in the protocol is 'wrong' in some way. Just look at CMake
> policies - the first ones were introduced a decade ago and there is no
> plan for removing them.

I see no problem there: You can always cut out old protocols if their
implementation hurts too much. That sucks for users that need the old
version, but it can be cleanly detected.

This is an fact better than working with the command line client right now,
which regularly introduces niggling small changes all over the place which
are much harder to detect.

> I emphasise the above just to make the points that:
>
> * Designing a protocol like this is hard (and not fast)

We have been discussing this for about two years now.

Due to the protocol versioning the implementation can be changed again
later. If the old version is not too different we can do that even without
breaking clients!

> * We should introduce something minimal, avoiding redundancy, because
> removing things is hard

I agree. That is why the current daemon is very much limited to dumping the
project structure. It is a strong foundation to go further than that.

> * The protocol should possibly expose a way to set policies so that they
> can be used to deprecate behaviors in a compatible way, as is done
> elsewhere in CMake

The policies applied elsewhere in CMake should already be expressed in the
project structure managed by CMake. If that internal project structure
changes in a significant way inside CMake, then we will need to adapt the
daemon mode anyway.

I see no need to to apply policies on the daemon output at this time. For
now protocol versions should suffice I think.

> * The version of the protocol should correspond to the version of CMake
> it appears in (Just like how QDataStream works). Currently you're
> versioning the protocol as '0.1' instead of the CMake version. Look at
> the mess of QML versions, and let's not reproduce that here :)
>
>  http://thread.gmane.org/gmane.comp.lib.qt.devel/23246

The protocol version is meant to get minor version bumps on feature
additions and major version increases for incompatible changes anywhere in
the protocol.

This makes it really simple to work with for clients. If I develop
something for protocol version 1.3, then I can probably work with protocol
1.5 and 1.6. Protocol version 2.3 will be more risky:)

Using the CMake version as protocol version is really painful on the other
hand: With each CMake release I have to test compatibility and hardcode a
support matrix into Creator.

This is assuming that CMake will not change its versioning scheme to follow
the protocol version -- which would be silly IMHO.

The protocol version is also entirely invisible to users. That is very
different from QML version numbers:-)

> > <snip>
> >
> > > > Is this the information you need for IDE integration?
> > >
> > > You might be packing too much into one protocol verb. Try this on
> > LLVM and
> > > VTK for example to see how much data it is. Perhaps then compare
> > with the
> > > gradual approach in my branch.
> >
> > CMake itself is way below 100KiB.
> >
> > A typical developer machine should be able to handle several 10MiB of
> > JSON data just fine. So I do not expect a problem there.
> >
>
> Then we can easily verify that and the impact (latency difference etc)
> with larger projects like the ones I mentioned.

I can provide numbers for my implementation. No problem there.

In fact I should probably add debug features to the daemon to make it easy
to collect processing times and data throughput:-)

> I don't know what's better but we should be able to quantify the impact
> of the choices made about what to return from particular protocol queries.
>
> > > In my branch I have
> > >
> > > * `buildsystem` - Get the targets (name, type, project) and backtraces
> > > * `target_info` - Get buildsystem properties for a target
> > > * `file_info` - Get buildsystem properties for a file in a target
> >
> > I know. I did not like that, so I did it differently.
> >
> > Basically I do want the really important information in one go instead
> > of requesting a little bit, iterate over the data, request some more,
...
> >
>
> Clients don't necessarily need all of it just to start up and show
> something on start. Maybe QtCreator does, I don't know.

... which is why you can already limit what is returned in the project
structure ...

I even documented that already:-)

> > In the end that is a lots of needless round trips and you send more
> > data over the wire, too, with all the protocol overhead.
> >
>
> That's something that should be quantifyable.
>
> If it makes no difference to compute and send it all, even on large
> projects then maybe that's fine.
>
> I think figuring this stuff out is part of the work of designing the
> protocol.
>
> > > Getting the backtraces (instead of the location) for targets is
> > important -
> >
> > I am aware of the importance of backtraces, but I do not see any use
> > case where this information needs to be available at the point the
> > project structure is requested.
> >
>
> It should also be possible to quantify the impact of that. How much
> extra data is it? How much latency is imposed if every time I click on a
> target in a treeview there is a round-trip to the server before the
> application even knows what should be shown in the editor next as a
> result of the click? How much simpler is the client code if they already
> have that information associated with the treeview node?

I would set up the tree as fast as possible and then fill in none critical
information as it arrives *after the tree is displayed*. This is what
Creator does anyway for nodes... add various stuff after the tree is
initially displayed.

> > An IDE can for example request backtraces for a target when/if needed
> > and does not need it right away to set up a project tree and code model.
> >
>
> What about a clickable project tree?

See above.

> There are several different concerns here. You proposal might be the
> right answer, but my central point is that work needs to be done to
> analyze the needs and be explicit about the choices and trade-offs made.
>
> > A tool to run a static analyzer over all C++ files in a project will
> > not need backtraces ever.
>
> A tool which *only* runs a windowless static analyzer on C++ files of a
> buildsystem without showing the buildsystem will probably not use the
> daemon. Depending on how the static analyzer works it would be likely
> that a different CMake feature would be more suitable.

Daemon-mode makes it dead easy to extract the necessary information with a
script. Admittedly harder than just using the JSON dump of compile
commands, but it still is a use case I have in the back of my head:)

Ideally you should be able to write a generator in Python using the
information from daemon mode. I doubt the information returned right now
suffices for that though:-)

Or a UI like cmake-gui... that should actually be completely doable with
the code in my branch, even though I did not discuss the necessary command
on the mailing list yet.

> > The biggest part of your mail is about the history of the patch set
> > and how to collaborate further.
> >
> > I did that to make review simpler for *other* reviewers.
> >
>
> It makes sense to be able to compare your work with what came before it.
> There is a large design space here and it should be understood before
> merging a protocol to master.
>
> Renaming the class and renaming member variables (m_Foo is not
> conventional in CMake code) makes that harder.

Yeap, it does. Check the cmake-daemon branch, I am pretty sure the renames
are a separate commit there.

I found it highly annoying that file name and class name did not match:-) I
doubt that would have made it through review, too.

> > I had not expected you to chime in at all. Don't get me wrong: I
> > appreciate you experience, your patience and your reviews! I just had
> > not expected it. The impression I got talking to you before is that
> > you do not have the resources to push this functionality forward at
> > this time. In fact that is the *only* reason I stepped up for this task.
> >
>
> Yes, I will provide guidance on this topic where I can.

Great:)

> > This is why I wonder how working through your branch is supposed to
> > function within the resource limits on your side?
> >
>
> I still have knowledge to share, even if I'm not putting time into CMake
> development now :).

That is very much appreciated, but without you being able to commit to
timely reviews of merge requests I will not use your branch as upstream.

This daemon mode has a huge potential, and having that in a branch on
github somewhere does not help anybody.

> > I do not want for this work to vanish into some obscure github branch
> > somewhere for weeks and month -- as it did so far. The CMake daemon is
> > too cool a feature for that!
> >
>
> It didn't really vanish. It's a fairly well-known repo, blog and video.

Yes it is.

And basically nobody will play with the code till it lands in CMake itself.
In my experience nobody tests experimental branches ever!

> No one worked on the topic for weeks or months. That is changing now,
> but the work to be done is still the same as it was in January. It's
> lots of design and analysis, and refactoring of CMake itself. The
> patches extracted from your branch in the last weeks are great progress
> for that, as is the wide-ranging refactoring from Daniel as he has more
> experience with the core of CMake.

I agree that refactoring of cmake is crucial to land all the functionality
of your cmake-daemon branch.

But let's merge the stuff that can go in as soon as possible. I am not
saying that what I have now is perfect, I fully expect that to change. That
is why I added the protocol version support: So that we have more freedom
to change things again as feedback comes in.

> I don't think 'time to master' is an appropriate concern for a feature
> which has not received enough attention yet. This stuff is complex and I
> expect that doing it right would take more 'weeks and months'.

The project stuff is not overly complex.

I know what Creator needs. I have a very good idea what other IDEs need,
too. Milian already agreed that what I have now is a step forward for
KDevelop.

Creator already supports several build systems, so I do know what I can
reasonably expect buildsystems to provide and which information is rather
stable and which is not.

The code you work on *is* complex, at least looking at it with the
experience I have with CMake. But that is currently not discussed at all
and totally outside my focus at this time.

Please do not drag that complexity into this discussion.

Right now the questions are:

* Is the daemon infrastructure extensible and robust enough to handle more
complexity as that becomes necessary.

* Is there enough benefit in the project structure to warrant adding the
daemon code?

My hope is to get the changes I need in the CMake core merged first. Brad
already merged some of these changes.

Once these changes are in, the daemon-mode branch is very self-contained
and only touches one file that currently exists in the CMake sources. That
is the file where the command line arguments are parsed and the daemon gets
started.

So adding this is pretty low-risk, but of course it comes with a promise to
keep the daemon working *once a CMake version that contains this code is
officially released*. So we can still back out for a while longer... it
will not be in the next version anyway. I did volunteer to maintain the
daemon-mode code going forward.

IMHO the risk the CMake project is taking with this patch set is rather
limited.

> My central point is that the design space for the protocol needs to be
> explored and understood so that:
>
> * What it exposes is correct from the CMake POV (what is 'correct' might
> have to be changed *first*, as you suggest might be necessary with the
> project parts).
> * It can be used by QtCreator
> * It can be used by other tools
> * It can be extended with some of the kinds of features I showcased

Sorry, we have been discussing this for *years* now.

Kdevelop, Clion and Creator agreed that we wanted a JSON file that is
unconditionally written by CMake, with pretty much what is in the proposed
project structure. IIRC that was in response to a patch by kdevelop
developers in 2014.

I think the requirements for a way to retrieve project structure data are
pretty clear by now.

How to get much of the desired information out of cmake was not. I covered
almost everything now in my patch set.

> I don't think the protocol should be merged before that kind of design
> work is done, but ultimately it's indeed not up to me.

Are you referring to the daemon infrastructure code here or the command to
query the project structure?

I do agree that the daemon mode infrastructure should not get merged before
there is some real-world use case covered by it.

> Maybe we should talk about these issues over a beer instead :).

I am all for it! We both are located in Berlin, Germany after all. How
about Tuesday night? Maybe somebody from KDevelop, kate or other IDEs can
join in?

By the way: I ask for a BoF session on CMake and IDEs at QtCon (Qtcon.org,
Berlin, Germany, Sept. 1-4 IIRC). Let's hope the session will make it into
the conference program. That would be a great place to discuss in a wider
round. I expect you will be around for QtCon?

I hope to see some parts merged by then, so that it is easy for other
developers to play with the code before joining the session:-)

Best Regards,
Tobias
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/cmake-developers/attachments/20160611/b5581429/attachment-0001.html>