Author Topic: find references and irrelevant results  (Read 8267 times)

IkerAriz

  • Senior Community Member
  • Posts: 159
  • Hero Points: 6
find references and irrelevant results
« on: September 09, 2018, 02:09:52 PM »
A long standing issue recently touched on by rowbearto (and myself back in 2014) is that the find references tool often adds irrelevant entries in the output window while it searches only to remove them at the end of the search. This causes a fair bit of distracting "noise" in the output window that defeats the point of incremental search... incremental but irrelevant results aren't useful so I ignore them and just wait for the search to complete.

IIUC, the current implementation tries to get results to the screen as quickly as possible by first populating the window with all potentially matching files and then expanding/removing each file node as needed. Ie:

Code: [Select]
term = "foobar"
matching_files = get_matching_files(index, term)
for each filename in matching_files:
    add filename to output_window
for each filename in output_window:
    refs = get_references(index, filename, term, ...)
    if num refs is zero:
        remove filename from output_window
        continue
    for each ref in refs:
        add reference to output_window   

But I'd happily trade a longer time-to-first-result for fewer false positives if such an option were available. Ie, only add fully vetted entries to the output window:

Code: [Select]
term = "foobar"
matching_files = get_matching_files(index, term)
for each filename in matching_files:
    refs = get_references(index, filename, term, ...)
    if num refs is zero:
        continue
    for each ref in refs:
        add reference to output_window   

Thanks,
Iker

Dennis

  • Senior Community Member
  • Posts: 3960
  • Hero Points: 517
Re: find references and irrelevant results
« Reply #1 on: September 10, 2018, 02:55:51 PM »
Do you have "Find References Incrementally" enabled or disabled in  Tools > Options > Editor > Context Tagging(R)...

Whether you have it enabled or disabled, it is possible to interrupt a References search before it is finished.  Having all the files in the list, even the ones that are false-positives, makes it possible to expand the references in a specific file after the search was interrupted.

Showing the results file-by-file as they are computed takes a very small portion of time relative to the amount of time it takes to calculate whether there are references in a file or not.

It would be great if we could narrow down the set of files to search for references in better.  I have experimented with several algorithms, but they seem to boil down to this -- the algorithm to trim down the file list takes longer than it takes to do what we are doing now.  Here's an example, suppose you have a method "size()" in a class named "Turtle".  You might think we could use the fact that size() needs to be referenced through an instance of "Turtle" to narrow down the list of files.  So we can look for files that reference both the identifier "size" and the identifier "Turtle".  But, it is not that simple and gets complicated really fast.  We also have to add all the parents (with size() method) and descendants of "Turtle".  There are over 300 species of turtles, so the cross-product search space explodes quickly.  Then there are namespace imports, aliases, typedefs, and #include paths.  And I haven't even mentioned the worst of the worst, the scourge of C/C++ --- #define macro expansion.  The algorithm starts to get very language specific very fast.

Using a database built by a compiler front end, clang has been suggested here before, is a solution that can pin down accurate results faster.  But here's the problem.  It is a solution for one language that will well work with one compiler on a few platforms with a lot more configuration.  Plus, it will only work until it has to be redone entirely when the compiler front-end changes their API or file formats.  We did this with Visual Studio BSC files a long time back, very soon after, Microsoft changed their debugger metadata file format to PDB and obsoleted the BSC library (I doubt that was due to us, they just happened to be moving on).  It then becomes a full time job keeping up with various compiler front-end implementations and APIs for multiple languages.  Trying to build them with their obscure and ever-changing build strategies and dependencies.  Plus an additional full time job for the lawyers making sure that we credit all the open source licenses, so as not to be put out of business by the very people whose software we are trying to integrate and support.  So, yeah, its ugly.  I'm not saying we are never going to do it, but again want to emphasize that this is no simple problem.

IkerAriz

  • Senior Community Member
  • Posts: 159
  • Hero Points: 6
Re: find references and irrelevant results
« Reply #2 on: September 10, 2018, 03:53:32 PM »
Quote
it is possible to interrupt a References search before it is finished
Quote
Using a database built by a compiler front end...

Note that the issue is not that that the search is uninterruptible, or that the SE reference calculation is deficient. The request is only to apply the existing SE reference calculation at a different time in the loop that populates the output window. Ie, run the SE ref calculation before adding each file and its refs to the window, instead of a adding lots of potential files that will only get trimmed later (and which serve no purpose in the mean time).

Quote
Having all the files in the list, even the ones that are false-positives, makes it possible to expand the references in a specific file after the search was interrupted

In my experience this hasn't proven useful. I typically interrupt a search only because I can see its picking up a lot of noise so the displayed files are no better than raffle tickets (some winners, but most just show "NO REFERENCES" when expanded).

rowbearto

  • Senior Community Member
  • Posts: 2335
  • Hero Points: 132
Re: find references and irrelevant results
« Reply #3 on: September 10, 2018, 04:39:58 PM »
Suggestion for using clang, its been mentioned before and I've actually been working on it in my spare time to actually get something working with SlickEdit. I'm motivated to do this because our code base has many overloads of common function names such as "set" or "initialize" and SE takes a very long time or freezes on references searches for these.

There is the "language server protocol" standard - https://microsoft.github.io/language-server-protocol/specification

It was talked about in another thread in the SE community: https://community.slickedit.com/index.php?topic=15799.0

It has a standard set of JSON request/responses for things like "go to definition", "find references" and "autocomplete".

There are already 2 open source projects that support it for C++:

cquery - https://github.com/cquery-project/cquery
ccls - https://github.com/MaskRay/ccls

There is also an official one in the works from the LLVM team (doesn't support completion yet):

https://reviews.llvm.org/diffusion/L/browse/clang-tools-extra/trunk/clangd/

Language Server Protocol is also supported for many other languages:

https://langserver.org/

I tried out the VIM plugin that communicates with language servers and by invoking cquery with some special options, I was able to log the JSON that went back and forth. I think SE team may also want to experiment with this, I had it setup pretty quickly.

Now I have been working on a python script that will take via the command line "goto def" or "find references" with a filename, line number and column number, translate it to the JSON, send to the server, get back the JSON, and print out the results so that I can put it into the build window (process buffer) of SlickEdit and use "next_error", "prev_error" to open the file at the line. This is still a work in progress. But it would be much nicer if SlickEdit had native support. I see that eclipse already has some support for Java Lsp from searches I've done online regarding my tool. And I've already used the VIM plugin for it.

So I think it could be worthwhile for SE team to look into Language Server Protocol and I think the development effort can be much smaller than working with clang itself. You can then let the users bring their own language servers to the table.
« Last Edit: September 10, 2018, 04:44:55 PM by rowbearto »

Dennis

  • Senior Community Member
  • Posts: 3960
  • Hero Points: 517
Re: find references and irrelevant results
« Reply #4 on: September 10, 2018, 05:14:10 PM »
@IkerIraz

Do you have "Find References Incrementally" enabled or disabled in  Tools > Options > Editor > Context Tagging(R)...

IkerAriz

  • Senior Community Member
  • Posts: 159
  • Hero Points: 6
Re: find references and irrelevant results
« Reply #5 on: September 10, 2018, 05:44:53 PM »
I've tried both enabling and disabling "find refs incrementally" and tested with a symbol name that is common in my source tree. When the option is enabled the search concludes very quickly with a long list of matching files in the output window most of which expand to "NO REFERENCES" when clicked. When disabled I instead get frenetic activity in the output window (files added, brief flashes of "NO REFERENCES", files removed) that ultimately settle down to SE's final set of vetted references.

rowbearto

  • Senior Community Member
  • Posts: 2335
  • Hero Points: 132
Re: find references and irrelevant results
« Reply #6 on: September 10, 2018, 05:51:31 PM »
Dennis:

Would it be possible to speed up the search by using a database key containing both "Turtle" and "size()" - "Turtle::size()"? Right now it seems you look up all database entries for "size()" and all database entries of "Turtle" and then search for the intersection (this is what takes up CPU time).

If your database key was "Turtle::size()" itself, wouldn't that speed things up and add more accuracy?

You could also have separate database keys for different function overloads, etc.?

The problem is probably way more complex than this and I admit I have not spent the amount of time and energy you have on this problem, so I'm probably underestimating this.

Rob

Dennis

  • Senior Community Member
  • Posts: 3960
  • Hero Points: 517
Re: find references and irrelevant results
« Reply #7 on: September 10, 2018, 06:09:46 PM »
// turtle.h
Code: [Select]
class Turtle {
public:
   int size();
};

// boxturtle.h
#include "turtle.h"
Code: [Select]
class BoxTurtle : public Turtle {
   public:
};

// evil.h
Code: [Select]
   typedef class BoxTurtle TypicalCppProgrammerDoesSomethingLikeThisForNoGoodReason;

// main.c
Code: [Select]
#include "evil.h"
void main(int args, char *argv[]) {
   TypicalCppProgrammerDoesSomethingLikeThisForNoGoodReason bt;
   printf("bt.size=%d\n", bt.size());
}

Unless you do full preprocessing with symbol analysis, you are not going to know that the reference to "size" in main.c has anything to do with "Turtle".  The identifier "Turtle" appears nowhere in main.c, not even as a substring, so even guesses are no good.

You actually have to calculate the set of all files that contain "Turtle" along with any file that contains a symbol somehow related to "Turtle", (and any file that contains symbols related to those symbols, etc.).  Then intersect that with the set of files that contain the identifier "size".  In terms of graph theory, it is any file containing "size" where there is a path of symbol pairs (X,Y) leading to "Turtle".  Problem is that symbols like "string" and "std" and "vector" can gang up to create this kind of path pretty easily.  It's like how almost every famous actor has some degree of separation from Kevin Bacon.  Or, no matter where you are, there's a turtle somewhere.

rowbearto

  • Senior Community Member
  • Posts: 2335
  • Hero Points: 132
Re: find references and irrelevant results
« Reply #8 on: September 10, 2018, 06:23:49 PM »
I see. That is a very complex problem indeed. It is a tradeoff you made to speed up tagging time significantly by removing the overhead of preprocessing at the expense of the search for common names like "size()", etc. It is why in the code that I write I strive to have unique names to speed up my searches in SE. But I can't control the code that other people write which vastly outnumbers my code. It is a good reason to have both the SE tagging for less common names and some clang based thing as a backup (at tradeoff of longer indexing time/less flexibility/more configuration) for the common name searching. cquery/ccls look promising for faster indexing, but even with tools like that they will not be as flexible as what SE has, for example if you forgot to include a .h file for a class while editing. Having both taggers could be real useful, and if SE supported the language server protocol it seems like a good way to minimize SE development time to gain this feature to complement SE's tagging.

IkerAriz

  • Senior Community Member
  • Posts: 159
  • Hero Points: 6
Re: find references and irrelevant results
« Reply #9 on: September 30, 2018, 01:17:23 PM »
Any word on whether the "calculate refs up front" option might make it into the next update? (*)

Thanks,
Iker

(*) As opposed to showing all potentially matching files up front and then calculating refs.

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6862
  • Hero Points: 528
Re: find references and irrelevant results
« Reply #10 on: September 30, 2018, 05:36:42 PM »
Seems like a "Calculate All References Now" context menu item would be a better way to go. This would be safer than a global option for always calculating all references immediately. Although, both could be done.

Dennis

  • Senior Community Member
  • Posts: 3960
  • Hero Points: 517
Re: find references and irrelevant results
« Reply #11 on: October 02, 2018, 11:07:11 PM »
@IkerIraz

I am not sure what you want out of a "calculate all references up front" option.  Seems that this would be the same as turning off "find refs incrementally".  Is your issue merely with the fact that we up date the references window as results are calculated file-by-file?  As I said before, the expense of updating the window as we go along is negligible compared to the cost of calculating references.

IkerAriz

  • Senior Community Member
  • Posts: 159
  • Hero Points: 6
Re: find references and irrelevant results
« Reply #12 on: October 03, 2018, 01:44:27 PM »
Quote
Is your issue merely with the fact that we up date the references window as results are calculated file-by-file?

Yes :)

What I'd like out of a "calculate refs up front" option: add items to the references window only if they have references. This might take a little longer to produce hits but the results would be free of false positives and incur far less distracting visual noise.

Iker

Attached are two images that show the results of a search for a common symbol in my project with "find refs incrementally" enabled and disabled. In both cases the References window is completely swamped by entries that are of no use.
  • Enabled. References window is immediately populated with 640 files most of which ultimately have NO REFERENCES. Searching stops after window is populated. I don't find this mode useful.
  • Disabled. References window is immediately populated with the same 640 files (I took screenshot at this moment). SE continues searching and "churns" the items in the window as it works its way down the list. Ie, items without refs are deleted and the remaining entries shift "up", items with refs are expanded and the remaining entries shift "down". This process is visually very noisy and distracting as every time my eye catches activity in the window it's due to a false positive. Also, if I stop the search early the window remains littered with useless entries.

NB: I clicked on a few entries before taking screen shots to show that they had no references.

Dennis

  • Senior Community Member
  • Posts: 3960
  • Hero Points: 517
Re: find references and irrelevant results
« Reply #13 on: October 03, 2018, 04:08:12 PM »
I see where you are coming from.  Let's break things down a bit.

1) Find References Incrementally.   Everything you have said indicates to me that you want this option OFF.  So, let's not talk about it.  You know what, I don't like this setting either, but it exists because we have users who prefer it, because they want to search for references and get back to the editor as soon as possible.  Their philosophy is that they are going to be fed references one at a time when they hit next-ref (Ctrl+G), and they probably do not even look at the References tool window, so a bit of clutter means nothing.

2) ...NO REFERENCES.  In either mode, if you manually click on a file to expand it, and the file has no references, instead of deleting the file from the list, we tag it with "...NO REFERENCES", and yes, it doesn't go away, and that is a bit annoying.  The problem is we can't just delete it, it would be really, really confusing.  Think through the use-case, you click on "UtilityFunction.h" to expand it, there are no references, and the node just disappears.  You are left wondering what happened and unless you are really paying attention, you probably have already forgotten what file you clicked on, or maybe you didn't really click on anything at all.  The evidence is gone.  Maybe you wanted to specifically check if that file referenced your symbol.  So, we leave the breadcrumb behind "...NO REFERENCES" so that you can clearly see what happened.  We only do this when you manually click on a node to expand it.  In other cases, we delete file nodes after we try to expand them and they have no contents.  I am making a tweak to how it cleans itself up when you do a next-ref, so you'll see less "...NO REFERENCES" nodes in the case that you are searching incrementally, or you have interrupted the references search and you are effectively searching incrementally.

3) The long list of files.  So, you have one of two options here, you just wait for references to finish vetting the list of files, or you interrupt it early.  Note that when you interrupt references, you are effectively putting it in Incremental mode.  The same is true when References hits its max-found limit (Tools > Options > Editing > Context Tagging(R) > Maximums > Maximum items found in references search).  So what do we do then?  We have this list of remaining files that might have references, or might not.  You might know exactly where you want to look for references, so you can manually drill down on those files.  That is very convenient.  You say it is useless, but I think you are confusing what you know about your code with what SlickEdit has not had the opportunity to discover yet.  It can't know a file has no references until it does the calculation, you might know it because you know your code.

4) Calculate refs up front.  I can and will add this option.  This discussion has given me an idea of how this can be done without taking a step backwards.  It will require displaying a progress dialog, because it is really bad form to just have the editor sit there grinding for what could take minutes without a clear indication of a finish line and clear way to stop it.  Also, if you cancel, you will be left with files that are unvetted and unexpanded, just like before.  This option will be off by default, the existing system of calculating references in the background on a timer is superior, since you can keep working while the references get flushed out.

rowbearto

  • Senior Community Member
  • Posts: 2335
  • Hero Points: 132
Re: find references and irrelevant results
« Reply #14 on: October 03, 2018, 04:24:07 PM »
FYI:

I am now able to use an alternative cross reference tool (cquery) integrated with Slickedit using some private SlickC macros + python scripts that I developed over the past few weeks in my spare time. The macro/scripts provide the glue between SlickEdit and cquery and implement a language server protocol client.

It works really really well for these common symbol names! Getting the references back is super fast, practically instantaneous. It is also able to find the right version of a function overload (go to definition) when Slick can't! There is some upfront work to use it, one needs to update their build scripts to generate a "compile_commands.json" file that provides all the -I/-D options for building every source file (using an open source tool called compiledb). One also needs to build cquery (or find a pre-built binary). Building the "tag files" for cquery for about 5896 c++ source files (don't know how many .h files, but cquery does preprocessing) takes about 5 minutes on a very powerful machine I use, and cquery makes heavy use of multithreading on this powerful machine with 72 threads. I don't have numbers to compare with SlickEdit tagging on this same number of source files at the moment.

When I have more time later I will benchmark the performance of building the tags using cquery vs slickedit on my powerful machine with 72 threads and my development VM with 4 threads. cquery is not perfect and has a few bugs, but it is very very good. Once the tags are built, getting results of find references, goto definition, autocomplete is practically instantaneous! Additionally, every time I save a source file I get the very helpful clang diagnostics back which has improved my productivity significantly!

If I was able to develop this capability in about 3 weeks in spare time (perhaps I did 5 full time days of work), I think SlickTeam could do implement a language server protocol client with minimal development time and allow users to "bring their own" LSP servers like "cquery" as a complement to SlickEdit's tagging engine to use for these common symbols for next year's release.
« Last Edit: October 03, 2018, 05:24:14 PM by rowbearto »