Author Topic: Scaling the user experience for large projects  (Read 14057 times)

chrisant

  • Senior Community Member
  • Posts: 1410
  • Hero Points: 131
Scaling the user experience for large projects
« on: March 25, 2009, 08:24:41 pm »
This is primarily @SlickTeam, but I'm posting publically for an open discussion.

A recent post mentioned collecting data on the sizes of projects that customers have.  That got me thinking:

Background - Size of my primary workspace
I develop on Windows.
My primary workspace has over 37,000 source files, in 10 projects.
The tag file is 317MB (with references, assuming SE14 fixed "Generate references" being interpreted backwards).
The .vpj files total 1.58MB.
(By the way, 317MB is impressively small -- by contrast exuberant ctags generates ~500MB without references, and MSVC BSC support generates ~750MB with references).

Four Three Two One Primary Scale Problem
In the workspace mentioned above, I run into four three two one key scalability issue:
1.  Starting SE14 is faster than starting SE13, but it's still rather slow (usually under 15 seconds) if the disk cache isn't primed with the tag file. -- SE14.0.2 solves this
2.  The smart_open dialog is great, but in a workspace with 37,000 files it has noticable delays (up to 2 seconds, but usually about .5 seconds) while filtering. -- SE14.0.1 solves this
3.  Finding references is painful:
  • When "Find references incrementally" is FALSE then the editor locks up for a while on more involved scans.
  • When "Find references incrementally" is TRUE then there is frequently a big set of potential matches.  While Ctrl+L (my button to find next) finds the next reference successfully, it suffers the same lock up issue as above (on a smaller scale, naturally), and it forces me to examine references in sequential order to avoid false positives.
4.  Building the auto-complete list (and other similar collection activities) can take excessive amounts of time. -- SE14 solves this.

Thoughts on Tackling the Scale Issues
Re: #1 -- [snipped]  Another topic here about the startup speed.  SlickTeam has made the tag cache populate in the background during startup, so the editor returns control to the user almost immediately on startup.
Re: #2 -- [snipped] SlickTeam did a lot of work on the smart-open toolbar in 14.0.1 and the performance is vastly improved; this is solved.
Re: #3 -- I really want the freedom to (a) examine references in any order, and (b) gain a sense of where the symbol is used and how broadly it is used.  So really, "Find references incrementally" doesn't work very well for my needs at all.  In fact I'm about to turn it back off and live with the performance issues -- but there is an important selling point here:  I try to talk up SlickEdit, but invariably when a co-worker sees SlickEdit's klunky (incremental) or slow (non-incremental) method for finding references they lose interest.  To be clear, I'm not knocking SlickEdit, this is a tough challenge to crack -- I'm just saying this is something that is turning off my co-workers and losing potential sales -- it may or may not be a selling point to other audiences.  But I love challenges, and I have three recommendations for how the user experience might be improved when finding references.
Re: #4 -- [snipped] SlickTeam implemented my suggestion in SE14; this is solved.

Improvements for Finding References
To my mind, there are three goals:  show progress, allow early termination, and return control to the user quickly.
I think all three are within reach:
1.  Show some kind of progress.  A cheap and simple solution would be to show "scanning potential reference X of Y" in the status bar.  A more elegant solution would be to show a dialog with a progress bar.
2.  Make the scan interruptible.  Right now the scan is done by the main process.  I shared some thoughts on high performance interrupting in CRM case number 08792789-7B85-48B2-9E35-F1DF8A95C3E6.  I can elaborate further in private mail if that would be helpful, just let me know.
3.  Find references in the background.  I think the "Find references incrementally" option can be removed completely:  in a nutshell, model it after the background find-in-files feature.  Find references in the current file and add them to the list, then return control to the user, continue to find references in the background and add them to the list only once they are confirmed.  Have a toolbar button to cancel the background scan.  Show progress in the status bar.
« Last Edit: July 06, 2009, 07:00:46 pm by chrisant »

chrisant

  • Senior Community Member
  • Posts: 1410
  • Hero Points: 131
Re: Scaling the user experience for large projects
« Reply #1 on: March 25, 2009, 08:43:45 pm »
I realized I hadn't checked to see if SE14 included the auto-complete list timeout.
And it does!  (See attached image, or look in Tools|Options|Editing|Context Tagging).
Sweet, thanks SlickTeam!  ;D

ScottW, VP of Dev

  • Senior Community Member
  • Posts: 1471
  • Hero Points: 64
Re: Scaling the user experience for large projects
« Reply #2 on: March 25, 2009, 08:58:13 pm »
Thanks for the feedback. I've flagged this to be fully digested later. In the meantime, you should take a look at the User Guide. We added a new section on Performance Tuning in the Introduction. In particular, with such a large workspace, you will likely need to adjust the "Tag file cache size".

At first glance, some of the performance items are due to the time needed to read things from disk. Our current implementation doesn't allow us to do some of these as background items, but that is an area we are working to address.

I always appreciate suggestions on how we can make SlickEdit better. If you don't want to post them here, you can reach me at swestfall@slickedit.com.   

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6468
  • Hero Points: 504
Re: Scaling the user experience for large projects
« Reply #3 on: March 25, 2009, 09:11:02 pm »
Thanks Chris, if you ever think you need a new job, send your resume to me;D You obviously know what you are doing and we appreciate your thoroughness.

No one is more frustrated than me that the performance of SlickEdit's tagging features don't scale enough!!!! It's a long haul for us to get to where we need to be. The first step is to thread basic tagging. Once we have this we can rebuild tag files (like the workspace tag file) on a thread and never make the users wait for that. There's a lot more though but this is the most significant start. After that, we can extend the threading to things like references, context tagging lists/info, and symbol coloring. Code changes for threading tagging are in my plan!

I tested the smart open dialog for scalability.  My machines is a slow machine AND I always run a debug editor so for me it really doesn't scale.  We already did some performance enhancements to it (it was a lot slower) but it could use some more.  I suggest turning off the "Sync current directory" option so you don't have to wait when the current directory changes. Right now, we have a pretty long list of small enhancements for smart open.  After that, hopefully we can figure out a way to make it handle larger workspaces.

chrisant

  • Senior Community Member
  • Posts: 1410
  • Hero Points: 131
Re: Scaling the user experience for large projects
« Reply #4 on: March 25, 2009, 10:46:19 pm »
Why thanks.  :)

Very interesting to hear you're moving towards threading.  I've been withholding suggestions that involved threading, out of respect for the challenges of doing multi-platform threading work.  In the future, I'll be less reserved about sharing them.

Re: scaling in general -- Ye cannae change the laws of physics, captain.  (Though one can sometimes trick them into submission).  And while I may be pushing the envelope of SE's scalability, SE's envelope is bigger than the competition's.  I can push the envelope, and while some things may be a little on the slow side, SE does not slip into madness.  Let's compare with Source Insight for a moment:  co-workers who use Source Insight use 3-5 workspaces to achieve decent performance (imagine trying to find references split across 4 workspaces...).  I use SlickEdit and use 1 workspace.  Source Insight has this annoying proclivity to freezes while painting windows, due to performing tag lookups while painting.  SE14 impressed me by having symbol coloring be an interruptible operation that effectively sets attributes which are efficiently tested at render time, rather than performing potentially-costly lookups during render.  Kudos.  And I suspect SE has some interesting data structures to keep track of where coloring is applied (I thought this even before symbol coloring) because rendering seems to scale very favorably even when there are very many highlights/colors being applied (a la _StreamMarkerAdd).

Re: threaded tagging -- FWIW, I don't retag my workspace very often (er, actually this week I've been retagging it several times a day as I beef up my usercpp.h file to help resolve unknown symbols that symbol coloring has brought to my attention, but that's not typical behavior for me).  The 3 issues I listed really are the biggest scalability issues that I face when using SE with a ginormous workspace.  The 4th scale/responsiveness issue on my list would be threaded tagging -- generating tag files isn't a common operation (for me at least), so I'd prioritize the other 3 higher (for my usage patterns at least).

Re: finding references -- It may be possible to find references during idle cycles (in Windows terms, when the message queue is empty) by using an interruptible/resumable state machine to perform tiny slices of work at a time.  That would keep the editor responsive not only to user input, but also system events (window messages, etc).  Such a state machine might be feasible cheaply here because of the highly constrained nature of looking up references (or it might be unnecessary if the work for confirming one match is cheap to begin with).  Well, it might take a hit when opening a file, too.  Although, hm, depending on how isolated the "read file into memory" code is, it might be possible to cheaply/safely thread that and signal the main thread when the data structures are ready to link into the real list of open files.  At that point even opening a file wouldn't interfere with responsiveness.  8)

Re: smart open -- I really like the "Sync current directory" option, and at least in my enhOpen() I experience a negligible delay from it.  Smart open is an interesting example, since at least on Windows the user community can provide our own scalability solution(s) for that.  That's the aim of my EnhProj macro.  For contrast, I can't see how I could implement background finding references without complex changes to stock macros, which wouldn't be very sharable.

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6468
  • Hero Points: 504
Re: Scaling the user experience for large projects
« Reply #5 on: March 26, 2009, 02:54:35 pm »
Interesting story here. A long while ago we had talks with Oracle because they were interested in OEMing SlickEdit. We told them we didn't have Unicode support.  At the time, it didn't seem like a big deal. They didn't ask us if we could add it. They ending not OEMing SlickEdit and they posted somewhere that they couldn't use SlickEdit because it would never have Unicode support. Hah!

In terms of staging threading, we don't really have much choice. It has to be done with the inner pieces first. The inner most pieces are roughly file I/O, Regex, misc lib routines, parsing, and tagging. If we want to ship any features, we'll need to do something with threaded tagging. Automatically retagging the workspace on a thread is an obvious nice feature to have especially if you have a wildcard project. I hear you though, this is nice to have since you can work around this one with a small amount of manual intervention. Threading references will be quite the challange but would be very valuable. Features like references and symbol coloring require analyzing the symbol at its actual file location to get all the context information.  A lot of this is currently done in Slick-C. My thoughts have been to move that code out of Slick-C but Dennis thinks we might be able to thread Slick-C. Because Slick-C is a copy by value language and not a reference language like (java, python, C#), this is doable. In any case, we have options. None are trivial but it's doable.

As for Smart Open, your EnjProj macro proves that our current implementation of Smart Open isn't optimized enough. Thanks to you, I can hand this to a developer and say, "make it this fast!". Hopefully, we'll have something by 14.0.1  ;D

mako

  • Community Member
  • Posts: 28
  • Hero Points: 1
Re: Scaling the user experience for large projects
« Reply #6 on: March 26, 2009, 03:09:37 pm »
I'm not sure if this is relevant to the performance of finding references, but it seems like it might be, given that a significant part of the process seems to be filtering the list of possible matches down to the ones that really fit the desired referenced symbol's fully qualified description.

I have noticed that when I do a definition look up for a symbol in, for example, C++ code, I get results of matching symbol names from other C++ namespaces and even other languages (Java, XML, and Perl in my case). In my case, identical symbol names can occur because 1) the symbol name is very common, e.g. a setState() method, or 2) similar code for multiple languages (C++, Java, Perl) is generated from XML metadata.

While occasionally it might be useful to find matching symbols in other languages or namespaces, For me (and I think in general) this is going to be the rare exception, not the preferred behavior. I know very little about what is in the tag database, or how it is built, but it seems like including the language that the tag is found in should be part of the tag identification, and that when searching for a tag reference or definition, the search should be confined to the language of the desired tag.

A similar argument can be made for identical symbols in different namespaces. If I'm searching for std::string, I don't want to find foo::string, or references to it, but I hit that problem all the time. Our project makes extensive use of C++ namespaces, as we mostly follow the convention that Java enforces, where a class is in a namepsace defined by its location in the directory tree. We therefore feel free to reuse common names for classes and functions knowing that the namespace qualification will keep them distinct. However it doesn't seem like SE's tagging uses the fully qualified name of a symbol when doing its searches, at least not consistently.

It seems to me that if the language and fully qualified name of a symbol were used for the indexing of the tag database, then when looking up definitions and references the search would be much more restricted, and the need for filtering of the results would be reduced.

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6468
  • Hero Points: 504
Re: Scaling the user experience for large projects
« Reply #7 on: March 26, 2009, 03:48:51 pm »
Filtering based on the language type wouldn't be hard to do.  Your second issue with only listing matches which really matter requires a very different implementation. Since SlickEdit does not compile your source when it builds the tag file (i.e. no includes are processed), it has no idea what the context of a symbol is.  All it knows is what files contain instances of what words.  When you request references, then it knows what files have the word specified and it checks what the symbol actually is.  The best solution for solving the references performance problem will be threading the results in the background. Parsing like a full compiler so that all symbols are known at parse time isn't practicle especially for C++.  Keep in mind when you write  A::B, that A might not be fully qualified and it might be inside yet another namespace.

mako

  • Community Member
  • Posts: 28
  • Hero Points: 1
Re: Scaling the user experience for large projects
« Reply #8 on: March 26, 2009, 04:05:19 pm »
Quote
Your second issue with only listing matches which really matter requires a very different implementation. Since SlickEdit does not compile your source when it builds the tag file (i.e. no includes are processed), it has no idea what the context of a symbol is.  All it knows is what files contain instances of what words.  When you request references, then it knows what files have the word specified and it checks what the symbol actually is.

I figured that getting the full context for C++ symbols was part of the problem. However it does seem to me that when I using the push-tag function (Ctrl-. for me) to look up a symbol definition that the range of choices presented can vary quite widely. For example, if the cursor is on a reference to a class method name, sometimes I get only the prototype and definition of that particular method, sometimes I get that method and its base and/or sublcass definitions, and sometimes I get its definition and a bunch of unrelated symbols that happen to have the same name. Can you give us some insight as to what factors are involved in filtering the list of choices presented, and how we can control them to give more usefully filtered results?


chrisant

  • Senior Community Member
  • Posts: 1410
  • Hero Points: 131
Re: Scaling the user experience for large projects
« Reply #9 on: March 26, 2009, 07:09:13 pm »
In terms of staging threading, we don't really have much choice. It has to be done with the inner pieces first. The inner most pieces are roughly file I/O, Regex, misc lib routines, parsing, and tagging. If we want to ship any features, we'll need to do something with threaded tagging. Automatically retagging the workspace on a thread is an obvious nice feature to have especially if you have a wildcard project. I hear you though, this is nice to have since you can work around this one with a small amount of manual intervention.
Understood, makes perfect sense.
Re: wildcard projects, that is another scalability challenge, and mikesart and myself have attempted to address that with the sync_project_with_scm macro (forum link).  Threading wildcard scans might address that, I haven't thought about it in much depth yet.

Threading references will be quite the challange but would be very valuable. Features like references and symbol coloring require analyzing the symbol at its actual file location to get all the context information.  A lot of this is currently done in Slick-C.
Threading seems an appropriate long term direction.  In the meantime, finding references can still show progress and avoid blocking the user, without using threading.  I implemented background find in files in another editor without using other threads or processes, and here is an attempt to translate that work into the context of finding references:

1.  User presses Ctrl+/ on a symbol.
2.  Collect potential hits from the tagfile, keep them in an array or whatever for processing.
3.  Process hits from the current file.
4.  Use a timer or _post_call to defer the rest of the processing.
>>  User regains control, and can interact with the editor and with the References toolbar and the references that it lists so far.
5.  On idle, the deferred call gets performed.
6.  If the editor has been idle for less than 100ms, return immediately and wait for the next timer event or use _post_call again, etc -- there are details here to work out, but the that's the gist.
7.  In the deferred call:
8.  Show progress on the status line:  "Processing potential hits, on X of Y".
9.  Process a small number (1? 5? 10? configurable?) of potential hits (but no more than 1 file's worth per call, because opening a file can be relatively expensive).
10.  When a hit is confirmed, add it to the list in the References toolbar.

If I find the time, I may work on coding this up myself and send in a patch for the SlickTeam to polish up (especially places where I may have made overly narrow assumptions :)).

My thoughts have been to move that code out of Slick-C but Dennis thinks we might be able to thread Slick-C. Because Slick-C is a copy by value language and not a reference language like (java, python, C#), this is doable. In any case, we have options. None are trivial but it's doable.
Oh very interesting indeed.  Should be fun to see where the editor stands in another 5 years.

As for Smart Open, your EnjProj macro proves that our current implementation of Smart Open isn't optimized enough. Thanks to you, I can hand this to a developer and say, "make it this fast!". Hopefully, we'll have something by 14.0.1  ;D
Cool.  8)  In case it wasn't noticed, EnhProj.zip includes full native code source for the DLL.  If a developer has any questions about the macro or code, feel free to contact me at my private email (if you can't get it from the forum member info, Shannon/Mike/Ashley in support know it).

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6468
  • Hero Points: 504
Re: Scaling the user experience for large projects
« Reply #10 on: March 27, 2009, 02:23:17 pm »
Quote

Can you give us some insight as to what factors are involved in filtering the list of choices presented, and how we can control them to give more usefully filtered results?


Dennis knows the details for this.  I'm mainly use it and complain to Dennis if something doesn't work the way I like  ;D

Chris, your EnhProj code is totally native Win32. That suprised me. I was expecting at least the dialog to be Slick-C.  In any case, our goal is to match it's performance. After a lot of performance analysis yesterday, I came up with some algorithms that are a mix of C++ and Slick-C that will give us the performance we need. In the process, we sped inserting items into the tree by about 60%. At this point, I expect to have some rockin performance for the Open tool window of 14.0.1.  Eventually, we need to use this same code for the Files tool window which also needs to be faster.

Chris, your idea for references on a timer/post call is very worth while.  If you get something that works well enough for you, I'll make sure we polish it up here. Thanks for any help here!

chrisant

  • Senior Community Member
  • Posts: 1410
  • Hero Points: 131
Re: Scaling the user experience for large projects
« Reply #11 on: March 27, 2009, 10:36:23 pm »
Chris, your EnhProj code is totally native Win32. That suprised me. I was expecting at least the dialog to be Slick-C.
Yeah, Slick-C is more more convenient for rapid UI development than native code is, I really wanted the dialog to be Slick-C.  But I couldn't figure out how to the list to populate quickly enough, nor how to filter quickly enough without resorting to native code (nor how to get the file system icons to show up via a background thread like my dialog does).  Performance was my main goal and I was willing to sacrifice some UI consistency/elegance to get it.  :P
By the way, I like how the icons in the Smart Open toolbar differentiate between curdir, project, workspace, history, and open state.  The reason my dialog shows the real file system icons is because, well, I already had threaded code lying around to do that, and it was quick to connect, so why not add it -- i.e. I'm not suggesting that SE should show the native file type icons.

In any case, our goal is to match it's performance.
I just posted a new version early this morning, which gains a big speed boost by taking advantage of special knowledge about the style of relative paths for files in a project.  Fully qualifying the path names was taking a huge amount of time.  I stepped through the assembly for vsAbsoluteTo() and was a little surprised at how much work it was doing (though maybe if I stepped through the assembly for my custom code for qualifying the paths, maybe I'd be surprised how much work it's doing as well ;)).  Anyway, the point being that if you haven't done work to speed that part up yet, it might be some low-hanging fruit that could get another chunk of speed boost.


After a lot of performance analysis yesterday, I came up with some algorithms that are a mix of C++ and Slick-C that will give us the performance we need. In the process, we sped inserting items into the tree by about 60%. At this point, I expect to have some rockin performance for the Open tool window of 14.0.1.  Eventually, we need to use this same code for the Files tool window which also needs to be faster.
Wow, hot!  :o  I'm eager to alpha-test and give feedback, when you've got it into a state where you're ready for feedback.

Chris, your idea for references on a timer/post call is very worth while.  If you get something that works well enough for you, I'll make sure we polish it up here. Thanks for any help here!
Cool, I'll see what I can whip up.

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6468
  • Hero Points: 504
Re: Scaling the user experience for large projects
« Reply #12 on: March 28, 2009, 12:05:51 am »
Good thing you did put all your code in Win32 because there's no way with the 14.0.0 tree to get the speed you need. I'll have to check out your latest code to see how fast it is.  I won't need to worry much about the absolute function.  V14 has a project file list cache which returns absolute filenames.  This should be good enough.  When the list is in the cache, it's so fast I can't time it for 40,000 files without coding up some millisecond timings.

chrisant

  • Senior Community Member
  • Posts: 1410
  • Hero Points: 131
Re: Scaling the user experience for large projects
« Reply #13 on: May 07, 2009, 07:45:19 am »
Chris, your idea for references on a timer/post call is very worth while.  If you get something that works well enough for you, I'll make sure we polish it up here. Thanks for any help here!
Cool, I'll see what I can whip up.

Update:  I haven't forgotten this.  It turns out to be more work than I'd hoped, because the code for expanding the hits for a file actually lives in the "expand node" handler for the References tree view (one node per file).  At least for prototyping purposes I want to keep my changes largely separate from the stock macro code, so given the current implementation details it's best if I have a completely separate implementation for populating an array with the files to be scanned, instead of populating the tree view.  The pseudo-asynchronous population and scanning have to be completely separate from the current implementation, but the UI code should need hardly any changes.  So it's taking longer than I'd hoped, but I'm still at it (sporadically, as my elusive free time permits).

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6468
  • Hero Points: 504
Re: Scaling the user experience for large projects
« Reply #14 on: May 07, 2009, 08:33:38 pm »
We reimplemented the open tool window for 14.0.1 to improve the performance. It's way faster than 14.0. It will be available very soon. It's not quite as fast as what Chris has done but it is very polished and the speed is impressive. I did all my testing on 40,000 files with a slow machine.