Author Topic: Performance analysis of launching VS.EXE  (Read 13299 times)

chrisant

  • Senior Community Member
  • Posts: 1410
  • Hero Points: 131
Performance analysis of launching VS.EXE
« on: April 10, 2009, 08:50:27 am »
Machine info
x64 Windows 2008
2.4GHz Intel Core2 Quad
4GB RAM
500GB striped RAID array, 2x250GB 7200RPM disks
SE info
x86 SE 14.0.0.7 hotfix revision 4
Tag cache size = 400,000
Workspace info
322MB workspace vtg file
180MB compiler vtg file
10 projects in the workspace
36,440 total files in the workspace
9 open files in the workspace

Launch timings
I used procmon and filtered to include only events for the vs.exe process:
  • time 11:52:26.373 -- launched vs.exe, miscellaneous stuff happens
  • time 11:52:26.807 -- started reading from vslick.sta (total size is ~9,300,000) in chunks of 15,000 bytes
  • time 11:52:26.851 -- finished reading vslick.sta (50ms, nice), goes back to miscellaneous stuff including instantiating some fonts and reading formatschemes etc
  • time 11:52:27.075 -- started restoring the open files and reading them from disk
  • time 11:52:27.350 -- started reading from $slk.0, jumping around a lot, highest offset was ~4,330,000
  • time 11:52:27.366 -- miscellaneous
  • time 11:52:27.371 -- writes 8k to the vpwhist file at offset 0, then loops:  reads 8k at offset 8k, writes 8k at offset 8k, looping like that for the rest of the file -- seems odd to both read and write all 1,089,797 bytes of the vpwhist file (but not read the first 8k) on launch, but it only took 15ms
  • time 11:52:27.386 -- miscellaneous
  • time 11:52:27.446 -- reads ~200MB of the workspace vtg file, in sequential order (total size is 322MB)
  • time 11:52:35.074 -- miscellaneous stuff with vrestore.slk, vslick.sta
  • time 11:52:35.098 -- read all of c:\slick\syscpp.h (a little over 1MB)
  • time 11:52:35.220 -- miscellaneous
  • time 11:52:35.326 -- reads ~50MB from the compiler vtg file, in sequential order (total size is 180MB)
  • time 11:52:38.960 -- oops :-[ I accidentally ended the profile in the middle of that, and also I didn't capture the total clock time it took to launch and become responsive.  I'm embarassed, but I didn't realize until I got to this point in writing up my findings, and I don't feel like redoing it all -- there is some good data here despite being truncated.
Analysis
Of the portion of the launch time that I captured:
  • 11.262 seconds was spent reading tag files
  • 1.325 seconds were spent on other startup stuff
  • 15ms were spent in an odd looking loop that reads and writes the vpwhist file on launch
  • the vtg btree files are read in sequential order, which I suppose is as good an order as any for seeding part of a btree into a cache
  • some of the file IO in the profile wasn't apparent why it was happening; for example reading c:\slick\syscpp.h on launch, various other files

The vast majority of the slow startup time for me is spent reading the tag files.  It's true that the user is going to have to pay an initial performance penalty on any chunk of tagfile data that is read from the disk.  And after forcibly dumping the disk cache to get accurate physical disk IO measurements, I timed a tight loop reading 180MB from disk and it was around 10sec.  So the 11.252 seconds seems like a good IO speed, i.e. the editor doesn't seem to be losing appreciable time to overhead.  But the time the user has to wait before getting control is not good.

Interesting story, might be relevant
In the editor I used prior to SE, I wrote the tagfile support and project support myself as external addon macros.  I used native DLL code for both, some of which I reused in the EnhProj stuff that's been discussed recently.  The tagfile stuff used a "tags" file generated by Exuberant C-Tags, and the tagfile was around 400MB.  Loading it into memory on the first tag lookup was a painful wait, as you'd expect (it was a sorted array, not a btree, so I had to pre-load the whole thing, I couldn't demand-load pages as they got visited).
So on launch I spun up a background thread to load the 400MB tagfile.  The editor was instantly responsive, and by the time I got around to doing any tag lookups the whole array had been loaded into memory in the background.

A request for the SlickTeam
For the 14.0.1 update, can the SlickTeam please consider deferring the initial cache population so it happens during idle cycles on the main thread or perhaps put it on a background thread?

SlickEdit's tagfiles are btrees, which makes them ideally suited to on-demand caching.  Spin up an idle task that does the initial cache population during idle cycles after the editor gives input control back to the user.  Threading isn't necessary:  the read chunk size is very small so the editor would be highly responsive if in its message loop (or equivalent on non-Windows OS) the editor read one chunk and used PeekMessage (or MsgWaitForMultipleObjects) to check if there are any messages to process, else read another chunk, etc and loop.

Or if you're looking for a first place to add threading then the tagfile cache might be a great candidate, especially if you plan to thread the tag APIs soon.  It should be quick and simple to make the cache threadsafe (assuming it is effectively just an app-managed disk cache containing the raw file bytes) and it would make it easy to populate the cache in the background on launch.

It "feels" like there might be a significant launch time improvement within easy reach.  Rough figures suggest my launch time (with a huge workspace) might be able to drop from 15+ seconds to possibly as little as 1-1.5 seconds.

Can the SlickTeam look into this and see if an improvement here is feasible in the 14.0.1 timeframe?  I realize I'm being slightly pushy, but I hope you can see it from my shoes and see it as a compliment:  you guys are whittling away at the scale issues and you're very close to turning a corner where your scalability curve is going to be very favorable.  I'm just excited about the improvements that are happening, and eager to see them as soon as your schedule allows.  :)

Thanks for reading the wall of text!
C
« Last Edit: May 13, 2009, 05:07:15 pm by chrisant »

_fragment_

  • Community Member
  • Posts: 49
  • Hero Points: 1
Re: Performance analysis of launching VS.EXE
« Reply #1 on: April 10, 2009, 08:32:34 pm »
  • 150ms were spent in an odd looking loop that reads and writes the vpwhist file on launch

Isn't .386 - .371 = .015 or 15 ms, not 150 ms?

Dave

chrisant

  • Senior Community Member
  • Posts: 1410
  • Hero Points: 131
Re: Performance analysis of launching VS.EXE
« Reply #2 on: April 10, 2009, 09:53:01 pm »
Isn't .386 - .371 = .015 or 15 ms, not 150 ms?
I should mention, I'm bad at math...  :'(

chrisant

  • Senior Community Member
  • Posts: 1410
  • Hero Points: 131
Re: Performance analysis of launching VS.EXE
« Reply #3 on: April 10, 2009, 10:09:59 pm »
Side note:  the 180MB for the compiler tag file seemed large to me, so I dug into that.  It turned out I had more include paths than necessary in there.  I scoped the paths more tightly and reduced it to 37MB.  The workspace tag file is still correctly 322MB.  (And since I'd accidentally cut off the measurement at 50MB of the compiler tag file, the actual startup time is now very close to the incomplete measured time above).

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6468
  • Hero Points: 504
Re: Performance analysis of launching VS.EXE
« Reply #4 on: April 13, 2009, 03:55:11 pm »
It turns out that on demand loading of the BTree gives worse problems. What happens is that as you type, a sychronous context tagging operation suddenly reads boat loads of disk data.  Almost any operation requires reading way more data than you would think.  Worse yet, the reads are random and not sequential so it takes even longer.  The only solution is to put more tagging tasks in the background. Unfortunately, this is not a quick fix.

chrisant

  • Senior Community Member
  • Posts: 1410
  • Hero Points: 131
Re: Performance analysis of launching VS.EXE
« Reply #5 on: April 13, 2009, 10:46:58 pm »
It turns out that on demand loading of the BTree gives worse problems. What happens is that as you type, a sychronous context tagging operation suddenly reads boat loads of disk data.  Almost any operation requires reading way more data than you would think.  Worse yet, the reads are random and not sequential so it takes even longer.  The only solution is to put more tagging tasks in the background. Unfortunately, this is not a quick fix.
Thanks, Clark.

I wasn't suggesting to let the tag cache populate purely on demand.  I agree that would exhibit poor performance while editing.
Just in case I explained my proposal too muddily, I'll try to clarify:

Current behavior:
  • SE reads most of the tag file on startup, and does not give control to the user until the tag file has been read.  This takes 15-30 seconds for me.

Proposal:
  • When SE starts, return control to the user before reading the tag file.
  • Start a state machine that will read the appropriate number of chunks into the tag file cache, and read one chunk per idle in the message loop (idle = PeekMessage said there was no message available).


The proposal gives control to the user quickly, so that SE starts quickly.

It is true that if the user types stuff before the tag cache has finished being populated, then tag lookups during that time will be slower than if the cache was populated.  But, the cost is guaranteed to be less than waiting for the cache to be fully populated.
It is also true that syntax coloring might do a bunch of tag lookups before the tag cache has finished being populated.  Again, the cost of those lookups is guaranteed to be less than the cost of waiting for cache to be fully populated.


As you said, I am probably underestimating how heavily SE uses the tag file on launch.  I'm doing some tests with the tag cache size set very small and the OS disk cache initially not having the tag file cached, to better understand the impact.

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6468
  • Hero Points: 504
Re: Performance analysis of launching VS.EXE
« Reply #6 on: April 14, 2009, 07:56:52 pm »
The reason we made the change to pre-fill the tag cache is due to user complaints about lags which occurred while typing. Even I had problems where suddenly the editor would hang up for a while. Whether it takes longer or not really depends on a lot of things but I'm certain there are cases where it does take longer.  I suspect with the new symbol coloring, the problem would occur on screen refresh.  pre-filling the tag cache on a timer might help if symbol coloring were off. 

chrisant

  • Senior Community Member
  • Posts: 1410
  • Hero Points: 131
Re: Performance analysis of launching VS.EXE
« Reply #7 on: April 14, 2009, 08:50:48 pm »
Yesterday I did the following test:

Test performance with tiny cache
This test avoids the pre-fill cost, and explores responsiveness in a worst case scenario of performing only on-demand caching, to get a feel for its actual impact.
1.  Set tag cache size to 8000 kb (my tag file is >320000 kb), closed SlickEdit.
2.  Dumped the OS disk cache by doing a full clean build of my product (takes ~10 hours and reads ~15GB of unique disk files and writes ~120GB of unique disk files -- i.e. "unique" meaning I count multiple reads/write of the same file as only 1 read/write).
3.  Started SlickEdit.
>>>>  SlickEdit started in about 1 second.
4.  Move insertion point up/down/left/right using arrow keys.
>>>>  During the first 5-10 seconds there were some brief pauses of up to 1 sec.
5.  Page up/down a few times.
>>>>  Syntax coloring caused pauses of up to ~2.5 sec.
6.  Ran "type MyWorkspace.vtg > nul" to prime the tag file into the OS disk cache.
7.  Moved cursor, page up/down, do a bunch of tag lookups.
>>>>  SlickEdit responded instantly to everything.
Overall feel
The responsiveness was a little slow, but was a lot better than sitting there for 30 seconds staring at a shaded window whose title bar says "(Not responding)".



Here is my reasoning for why it makes sense to use an incremental/async pre-fill:
-  Making tag-related background operations asynchronous (such as syntax coloring, filling the Preview toolbar, etc) will solve performance for the background operations.
-  It's not clear to me how to make tag-related user actions (e.g. press Ctrl+.) asynchronous, so I don't think that threading/async work will solve those -- they'll still be best served by having a tag cache.
-  If we're going to have a tag cache for the foreseeable future, we're going to need pre-fill in order for the cache to add value.
-  If we're going to need pre-fill, we're back to the problem of synchronous pre-fill causing editor startup to take a long time.
-  Even with threading, to solve startup time we still need to make the pre-fill be asynchronous.
-  The pre-fill can be made async without any threading, and all tag-related operations gain from that.  In fact if the whole tag file is cached then the need for threading goes down.

So it feels like pre-fill and threading are only loosely related, and pre-fill reduces the need for threading.  Combine async (idle) pre-fill with a dynamically sized tag cache that is big enough for the whole tag file(s), and tag lookups become super fast without any threading.  Seems like a less complicated solution path than hooking up threading.



I've said my piece(s ;)), so I'll let this lie.  I can work around the problem by cranking the SE tag cache to its minimum limit, and writing a daemon to frequently read the entire tag file to force it to stay resident in the OS disk cache.  It just seems like users shouldn't have to go to this degree of manual tuning and creating external processes to get decent performance in the editor.

Dennis

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 3817
  • Hero Points: 501
Re: Performance analysis of launching VS.EXE
« Reply #8 on: April 14, 2009, 09:42:27 pm »
I think you have convinced me that the immediate pre-fill is too aggresive.  We were optimizing to avoid (only avoid, not completely prevent) bad cache misses when the editor is just started and the tag file cache was not populated.

The strategy of taking out the immediate pre-fill and letting the tag file cache populate itself though a combination of on-demand reads and asynchronous sequential pre-fill reads would be a much better balanced strategy.  I will look into implementing this for 14.0.1.  This strategy could result in delays due to cache misses shortly after the editor first comes up, but once the cache is filled, the performance would be as good as it is now.

With that change, we would be able to increase the default tag file cache size and/or configure it using a formula based on the amount of available memory on your machine.

chrisant

  • Senior Community Member
  • Posts: 1410
  • Hero Points: 131
Re: Performance analysis of launching VS.EXE
« Reply #9 on: May 13, 2009, 05:04:36 pm »
Did this make it into the 14.0.1.1 RC?  Well, I'll know later today after I do a build...  ;)

chrisant

  • Senior Community Member
  • Posts: 1410
  • Hero Points: 131
Re: Performance analysis of launching VS.EXE
« Reply #10 on: May 13, 2009, 08:58:03 pm »
It did not make it into 14.0.1.1.

ScottW, VP of Dev

  • Senior Community Member
  • Posts: 1471
  • Hero Points: 64
Re: Performance analysis of launching VS.EXE
« Reply #11 on: May 14, 2009, 06:22:14 pm »
That change introduced a pernicious crash bug, so we had to pull it. We'll try again during our normal Summer release. Though it has grown in scope, the initial intent of v14.0.1 was to include bug fixes that were already published as hot fixes plus the ones that couldn't be shipped as hot fixes.

chrisant

  • Senior Community Member
  • Posts: 1410
  • Hero Points: 131
Re: Performance analysis of launching VS.EXE
« Reply #12 on: May 14, 2009, 06:56:27 pm »
Understood, thanks, Scott.

chrisant

  • Senior Community Member
  • Posts: 1410
  • Hero Points: 131
Re: Performance analysis of launching VS.EXE
« Reply #13 on: July 06, 2009, 06:57:05 pm »
This appears to have made it into the 14.0.2.1 RC, yay!  Startup is nearly instantaneous (~1 second), and editor responsiveness is slightly choppy for the first several seconds.  (That's as expected and is a good thing:  the idea being it is better to be slightly choppy for a 15 seconds than to be completely unresponsive for 15 seconds).

Thanks, SlickTeam!  8)

ScottW, VP of Dev

  • Senior Community Member
  • Posts: 1471
  • Hero Points: 64
Re: Performance analysis of launching VS.EXE
« Reply #14 on: July 06, 2009, 07:02:26 pm »
And our thanks to you. You made a convincing argument for the change.