Author Topic: Speed on Linux again  (Read 13968 times)

chrisant

  • Senior Community Member
  • Posts: 1410
  • Hero Points: 131
Re: Speed on Linux again
« Reply #15 on: June 16, 2011, 08:27:46 PM »
Quote
SE locks the tag file when in use; git wouldn't be able to replace the tag file while it's in use, so that shouldn't be a problem.
Not a problem for SE, it would be a problem for git however. :)
I mean not a problem for the user.  Git will fail to sync that file, and the user can simply close SE and retry.


Quote
Yes, checking in the tag file would make submission slower in git, but only when the tag file is submitted (not during sync/repoint, since apparently git doesn't transfer content, it merely changes hard links).  It's a pragmatic trade off:  take a perf hit during sync, or take a perf hit after sync -- which one is less impactful?  It sounds like there's little control over the frequency of sync; but the frequency of submission of the tag file can be fully controlled by the user, for example 2x per week, so the perf hit after sync is never more than 1/2 a week's worth of changes.
I suppose the pain would be as you say on check in - would need to try it out to see how much pain. :) One thing to note, one can't do a re-point with changed files not checked in, so  you'd almost always need to check in your tag file before doing a re-point.
Or revert the tag file before checkin/repoint.


Quote
That assumes that the perf cost is coming from detecting the changed files.  Greggman already stated that only the changed files are being retagged, so that could avoid doing a dir scan of 40k files, but the tagging cost is the same.  Intuitively I'd expect the cost of retagging 3000 files to be much higher than the cost of doing a directory scan over 40,000 files, but I could be wrong... (especially since I use solid state drives these days so I don't experience seek time delays or degradation from disk cache misses.  ;))
I think that if the file is already in the buffer cache as a result of a git operation on it, then a quick second scan of it shouldn't hit the disk at all. Also, intuitively I think that If you're walking through all of the 40k+ files looking for changed ones that's re-doing a lot of work that git just did.
I was already assuming the files are cached.  The parsing, computation, and IO (incl writes) on the tag file are the primary costs.  When I do a sync that updates 150k files out of 400k total, SE took several seconds to scan and find the 150,000 files that need to be tagged, and then SE took up to 15 minutes to actually retag those files (those numbers are from SE14 on 2yr old hardware with rotational drives, not ssd).  That's why I don't think it would make any noticable improvement to get the list of changed files from git -- it would micro-optimize the "several seconds" part, without addressing the "15 minutes" part.


Also to consider, depending on the git operation, you're not guaranteed that those files that you're pulling in haven't changed since you last looked at them, and so would need to be re-indexed anyways.  Of course, git would know this - I think that it generates a unique hash per file based on something - contents likely I think.  If you tracked those hashes you could be even smarter and use that to whittle down the list that would need re-indexing.
The SE tag file already includes timestamps of the files.  Git only updates the timestamps on files that changed.  So if the tag file is swapped out by git, then SE will already quickly see which files are different relative to what the tag file knows about, and SE retags only the files that aren't already up-to-date in the newly-swapped-in tag file.  That's how & why storing the tag file in git can potentially make a big improvement to the retagging cost.
« Last Edit: June 16, 2011, 08:46:28 PM by chrisant »

greggman

  • Senior Community Member
  • Posts: 280
  • Hero Points: 14
Re: Speed on Linux again
« Reply #16 on: June 17, 2011, 06:03:46 PM »
Follow up:

So since moving my slickedit config files off the network things seem to be much better. I'll keep my fingers crossed  that that was it.

My home folder was ~/.slickedit which at our company is on the net. I changed to to some local folder.

Note that it took me about an hour to do that because the icon that slickedit installs in Linux as '-sc ~/.slickedit' appended. Since I didn't know that I tried for about an hour of various ways of setting a SLICKEDITCONFIG environment variable and not understanding why it wasn't working.

To Slickedit: You might want to get rid of the -sc ~/.slickedit in the icon or at least update the docs on SLICKEDITCONFIG to point out the icon is overriding it.

atrens

  • Community Member
  • Posts: 6
  • Hero Points: 0
Re: Speed on Linux again
« Reply #17 on: June 20, 2011, 04:47:53 PM »

Quote
SE locks the tag file when in use; git wouldn't be able to replace the tag file while it's in use, so that shouldn't be a problem.
Not a problem for SE, it would be a problem for git however. :)
I mean not a problem for the user.  Git will fail to sync that file, and the user can simply close SE and retry.

Sure.


Quote
Yes, checking in the tag file would make submission slower in git, but only when the tag file is submitted (not during sync/repoint, since apparently git doesn't transfer content, it merely changes hard links).  It's a pragmatic trade off:  take a perf hit during sync, or take a perf hit after sync -- which one is less impactful?  It sounds like there's little control over the frequency of sync; but the frequency of submission of the tag file can be fully controlled by the user, for example 2x per week, so the perf hit after sync is never more than 1/2 a week's worth of changes.
I suppose the pain would be as you say on check in - would need to try it out to see how much pain. :) One thing to note, one can't do a re-point with changed files not checked in, so  you'd almost always need to check in your tag file before doing a re-point.
Or revert the tag file before checkin/repoint.

Hmm.. reverting the tag file.. I guess you could, but subsequently you'd need to 'catch up' your tag file the next time you switched contexts.


Quote
That assumes that the perf cost is coming from detecting the changed files.  Greggman already stated that only the changed files are being retagged, so that could avoid doing a dir scan of 40k files, but the tagging cost is the same.  Intuitively I'd expect the cost of retagging 3000 files to be much higher than the cost of doing a directory scan over 40,000 files, but I could be wrong... (especially since I use solid state drives these days so I don't experience seek time delays or degradation from disk cache misses.  ;))
I think that if the file is already in the buffer cache as a result of a git operation on it, then a quick second scan of it shouldn't hit the disk at all. Also, intuitively I think that If you're walking through all of the 40k+ files looking for changed ones that's re-doing a lot of work that git just did.
I was already assuming the files are cached.  The parsing, computation, and IO (incl writes) on the tag file are the primary costs.  When I do a sync that updates 150k files out of 400k total, SE took several seconds to scan and find the 150,000 files that need to be tagged, and then SE took up to 15 minutes to actually retag those files (those numbers are from SE14 on 2yr old hardware with rotational drives, not ssd).  That's why I don't think it would make any noticable improvement to get the list of changed files from git -- it would micro-optimize the "several seconds" part, without addressing the "15 minutes" part.

Watching SE16 running here I see that it's using about 15% of my cpu, given I'm roughly 80% idle it must be I/O bound.

Initially I had my tag file on the same device as my sandbox, which I thought was maybe a mistake, so I've started again, this time with my home dir, tag/proj file dir, and sandbox each on their own separate disk. For the first 20k or so files I saw no writes to the tag file - must have all been in memory, following that I see regular bursty 'flushes' of the tag file to disk. On the sandbox disk I'm seeing about 2MB/s of reads and a small trickle of writes (not sure what that's about, maybe filesystem metadata updates to record last access time), this is more or less steady state and looks to be the bottleneck.


Also to consider, depending on the git operation, you're not guaranteed that those files that you're pulling in haven't changed since you last looked at them, and so would need to be re-indexed anyways.  Of course, git would know this - I think that it generates a unique hash per file based on something - contents likely I think.  If you tracked those hashes you could be even smarter and use that to whittle down the list that would need re-indexing.
The SE tag file already includes timestamps of the files.  Git only updates the timestamps on files that changed.  So if the tag file is swapped out by git, then SE will already quickly see which files are different relative to what the tag file knows about, and SE retags only the files that aren't already up-to-date in the newly-swapped-in tag file.  That's how & why storing the tag file in git can potentially make a big improvement to the retagging cost.

I was initially skeptical, but after timing the 'find' command it seems that I can scan the metadata of all the files in the sandbox in about 54 seconds. Now having said that the bottleneck I'm seeing is in the reading of the files' contents, not in the tag output (write) side or cpu.



Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6823
  • Hero Points: 526
Re: Speed on Linux again
« Reply #18 on: June 21, 2011, 02:07:30 PM »
I haven't tested how well checking tag files into git helps.

I'm more concerned about how well background tagging works. You should be able to continue working. I simulated this situation by touching the date of 40,000 files and then switching back to slickedit to force a massive background tag. I have quad core windows and linux machines. I was able to continue working with ok performance. I suspect if you have a really slow graphics card for Linux, that would be a big problem. A notebook computer might be a problem as well (I've got an old notebook which I will test).

One thing I didn't like is that if I pressed Ctrl+Period to look for a tag that did not exist it switched to synchronous tagging. This is a good default for this but maybe we need an option.Note that if you press Ctrl+Period to look for a tag and SlickEdit start synchronously tagging, press the Cancel button and SlickEdit will continue tagging in the background.

One thing I was wondering is when git switches to a different branch, how your build still works. It seems to me you would need to check-in the binaries, delete the binaries, or touch all the source files. If git is just changing links, I would expect no dates to change.
« Last Edit: June 21, 2011, 02:51:37 PM by Clark »

atrens

  • Community Member
  • Posts: 6
  • Hero Points: 0
Re: Speed on Linux again
« Reply #19 on: June 21, 2011, 04:48:04 PM »

I haven't tested how well checking tag files into git helps.

I haven't tried this either.

I'm more concerned about how well background tagging works. You should be able to continue working. I simulated this situation by touching the date of 40,000 files and then switching back to slickedit to force a massive background tag. I have quad core windows and linux machines. I was able to continue working with ok performance.

First I should start by saying that my desktop is a few years old, but still respectable.  Quad-core Xeon 2GHz clock. FireGL graphics card I think. 2 relatively quick 7200 rpm drives and a third 10000 rpm drive.

I can do some work with background tagging running, but symbol def/ref lookups are pretty slow.

I suspect if you have a really slow graphics card for Linux, that would be a big problem. A notebook computer might be a problem as well (I've got an old notebook which I will test).

Haven't noticed any graphics speed related issues.

One thing I didn't like is that if I pressed Ctrl+Period to look for a tag that did not exist it switched to synchronous tagging. This is a good default for this but maybe we need an option.Note that if you press Ctrl+Period to look for a tag and SlickEdit start synchronously tagging, press the Cancel button and SlickEdit will continue tagging in the background.

I'm kind of agnostic about this. If I can't find a symbol then that is kind of disrupting my thought process - especially if I'm looking for a bug, if synchronous tagging would expedite this then that would be helpful.  On the other hand it's critical to see all the references for a symbol .. or if not then an indication that the list is potentially incomplete - because when you're wading through 50000+ files looking for a function that you plan on changing, missing an instance can lead to great unhappiness.

One thing I was wondering is when git switches to a different branch, how your build still works. It seems to me you would need to check-in the binaries, delete the binaries, or touch all the source files. If git is just changing links, I would expect no dates to change.

We use a custom bsd-style build where the objects get placed in a separate output directory tree - this helps because one build often contains object code for multiple cpu targets - each set of output objects will get stored in its own subtree.

Of course none of this gets checked into git.

I've found that the only safe thing to do when significantly shifting the baseline of one's sandbox is to blow away the whole output tree and start afresh. For me, build correctness is the most important thing - Tthe extra time it takes to rebuild in these situations is insignificant when compared with the time taken to track down subtle dependency related build problems.

Now if I'm advancing my sandbox whilst staying on the same branch, say rebasing to a new head on the current branch, then I'll leave the object tree in place and trust the dependency engine figure out what needs to be rebuilt.

Git doesn't just change links persay, it will reset the file dates to reflect the file dates at that particular point on the branch. If you're moving backwards on the branch some file dates will go back in time to reflect that changes have been unwound, if you're advancing on the branch some file dates will move forward to reflect changes made to those files. I think that git sort of treats changes like patches in the sense that it has a list of patches to apply as you're going forward and to revert as you're moving back. Each patch gets applied with specific timestamp.

Cheers,

--Andrew


Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6823
  • Hero Points: 526
Re: Speed on Linux again
« Reply #20 on: June 21, 2011, 05:45:13 PM »
Thanks Andrew. Git does what I expected. By not changing the dates I'm meant not changing the dates to be the current date (i.e. touching all files). Maybe find-ref should prompt whether to complete background tagging first? Or maybe there needs to be an option for this?

We've been doing more work on optimizing background tagging. We are trying to reduce the time things are exclusively locked. Hopefully some of our ideas will pan out.

Clark

atrens

  • Community Member
  • Posts: 6
  • Hero Points: 0
Re: Speed on Linux again
« Reply #21 on: July 25, 2011, 08:15:01 PM »
Thanks Andrew. Git does what I expected. By not changing the dates I'm meant not changing the dates to be the current date (i.e. touching all files). Maybe find-ref should prompt whether to complete background tagging first? Or maybe there needs to be an option for this?

We've been doing more work on optimizing background tagging. We are trying to reduce the time things are exclusively locked. Hopefully some of our ideas will pan out.

Clark
Hi Clark,

I just installed 16.0.1 and it seems to be much more responsive whilst the background tagging is running, and now starting a Symbol Reference Lookup asks for me to wait for background tagging to complete, which I'm okay with. The only thing (maybe this is intentional) is that the Background Tagging progress bar and gkrellm seem to indicate that progress on the background tagging (effectively foreground now since I've opted to wait for it to complete prior to starting my symbol reference lookup) is a bit bursty - I see short bursts of disk activity, followed by short bursts of cpu usage, followed by progress bar updates - lather, rinse, repeat.  So the cpu cores are not pegged and neither is the disk i/o.  I'm thinking that this is deliberate for background tagging and probably quite nice wrt keeping the UI more responsive, but if it were possible to tweak it to run flat-out when flipped into this pseudo-foreground mode it might run a bit faster.

Thanks again!

--Andrew