SlickEdit Community

SlickEdit Product Discussion => SlickEdit® => Topic started by: greggman on June 01, 2011, 11:43:54 pm

Title: Speed on Linux again
Post by: greggman on June 01, 2011, 11:43:54 pm
I'm sorry to start off this way but I'm really frustrated. Slickedit has become un-useful for me lately and support isn't answering, at least here on the forums.

I recently rebooted my machine. Started Slickedit, it said my <prog>.vtg file was corrupt. It took *35 minutes!!!!* before I could start using slickedit. 8-10 minutes until it finishing saying <prog>.vtg file corrupt and 25 minutes to tag during which perf is so slow it's unusable.

Seriously guys, what am I supposed to do? I've been a loyal customer for 18 years or since V1.0  I've got 18 years of slickedit muscle memory. I don't want to have to give up slickedit but some word from you guys seems in order. Are you going to address this? Is it a priority? Will I see a fix in a reasonable amount of time or should I be looking for other solutions?
Title: Re: Speed on Linux again
Post by: chrisant on June 02, 2011, 02:10:14 am
35 min is painful, I feel your pain.

Using the forum search, I can see posts from you about speed issues dating back as far as 2006.  They were taken seriously and some improvements and/or hotfixes came out of them.  I've also posted about speed issues a number of times, and I've always felt my feedback was taken seriously and most often led to direct improvements with reasonably short turnaround time.

I only see one thread from you recently about speed, and it was less than a week ago, but maybe I missed some other posts in the forums.

I've always found that the more specific details I can provide, the better the SlickTeam is able to respond in a timely manner.  I haven't seen much detail in the recent speed reports, but I also recognize it's hard to capture "details" when we're talking half a million files (I work in some very large projects as well, anywhere from 200,000 files to 800,000 files).

For me on Windows (I know you're on Linux) SlickEdit 16 is faster than ever at tagging.  I recall some kind of problem that was discovered wrt Linux (I think?) during the 16 beta (I think?) that was slowly down performance in an unexpected way.  Maybe you can find it in the beta forum and check if it looks like it could be related.

Two ideas that might help in the short term:
- Turning off symbol coloring might alleviate some of the speed problems.
- Deleting the .vtg file might help speed up the tagging.

HTH
Title: Re: Speed on Linux again
Post by: ScottW, VP of Dev on June 02, 2011, 02:24:48 pm
I'm very sorry for your problem. We take all problem reports seriously, and performance issues are one of our top priorities. Much of what we did in v16 was to improve performance. Despite your experience, v16 is faster at tagging for most people. I don't know why it has been so bad for you.

Please remember that the forums are not an official support channel. See: http://community.slickedit.com/index.php?topic=28.0.
Quote
Although our staff will monitor the forums and answer selected questions, it is not our intent to address support requests from within the forum. Support requests should be submitted via www.slickedit.com/supportcase

The forums are for users to help users. We answer items when we can, particularly if they are helpful to other users. But all support issues should be routed through official support channels. I probably confuse things by posting to so many of these. If so, I apologize for creating this confusion.

Can you post a case number so I can look into this one?

Some questions (ignore any questions already covered in the support case):
1) It sounds like you're using background tagging. If so, turn it off. Tools > Options then Editing > Context Tagging. Set "Number of tagging threads" to 0 and the two "Use background thread to..." items to False. Of course, this means you won't be able to do anything until tagging has finished, but it sounds like that's where you are now. However, foreground tagging may avoid the problem that is causing this to be so slow, so it may shorten the time before you can start working. What were your tag times like using v15? Please let us know the difference in time to tag with background tagging on and off.
2) Are your source files stored locally?
3) Are your project and workspace files stored locally?
4) How many files in your workspace/projects? What language?
5) Was this a problem using v15? If not, can you use v15 while we are trying to get to the bottom of this? Are there specific changes in v16 that you need? Obviously the multithreading work wasn't a boon for you.
6) Can you post your Help > About SlickEdit info (redact anything you don't wish to share)?

That's all I can think of for now. We may not be able to fix this until the v16.0.1 release, which is planned for late July. But we'll do our best to get to the bottom of this and send you a fix if possible.
Title: Re: Speed on Linux again
Post by: timur on June 02, 2011, 02:51:17 pm
I have found that the background tagging (which didn't exist in v11, the previous version I used) is very slow and bogs down the system.  I use the command-line tool now for all tagging, although it has its own drawbacks.
Title: Re: Speed on Linux again
Post by: ScottW, VP of Dev on June 02, 2011, 06:07:41 pm
Background tagging is easily disabled, as described previously. After that, it should be very similar to previous versions.
Title: Re: Speed on Linux again
Post by: greggman on June 03, 2011, 06:59:47 pm
Thank you for responding.

I'm very sorry for your problem. We take all problem reports seriously, and performance issues are one of our top priorities. Much of what we did in v16 was to improve performance. Despite your experience, v16 is faster at tagging for most people. I don't know why it has been so bad for you.

Please remember that the forums are not an official support channel. See: http://community.slickedit.com/index.php?topic=28.0.
Quote
Although our staff will monitor the forums and answer selected questions, it is not our intent to address support requests from within the forum. Support requests should be submitted via www.slickedit.com/supportcase

The forums are for users to help users. We answer items when we can, particularly if they are helpful to other users. But all support issues should be routed through official support channels. I probably confuse things by posting to so many of these. If so, I apologize for creating this confusion.

Right. I guess I just assume you'd be better off if solutions are public. Stuff done through the support channel means you guys have to spend time with each person even if their solution could be public.

Can you post a case number so I can look into this one?

I'll open a case if these don't solve the issue.

Some questions (ignore any questions already covered in the support case):
1) It sounds like you're using background tagging. If so, turn it off. Tools > Options then Editing > Context Tagging. Set "Number of tagging threads" to 0 and the two "Use background thread to..." items to False. Of course, this means you won't be able to do anything until tagging has finished, but it sounds like that's where you are now. However, foreground tagging may avoid the problem that is causing this to be so slow, so it may shorten the time before you can start working. What were your tag times like using v15? Please let us know the difference in time to tag with background tagging on and off.

I'll try that and tell you how it goes

2) Are your source files stored locally?

Yes

3) Are your project and workspace files stored locally?

Yes

4) How many files in your workspace/projects? What language?

40k-50k files, mostly C++. I'm only adding .c, cc, .cpp, .h and .py files to my project. I've sent links to the source before is the project is open source.

5) Was this a problem using v15? If not, can you use v15 while we are trying to get to the bottom of this? Are there specific changes in v16 that you need? Obviously the multithreading work wasn't a boon for you.

Speed has been a problem on all platforms for this project. Wait times of 7 minutes or more were common on Windows each time I synced (which rebuilds the .vcproj files in my project)

A big difference is a new workflow using git. A workflow that is gaining massive traction.

If you are not aware of how git works, Git supports branching in a way that no other version control system does so in git it is most common to switch branches often (several times an hour).  When you switch branches in git it changes a bunch of hardlinks on your files (similar to how OSX Time Machine works if you are familar with that). So in other words. 'git checkout feature1' will nearly instantly switch all the files in your project to the versions you were at when you started working on feature1.  'git checkout feature2' will switch them all to the state needed for feature2. It generally takes git like 0-2 seconds even for  50k files.

What this means for slickedit is that several thousand files in a project might change underneath it several times an hour.

git usage is taking off. Tons of projects are moving to github. Chrome is moving to git. Webkit is moving to git. Processing is on git to name a few.

The point I'm trying to make is slickedit is going to have to work well with this new and increasingly popular workflow and I'm guessing it wasn't really designed with that in mind. When I was using svn or p4 or cvs, files didn't change often. Generally if I worked in multiple things at once I had separate copies of the repo and switched that way. In that old style workflow I'd switch projects to work on a different feature. c:\work\repo_feature1\project.vsj vs c:\work\repo_feature2\project.vsj.  The files in each were relatively stable.  In git though I'm always in c:\work\repo. When I want to work on feature 1 I type 'git checkout feature1' and the files in c:\work\repo magically change to the state needed for feature 1. When I want to work on feature 2 I type 'git checkout feature2'  and again the files all magically change back to the state needed for feature 2.

That means slickedit is now noticing all these files changed and starts tagging. That 7 minute wait in Windows which only happened each time I synced a particular repo now happens several times an hour and tagging being slower on linux makes it even worse. I can switch tagging off but I rely on tagging. Even if I make it not tag in the background the git workflow means the long wait comes up really often.

If you can find the source of the poor performance of tagging in linux and find out why while it is tagging the editor's response is so slow that would probably be the biggest help.

6) Can you post your Help > About SlickEdit info (redact anything you don't wish to share)?

Code: [Select]
SlickEdit 2011 (v16.0.0.6 64-bit)

Serial number: xxxxxxx
Licensed number of users: Single user
License file: /usr/local/google/gman/slickedit/16.0.0.6/bin/slickedit.lic

Build Date: May 05, 2011
Emulation: Brief

OS: Linux
OS Version: Ubuntu 10.04.1 LTS
Kernel Level: 2.6.32-gg465-generic
Build Version: #gg465-Ubuntu SMP Mon Apr 11 05:52:28 PDT 2011
Processor Architecture: x86_64

X Server Vendor: The X.Org Foundation
Memory: 47% Load, 4880MB/10178MB Virtual
Shell Info: /usr/local/google/gman/slickedit/16.0.0.6/bin/secsh -i
Screen Size: 2240 x 1600, 2240 x 1600

Project Type: Gnuc
Language: .cc (C/C++)

Installation Directory: /usr/local/google/gman/slickedit/16.0.0.6/
Configuration Directory: /usr/local/google/gman/.slickedit/16.0.0/

Hotfixes:
/usr/local/google/gman/.slickedit/16.0.0/hotfixes/hotfix_se1600_1_cumulative.zip (Revision: 1)


That's all I can think of for now. We may not be able to fix this until the v16.0.1 release, which is planned for late July. But we'll do our best to get to the bottom of this and send you a fix if possible.


One thing I did notice, the configuration path was on the network. (~/.slickedit).  I've moved it to be to be local. I'll see how much that helps.

Thank you. I'm sorry for being frustrated. I'm one of  your biggest fans. I've gotten others here to use slickedit and I'm also one who realizes the value in paying for something even when inferior but free alternatives exist.
Title: Re: Speed on Linux again
Post by: ScottW, VP of Dev on June 03, 2011, 07:23:48 pm
You are right to be frustrated. We have no tolerance for bad performance and you shouldn't either!

The git workflow poses a major challenge for us. Tagging only works if it knows where the symbols are defined. If the location of the file changes when you switch branches, I can't think how we could handle that except to retag. I'm amazed that git can update that many files in a couple seconds. I'm curious what's going on there. We'll have to look into this.

Having your config on a local drive is very important, too. Good catch! I didn't think to ask about that. I would think that would affect library tagging more than the speed of tagging your workspace, though. The tag file for the workspace is stored in the same directory as the workspace file.

Is this a wildcard project? That could be a problem. Maybe the wildcard lookup and the tagging are fighting for time. Also, you might try closing the Projects tool window. That has some performance problems associated with it.

We do like users to be able to see other answers, which is why I try to post to as many of these as I can. The problem is that the forum software has no tracking in it. I can't tell which items are open or resolved. I can't see which items have someone working on them. So, things are very likely to fall through the cracks on the forums as we get pulled into other activities. The Product Support team has real tracking software with workflows built in for license verification and case histories.

If you are on maintenance and support, please contact support and open a case for this. Reference this thread on the forums so they can see what we've already covered. If you aren't on maintenance and support, contact them anyway and tell them that I wanted them to handle this case. I'll tell them to watch for it. We need to get to the bottom of this performance issue.
Title: Re: Speed on Linux again
Post by: chrisant on June 03, 2011, 07:50:11 pm
@greggman, I recall that you uncovered some performance problems in Python tagging, specifically.  It might be interesting to try excluding the .py files temporarily, and see how that affects SE's tagging speed.  For example, if the problem is actually in the Python tagging parser and not in the outer loop(s) of SE's tagging engine, this could help focus SlickTeam's investigation.
Title: Re: Speed on Linux again
Post by: greggman on June 03, 2011, 08:37:50 pm
The git workflow poses a major challenge for us. Tagging only works if it knows where the symbols are defined. If the location of the file changes when you switch branches, I can't think how we could handle that except to retag. I'm amazed that git can update that many files in a couple seconds. I'm curious what's going on there. We'll have to look into this.

AFAIK git works using hardlinks which means when it's switching versions all it's doing is changing links to files. No copies. In unix lingo 'ln repo_data/versions/file-version1.cc workspace/file.cc'

If you want to learn git I suggest these 2 links

"The git parable" explains what's really happening in plain and simple terms. It's an easy read
http://tom.preston-werner.com/2009/05/19/the-git-parable.html


The ProGit book
http://progit.org/book/

The first chapter explains the basic different between old systems p4, cvs, svn, etc... and new ones hg, git, etc..

Title: Re: Speed on Linux again
Post by: Clark on June 06, 2011, 01:18:08 pm
Have you tried putting the SlickEdit workspace and tag file in the repository? This hopefully would allow the dates in the tag file to match those on disk when you do a checkout which switches all files.
Title: Re: Speed on Linux again
Post by: HaveF on June 09, 2011, 01:45:02 am
The git workflow poses a major challenge for us. Tagging only works if it knows where the symbols are defined. If the location of the file changes when you switch branches, I can't think how we could handle that except to retag. I'm amazed that git can update that many files in a couple seconds. I'm curious what's going on there. We'll have to look into this.

It's a interesting problem.
I have a idea about retag the changed file instead of __all__ the change files in the git related project, that is:

1. slickedit should remember the current snapshot id(SHA-1 value) by git command or parse the related file.
$ cat .git/HEAD
ref: refs/heads/master
$ cat .git/refs/heads/master
cac0cab538b970a37ea1e769cbbde608743bc96d

2. after users commit their code, or checkout to another branch, slickedit should remember the newer snapshot id.
such as:
1a410efbd13591db07496601ebc7a059dd55cfe9

3. then, slickedit should use git command(or find how this git command works in git's source code) to find the changed file.
$ git diff --name-status cac0cab538b970a37ea1e769cbbde608743bc96d 1a410efbd13591db07496601ebc7a059dd55cfe9
M       .description
D       .gitattributes

4. retag the changed file.
Title: Re: Speed on Linux again
Post by: chrisant on June 09, 2011, 09:53:38 am
If git is touching files that it knows haven't actually changed, that would be a performance flaw in git that should be addressed in git itself (it would be such a silly oversight that it's hard for me to believe it exists).

But greggman said his project has 40k to 50k files, and "several thousand files in a project might change underneath it several times an hour".  Which suggests that git is only touching the files it knows changed.  If that's so, then it would only further hurt performance for SE to confirm with git that the changed files were changed.

The issue seems to be that it takes an uncomfortably long to tag the thousands of files that really changed.  Clark's suggestion seems good:  It seems like having the tag db checked into git would allow git to automatically solve the very problem it created.

Alternatively either SE could introduce revision history directly into the tag db (seems complex and impractical, and has potential for unbounded growth), or SE could make tagging faster -- but that's not a scalable solution (if 2000 files takes too long, and performance is improved by 10x, then it just means 20,000 files takes too long).  So having the tag db in git seems like a better solution to the git problem.
Title: Re: Speed on Linux again
Post by: atrens on June 16, 2011, 03:48:35 pm
If git is touching files that it knows haven't actually changed, that would be a performance flaw in git that should be addressed in git itself (it would be such a silly oversight that it's hard for me to believe it exists).

But greggman said his project has 40k to 50k files, and "several thousand files in a project might change underneath it several times an hour".  Which suggests that git is only touching the files it knows changed.  If that's so, then it would only further hurt performance for SE to confirm with git that the changed files were changed.

The issue seems to be that it takes an uncomfortably long to tag the thousands of files that really changed.  Clark's suggestion seems good:  It seems like having the tag db checked into git would allow git to automatically solve the very problem it created.

Alternatively either SE could introduce revision history directly into the tag db (seems complex and impractical, and has potential for unbounded growth), or SE could make tagging faster -- but that's not a scalable solution (if 2000 files takes too long, and performance is improved by 10x, then it just means 20,000 files takes too long).  So having the tag db in git seems like a better solution to the git problem.

Not sure if having a huge tag file checked into git would actually help. I think it would just slow down git.  Also slickedit would need to worry about the file changing underneath it, which I think would be hard problem to solve.


In my case I normally deal with around 65k files, and my tag tile is over 400MB -

-rw-r--r--  1 atrens  wheel  463478784 Jun 16 10:33 trunk.git.vtg

that's only about 1% of the sandbox size, but still getting a bit unwieldy ..

Not sure if splitting up the tag file helps either, unless you could predict which files were most likely to change and group them together.

I think that the only tractable option would be for slickedit to get a list of files changed by the last git operation - perhaps git could dump this someplace convenient as it's actually performing the operation and then slickedit could re-tag only those files. Even better, if a slickedit thread were to be fed the file names as git completed processing them it could follow along and take advantage of the files still being in the buffer cache.

I guess that this means changes to git, but pretty simple changes I'd think. For robustness git could even be taught to spew xml-ized output when passed a special flag.

--Andrew






Title: Re: Speed on Linux again
Post by: chrisant on June 16, 2011, 05:37:01 pm
Not sure if having a huge tag file checked into git would actually help. I think it would just slow down git.  Also slickedit would need to worry about the file changing underneath it, which I think would be hard problem to solve.
SE locks the tag file when in use; git wouldn't be able to replace the tag file while it's in use, so that shouldn't be a problem.

Yes, checking in the tag file would make submission slower in git, but only when the tag file is submitted (not during sync/repoint, since apparently git doesn't transfer content, it merely changes hard links).  It's a pragmatic trade off:  take a perf hit during sync, or take a perf hit after sync -- which one is less impactful?  It sounds like there's little control over the frequency of sync; but the frequency of submission of the tag file can be fully controlled by the user, for example 2x per week, so the perf hit after sync is never more than 1/2 a week's worth of changes.

I think that the only tractable option would be for slickedit to get a list of files changed by the last git operation - perhaps git could dump this someplace convenient as it's actually performing the operation and then slickedit could re-tag only those files. Even better, if a slickedit thread were to be fed the file names as git completed processing them it could follow along and take advantage of the files still being in the buffer cache.
That assumes that the perf cost is coming from detecting the changed files.  Greggman already stated that only the changed files are being retagged, so that could avoid doing a dir scan of 40k files, but the tagging cost is the same.  Intuitively I'd expect the cost of retagging 3000 files to be much higher than the cost of doing a directory scan over 40,000 files, but I could be wrong... (especially since I use solid state drives these days so I don't experience seek time delays or degradation from disk cache misses.  ;))
Title: Re: Speed on Linux again
Post by: atrens on June 16, 2011, 06:35:15 pm
Not sure if having a huge tag file checked into git would actually help. I think it would just slow down git.  Also slickedit would need to worry about the file changing underneath it, which I think would be hard problem to solve.
SE locks the tag file when in use; git wouldn't be able to replace the tag file while it's in use, so that shouldn't be a problem.

Not a problem for SE, it would be a problem for git however. :)

Yes, checking in the tag file would make submission slower in git, but only when the tag file is submitted (not during sync/repoint, since apparently git doesn't transfer content, it merely changes hard links).  It's a pragmatic trade off:  take a perf hit during sync, or take a perf hit after sync -- which one is less impactful?  It sounds like there's little control over the frequency of sync; but the frequency of submission of the tag file can be fully controlled by the user, for example 2x per week, so the perf hit after sync is never more than 1/2 a week's worth of changes.

I suppose the pain would be as you say on check in - would need to try it out to see how much pain. :) One thing to note, one can't do a re-point with changed files not checked in, so  you'd almost always need to check in your tag file before doing a re-point.

I think that the only tractable option would be for slickedit to get a list of files changed by the last git operation - perhaps git could dump this someplace convenient as it's actually performing the operation and then slickedit could re-tag only those files. Even better, if a slickedit thread were to be fed the file names as git completed processing them it could follow along and take advantage of the files still being in the buffer cache.
That assumes that the perf cost is coming from detecting the changed files.  Greggman already stated that only the changed files are being retagged, so that could avoid doing a dir scan of 40k files, but the tagging cost is the same.  Intuitively I'd expect the cost of retagging 3000 files to be much higher than the cost of doing a directory scan over 40,000 files, but I could be wrong... (especially since I use solid state drives these days so I don't experience seek time delays or degradation from disk cache misses.  ;))

I think that if the file is already in the buffer cache as a result of a git operation on it, then a quick second scan of it shouldn't hit the disk at all. Also, intuitively I think that If you're walking through all of the 40k+ files looking for changed ones that's re-doing a lot of work that git just did.

Also to consider, depending on the git operation, you're not guaranteed that those files that you're pulling in haven't changed since you last looked at them, and so would need to be re-indexed anyways.  Of course, git would know this - I think that it generates a unique hash per file based on something - contents likely I think.  If you tracked those hashes you could be even smarter and use that to whittle down the list that would need re-indexing.

--Andrew
Title: Re: Speed on Linux again
Post by: chrisant on June 16, 2011, 08:27:46 pm
Quote
SE locks the tag file when in use; git wouldn't be able to replace the tag file while it's in use, so that shouldn't be a problem.
Not a problem for SE, it would be a problem for git however. :)
I mean not a problem for the user.  Git will fail to sync that file, and the user can simply close SE and retry.


Quote
Yes, checking in the tag file would make submission slower in git, but only when the tag file is submitted (not during sync/repoint, since apparently git doesn't transfer content, it merely changes hard links).  It's a pragmatic trade off:  take a perf hit during sync, or take a perf hit after sync -- which one is less impactful?  It sounds like there's little control over the frequency of sync; but the frequency of submission of the tag file can be fully controlled by the user, for example 2x per week, so the perf hit after sync is never more than 1/2 a week's worth of changes.
I suppose the pain would be as you say on check in - would need to try it out to see how much pain. :) One thing to note, one can't do a re-point with changed files not checked in, so  you'd almost always need to check in your tag file before doing a re-point.
Or revert the tag file before checkin/repoint.


Quote
That assumes that the perf cost is coming from detecting the changed files.  Greggman already stated that only the changed files are being retagged, so that could avoid doing a dir scan of 40k files, but the tagging cost is the same.  Intuitively I'd expect the cost of retagging 3000 files to be much higher than the cost of doing a directory scan over 40,000 files, but I could be wrong... (especially since I use solid state drives these days so I don't experience seek time delays or degradation from disk cache misses.  ;))
I think that if the file is already in the buffer cache as a result of a git operation on it, then a quick second scan of it shouldn't hit the disk at all. Also, intuitively I think that If you're walking through all of the 40k+ files looking for changed ones that's re-doing a lot of work that git just did.
I was already assuming the files are cached.  The parsing, computation, and IO (incl writes) on the tag file are the primary costs.  When I do a sync that updates 150k files out of 400k total, SE took several seconds to scan and find the 150,000 files that need to be tagged, and then SE took up to 15 minutes to actually retag those files (those numbers are from SE14 on 2yr old hardware with rotational drives, not ssd).  That's why I don't think it would make any noticable improvement to get the list of changed files from git -- it would micro-optimize the "several seconds" part, without addressing the "15 minutes" part.


Also to consider, depending on the git operation, you're not guaranteed that those files that you're pulling in haven't changed since you last looked at them, and so would need to be re-indexed anyways.  Of course, git would know this - I think that it generates a unique hash per file based on something - contents likely I think.  If you tracked those hashes you could be even smarter and use that to whittle down the list that would need re-indexing.
The SE tag file already includes timestamps of the files.  Git only updates the timestamps on files that changed.  So if the tag file is swapped out by git, then SE will already quickly see which files are different relative to what the tag file knows about, and SE retags only the files that aren't already up-to-date in the newly-swapped-in tag file.  That's how & why storing the tag file in git can potentially make a big improvement to the retagging cost.
Title: Re: Speed on Linux again
Post by: greggman on June 17, 2011, 06:03:46 pm
Follow up:

So since moving my slickedit config files off the network things seem to be much better. I'll keep my fingers crossed  that that was it.

My home folder was ~/.slickedit which at our company is on the net. I changed to to some local folder.

Note that it took me about an hour to do that because the icon that slickedit installs in Linux as '-sc ~/.slickedit' appended. Since I didn't know that I tried for about an hour of various ways of setting a SLICKEDITCONFIG environment variable and not understanding why it wasn't working.

To Slickedit: You might want to get rid of the -sc ~/.slickedit in the icon or at least update the docs on SLICKEDITCONFIG to point out the icon is overriding it.
Title: Re: Speed on Linux again
Post by: atrens on June 20, 2011, 04:47:53 pm

Quote
SE locks the tag file when in use; git wouldn't be able to replace the tag file while it's in use, so that shouldn't be a problem.
Not a problem for SE, it would be a problem for git however. :)
I mean not a problem for the user.  Git will fail to sync that file, and the user can simply close SE and retry.

Sure.


Quote
Yes, checking in the tag file would make submission slower in git, but only when the tag file is submitted (not during sync/repoint, since apparently git doesn't transfer content, it merely changes hard links).  It's a pragmatic trade off:  take a perf hit during sync, or take a perf hit after sync -- which one is less impactful?  It sounds like there's little control over the frequency of sync; but the frequency of submission of the tag file can be fully controlled by the user, for example 2x per week, so the perf hit after sync is never more than 1/2 a week's worth of changes.
I suppose the pain would be as you say on check in - would need to try it out to see how much pain. :) One thing to note, one can't do a re-point with changed files not checked in, so  you'd almost always need to check in your tag file before doing a re-point.
Or revert the tag file before checkin/repoint.

Hmm.. reverting the tag file.. I guess you could, but subsequently you'd need to 'catch up' your tag file the next time you switched contexts.


Quote
That assumes that the perf cost is coming from detecting the changed files.  Greggman already stated that only the changed files are being retagged, so that could avoid doing a dir scan of 40k files, but the tagging cost is the same.  Intuitively I'd expect the cost of retagging 3000 files to be much higher than the cost of doing a directory scan over 40,000 files, but I could be wrong... (especially since I use solid state drives these days so I don't experience seek time delays or degradation from disk cache misses.  ;))
I think that if the file is already in the buffer cache as a result of a git operation on it, then a quick second scan of it shouldn't hit the disk at all. Also, intuitively I think that If you're walking through all of the 40k+ files looking for changed ones that's re-doing a lot of work that git just did.
I was already assuming the files are cached.  The parsing, computation, and IO (incl writes) on the tag file are the primary costs.  When I do a sync that updates 150k files out of 400k total, SE took several seconds to scan and find the 150,000 files that need to be tagged, and then SE took up to 15 minutes to actually retag those files (those numbers are from SE14 on 2yr old hardware with rotational drives, not ssd).  That's why I don't think it would make any noticable improvement to get the list of changed files from git -- it would micro-optimize the "several seconds" part, without addressing the "15 minutes" part.

Watching SE16 running here I see that it's using about 15% of my cpu, given I'm roughly 80% idle it must be I/O bound.

Initially I had my tag file on the same device as my sandbox, which I thought was maybe a mistake, so I've started again, this time with my home dir, tag/proj file dir, and sandbox each on their own separate disk. For the first 20k or so files I saw no writes to the tag file - must have all been in memory, following that I see regular bursty 'flushes' of the tag file to disk. On the sandbox disk I'm seeing about 2MB/s of reads and a small trickle of writes (not sure what that's about, maybe filesystem metadata updates to record last access time), this is more or less steady state and looks to be the bottleneck.


Also to consider, depending on the git operation, you're not guaranteed that those files that you're pulling in haven't changed since you last looked at them, and so would need to be re-indexed anyways.  Of course, git would know this - I think that it generates a unique hash per file based on something - contents likely I think.  If you tracked those hashes you could be even smarter and use that to whittle down the list that would need re-indexing.
The SE tag file already includes timestamps of the files.  Git only updates the timestamps on files that changed.  So if the tag file is swapped out by git, then SE will already quickly see which files are different relative to what the tag file knows about, and SE retags only the files that aren't already up-to-date in the newly-swapped-in tag file.  That's how & why storing the tag file in git can potentially make a big improvement to the retagging cost.

I was initially skeptical, but after timing the 'find' command it seems that I can scan the metadata of all the files in the sandbox in about 54 seconds. Now having said that the bottleneck I'm seeing is in the reading of the files' contents, not in the tag output (write) side or cpu.


Title: Re: Speed on Linux again
Post by: Clark on June 21, 2011, 02:07:30 pm
I haven't tested how well checking tag files into git helps.

I'm more concerned about how well background tagging works. You should be able to continue working. I simulated this situation by touching the date of 40,000 files and then switching back to slickedit to force a massive background tag. I have quad core windows and linux machines. I was able to continue working with ok performance. I suspect if you have a really slow graphics card for Linux, that would be a big problem. A notebook computer might be a problem as well (I've got an old notebook which I will test).

One thing I didn't like is that if I pressed Ctrl+Period to look for a tag that did not exist it switched to synchronous tagging. This is a good default for this but maybe we need an option.Note that if you press Ctrl+Period to look for a tag and SlickEdit start synchronously tagging, press the Cancel button and SlickEdit will continue tagging in the background.

One thing I was wondering is when git switches to a different branch, how your build still works. It seems to me you would need to check-in the binaries, delete the binaries, or touch all the source files. If git is just changing links, I would expect no dates to change.
Title: Re: Speed on Linux again
Post by: atrens on June 21, 2011, 04:48:04 pm

I haven't tested how well checking tag files into git helps.

I haven't tried this either.

I'm more concerned about how well background tagging works. You should be able to continue working. I simulated this situation by touching the date of 40,000 files and then switching back to slickedit to force a massive background tag. I have quad core windows and linux machines. I was able to continue working with ok performance.

First I should start by saying that my desktop is a few years old, but still respectable.  Quad-core Xeon 2GHz clock. FireGL graphics card I think. 2 relatively quick 7200 rpm drives and a third 10000 rpm drive.

I can do some work with background tagging running, but symbol def/ref lookups are pretty slow.

I suspect if you have a really slow graphics card for Linux, that would be a big problem. A notebook computer might be a problem as well (I've got an old notebook which I will test).

Haven't noticed any graphics speed related issues.

One thing I didn't like is that if I pressed Ctrl+Period to look for a tag that did not exist it switched to synchronous tagging. This is a good default for this but maybe we need an option.Note that if you press Ctrl+Period to look for a tag and SlickEdit start synchronously tagging, press the Cancel button and SlickEdit will continue tagging in the background.

I'm kind of agnostic about this. If I can't find a symbol then that is kind of disrupting my thought process - especially if I'm looking for a bug, if synchronous tagging would expedite this then that would be helpful.  On the other hand it's critical to see all the references for a symbol .. or if not then an indication that the list is potentially incomplete - because when you're wading through 50000+ files looking for a function that you plan on changing, missing an instance can lead to great unhappiness.

One thing I was wondering is when git switches to a different branch, how your build still works. It seems to me you would need to check-in the binaries, delete the binaries, or touch all the source files. If git is just changing links, I would expect no dates to change.

We use a custom bsd-style build where the objects get placed in a separate output directory tree - this helps because one build often contains object code for multiple cpu targets - each set of output objects will get stored in its own subtree.

Of course none of this gets checked into git.

I've found that the only safe thing to do when significantly shifting the baseline of one's sandbox is to blow away the whole output tree and start afresh. For me, build correctness is the most important thing - Tthe extra time it takes to rebuild in these situations is insignificant when compared with the time taken to track down subtle dependency related build problems.

Now if I'm advancing my sandbox whilst staying on the same branch, say rebasing to a new head on the current branch, then I'll leave the object tree in place and trust the dependency engine figure out what needs to be rebuilt.

Git doesn't just change links persay, it will reset the file dates to reflect the file dates at that particular point on the branch. If you're moving backwards on the branch some file dates will go back in time to reflect that changes have been unwound, if you're advancing on the branch some file dates will move forward to reflect changes made to those files. I think that git sort of treats changes like patches in the sense that it has a list of patches to apply as you're going forward and to revert as you're moving back. Each patch gets applied with specific timestamp.

Cheers,

--Andrew

Title: Re: Speed on Linux again
Post by: Clark on June 21, 2011, 05:45:13 pm
Thanks Andrew. Git does what I expected. By not changing the dates I'm meant not changing the dates to be the current date (i.e. touching all files). Maybe find-ref should prompt whether to complete background tagging first? Or maybe there needs to be an option for this?

We've been doing more work on optimizing background tagging. We are trying to reduce the time things are exclusively locked. Hopefully some of our ideas will pan out.

Clark
Title: Re: Speed on Linux again
Post by: atrens on July 25, 2011, 08:15:01 pm
Thanks Andrew. Git does what I expected. By not changing the dates I'm meant not changing the dates to be the current date (i.e. touching all files). Maybe find-ref should prompt whether to complete background tagging first? Or maybe there needs to be an option for this?

We've been doing more work on optimizing background tagging. We are trying to reduce the time things are exclusively locked. Hopefully some of our ideas will pan out.

Clark
Hi Clark,

I just installed 16.0.1 and it seems to be much more responsive whilst the background tagging is running, and now starting a Symbol Reference Lookup asks for me to wait for background tagging to complete, which I'm okay with. The only thing (maybe this is intentional) is that the Background Tagging progress bar and gkrellm seem to indicate that progress on the background tagging (effectively foreground now since I've opted to wait for it to complete prior to starting my symbol reference lookup) is a bit bursty - I see short bursts of disk activity, followed by short bursts of cpu usage, followed by progress bar updates - lather, rinse, repeat.  So the cpu cores are not pegged and neither is the disk i/o.  I'm thinking that this is deliberate for background tagging and probably quite nice wrt keeping the UI more responsive, but if it were possible to tweak it to run flat-out when flipped into this pseudo-foreground mode it might run a bit faster.

Thanks again!

--Andrew