SlickEdit Community

SlickEdit Product Discussion => SlickEdit® => Topic started by: jporkkahtc on November 17, 2021, 08:28:53 pm

Title: FindInFiles: Performance
Post by: jporkkahtc on November 17, 2021, 08:28:53 pm
FiF seems a bit slow - not sure if it is slower than it used to be though.

Seems like this has something to do with Slick language support, though I've not tried hard to confirm this.
For example, when I did a FiF<WorkSpace>/*, at some point Slick complained about:

Code: [Select]
DTD 'http:\\\developer\msbuild\2003' for file 'C:\src\plugins2blahblah.csproj' not found.
This processing is needed for better color coding and auto completions but can cause delays.
So, I know that Slickedit is doing some processing on each file it searches.

Is there a way to dumb-down the searching?

Last week I was doing a search of Search&&Replaces. I started by using a Slickedit macro with several instances of:
Code: [Select]
_mfreplace2(search_string, replace_string,"LI*",'+t C:\src\plugins\blah\blah',"*.md","<Default Excludes>");
And found it was running pretty slowly.
I rewrote the macro in Python - doing the same regex F&R operations, and it was dramatically faster.

Title: Re: FindInFiles: Performance
Post by: Clark on November 17, 2021, 09:11:39 pm
The DTD message sounds like SlickEdit is fetching a DTD for an XML file. That could make things crazy slow. I wouldn't think a multi-file find would cause that to happen. However, a mult-file replace could definitely cause that to happen. If you're getting this with a multi-file find, please provide me with steps to reproduce this.  We plan to add support for disabling the fetching of DTD's in XML files.

As for the _mfreplace2 call I would need a test case to look into. v26 usually does the multi-file find threaded but the replaces are all done on the main thread which slow things down a little.
Title: Re: FindInFiles: Performance
Post by: jporkkahtc on November 23, 2021, 08:48:46 pm
OK, so I've done some testing.
With a clean configuration doing a Search & replace over 447 files takes 18 seconds.
With my configuration it takes 236 seconds.

With a simple Python script it takes 4 seconds.

I will upload my sample data, macros and python script to Support - under "JPorkkaPerformance"

Title: Re: FindInFiles: Performance
Post by: Clark on November 24, 2021, 04:01:48 am
I would like to figure out why your configuration is much slower.

Can you post your user.cfg.xml?

In my testing, most of the time is being spent saving the file.
Here's where the save time is spent:

   * Backup history is very expensive. If you turn that off, this gets about twice as fast.
   * The "save" command displays a message on the message line. This seems to eat a lot more time than you would think.
   * Mult-file undo takes a small amount of extra time. This means that SlickEdit has to a make copy of the original file so you can undo the entire multi-file replace operation. I don't think it's worth it to turn off because it wouldn't speed things up much.

We could add an option to the function you are using to disable backup history for the replace. For a typical user, I wouldn't recommend this. As for the save message, I'll have to look into this further to see what is the best way to improve this.
Title: Re: FindInFiles: Performance
Post by: Clark on November 24, 2021, 09:32:20 am
Do you have these *.md files in a project which is currently open? If so, that adds a lot of time due to your files needing to be retagged. For me, this makes things about 3 times slower. Not sure if this explains why you're getting 236 seconds. It might. My machine is MUCH faster than yours.

Try closing your workspace and testing the multi-file replace speed to see how much faster this is.

Title: Re: FindInFiles: Performance
Post by: jporkkahtc on November 24, 2021, 05:49:41 pm
Please, send me a faster machine for testing this. :-)

These files are not part of a workspace -- for this test I had them in C:\Temp\Slick\...

For todays testing, I've renamed all the files from *.md to *.PlainText to avoid Slick trying to recognize anything.

Closing my workspace and turning off backup history reduces the time to 18 seconds
Turning history back on: 37 seconds.
With my usual workspace open, History On: 237 seconds.
With my usual workspace open, History Off: 223 seconds.
Title: Re: FindInFiles: Performance
Post by: jporkkahtc on November 24, 2021, 06:08:18 pm
I uploaded my project files and user.cfg.xml to support.
Title: Re: FindInFiles: Performance
Post by: Clark on November 25, 2021, 01:43:30 pm
When I open your workspace, it has no effect on performance. However, when I open your workspace and use your config, then I see performance degrade by more than 3X. If you turn off def_git_show_file_status, then it doesn't happen.

The worst test case is to create a project which has these files in it and then run your test case. That's because Markdown files use a proc search function which has to be run on the main thread.

In any case, we've got plenty to go on. Thanks for helping us figure this out.

Here's the save overhead in order of most overhead first.

1. Retags file. This is especially slow if a language like Markdown where a proc_search function is used. Slows multi-file replace time by about 8 times when proc search is used.
2. Version control file status updating. This causes too much file I/O (done on thread). Slows multi-file replace time by about 3 times. Sometimes the overhead is more like 40%. I don't know why this can vary so much.
3. Backup History. This roughly doubles multi-file replace time.
4. Displaying the save message is pretty expensive. Adds about 20% on windows.
5. Multi-file undo data. This doesn't slow things down much. Maybe 10% slower.

I think I can automate some optimizations for 26.0.1. I've got good ideas for 1 and 2. 3-Backup history may require an option. 4-can be fixed by not displaying a message for all files saved. Best to leave 5 alone. With these optimizations, the replace takes about 4 seconds on my machine. This would be more like 14-18 seconds on your machine.

Even with these optimizations, it won't be as fast as your Python script for this type of test case where most files being searched need to be saved. There is a some overhead we can't get rid of without a difficult rewrite. SlickEdit's multi-file replace supports files larger than memory. Can't just load the whole file into memory and replace instances found. cmgrep, which is used for a multi-file find, has a very primitive editor-like file cache but currently it isn't writable at all. Maybe we could optimize small files by just loading the whole file into memory and replacing instances found (files less than say 1 gig). Although, backup history will be a performance issue unless it's disabled for files not already loaded in the editor.
Title: Re: FindInFiles: Performance
Post by: Clark on November 26, 2021, 01:21:52 am
I did a quick analysis for doing many replaces in a continuous string. This is waaaaaaay too slow when replacing many occurrences. In other words, the ultimate solution requires us to enhance our read-only mini-editor in cmgrep with read-write capabilities. Hopefully we can do this for v27.
Title: Re: FindInFiles: Performance
Post by: Clark on November 26, 2021, 05:54:28 pm
After optimizing this. I'm getting about 2.3 seconds with a release build  on Windows. Old code was about 2.75 seconds if backup history and def_git_show_file_status are off.

On Linux, with the new optimizations I get about .4 seconds for the same test case. If you want your slow machine to run faster, I suggest using Linux instead of Windows. That's what I use for my primary work machine. Once in a while I need to use the Visual Studio debugger on Windows. Although, I found that even the Visual Studio debugger sometimes gives me problems.

A while ago, I studied file system performance and found that the Linux and macOS file systems are on average 30% faster than Windows depending on the operation. The Windows file system has a lot of overhead that most people don't benefit from. I prefer Linux to macOS because it's easier to make it work more like the Windows window manager. I wouldn't recommend Linux to a non-savvy computer user.
Title: Re: FindInFiles: Performance
Post by: Clark on November 29, 2021, 10:57:22 pm
The multi-file undo data is more expensive than I thought. Now when I turn that off (manually, no option yet), it cuts the time in half. At a minimum, we can't turn off multi-file undo by default. Too bad Windows is so much slower than Linux at all this file I/O.
Title: Re: FindInFiles: Performance
Post by: jporkkahtc on December 01, 2021, 07:55:44 pm
WRT MFUndo: its interesting that Slick keeps a before and after copy of the file.
So before saving a modified file, it makes 2 copies?
It doesn't do any "move" operations - I suppose you cannot for several reasons.

Also, it adds entries in VSDelta as well as MFUndo.

A: Couldn't MFUndo make use of the VSDelta instead of creating 2 more copies?

B: Instead of creating individual files in MFUndo, create a ZIP with all the files. You could even disable compression to avoid that overhead. This will reduce the file-operations overhead in Windows quite a bit. I think windows IO isn't so bad in general, but creating files has a lot more overhead than Linux.

C: When copying files to MFUndo, Slick uses a buffer size of "32000" instead of a nice power of two.
Hmm... this is weird.
1st Writing to MFUndo files, Slick goes 32000 bytes at a time. (Alternating reads from original, writes to MFUndo).
2nd, after creating the 1st file in MFUndo, it writes the whole original file using odd sizes: 32765, 32738, 32692, 32745 and finally 8680. (Maybe WriteFileGather() would help out if the buffer it segmented?)
3rd, the original file is written again, this time 8192 bytes at a time. (The file is also open as a buffer).
4th: A VSDelta for this file happens.
5th: The 2nd file in MFUndo is written, 32000 bytes at a time.
6th: The entire original file is read, 8192 bytes at a time.

Seems like there are some unneeded operations there.

Title: Re: FindInFiles: Performance
Post by: Clark on December 01, 2021, 08:51:55 pm
Multi-file undo supports redo. That's why the before and after copies are made.

A. Backup history is a totally separate feature from multi-file undo. Backup history really was not built to be a version control system. Backup history delta files are automatically deleted after a number of saves occur to save disk space.

B. Too bad Windows file I/O is so much slower than Linux. Streaming all files to one file would probably help Windows performance because there would be fewer file opens which are more expensive. Maybe a future release. I think compression would hurt performance assuming you have a local SSD.

C. One change I already made was to try to preload the entire file. That seems to speed up the save. The reads of ~8192 are more of a limitation of the editor engine. It is possible to indirectly increase this with the -sm invocation option (try -sm 1000). I doubt changing read/write sizes will change performance much for these small files.