Author Topic: search speed vs grep  (Read 8295 times)

greggman

  • Senior Community Member
  • Posts: 280
  • Hero Points: 14
search speed vs grep
« on: May 23, 2012, 11:56:41 PM »
I never had reason to think that slickedit is slow to search across files, that is until I started using Linux and grep.

My project has a lot of files. (http://dev.chromium.org). If I go into slickedit and pick "Find in Files" and set it to the root of my project. Exclude is set to ".git/" and Include is set to *.cpp;*.cc;*.h, regular expressions off. I type in something like "Canvas2DRenderingContext".  (PS: there are no matches for Canvas2DRenderingContext)

Compared to

   grep -r --include="*.cc" --include="*.h" --include="*.cpp" CanvasRenderingContext2D .

grep is at least an order of magnitude faster. If the file cache is hot, meaning if it's a second search but for a different word grep is probably 2 to 3 orders of magnitude faster than Slickedit. Sometimes it's so fast I'm amazed it actually was able to check all the files. At this point I've mostly stopped using slickedit search for a recursive search :-(

A couple of ideas if you guys feel like looking into it.

1) getting a recursive listing of files like this

   ls -1R | wc -l

takes < 2 seconds on second run. I don't know what ls does that makes it so fast.

2) Googling 'why is grep so fast' brought up this

http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6882
  • Hero Points: 530
Re: search speed vs grep
« Reply #1 on: June 15, 2012, 10:15:17 PM »
sgrep which ships with SlickEdit, implements all of the optimizations listed there except for memory mapping. SlickEdit breaks files into lines and has a ton of slick-c overhead per file.

As for recursive listing, there are two code paths. I'm not sure if SlickEdit or sgrep use the fastest listing code. It depends on the options passed to our library routines.

I use sgrep in the build window instead of slickedits find mainly because I'm old fashioned but it is way faster and it supports slickedits next-error. The SlickEdit search supports color coding and sometimes I need that.

I'm on vacation for a while, give sgrep a try.

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6882
  • Hero Points: 530
Re: search speed vs grep
« Reply #2 on: June 16, 2012, 12:29:27 AM »
Here's a sample recursive sgrep command:

sgrep main -t "*.cpp" "*.h"

chrisant

  • Senior Community Member
  • Posts: 1410
  • Hero Points: 131
Re: search speed vs grep
« Reply #3 on: June 22, 2012, 06:24:10 PM »
I hadn't realized the amount of overhead in SE find in files.
Findstr smoked SE for a simple search across ~50,000 files.

It would be very (very) useful to have a macro that invokes an external search (e.g. sgrep, especially since it's included with SE) and then massages the results into a format that works in the Search Results buffer.  To get the best of both worlds:  performance and useful UI.

(Or to have a way for SE to optimize and skip overhead when doing searches only on context, e.g. not involving syntax coloring options.  Paying the overhead cost only when it's necessary could speed up searches by a lot.)

EDIT:  I mean run an external search tool asynchronously...  :)

nicktook

  • Community Member
  • Posts: 14
  • Hero Points: 0
Re: search speed vs grep
« Reply #4 on: June 22, 2012, 07:51:44 PM »
I edit files on my Windows PC over a Samba share to a linux box.  Find in Files in this environment was very slow.  Instead I implemented a macro to ssh into the linux box.  There it runs a program I wrote to grep the files and reformat the result in a format similar to the Find In Files result.  With a hot cache I can search a 200MB set of files in less then 2 seconds.  If interested I can post the macro but the remote program is tangled up with other projects.  I might be able to post a non-functional version of it.

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6882
  • Hero Points: 530
Re: search speed vs grep
« Reply #5 on: June 25, 2012, 08:16:54 PM »
I hadn't realized the amount of overhead in SE find in files.
Findstr smoked SE for a simple search across ~50,000 files.

It would be very (very) useful to have a macro that invokes an external search (e.g. sgrep, especially since it's included with SE) and then massages the results into a format that works in the Search Results buffer.  To get the best of both worlds:  performance and useful UI.

(Or to have a way for SE to optimize and skip overhead when doing searches only on context, e.g. not involving syntax coloring options.  Paying the overhead cost only when it's necessary could speed up searches by a lot.)

EDIT:  I mean run an external search tool asynchronously...  :)

This is possible but easier said than done. Off the top of my head sgrep would need a bunch of modifications (a socket to communicate with slickedit to handle resident buffers and output -not redirection, auto delete a list file). We would also need to change the column output to be a physical offset/column as if tabs in the line were not expanded. We already do this for compilation error processing.

By far the biggest problem is handling encoding. Like Unix grep, sgrep has no support for file encodings (or in SlickEdit's case ridiculously configurable encoding support). sgrep would need SlickEdit to provide a precalculated list of files with the encoding information and it would need to be enhanced to support active code page and Utf-8 searches like SlickEdit. Writting out the list of files and passing them to sgrep would cost some. The alternative is to have SlickEdit pass a ton of config information to sgrep and let it do the encoding determination work but I don't like this idea.

As some point, we should give this a go.

greggman

  • Senior Community Member
  • Posts: 280
  • Hero Points: 14
Re: search speed vs grep
« Reply #6 on: June 26, 2012, 04:15:51 PM »
Another idea would be to default to fast search 'sgrep' and give an option for handling all those slower harder cases.

As a programmer, most of the time I'm searching code for function names or variable names that are in the 0x20 to 0x7F range. I'd be happy to have to pick a different option to search out of that range if it meant search was faster for the more common case.

chrisant

  • Senior Community Member
  • Posts: 1410
  • Hero Points: 131
Re: search speed vs grep
« Reply #7 on: June 26, 2012, 05:53:05 PM »
Another idea would be to default to fast search 'sgrep' and give an option for handling all those slower harder cases.

As a programmer, most of the time I'm searching code for function names or variable names that are in the 0x20 to 0x7F range. I'd be happy to have to pick a different option to search out of that range if it meant search was faster for the more common case.
Yes.

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6882
  • Hero Points: 530
Re: search speed vs grep
« Reply #8 on: June 26, 2012, 07:34:39 PM »
We will have to see how fast we can calculate encodings. If it takes to long, a different interface will be necessary. I hope it's possible with one option, its so much more user freindly.