I have to say, I really am impressed with how effectively the tagging engine works. I've been running some stress tests to see just what VSE and the tagging engine is capable of, but I do have a couple of questions with what it is actually doing.
My goal in these efforts is to troubleshoot a problem in a project. The project is around 2300 source files and taking up to three hours to complete the tagging process. I'm in the dark about what is contained in those files, but I imagine it can't be worse than what I'm putting the tagging engine through. I just completed a run of 10,000 files with about 10 elements (to be tagged) in each file. It took less than a minute. Now I'm in the middle of running 100,000 files with 20 elements each. 16,000,000 lines of source code. Admittedly, it is taking a might bit longer, but it's still going to finish in under 20 minutes (at it's current rate).
What I'm interested in knowing is what exactly the Tagging Cache does. It was my theory that a larger cache size would vastly increase performance on largish tagging runs, but I've been running the stress tests on a cache size of 512KB (rather than the default 32MB) with little to no noticeable difference. I have not tried the 100,000 files on the larger cache, but I don't expect to see a large change in time. What is the purpose of the Tagging Cache? Is it merely a paging file for if the tagging engine exceeds available memory?
Also, is there any other way to display the contents of a VTG file other than the listvtg tool? If not, does the listvtg tool accept any parameters for output to a file?