Author Topic: BackupHistory performance  (Read 3807 times)

jporkkahtc

  • Senior Community Member
  • Posts: 2620
  • Hero Points: 210
  • Text
BackupHistory performance
« on: February 24, 2017, 10:13:55 PM »
When set to use BackupHistory on save...

#1: I set MaxBackupSize to 75MB
This makes saving large files *really* slow - mostly because Slick does a DIFF operation on two really big files ... diff is pretty slow.

In the case of large files, Slick could keep history as N full-copies of the files and not do any diffing.
    VSDelta\c\temp\LargeFile.txt.vsdiff.1
    VSDelta\c\temp\LargeFile.txt.vsdiff.2
    VSDelta\c\temp\LargeFile.txt.vsdiff.3
    VSDelta\c\temp\LargeFile.txt.vsdiff.4

I have plenty of disk space to burn - its cheaper than my time - so please, use the extra space!

#2: In Tools > Options > File Options > Backup, it should 2 sets of options: How to backup large files and how to backup small files.

Then, I could choose History for small files, and CreateBackupFileOnFirstSave for large files.

Also, for small files I could choose "Use VSDelta" versions, and large files I could choose "Number of full-copies to keep for backups"


#3:
If the file is too big (max size to backup) then slick doesn't do a safe-save or keep any extra copies of the large file around: https://community.slickedit.com/index.php/topic,14136.msg55516.html#msg55516

When not keeping history on a file, Slick ought to do a safe-save when possible to avoid dataloss on error during saving. (In the extremely unlikely case of out-of-disk space, prompt to user to delete the backup and continue saving).

#4: Maybe if BackupOnSave had the option to keep multiple full copies, then integrating File history and Multifile Undo would get a little easier :-)


Graeme

  • Senior Community Member
  • Posts: 2796
  • Hero Points: 347
Re: BackupHistory performance
« Reply #1 on: February 26, 2017, 08:15:46 AM »
If the long save time gets too annoying you could try a custom version of save_file.  The code below is untested.  It adds the +O switch to the call to _save_file (as does the "build_save_options" function in saveload.e) to prevent any backup.  After the file has been saved it makes a copy into a child folder and cascades the files  _1 _2 _3.  Unfortunately I can't find a rename function in slick so it has to do multiple file copy commands.  You could write a macro that diffed the current buffer with one of the cascaded files.  In my slickedit backup options I have backup path set to ./SlickBackups/ and it puts all my delta files in the SlickBackups child folder of the current buffer.

Instead of cascading the files you could save the file to some other folder and use git auto commit to backup the file in the "background", though if slick has problems with the diff, git probably will too.  At work I use a custom version of save_file to make a backup to another drive.

Code: [Select]
/**
 * Writes current buffer to filename.  This function is a hook function
 * that the user may replace.  Options allowed by <b>_save_file</b>
 * built-in may be specified.
 * @param filename parameter should not contain options.
 *
 * @appliesTo Edit_Window
 *
 * @categories File_Functions
 *
 */
_str save_file(_str filename,_str options)
{
#if 0
   int renumber_flags=numbering_options();
   if (renumber_flags&VSRENUMBER_AUTO) {
      if (renumber_flags&VSRENUMBER_COBOL) {
         renumber_lines(1,6,'0',false,true);
      }
      if (renumber_flags&VSRENUMBER_STD) {
         renumber_lines(73,80,'0',false,true);
      }
   }
#endif

   typeless status;
   boolean big_file = false;
   if ( p_buf_size > 10000 * 1000 ) {
      big_file = true;
      // +O means overwrite with no backup
      status = _save_file(options :+ ' +O ' maybe_quote_filename(filename));
   }
   else {
      status = _save_file(options " "maybe_quote_filename(filename));
   }

   if (!status && file_eq(strip(filename,'B','"'),p_buf_name)) {
      if (p_modified_temp_name!='') {
         _as_removefilename(p_modified_temp_name,true);
         p_modified_temp_name='';
      }
      //_cbsave_filewatch();
#if 1
      call_list('_cbsave_');
      //10:51am 7/3/1997
      //Dan modified for auto-tagging
      if (def_autotag_flags2&AUTOTAG_ON_SAVE) {
         //messageNwait(nls('got here'));
         TagFileOnSave();
      }
#endif
   }

   if ( big_file ) {
      _str bpath = strip_filename(filename, 'N') :+ FILESEP :+ 'SlickBackups' :+ FILESEP;
      _str fname = strip_filename(filename, 'PDE');  // name without extension
      _str ext = get_extension(filename, true);  // ext with dot
      int res = 0;
      if (!path_exists(bpath)) {
         res = make_path(bpath);
      }
      if ( res == 0 ) {
         _str ff1 = bpath :+ fname :+ '_1' :+ ext;
         _str ff2 = bpath :+ fname :+ '_2' :+ ext;
         _str ff3 = bpath :+ fname :+ '_3' :+ ext;
         if ( file_exists(ff1) ) {
            if ( file_exists(ff2) ) {
               if ( file_exists(ff3) ) {
                  delete_file(ff3);
               }
               // prefer to rename ff2 as ff3 but there's no rename function available
               copy_file(ff2,ff3);
               delete_file(ff2);
            }
            // prefer to rename f1 as f2 but there's no rename function available
            copy_file(ff1, ff2);
            delete_file(ff1);
         }
         copy_file(filename, ff1);
      }
   }
   return(status);
}

jporkkahtc

  • Senior Community Member
  • Posts: 2620
  • Hero Points: 210
  • Text
Re: BackupHistory performance
« Reply #2 on: February 26, 2017, 09:06:16 PM »
What about _file_move() ?
Seems to work, except for case of filename to ff1 (presumably because that file is being held open by Slickedit).
« Last Edit: February 26, 2017, 09:16:14 PM by jporkkahtc »

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6862
  • Hero Points: 528
Re: BackupHistory performance
« Reply #3 on: February 26, 2017, 11:44:20 PM »
Different options for large backup files would be nice. I'd still like SlickEdit to attempt a poor mans diff (just skip leading and trailing matching lines). I like having more backups.

I'm pretty sure the save corruption with the beta Joe ran into was an optimization bug that we had to pull. Sorry about that one.

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6862
  • Hero Points: 528
Re: BackupHistory performance
« Reply #4 on: February 26, 2017, 11:52:37 PM »
I thought there was a timeout in backup history to prevent the diff from taking a long time. Maybe it needs to be optimized more or tweaked.

jporkkahtc

  • Senior Community Member
  • Posts: 2620
  • Hero Points: 210
  • Text
Re: BackupHistory performance
« Reply #5 on: February 27, 2017, 06:47:20 PM »
The VSDELTA format is also pretty inefficient.

It stores
MostRecentVersion
Version0
Version0
...
VersionN-1

Since the most recent version of the file is stored at the top the entire file must be read into memory and rewritten each time a new version is created.

The MostRecentVersion should be stored last - then when writing VersionN to the file, only VersionN-1 and VersionN need to be rewritten, and the entire file does not need to be read into memory.

Also, XML isn't great for this.

Storing an index at the end of the file would allow Slick to quickly get what it needs for displaying backup history, BackupHistoryBrowser and "Rebuild Save Log" without having to read the entire contents of each of the VSDelta files.

The very end of the file would look like:
/// Index
Version0=535, Comment="Comment Text"
Version1=800, Comment="Comment Text"
Version2=1000, Comment="Comment Text"

IndexAt=1500...................


Where the last 32 bytes of the file would always be "IndexAt...".
Slick could read the last 32 bytes, from that it knows exactly where the index is and and read it.
Except for actually recreating an old version of the file, no other data would need to be read.




For handling large files, binary files and XML files, the VSDelta could include the filename of the file with the text, instead of including the text in the VSDelta file itself. That is, instead of tags like "<Insert>", have a tag "<FullText>{DeltaRoot}\C\temp\BigFile.txt_1</FullText>". (Placing XML files separately would avoid processing the text to escape things that would confuse the XML parser).

This would allow integration with MultifileReplaceUndo work as well.

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6862
  • Hero Points: 528
Re: BackupHistory performance
« Reply #6 on: February 27, 2017, 07:20:46 PM »
Thanks Joe. We already realized this could be optimized more by putting the MostRecentVersion at the bottom. For handling many entries for large files, there would need to be an index at the bottom with seek positions (good idea!).

I don't think it's necessary to store the file data externally. The file IO is much more expensive than the required XML translation.

Not sure if we will have time to change this for v22 but I hope we do. Backup History is really important and it would be great for it to support much larger files.

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6862
  • Hero Points: 528
Re: BackupHistory performance
« Reply #7 on: February 27, 2017, 11:44:36 PM »
The above idea has a bad flaw. When an item gets deleted because you've reached the maximum (default is 400 backups/deltas), the entire file (which could be large), gets moved around.

A better solution that is simpler to implement is to store the Delta/MostRecent externally if the data is above a user configurable size (maybe 20k by default). While you could always store the Delta/MostRecent data externally, you will get quite a bit of fragmentation due to the number of small files. Simple store the small Delta/MostRecent data inline. This scheme has many advantages. No index is needed because the .vsdelta file won't get large. It's very backwards compatible. Existing .vsdelta files don't have to be changed at all. This scheme could be extended to backup files that should not be loaded into memory. Right now, the code does all file differencing in memory.