Author Topic: Prevent insertion of UTF8 BOM (signature) when creating new file?  (Read 3128 times)

Brandon

  • Community Member
  • Posts: 36
  • Hero Points: 5
I work frequently with utf8 encoded files, and as such have found it easier for my workflow to set VSE to treat all loaded files as utf8.  This is achieved by setting the Tools->Options->File options -> Load tab to UTF8 encoding.

This is needed because many of the files I work with do not have a BOM (byte order mark, aka signature) and it was too aggravating to remember to manually set the encoding when loading them.

The problem here is that when I create new files I use the Edit command to type a new name.  This is effectively the same as loading a blank file, and VSE is automatically inserting a BOM at the beginning of the file.  I do not want this to happen, because my codebase is shared amongst other developers with different platforms, editors, and utilities that do not know what a BOM is, and frankly some people think that the BOM is random junk and delete it anyway.  I have also seen it filtered out by source control systems, or trick them into thinking the file is binary instead of text.

This behavior seems to be tied to the Load/Encoding options I just mentioned.  I need to find the macro source to remove this behavior but have not been able to do so.

If you look at the macros/setupext.e file, there is a large commented table showing the various encoding options.  Interestingly, there is an option for UTF-8, no signature, but it does not show up in the available list.  I cannot find the "real" list, perhaps it is coded into vsapi.dll.




hs2

  • Senior Community Member
  • Posts: 2734
  • Hero Points: 284
Re: Prevent insertion of UTF8 BOM (signature) when creating new file?
« Reply #1 on: November 08, 2007, 07:51:39 pm »
This should work to create non-BOM utf8 files e +FUTF8 <new file>
Another poss. could be to setup VSLICKSAVE in your <VSLICKCONFIG>/'vslick.ini file like that
VSLICKSAVE=\ +FUTF8 (with that it should apply to all drives)
or to add '+FUTF8' to the 'def_save_options' (set-var def_save_options)

HS2

Brandon

  • Community Member
  • Posts: 36
  • Hero Points: 5
Re: Prevent insertion of UTF8 BOM (signature) when creating new file?
« Reply #2 on: November 08, 2007, 11:23:48 pm »
Yeah, I am familiar with the +futf8 switch but unfortunately these suggestions do not fit into my workflow.

The alteration of the def_save_options var works (since the BOM seems to get inserted on the save operation), but doing so overrides any encoding I may need to set in the "save as" dialog.

It would seem that this particular problem is the result of a specific method of creating new files (with the open dialog), because the New command does allow you to choose UTF8 with no signature.   

What I need is a configuration option (or a hack) to set the default encoding for new files only.  This setting would pop up as the default in the combo box whenever the New dialog was used, and would also be used whenever a new file is created via code or via the Open dialog.  Would be nice to have this somewhere in the File options config.  Obviously, it is not logical to have the additional options of "UTF8 with or without signature" with respect to the Load options tab, since this setting is for a read operation.



hs2

  • Senior Community Member
  • Posts: 2734
  • Hero Points: 284
Re: Prevent insertion of UTF8 BOM (signature) when creating new file?
« Reply #3 on: November 09, 2007, 12:44:40 am »
Ok - seems that a patch is required to solve your problem:
files.e - _InitNewFileContents() [line 2817]: (v12.0.3)
Code: [Select]
   // HS2-CHG: patch p_encoding(_set_by_user) for new UTF8+BOM files -> UTF8-BOM files
   // needed when creating new files e.g. via gui-open due to lack of 'UTF8, no signature' encoding setting
   defpenc     := p_encoding;
   defpenc_usr := p_encoding_set_by_user;
   // both props should be patched consistently
   if ( p_encoding == VSENCODING_UTF8_WITH_SIGNATURE ) p_encoding = VSENCODING_UTF8;
   if ( p_encoding_set_by_user == VSENCODING_UTF8_WITH_SIGNATURE ) p_encoding_set_by_user = VSENCODING_UTF8;
   // say ("_InitNewFileContents: defpenc = " defpenc " defpenc_usr = " defpenc_usr " -> p_encoding = " p_encoding " -> p_encoding_set_by_user = " p_encoding_set_by_user);

Alternatively you could add a _buffer_new_' callback (similar to _buffer_add_) to move the patch code to your own macro toolbox:
Code: [Select]
   call_list('_buffer_new_',p_buf_id,p_buf_name,p_encoding,p_encoding_set_by_user);
Code: [Select]
// @see files.e - _InitNewFileContents()
void _buffer_new_utf8_bom ( int bid, _str bname, int penc, int penc_usr )
{
   // both props should be patched consistently
   if ( penc == VSENCODING_UTF8_WITH_SIGNATURE ) p_encoding = VSENCODING_UTF8;
   if ( penc_usr == VSENCODING_UTF8_WITH_SIGNATURE ) p_encoding_set_by_user = VSENCODING_UTF8;
   // say ("_buffer_new_utf8_bom: penc = " penc " penc_usr = " penc_usr " -> p_encoding = " p_encoding " -> p_encoding_set_by_user = " p_encoding_set_by_user);
}

HS2

Edit: Removed 'bflags' arg from call back since 'call_list' supports only 4 args.
« Last Edit: November 09, 2007, 01:20:35 am by hs2 »

hs2

  • Senior Community Member
  • Posts: 2734
  • Hero Points: 284
Re: Prevent insertion of UTF8 BOM (signature) when creating new file?
« Reply #4 on: November 09, 2007, 09:11:54 am »
With a bit guess work in a custom '_buffer_add_' callback you could avoid patching the product sources:
example: (if a buffer just contains the initial line ending (1-2 bytes) it's guessed that it's a new one)
Code: [Select]
void _buffer_add_utf8_bom( int bid, _str bname, int bflags )
{
   if ( p_RBufSize <= 2 )  // just initial line ending -> seems to be a new buffer...
   {
      // both props should be patched consistently
      if ( p_encoding == VSENCODING_UTF8_WITH_SIGNATURE ) p_encoding = VSENCODING_UTF8;
      if ( p_encoding_set_by_user == VSENCODING_UTF8_WITH_SIGNATURE ) p_encoding_set_by_user = VSENCODING_UTF8;
   }
}

Good luck,
HS2