Author Topic: Change character encoding after a file has been opened  (Read 4094 times)

Zytoblast

  • Community Member
  • Posts: 6
  • Hero Points: 1
Change character encoding after a file has been opened
« on: January 23, 2010, 08:32:05 pm »
Being able to change character encoding after a file has been loaded or a new buffer created. (Have to work with a lot of text files with no unicode signature.)
« Last Edit: January 23, 2010, 08:35:05 pm by Zytoblast »

dunkers

  • Senior Community Member
  • Posts: 607
  • Hero Points: 28
Re: Change character encoding after a file has been opened
« Reply #1 on: January 26, 2010, 10:04:38 pm »
Quote
Being able to change character encoding after

Not quite what you want, but as a work around what I do is 'save as...' then save over the same filename with the correct encoding option set.

Gary

  • Community Member
  • Posts: 43
  • Hero Points: 5
Re: Change character encoding after a file has been opened
« Reply #2 on: May 13, 2011, 07:16:45 am »
Show encoding used for the current buffer in the status bar.

Glajan

  • Community Member
  • Posts: 44
  • Hero Points: 7
Re: Change character encoding after a file has been opened
« Reply #3 on: May 13, 2011, 11:10:57 am »
Show encoding used for the current buffer in the status bar.

I like this one. I vote for that.

rh

  • Senior Community Member
  • Posts: 126
  • Hero Points: 6
Re: Change character encoding after a file has been opened
« Reply #4 on: May 13, 2011, 03:38:49 pm »
Show encoding used for the current buffer in the status bar.

Agreed.

Please add Language mode too please.

hs2

  • Senior Community Member
  • Posts: 2737
  • Hero Points: 288
Re: Change character encoding after a file has been opened
« Reply #5 on: May 13, 2011, 04:23:12 pm »
Language mode could be determined by Document>Select mode in the rare (?) cases when this is interesting.
Encoding might be different. At least I don't need to check encoding at all and lang.mode very seldom.
I just don't like cluttered status bars and the message area should be long enough.
I hope the SE team finds a way to handle both requirements :)
HS2

rh

  • Senior Community Member
  • Posts: 126
  • Hero Points: 6
Re: Change character encoding after a file has been opened
« Reply #6 on: May 13, 2011, 07:02:20 pm »
I discovered recently that the Language Mode has an effect on how Context Tagging works - or in my case wasn't working. 

Either way, a related feature would be to have the ability to decide what is displayed on the status bar. 

Like this...

hs2

  • Senior Community Member
  • Posts: 2737
  • Hero Points: 288
Re: Change character encoding after a file has been opened
« Reply #7 on: May 13, 2011, 11:27:18 pm »
@rh: I understand your request - I've read your related problem report.
Your proposol sounds good as it provides a fast but less intrusive way to display this kind of (non-permanent) information.
Could be a the way to go.
HS2

RaffoPazzo

  • Community Member
  • Posts: 65
  • Hero Points: 2
Re: Change character encoding after a file has been opened
« Reply #8 on: February 21, 2012, 04:02:34 pm »
Hot change of the charset encoding of the opened buffer.

mikael.lyngvig

  • Community Member
  • Posts: 17
  • Hero Points: 0
Re: Change character encoding after a file has been opened
« Reply #9 on: February 13, 2016, 07:58:13 am »
We would be very interested in what features and/or improvements you would like to see in SlickEdit? 

I'm a long-time user of SlickEdit (6 years?  8 years? can't remember) and I am almost at the point of giving up SlickEdit. 

My main issue is that I feel that the SlickEdit support of BOM-less UTF-8 files is rather poor.  Personally, I prefer using UTF-8 BOMs but languages like PHP choke on these (indirectly, by outputting them to the website client, which confuse some clients) so I have to avoid using BOMs.  Unfortunately, my experience is that SlickEdit doesn't really try to figure out the encoding of input files.  It should be so simple to figure out that a given file is a UTF-8 encoded file, but SlickEdit chokes and displays garbage characters if there's no BOM.

I know that SlickEdit has a "remember encoding" feature, but this is worthless to me.  I use WebDrive all the time (a tool that maps an FTP site as a drive letter on your Windows box) and therefore SlickEdit cannot remember the encoding as all sorts of different FTP sites are being mapped to the same drive letter on my Windows box.  Besides, I find the idea of an editor remembering the encoding of a file to be a hack compared to the sane solution that the editor recognizes the encoding automatically.  My life's too short to tell SlickEdit that this and that file is UTF-8, this and that file is ASCII, and so on.  I know that it is non-trivial to automatically identify all of the plethora of encodings that exist, but UTF-8 is a must and the editor really should be able to handle this case easily by looking at 8-bit values and see if they decode properly.

BOM-less UTF-8 is becoming more frequently used all the time so SlickEdit really should and must support this very well!

Also, I'd love the buffer window to display the encoding of the files in the buffer list.  My previous proposal to have SlickEdit display the encoding on the status line seems to have ended in /dev/null so now I'd like to propose that the buffer list shows the used encoding.

Cheers,
Mikael

Graeme

  • Senior Community Member
  • Posts: 2336
  • Hero Points: 307
Re: Change character encoding after a file has been opened
« Reply #10 on: February 13, 2016, 12:18:47 pm »
Quote
It should be so simple to figure out that a given file is a UTF-8 encoded file, but SlickEdit chokes and displays garbage characters if there's no BOM.]It should be so simple to figure out that a given file is a UTF-8 encoded file, but SlickEdit chokes and displays garbage characters if there's no BOM.

What version of slickedit are you using and what operating system?
Can you post an example of a small UTF-8 file that slickedit displays as garbage?

mikael.lyngvig

  • Community Member
  • Posts: 17
  • Hero Points: 0
Re: Change character encoding after a file has been opened
« Reply #11 on: February 13, 2016, 12:34:15 pm »
You can try with a file like this and see what I mean:

   (wget) http://www.columbia.edu/~fdc/utf8/

Obviously, many of the examples in this text won't work due to font issues, but even Danish characters are displayed incorrectly:

    Dansk / Danish: Jeg kan spise glas, det gør ikke ondt pÃ¥ mig.

Which should be:

    Dansk / Danish: Jeg kan spise glas, det gør ikke ondt på mig.

I'm using SlickEdit Pro 2015 (v20.0.1.3 64-bit) on Windows 10 x64.

What I am asking for is that SlickEdit does something like this (obviously only works for small source files, but most source files are small):

    Read source file into memory buffer.

    Scan memory buffer for 8-bit characters.
    If no 8-bit characters are found:
        return ENCODING_ASCII

    try:
         Decode memory buffer as UTF-8.
         return ENCODING_UTF8
    catch DecoderError:
         return ENCODING_8BIT_UNKNOWN

Basically, I just want SlickEdit to automatically handle UTF-8 files as UTF-8 whether or not a BOM is present.  This should be really simple to implement as it is simply a matter of trying to decode the file as UTF-8.  If this succeeds, it is probably UTF-8.  Otherwise it is some arcane code page that you guys know better than me how to handle.

P.S. I just tried manually adding a UTF-8 BOM to the index.html file from above and then everything works.  So the problem is simply that SlickEdit does not handle BOM-less UTF-8 files well.

Cheers,
Mikael
« Last Edit: February 13, 2016, 12:41:59 pm by mikael.lyngvig »

Graeme

  • Senior Community Member
  • Posts: 2336
  • Hero Points: 307
Re: Change character encoding after a file has been opened
« Reply #12 on: February 13, 2016, 09:41:44 pm »
Quote
Obviously, many of the examples in this text won't work due to font issues, but even Danish characters are displayed incorrectly:

    Dansk / Danish: Jeg kan spise glas, det gør ikke ondt pÃ¥ mig.

Which should be:

    Dansk / Danish: Jeg kan spise glas, det gør ikke ondt på mig.

I'm using SlickEdit Pro 2015 (v20.0.1.3 64-bit) on Windows 10 x64.

That line displays correctly for me in slick V20.0.1.3 on Windows 7 (and in notepad++).   When I click save as, slick shows the encoding as "UTF-8 no signature".  What setting do you have for "Encoding" in Options -> File Options -> Load.  I have "auto unicode 2".

mikael.lyngvig

  • Community Member
  • Posts: 17
  • Hero Points: 0
Re: Change character encoding after a file has been opened
« Reply #13 on: February 13, 2016, 10:01:16 pm »
Thanks for your quick reply :-)

I had it set at "auto unicode", changed it to "auto unicode2", closed all open windows, exited the editor, renamed the original input file to some other name (to bypass the remembered encoding, if any), and the problem still persists.

Just to be sure a hotfix hadn't changed something, I downloaded the most recent hotfix as of today, applied it, and did the above procedure one more time (also renaming the file). 

The problem persists.  I am perplexed...  I generally have the philosophy that I touch as few SlickEdit settings as possible, because it is a nightmare to get things right once they've gone wrong, so I don't think I've messed with some setting that breaks my installation.  My setup is quite old, though, as I have a few custom languages and have exported and imported the same setup since SlickEdit 17 or so.

I just tried changing the encoding to UTF-8, in Options -> File Options -> Load.  I guess I could live with forced UTF-8 encoding (without BOM) as this should be backwards compatible with my thousands of ASCII files.  But, alas, this doesn't work either.

I don't think it is a Windows 10 issue as I had the same issue under Windows 8.1.  Could it be a codepage issue?  I live in Denmark so I my active code page is 850, according to the "chcp" command.

The only other thing I can think of is that I always open SlickEdit using a DOSKEY macro, never from the Start Menu.  I don't know if this impacts some environment settings or SlickEdit's way of detecting the current code page, which again confuses SlickEdit somehow.

Is there some way I can reset all options, just to try it out?  I only seem capable of finding a way of importing already saved options.

P.S. I just installed Notepad++ and tried with it and it works as expected: The BOM-less UTF-8 file is recognized as UTF-8.


Cheers,
Mikael
« Last Edit: February 13, 2016, 10:04:31 pm by mikael.lyngvig »

Graeme

  • Senior Community Member
  • Posts: 2336
  • Hero Points: 307
Re: Change character encoding after a file has been opened
« Reply #14 on: February 13, 2016, 10:23:00 pm »
To test with a default config, start slickedit using vs.exe +new -sc some-empty-folder.
chcp gives me 850 too.
You could use the following macro to show the encoding.  If it helps, some time I could probably make you a toolbar that shows the encoding all the time.

Code: [Select]
_command void show_encoding() name_info(',')
{
   _str xline = '';
   bufId := _mdi.p_child;
   if (bufId && !_no_child_windows()) {
      xline = _EncodingToName(bufId.p_encoding) :+ " " bufId.p_encoding;
   } else {
      xline = 'No file open';
   }
   message(xline );
}