SlickEdit Community

SlickEdit Product Discussion => SlickEdit® => Features and/or Improvements => Topic started by: Zytoblast on January 23, 2010, 08:32:05 pm

Title: Change character encoding after a file has been opened
Post by: Zytoblast on January 23, 2010, 08:32:05 pm
Being able to change character encoding after a file has been loaded or a new buffer created. (Have to work with a lot of text files with no unicode signature.)
Title: Re: Change character encoding after a file has been opened
Post by: dunkers on January 26, 2010, 10:04:38 pm
Quote
Being able to change character encoding after

Not quite what you want, but as a work around what I do is 'save as...' then save over the same filename with the correct encoding option set.
Title: Re: Change character encoding after a file has been opened
Post by: Gary on May 13, 2011, 07:16:45 am
Show encoding used for the current buffer in the status bar.
Title: Re: Change character encoding after a file has been opened
Post by: Glajan on May 13, 2011, 11:10:57 am
Show encoding used for the current buffer in the status bar.

I like this one. I vote for that.
Title: Re: Change character encoding after a file has been opened
Post by: rh on May 13, 2011, 03:38:49 pm
Show encoding used for the current buffer in the status bar.

Agreed.

Please add Language mode too please.
Title: Re: Change character encoding after a file has been opened
Post by: hs2 on May 13, 2011, 04:23:12 pm
Language mode could be determined by Document>Select mode in the rare (?) cases when this is interesting.
Encoding might be different. At least I don't need to check encoding at all and lang.mode very seldom.
I just don't like cluttered status bars and the message area should be long enough.
I hope the SE team finds a way to handle both requirements :)
HS2
Title: Re: Change character encoding after a file has been opened
Post by: rh on May 13, 2011, 07:02:20 pm
I discovered recently that the Language Mode has an effect on how Context Tagging works - or in my case wasn't working. 

Either way, a related feature would be to have the ability to decide what is displayed on the status bar. 

Like this...
Title: Re: Change character encoding after a file has been opened
Post by: hs2 on May 13, 2011, 11:27:18 pm
@rh: I understand your request - I've read your related problem report.
Your proposol sounds good as it provides a fast but less intrusive way to display this kind of (non-permanent) information.
Could be a the way to go.
HS2
Title: Re: Change character encoding after a file has been opened
Post by: RaffoPazzo on February 21, 2012, 04:02:34 pm
Hot change of the charset encoding of the opened buffer.
Title: Re: Change character encoding after a file has been opened
Post by: mikael.lyngvig on February 13, 2016, 07:58:13 am
We would be very interested in what features and/or improvements you would like to see in SlickEdit? 

I'm a long-time user of SlickEdit (6 years?  8 years? can't remember) and I am almost at the point of giving up SlickEdit. 

My main issue is that I feel that the SlickEdit support of BOM-less UTF-8 files is rather poor.  Personally, I prefer using UTF-8 BOMs but languages like PHP choke on these (indirectly, by outputting them to the website client, which confuse some clients) so I have to avoid using BOMs.  Unfortunately, my experience is that SlickEdit doesn't really try to figure out the encoding of input files.  It should be so simple to figure out that a given file is a UTF-8 encoded file, but SlickEdit chokes and displays garbage characters if there's no BOM.

I know that SlickEdit has a "remember encoding" feature, but this is worthless to me.  I use WebDrive all the time (a tool that maps an FTP site as a drive letter on your Windows box) and therefore SlickEdit cannot remember the encoding as all sorts of different FTP sites are being mapped to the same drive letter on my Windows box.  Besides, I find the idea of an editor remembering the encoding of a file to be a hack compared to the sane solution that the editor recognizes the encoding automatically.  My life's too short to tell SlickEdit that this and that file is UTF-8, this and that file is ASCII, and so on.  I know that it is non-trivial to automatically identify all of the plethora of encodings that exist, but UTF-8 is a must and the editor really should be able to handle this case easily by looking at 8-bit values and see if they decode properly.

BOM-less UTF-8 is becoming more frequently used all the time so SlickEdit really should and must support this very well!

Also, I'd love the buffer window to display the encoding of the files in the buffer list.  My previous proposal to have SlickEdit display the encoding on the status line seems to have ended in /dev/null so now I'd like to propose that the buffer list shows the used encoding.

Cheers,
Mikael
Title: Re: Change character encoding after a file has been opened
Post by: Graeme on February 13, 2016, 12:18:47 pm
Quote
It should be so simple to figure out that a given file is a UTF-8 encoded file, but SlickEdit chokes and displays garbage characters if there's no BOM.]It should be so simple to figure out that a given file is a UTF-8 encoded file, but SlickEdit chokes and displays garbage characters if there's no BOM.

What version of slickedit are you using and what operating system?
Can you post an example of a small UTF-8 file that slickedit displays as garbage?
Title: Re: Change character encoding after a file has been opened
Post by: mikael.lyngvig on February 13, 2016, 12:34:15 pm
You can try with a file like this and see what I mean:

   (wget) http://www.columbia.edu/~fdc/utf8/

Obviously, many of the examples in this text won't work due to font issues, but even Danish characters are displayed incorrectly:

    Dansk / Danish: Jeg kan spise glas, det gør ikke ondt pÃ¥ mig.

Which should be:

    Dansk / Danish: Jeg kan spise glas, det gør ikke ondt på mig.

I'm using SlickEdit Pro 2015 (v20.0.1.3 64-bit) on Windows 10 x64.

What I am asking for is that SlickEdit does something like this (obviously only works for small source files, but most source files are small):

    Read source file into memory buffer.

    Scan memory buffer for 8-bit characters.
    If no 8-bit characters are found:
        return ENCODING_ASCII

    try:
         Decode memory buffer as UTF-8.
         return ENCODING_UTF8
    catch DecoderError:
         return ENCODING_8BIT_UNKNOWN

Basically, I just want SlickEdit to automatically handle UTF-8 files as UTF-8 whether or not a BOM is present.  This should be really simple to implement as it is simply a matter of trying to decode the file as UTF-8.  If this succeeds, it is probably UTF-8.  Otherwise it is some arcane code page that you guys know better than me how to handle.

P.S. I just tried manually adding a UTF-8 BOM to the index.html file from above and then everything works.  So the problem is simply that SlickEdit does not handle BOM-less UTF-8 files well.

Cheers,
Mikael
Title: Re: Change character encoding after a file has been opened
Post by: Graeme on February 13, 2016, 09:41:44 pm
Quote
Obviously, many of the examples in this text won't work due to font issues, but even Danish characters are displayed incorrectly:

    Dansk / Danish: Jeg kan spise glas, det gør ikke ondt pÃ¥ mig.

Which should be:

    Dansk / Danish: Jeg kan spise glas, det gør ikke ondt på mig.

I'm using SlickEdit Pro 2015 (v20.0.1.3 64-bit) on Windows 10 x64.

That line displays correctly for me in slick V20.0.1.3 on Windows 7 (and in notepad++).   When I click save as, slick shows the encoding as "UTF-8 no signature".  What setting do you have for "Encoding" in Options -> File Options -> Load.  I have "auto unicode 2".
Title: Re: Change character encoding after a file has been opened
Post by: mikael.lyngvig on February 13, 2016, 10:01:16 pm
Thanks for your quick reply :-)

I had it set at "auto unicode", changed it to "auto unicode2", closed all open windows, exited the editor, renamed the original input file to some other name (to bypass the remembered encoding, if any), and the problem still persists.

Just to be sure a hotfix hadn't changed something, I downloaded the most recent hotfix as of today, applied it, and did the above procedure one more time (also renaming the file). 

The problem persists.  I am perplexed...  I generally have the philosophy that I touch as few SlickEdit settings as possible, because it is a nightmare to get things right once they've gone wrong, so I don't think I've messed with some setting that breaks my installation.  My setup is quite old, though, as I have a few custom languages and have exported and imported the same setup since SlickEdit 17 or so.

I just tried changing the encoding to UTF-8, in Options -> File Options -> Load.  I guess I could live with forced UTF-8 encoding (without BOM) as this should be backwards compatible with my thousands of ASCII files.  But, alas, this doesn't work either.

I don't think it is a Windows 10 issue as I had the same issue under Windows 8.1.  Could it be a codepage issue?  I live in Denmark so I my active code page is 850, according to the "chcp" command.

The only other thing I can think of is that I always open SlickEdit using a DOSKEY macro, never from the Start Menu.  I don't know if this impacts some environment settings or SlickEdit's way of detecting the current code page, which again confuses SlickEdit somehow.

Is there some way I can reset all options, just to try it out?  I only seem capable of finding a way of importing already saved options.

P.S. I just installed Notepad++ and tried with it and it works as expected: The BOM-less UTF-8 file is recognized as UTF-8.


Cheers,
Mikael
Title: Re: Change character encoding after a file has been opened
Post by: Graeme on February 13, 2016, 10:23:00 pm
To test with a default config, start slickedit using vs.exe +new -sc some-empty-folder.
chcp gives me 850 too.
You could use the following macro to show the encoding.  If it helps, some time I could probably make you a toolbar that shows the encoding all the time.

Code: [Select]
_command void show_encoding() name_info(',')
{
   _str xline = '';
   bufId := _mdi.p_child;
   if (bufId && !_no_child_windows()) {
      xline = _EncodingToName(bufId.p_encoding) :+ " " bufId.p_encoding;
   } else {
      xline = 'No file open';
   }
   message(xline );
}

Title: Re: Change character encoding after a file has been opened
Post by: mikael.lyngvig on February 13, 2016, 10:36:48 pm
I resorted to uninstalling SlickEdit, erasing the SlickEdit files in Documents, and re-apply the latest hotfix.

Still no change...  UTF-8 Files without a BOM are garbled.

The only change that came of reinstalling SlickEdit is that encoding "auto unicode" was set to "auto unicode2" without me doing anything.  So I guess the old value was imported from an ancient configuration and never upgraded to the new unicode2 method.

I think I should stop spamming this forum with what appears to be a support issue; SlickEdit UTF-8 with no BOM works for you and so it should for me.  The mere fact that it works for you makes me a very happy person because I've for years thought that SlickEdit couldn't figure out how to handle BOM-less UTF-8 files and instead it turns out that it is some bizarre support issue I'm facing.

Thanks for your help so far!
Title: Re: Change character encoding after a file has been opened
Post by: Graeme on February 13, 2016, 11:19:57 pm
Try searching on google for pÃ¥  - a few of the results show på  - maybe that gives a clue as to what slickedit is doing.  What font are you using?

If I turn on "line hex" in the view menu I get the incorrect version that you're getting.
Title: Re: Change character encoding after a file has been opened
Post by: mikael.lyngvig on February 13, 2016, 11:22:53 pm
I'm using Default Unicode font as I just reinstalled everything.

I've filed a support request on this issue and hope that the support crew can help me (they usually can).
Title: Re: Change character encoding after a file has been opened
Post by: Clark on February 14, 2016, 12:24:26 am
There's a bug in the load options encoding list. It's supposed to list UTF8, no signature. There is a workaround.

Go ahead and choose UTF8 for File>load options>encoding. Now use Macro>Set macro variable... and change +utf8s in the variable def_load_options to +futf8. The only downside to doing this is the GUI load options will be lost and can't display this setting.
Title: Re: Change character encoding after a file has been opened
Post by: mikael.lyngvig on February 14, 2016, 01:12:18 am
My def_load_options value is (even though I've changed the encoding to UTF-8 in File>Load Options>Encoding):

    +L -LF +LE -S -E +U:32000 -N +BP

If I add "+FUTF8" to this line, exit the editor, and open the document under a new name, it still doesn't work.  Whether or not I use "+new -sc TempDir" on the SlickEdit command line.  The manually added "+FUTF8" value does appear if I inspect def_load_options again.

The only things that work are if I use File>Reload with Encoding>UTF-8 or manually add a BOM.  Then the text displays as it should.

Argh! :-)  I really don't get this...   We're talking about a problem that persists across operating systems, editor versions, and configurations.
Title: Re: Change character encoding after a file has been opened
Post by: Clark on February 14, 2016, 01:35:56 am
I don't know if the +futf8 must be lower case. Try lower case.
Title: Re: Change character encoding after a file has been opened
Post by: mikael.lyngvig on February 14, 2016, 01:37:35 am
Actually, I tried both.  The online help lists it in uppercase, so I ended up using that :-)
Title: Re: Change character encoding after a file has been opened
Post by: Clark on February 14, 2016, 01:45:02 am

i have goofed a bit here though. It turns out this work around is only needs when creating new files. Choosing Utf-8 in File>load options>encoding should recognize existing Utf-8 files with or without a signature.
Title: Re: Change character encoding after a file has been opened
Post by: mikael.lyngvig on February 14, 2016, 01:58:52 am
Yes, but it doesn't in my end, unfortunately.  I am grateful for all your help, but so far it hasn't changed anything.

I have absolutely no idea of what's going on here.  I have submitted a support case - do you want me to continue in this thread or to work with Support to resolve the issue?

I rename the file every time, before I open it, to a new unique name so as to not trigger SlickEdit's built-in memory of file encodings.  I have tried both with and without customized settings, even a clean install.  Nothing works.  And it hasn't for years, which is why I have been looking for another option recently because it is getting to be really annoying that I can't get SlickEdit to work for me.

Obviously, I'd prefer to get SlickEdit to work but I am ready to switch to another editor, if this issue cannot be resolved.  I work almost full-time in UTF-8 encoded PHP files containing Danish national characters so this is a major hassle for me.  Especially because the UTF-8 BOM breaks some web clients and makes the the XHTML/1.0 validator fail (the whole point of XHTML/1.0 Strict is that it can be validated sensibly).

Also, when you switch a lot between ASCII and UTF-8 files, it is a tedious hassle to have to use File>Save As to determine the encoding.  This should be shown prominently on the status line or in the buffer list (I use the BRIEF emulation).  I have suggested this to support long ago but nothing has happened.

Graeme reported this:

    "If I turn on "line hex" in the view menu I get the incorrect version that you're getting."

So there must be some issue somewhere in there, which I am running into all the time.

A final note: I never use SlickEdit workspaces, I always open and close my files on the fly from a console window.
Title: Re: Change character encoding after a file has been opened
Post by: mikael.lyngvig on February 14, 2016, 02:08:35 am
i have goofed a bit here though. It turns out this work around is only needs when creating new files. Choosing Utf-8 in File>load options>encoding should recognize existing Utf-8 files with or without a signature.

I just tried this:

    vs.exe +FUTF8 index708.html

And it WORKED.  I suppose that amounts to manually specifying the encoding.  I'll change my DOSKEY alias for the time being to explicitly specify the +FUTF8 option and hopefully we can resolve this sometime.
Title: Re: Change character encoding after a file has been opened
Post by: Clark on February 14, 2016, 02:22:17 am
I didn't read into the fact that you're working .html files. Now that I think about it, the solution I gave you shouldn't work for .html files because the default extension specific encoding is "Auto HTML" which overrides the global encoding setting. Go to  Tools>Options>Languages>File Extension Manager and select the "html" extension. Change the encoding from "Auto HTML" to "Utf-8".

I was about to give up and have you use support (their definitely better than I am due to there customer specific experience).
Title: Re: Change character encoding after a file has been opened
Post by: mikael.lyngvig on February 14, 2016, 02:40:28 am
You have solved a big issue for me, thanks!  Even though I only used an HTML file because it illustrated my issue.  I don't regularly edit HTML files so I'll leave the HTML setting as it is, but you definitely helped me get SlickEdit to work for me. 

In combination with the now default +FUTF8 option (in my DOSKEY macro) in my end, I suspect we're getting to the point that it actually works for all file types: I often edit UTF-8 encoded file types that are unknown to SlickEdit and they also appear to work now (without a BOM).  As far as I know, I should never run into any issues forcing the encoding to UTF-8 as I only use UTF-8 and ASCII encoded files.

Perhaps you can pull some strings and get the encoding to be displayed somewhere on the status line?  Just kidding, but it would be nice :-)
Title: Re: Change character encoding after a file has been opened
Post by: Mike on December 04, 2018, 02:32:37 pm
I noticed this older post and wanted to note that SlickEdit does now show the encoding in the status bar, and clicking that item opens the "Save As" dialog so that you can change the encoding.
Thanks for the suggestion!