Author Topic: Viewing / Changing the Encoding  (Read 6550 times)

greggman

  • Senior Community Member
  • Posts: 268
  • Hero Points: 12
Viewing / Changing the Encoding
« on: September 27, 2006, 06:09:28 am »
This might be in the manual but I didn't see it.

1) Is there a way to display the current encoding of a file?  For example I need to know if the file is utf-8 or shift-jis or euc-jp or Text SBCS/DBSC

2) Is there a way to change the encoding AFTER the file has been loaded?

The problem I have is I use utf-8, shift-jis and euc-jp all the time and slickedit often guesses the encoding wrong.  The best solution would be if it guessed right but short of that, if I could set it after the fact (like I can do in a browser) that would be great.

I know I can set it at load time but that's only if I use the GUI file load dialog but that's not helpful when I'm using something like push-tag and it takes me to a file and guesses the encoding wrong.

srouleau

  • Community Member
  • Posts: 68
  • Hero Points: 4
Re: Viewing / Changing the Encoding
« Reply #1 on: September 27, 2006, 01:01:36 pm »
Same problem here -- I'll open up a file with Slick and have it guess wrong.  When I'd asked tech support about this, they gave me the commandline switch to tell slick what encoding to use, but that's not really what I was looking for.

FWIW, Notepad, of all things, guesses the encoding properly on the same files that throw off Slick. (ie: .reg files on Windows)

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 4035
  • Hero Points: 267
Re: Viewing / Changing the Encoding
« Reply #2 on: September 27, 2006, 01:47:03 pm »
(greggman) At the moment, there isn't a "reopen_as_encoding" command.  Is there anything about these files that SlickEdit can key off (extension maybe)?  Also, once you open a specific file in a specific encoding, SlickEdit will reuse the encoding you opened the file with.

(srouleau) I'm pretty sure the problem you are seeing is with Utf-16 files that do not have a signature.  Go to Tools>Options>File Options>Load Tab and change the encoding from "Auto Unicode" to "Auto Unicode 2".  This allows SlickEdit to look at the contents of the file to determine if it is Utf-16.  This is not the default because SlickEdit is a VERY multi-purpose tool which is often used to edit binary files.  We didn't what flaky behavior by default.


srouleau

  • Community Member
  • Posts: 68
  • Hero Points: 4
Re: Viewing / Changing the Encoding
« Reply #3 on: September 27, 2006, 02:51:47 pm »
"Auto Unicode 2"?   Wow, this is right up there with the "More" tab in the Options dialog.  ;D

Thanks, I'd have never guessed it!  I'll try not to complain when it mistakenly picks the wrong encoding hehe.

(I rarely use Slick to view binary files, because I never could figure out how to search properly in there)

greggman

  • Senior Community Member
  • Posts: 268
  • Hero Points: 12
Re: Viewing / Changing the Encoding
« Reply #4 on: September 27, 2006, 08:19:40 pm »
(greggman) At the moment, there isn't a "reopen_as_encoding" command.  Is there anything about these files that SlickEdit can key off (extension maybe)?  Also, once you open a specific file in a specific encoding, SlickEdit will reuse the encoding you opened the file with.

unfortunately I can't do it by extension. :-(

For my personal files they are usually HTML in utf-8 and slickedit guesses them wrong. It can look inside the file for the encoding in the form of

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

which is part of the HTML standard

For my work files, there is nothing except looking at the bytes in the files. To get specific they are all Playstation 1, PSP, PS2 and PS3 libraries and examples. Depending on the person that wrote them they could be in euc-jp or shift-jis and so even within a single project they can be mixed (library #1 uses shift-jis, lib #2 uses euc-jp, etc..)

For euc-jp, shift-jis, according to this page, it looks like it would be relatively easy to scan the first nnn bytes of the file and make an educated guess which of those 3 encodings it is. Of course that only covers Japanese.  A quick search brought up this

http://search.cpan.org/~dankogai/Encode-2.18/lib/Encode/Guess.pm

and this

http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html

For the C/C++ files I'd be happy if there was a hook I could use to figure out myself at a low enough level that push-tag would work. Of course I'd rather just have you guys fix it but if it's an unsolvable problem or too low priority then a hook would at least let me fix it. Maybe you can point me to the place to hook it in?

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 4035
  • Hero Points: 267
Re: Viewing / Changing the Encoding
« Reply #5 on: September 28, 2006, 03:13:20 pm »
(greggman) SlickEdit is supposed to recognize the:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

in HTML files.  Go to Tools>Options>File Extension Setup>General Tab and make sure the encoding is set to "Auto HTML".   If there is still a problem, post another forum message and attach the HTML file.  SlickEdit does have a couple limitations with this feature.  SlickEdit uses only the first HTML encoding definition.  Also, SlickEdit only looks at the first 4k of the file.

As for detecting code page files like euc-jp and shift-jis, SlickEdit does not have this yet.  Microsoft word seems to do this.

There isn't a hook to detect encoding.  However,  you can modify the "build_load_options" macro.  You could make a copy of this function, modify it, and load your own version.  That way you would never lose your changes.  The last macro loaded with this function wins :-)

greggman

  • Senior Community Member
  • Posts: 268
  • Hero Points: 12
Re: Viewing / Changing the Encoding
« Reply #6 on: September 28, 2006, 05:36:29 pm »
Thank you.

I checked and it's set to Auto HTML.  Is that new?  It appears to work for files that have the tag but not for files missing the tag. Most of the files I'm editing are templates to a website. The top level files have the tags in them but the smaller pieces (left column, right column, etc..) are not complete HTML files and are therefore missing the tag.

So, unfortunately even for HTML files I need either it to try to figure it out when it can't find the character set, or....in my case for HTML I just need it to default to utf-8 when it can't find the tag. Currently it's defaulting to p_encoding=0.  Is there a setting for this as well, ie Auto HTML then fall back to utf-8?

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 4035
  • Hero Points: 267
Re: Viewing / Changing the Encoding
« Reply #7 on: September 28, 2006, 06:55:52 pm »
SlickEdit does not have this.  Auto XML defaults to Utf-8 but this code path does not check for html encoding stuff.  Your best bet is to customize build_load_options.

greggman

  • Senior Community Member
  • Posts: 268
  • Hero Points: 12
Re: Viewing / Changing the Encoding
« Reply #8 on: September 29, 2006, 07:28:01 am »
Thank you.  I'll look into that.

Any chance you could add a hook to version 12 so I don't have to hack build_load_options in the future?