Author Topic: small bug with column indicator in utf-8 coded files  (Read 190 times)

vladie

  • Community Member
  • Posts: 18
  • Hero Points: 0
small bug with column indicator in utf-8 coded files
« on: February 24, 2021, 12:01:02 pm »
hi

column indicator must show cursor position in characters, but instead it's shows it in bytes.
for example, for the end of string "привет" slickedit will show column value 12 (but it must be 6).

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 5702
  • Hero Points: 457
Re: small bug with column indicator in utf-8 coded files
« Reply #1 on: February 24, 2021, 03:26:52 pm »
Right now, the column indicator is in bytes (Utf-8 for Unicode files) with tabs expanded. What the column indicator should be is not obvious. Some compilers still output error message column output exactly like SlickEdit does (especially Unix tools and SlickEdit tools). However, Microsoft tools seem to output Utf16 columns (NOT Utf-32). I don't see Utf-16 as an improvement. I like Utf-32 but there are no tools that use this (at least I haven't seen any).

So here's my question. What are you using the column indicator for? Are you trying to get a character count? If so, why do you need this count (i.e source code string length or matching another tools column??)? Do you want Utf-16, Utf-32, or Utf-8.  See what I mean???

vladie

  • Community Member
  • Posts: 18
  • Hero Points: 0
Re: small bug with column indicator in utf-8 coded files
« Reply #2 on: February 24, 2021, 05:17:34 pm »
I've tried to measure utf-8 line width in chars using column values at begin and end of string, like:
const char *test = "приветприветприветприветприветприветприветприветприветпривет";

i think that it will be fine to have right-click popup menu on "Col" indicator, which can make possible to select Column meaning: it's offset in characters or in bytes.
« Last Edit: February 24, 2021, 06:03:32 pm by vladie »

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 5702
  • Hero Points: 457
Re: small bug with column indicator in utf-8 coded files
« Reply #3 on: February 24, 2021, 06:04:57 pm »
By “chars” do you mean UTF-32? That would count each Utf-8 character specification which is anywhere from 1-6 bytes as one character.

Your source code example doesn’t indicate the need for a particular length since the string is null terminated. To determine the amount of memory required by your sample string, you would need a byte count and not a UTF-32 count.

Note that on windows, SlickEdit uses the active code page by default for C++. I’m assuming you are on macOS or Linux.

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 5702
  • Hero Points: 457
Re: small bug with column indicator in utf-8 coded files
« Reply #4 on: February 24, 2021, 09:38:32 pm »
If you want a character count and not a byte count of a string of characters within a single line, make a selection and SlickEdit will give you a UTF-32 (not UTF-16) character count. In your case, since your sample text has no spaces, just double click on the word and SlickEdit will give you the character count (Utf-32 count not Utf-16) and not the byte count in the status area.

There's no need to do math on the start and end columns.

vladie

  • Community Member
  • Posts: 18
  • Hero Points: 0
Re: small bug with column indicator in utf-8 coded files
« Reply #5 on: February 25, 2021, 09:24:54 am »
By “chars” do you mean UTF-32? That would count each Utf-8 character specification which is anywhere from 1-6 bytes as one character.

my source code is in utf-8, os is mac os.