Author Topic: changing Word-chars and the definition of IsXMLNameChar  (Read 8685 times)

rod_gomz

  • Community Member
  • Posts: 80
  • Hero Points: 1
changing Word-chars and the definition of IsXMLNameChar
« on: April 13, 2009, 09:52:15 PM »
I have some XML like this:

<key-group id="*the cursor is here |*part.extract">

If I want to ctrl+shift and select "part" before the dot, how do I do that? I see that issueing the command word-chars, I see this:

\p{isXMLNameChar}?!

So a word is defined as any character not an xml class? I did a search (macros directory) and couldn't find the place where this is defined. I wanted to copy that and put a dot in there. Also what is the \p?

I keep thinking when I select words about camelhumps. I keep seeing the dot as a word sepator.

Rodrigo Gomez

jbhurst

  • Senior Community Member
  • Posts: 405
  • Hero Points: 33
Re: changing Word-chars and the definition of IsXMLNameChar
« Reply #1 on: April 13, 2009, 10:04:14 PM »
I also find it annoying that '.' and ':' are considered word characters in XML. It makes it harder to navigate, as the usefulness of the prev/next-word commands is diminished.

For example, in an Ant build script I might have the following:
Code: [Select]
  <taskdef name="groovyc" classname="org.codehaus.groovy.ant.Groovyc" classpathref="classpath.groovy"/>

Normally, the prev/next-word commands would jump to the individual words "org", "codehaus", "groovy", etc. But not in XML mode.

Is there a way to change that?

One approach, which I sometimes use, it to switch modes:

Code: [Select]
fundamental-mode

John Hurst
Wellington, New Zealand

chrisant

  • Senior Community Member
  • Posts: 1410
  • Hero Points: 131
Re: changing Word-chars and the definition of IsXMLNameChar
« Reply #2 on: April 14, 2009, 12:23:34 AM »
I also find it annoying that '.' and ':' are considered word characters in XML. It makes it harder to navigate, as the usefulness of the prev/next-word commands is diminished.

For example, in an Ant build script I might have the following:
Code: [Select]
  <taskdef name="groovyc" classname="org.codehaus.groovy.ant.Groovyc" classpathref="classpath.groovy"/>

Normally, the prev/next-word commands would jump to the individual words "org", "codehaus", "groovy", etc. But not in XML mode.

Is there a way to change that?
In the Tools|Options dialog you can change the word characters for a language in the language's "General" node, right above the "Language-Specific Project..." button.

rod_gomz

  • Community Member
  • Posts: 80
  • Hero Points: 1
Re: changing Word-chars and the definition of IsXMLNameChar
« Reply #3 on: April 14, 2009, 02:35:50 PM »
Changing to fundamental mode is a good idea.  The word-chars over there are: A-Za-z0-9_$. So all I have to do is copy that under xml mode.

Lee

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 1299
  • Hero Points: 130
Re: changing Word-chars and the definition of IsXMLNameChar
« Reply #4 on: April 14, 2009, 02:50:40 PM »
\p{} and \P{} are used for matching specific named Unicode Character blocks.  There is a section in the Help on Regular Expressions which details the named Unicode blocks that are available to match.

You are free to change the defined Word Chars for XML mode if you like, but you may encounter problems you'll if there are any valid UTF-8 characters > 0x07F in the word your trying to select.

rod_gomz

  • Community Member
  • Posts: 80
  • Hero Points: 1
Re: changing Word-chars and the definition of IsXMLNameChar
« Reply #5 on: April 14, 2009, 04:31:35 PM »
Thanks. That helps explain what this means:


    XMLNameChar
      All characters that are valid in an XML tag name.
 
I'm guessing this expression means:

\p{isXMLNameChar}?!

A word is a valid XML tag name. I am reading this right? What is the bang at the end "!".

Maybe I don't understand the command "word-chars" well.

Rodrigo Gomez

Lee

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 1299
  • Hero Points: 130
Re: changing Word-chars and the definition of IsXMLNameChar
« Reply #6 on: April 14, 2009, 05:04:37 PM »
The Word Chars setting in SlickEdit defines the set of characters that consecutively make up a word for that language.  It is used for word-based operations in searches, generating selections, and navigation.  In this case the ? and ! are literal characters, so if you placed your cursor somewhere on this snippet:

abc?ijk!xyz

the whole phrase would be selected if you did a Select Word or some other word based op.

rod_gomz

  • Community Member
  • Posts: 80
  • Hero Points: 1
Re: changing Word-chars and the definition of IsXMLNameChar
« Reply #7 on: April 14, 2009, 06:18:35 PM »
Thanks for the information.

Cool, it seems as if the word-chars is inside a "[]" character class.

I was trying to specify a not:

[^.]

and I was doing this:

^.

So crafting a regular expression is a standard and self-documenting way of describing a word.

However, would it be better to defer this to a user defined function that returns true for each character? So I can write a function that returns true on a valid \p{isXMLNameChar} (XML tag name), but false if that character is a dot "."?

Rodrigo Gomez

Lee

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 1299
  • Hero Points: 130
Re: changing Word-chars and the definition of IsXMLNameChar
« Reply #8 on: April 14, 2009, 07:22:13 PM »
Yes, the word char definition is a character class. \p{isXMLNameChar} is a predefined expression that maps to the XML Name definition (see http://www.w3.org/TR/xml/#NT-NameChar).

Quote
However, would it be better to defer this to a user defined function that returns true for each character? So I can write a function that returns true on a valid \p{isXMLNameChar} (XML tag name), but false if that character is a dot "."?

There is no need for a user defined function, the word char option already allows you to specify what characters are valid.  If you want to exclude just "." from the word char setting \p{isXMLNameChar}?!, SlickEdit regex does support character class subtraction.  I'm pretty sure this would work:
\p{isXMLNameChar}?!-[.]


rod_gomz

  • Community Member
  • Posts: 80
  • Hero Points: 1
Re: changing Word-chars and the definition of IsXMLNameChar
« Reply #9 on: April 14, 2009, 07:38:59 PM »
Wow.

Thanks alot. This is great functionality.  :)

I've changed my word-chars to:

\p{isXMLNameChar}?!\*-[.<>_]

Now if I have a word like this:

<xmltag>*cursor here |*AAAA_INFOIQ.P14</xmltag>

I can hit ctrl+shift 3 times to select each one as I go and it stops before the less than "<".

Thank you so much!

Rodrigo Gomez

jbhurst

  • Senior Community Member
  • Posts: 405
  • Hero Points: 33
Re: changing Word-chars and the definition of IsXMLNameChar
« Reply #10 on: April 14, 2009, 08:20:10 PM »
This worked a treat for me too. Thanks!

Interestingly, Ant mode has its own settings, and does not define word chars using the isXMLNameChar character class. However, the word chars are easily fixed.

The hyphen works as both a range operator and a subtraction operator in a character class ... I don't think I consciously realized that before! Freaked me out.

Kicking myself now for not fixing this years ago ...

John Hurst
Wellington, New Zealand

hs2

  • Senior Community Member
  • Posts: 2761
  • Hero Points: 292
Re: changing Word-chars and the definition of IsXMLNameChar
« Reply #11 on: April 26, 2009, 03:16:18 PM »
Hey - I didn't know that nifty substraction feature too ! Again a little SE gem I really like :)
Thanks Lee !
HS2