Word-wrap for Mozilla (take 2)

This blog entry is a follow-up to my last blog entry, Emulating CSS word-wrap for Mozilla/Firefox. In this entry, I work under the assumption that the reader has read the original post. If you haven't, I highly encourage you to do so before reading this post, as the contents of this one will be much more relevant if you do.

In my last entry, I discussed how one might achieve something similar to the CSS style rule: "word-wrap: break-word;" in Mozilla-based browsers.

The technique I used to accomplish this centered around using a Mozilla binding to automatically insert an invisible unicode hyphen between the characters of a string contained within any element whose class was set to "wordwrap." With this invisible character in place, Mozilla will happily wrap a long string whenever the element's content overflows its container.

There was a catch with my approach though. I wanted to be able to modify, not only any inner text of a "wordwrap" element, but I also needed the logic of my binding to leave in tact, any HTML that may have existed inside the given element.

The logic I ended up using was to insert the unicode character between each and every character in the element's .innerHTML (including any HTML tags). Obviously though, this would "mangle" the HTML tags thereby making them useless. So, I then added a bit more logic which utilized regular expressions plus a callback function in a .replace(..) call to "unmangle" the HTML I screwed up in the first place.

This worked, but as alert reader, Rich Birkby, kindly pointed out, there is an easier way. A way that, as it turns out, is quite simple to implement and doesn't require "mangling" any of the HTML tags within the "wordwrap" element.

Read on to learn more.

This superior technique (IMO) that Rich informed me of was the TreeWalker. A TreeWalker is a an object that you can create in Mozilla-based browsers that allows you to navigate the document tree.

The document.createTreeWalker(..) function takes 4 parameters which allow you to specify some specifics about exactly how the walker should "do its walking" (if you will). Each parameter is explained in more detail below:

  1. rootNode: The node in the document that is to serve as the root node for this tree walker.
  2. whatToShow: An integer constant that specifies one of several built in filters for selecting nodes to be included in the tree. Listed below are the acceptable values for this parameter:
    • NodeFilter.SHOW_ALL
    • NodeFilter.SHOW_CDATA_SECTION
    • NodeFilter.SHOW_DOCUMENT
    • NodeFilter.SHOW_DOCUMENT_TYPE
    • NodeFilter.SHOW_ENTITY
    • NodeFilter.SHOW_NOTATION
    • NodeFilter.SHOW_TEXT
    • NodeFilter.SHOW_ATTRIBUTE
    • NodeFilter.SHOW_COMMENT
    • NodeFilter.SHOW_DOCUMENT_FRAGMENT
    • NodeFilter.SHOW_ELEMENT
    • NodeFilter.SHOW_ENTITY_REFERENCE
    • NodeFilter.SHOW_PROCESSING_INSTRUCTION
  3. filterFunction: This parameter can be a reference to a filter function. Its purpose is to allow you to filter nodes even further than what the whatToShow parameter can provide. If you have no need for this parameter, simply pass in null. However, if you specify a function, it must accept a single node and return an integer value based on one of the following constants:
    • NodeFilter.FILTER_ACCEPT
    • NodeFilter.FILTER_REJECT
    • NodeFilter.FILTER_SKIP
  4. entityRefExpansion: A boolean value that determines whether or not the content of the entity reference nodes should be treated as hierarchial nodes. In most cases, this value will probably be false.

Updating the original code to use the TreeWalker:

For our purposes (emulating the word-wrap CSS property) we won't need to get very fancy with the TreeWalker at all. All we really need to do is traverse the tree of the element that's been bound by our Mozilla Binding. We'll make use of the SHOW_TEXT filter to ensure that we visit only text nodes (and not bother with any HTML elements). This will allow us to insert the unicode hyphen between the characters of text nodes, while leaving in tact all other HTML inside the bound element.

The code would look something like this:


var walker = document.createTreeWalker(_elem, NodeFilter.SHOW_TEXT, null, false);

while (walker.nextNode())
{
   var node = walker.currentNode;
   node.nodeValue = node.nodeValue.split('').join(String.fromCharCode('8203'));
}
 

The above code is really a tremendous simplification from what I had originally written (thanks Rich!). This technique allows us to walk only the text nodes of the bound element (skipping over any HTML tags). That way, we can insert the unicode symbol only amongst the characters of raw text and not have to worry about mangling (and subsequently, unmangling) any HTML tags.

The "Gotcha:"

While the TreeWalker technique greatly simplifies the word-wrap binding code, there is still a caveat that you should be aware of if you plan to use this binding -- or a similar one you've written yourself -- in one of your projects (and this goes for the original version too).

Inserting the unicode symbol between every character in a string is not necessarily ideal. Take a perfectly normal string like "How are you today?" This string already has spaces, and so it will naturally wrap where the browser sees those "soft" spaces. Inserting the unicode hyphen into that aforementioned string doesn't do anything particularly useful, and in fact, it might even be considered harmful.

It's quite possible that you would end up breaking the word "today" onto two pieces: "tod" and "ay" -- clearly not the desired effect. What can be done about this? Well... one option is careful and thoughtful use of the word-wrap binding. Use it only for long unbroken strings of text (like a long URL for instance). Another option might be to modify the code to abort the word wrap if a space is found within the string (far from a perfect approach though).

Another idea I experimented with (but decided against blogging about because of its complexity) was measuring each "word" in the bound element. If any given word were greater in width than the bound element's container, then that word would get unicode hyphens inserted into it. All "short" words were left alone.

This technique did actually work, but it involved recursively adding invisible, cloned elements wrapped in <span> tags to the DOM on the fly in order to properly measure the text. Also, it did not account for a scenario where text should wrap because of a floated element, and this (IMO) is one of the more desirable way to utilize word-wrap. Still, I think the concept has potential, and I may decide to revisit it at some point in the future.

For now though, we have something that gets us close, is relatively simple to implement, and brain-dead easy to use.

If you'd like, you can see a working demo of the updated word-wrap code.

If you're feeling extra adventurous, you can take a look at my word-wrap experiment (take 3) where I try to measure text elements and only insert unicode hyphens where necessary (use this version at your own risk!)

That'll do it for this entry. Special thanks to Rich Birkby for enlightening me about the TreeWalker. Be sure to check our Rich's ASPAdvice blog: http://aspadvice.com/blogs/rbirkby and his Microsoft Downloads RSS Feed.

Comments welcome.

21 Responses

  1. Fabrizio Calderan Says:

    I really appreciate this experiment and I found it very useful.

    Using this workaorund with the 'word-wrap' property in Internet Explorer you have a nearly crossbrowser solution for the word break problem.

    I said 'nearly' because in Opera the issue is not resolved anyway… do you know if it's possibile use some workaround in Opera (like -moz-binding)?

    Thank you again for you interesting talks.

    Fab

  2. sstchur Says:

    Fab,

    That's a good question; I'm honestly not sure. Shame on me for neglecting Opera though!

    One thing that comes to mind, would be to use some sort of function that you call from Javscript, with logic similar to what we've got in the Mozilla binding. This should work in all browsers, but of course, it's not quite a elegant as binding the logic with CSS.

  3. Mohammed Irfan Says:

    It is really good but when you try to coy paste form it the Unicode caracters come s up. This will render the content to be as good as worth less. Hope you will find some thing that also let us copy-paste the content

  4. sstchur Says:

    Mohammed:

    Are you talking about the code in the post? You're having problems copying and pasting it?

  5. David Says:

    Please forgive me, but I don't understand. Your demo for this doesn't seem to work with Firefox or IE. I've been looking for a solution for this for days. I'm sorry yours hasn't panned out. Also, it's nice that you give the details about what you're doing, but without giving the code for the CSS and the HTML and stating which goes where your instructions fall short. In this Treewalker technique, do I need anything specific to Treewalker (files, code, etc.). Do I still need the XML file, and what, if anything new, goes there. And if the XML is still used do I use the same code as before. And should I reference it the same way.
    There are people out here that look at this wordwrap problem as severe and puzzled why it hasn't been fixed. But even worse, that it's so hard to find a fix. If you don't mind me saying so, I would like to be treated like a baby and shown everything that's required to make your solution work. I want to know what and where to place all code to make my seemingly simple problem simply solved. Sometimes it perplexes me to see people out here behaving like linux people. Always belittling the people that want to point, click and have it done and thinking that people should suffer and think in pain for their code solutions. Some people just want to be spoon fed. For example, Word for Windows – People expect there will be some learning curve with the program, but they don't expect they'll have to code every individual page just to make a document. That's probably why Linux is still behind Windows in user popularity. The developers are either lazy or expect the users to feel their "code figuring out" pain. I'm sorry if I'm off task or on a tyrade, but the point is I've come to your site for a solution to my problem, and am leaving with a bunch of stuff that still doesn't work. I'm tired of hearing people say, "try this and see if it works. Now try this. Now try this." No, you try this. Make it work, and don't blog about it till it does work. Don't blog about it till you've come up with a solution for CSS text boxes and textarea form fields. Have you looked at your demo? The text doesn't wordwrap. It just cuts off at the end of the line. Why, this very comment field doesn't have your solutions in place. This is why I'm here. I need a constant text wordwrap solution for a textarea form field so my long uninterrupted text doesn't fly off the field box. Gobishe?

    David

  6. sstchur Says:

    David,

    I understand you're upset having not yet found a solution, and my blog is here to try to help people, so I will try to help you.

    You asked if I have looked at my demo in Firefox. I have. Here is a screenshot of what I see: http://blog.stchur.com/blogcode/pics/mozilla_wordwrap.gif

    What about that screenshot isn't doing what you want/expect?

    Note that my demo is not designed to work with IE. IE support word-wrap natively and using it is trivial, so that was not included in the blog post or demo — I did not feel that the average user needs help with that.

    Why don't you try emailing me directly with some specific questions about the word-wrap binding, and explain to me what isn't working for you?

    And in order to increase your chances of a reply, I would recommend using a more neutral tone that what you used in your comments above.

  7. Hypa Says:

    The links to your demos are not working, I would like to see this in effect.

  8. sstchur Says:

    Ah, thanks for pointing this out Hypa.

    I updated my blogging software and forgot to re-upload the blogcode demos after the upgrade was complete. The demos should be restored now.

  9. Geuis Says:

    I had a particular problem on a forum migration I'm doing. FF kept blowing out a column that had post names that were longer than a certain number of characters. My solution ended up not being to wrap the text, though that was the original goal. I kept applying overflow:hidden; and trying to set a width, both of which were failing when applied to the elements containing the text. However, by applying the styling to the links inside the elements fixes it nicely.

    somelongtextualstuffgoeshere
    .whateverclass a{
    overflow:hidden;
    width:auto;
    display:block;
    }

  10. Geoff Says:

    BODY {
    font-family:normal arial,helvetica,verdana,sans-serif;
    } – breaks you demo – for me every character is rendered w i t h a s p a c e l i k e t h i s .

  11. Stephen Stchur Says:

    @Geoff:

    I'm not able to repro this. I just used Firebug to add the styles you specified to the <body> element on the demo page, and it appeared to work for me.

  12. HP Says:

    Hi,

    Sorry for bumping an old article. I have been trying to emulate your xml solution for word wrap in Firefox. My knowledge of this is limited. I have followed your steps in both version 1 and version 2 of this solution and can get neither to work.

    I would appreciate some assistance and would be happy to provide my code. Are you able to provide me with a direct email?

    Thanks

  13. sstchur Says:

    HP,

    My email is sstchur(at)yahoo(dot)com. Let me know what you've got so far, and we'll see if we can figure out what the missing piece is to get it working.

  14. personallo Says:

    wow ))
    its very interesting point of view.
    Good post.
    realy gj

    thx 🙂

  15. haider Says:

    hix

  16. Chris Says:

    Good news, apparantly the word-wrap CSS property will soon be available in Firefox 3.1:
    http://developer.mozilla.org/web-tech/2008/08/20/word-wrap-break-word/

  17. RKB Says:

    Hello
    it's very good solution for wrapping mozilla but i have face one problem to copy report name try to coy paste form it the Unicode caracters come s up.I want normal test to copy on mozilla web page

    Please give me option for last line
    node.nodeValue = node.nodeValue.split(").join(String.fromCharCode('8203'));

  18. rajiv Says:

    it's very useful but i have facing one problem the word is not breaking properlly meanse it break not complete
    ex – please note : constructive criticis
    m is welcome. Rude or vulgar comments ho
    wever ,are not

    it happen like this

    please help me

  19. sbeam Says:

    Hi, this was a very good article and an interesting technique. I didn't know about -moz-binding. If one needed to support Opera, you could do something similar with regular JS/DOM manipulation – but it would have to be without the convenience of TreeWalker.

    I can confirm this does NOT work on table cells, because the 'overflow' event does not trigger on them. My kludge was to wrap a around the contents of certain table cells that might have really long words in them, and assign max-width: and overflow: property to that.

    Also – since FF 3.1 will have word-wrap:, I added some version detection to the binding. I know this is generally frowned upon but in this case it should be OK since it is future-proof:

    var ver = parseFloat(navigator.userAgent.substring(navigator.userAgent.lastIndexOf('/') + 1));
    if (3.1 > ver) {
    var elem = this;
    doWrap();
    elem.addEventListener('overflow', doWrap, false);
    }

  20. Raja Says:

    I really appreciated your code for solving word-wrap in FireFox using TreeWalker.

    But it fails for Chrome. For IE & Chrome, we can use styles as,

    table-layout:fixed;
    word-break: break-all;

    Thanks a lot .

  21. sstchur Says:

    Thanks Raja:
    Yeah, this was written before Chrome was released. And Chrome not supporting XBL isn't too surprising, right? But as you pointed out, it supports some alt CSS which can be used to achieve the desired effect.

Got something to say?

Please note: Constructive criticism is welcome. Rude or vulgar comments however, are not and will be removed during moderation.