.split(..) broken in IE

I discovered last weekend, something about IE which I had previously never known (probably because I use Firefox/Firebug to write and test most of my Javascript). It turns out that the implementation of .split(..) is broken in IE (for more complex cases). If you stick with the simpler cases when using this function, then you probably won't have ever noticed this. For example, the following works just fine:

var s = 'This-is-a-hyphen-delimited-string';
var split = s.split('-')// returns  ['This', 'is', 'a', 'hyphen', 'delimited', 'string']

The above code will work in just about every browser under the sun, including IE, but of course, it's the simplest case scenario for the .split(..) function. Many programmers will probably be aware that you're not limited to passing in a string; you can also pass in a regular expression, which IE (mostly) supports, but this is where things can start to fall apart on you (in IE anyway).

Read on to learn more.

The .split(..) function is pretty robust. You can pass in a simple string, which indicates what character (or characters) you want to use use as a delimiter, or you can pass in a regular expression instead of a string. Take the following example:

var s = 'My~Silly.Test-String';
var split = s.split(/~|\.|-/);   // returns [ 'My', 'Silly', 'Test', 'String' ]

In the above example, the string, s will be split by any of the characters: "~", ".", or "-". The regular expression makes this really easy to do. But one thing you'll notice is that the delimiters themselves are not captured. This might not be a problem for you; in fact, you might not want the delimiters captured, in which case no problem. But suppose you do. Suppose you want the result of your split to be:

[ 'My', '~', 'Silly', '.', 'Test', '-', 'String' ]

In that case, you need to modify your regular expression, and unfortunatly, the needed modification isn't going to work in IE. Here's how's you'd accomplish it in most other browsers (it's remarkably easy).

var s = 'My~Silly.Test-String';
var split = s.split(/(~|\.|-/))// returns ['My', '~', 'Silly', '.', 'Test', '-', 'String']

You'll notice (or maybe not, since the change was so minor) that I didn't have to do much to get the desired result. All that was needed was wrapping the contents of my regular expression in parenthesis, thus indicating a regular expression grouping.

Unfortunately, in IE's implementation of .split(..), it completely ignores the grouping request. So while your result will still be an array split by the requested delimiters, those delimiters won't be included in the array!

So, what do we do when IE won't cooperate with with us? Why, we beat it into submission of course, and here's how.

The .xSplit(..) function

I call it xSplit(..) because it's a sort of "eXtended" version of the native .split(..) function. Anytime I'm writing/enhancing a version of a function that already exists, I like to use the "x" to help my eyes quickly distinguish between the preferred native function and my own custom implementation (actually, I don't really do that; I just made that up now, but it seems like a reasonable idea, so maybe I'll start!).

The xSplit(..) function will work more or less like the native function, with just one small difference: it will only accept regular expressions. Strings won't be accepted; the native .split(..) function for that.

Here is the full code, with an explanation to follow.

String.prototype.xSplit = function(_regEx)
{
   // Most browsers can do this properly, so let them -- they'll do it faster
   if ('a~b'.split(/(~)/).length === 3) { return this.split(_regEx); }

   if (!_regEx.global)
      { _regEx = new RegExp(_regEx.source, 'g' + (_regEx.ignoreCase ? 'i' : '')); }

   // IE (and any other browser that can't capture the delimiter)
   // will, unfortunately, have to be slowed down
   var m, str = '', arr = [];
   var i, len = this.length;
   for (i = 0; i < len; i++)
   {
      str += this.charAt(i);
      m = str.match(_regEx);
      if (m)
      {
         arr.push(str.replace(m[0], ''));
         arr.push(m[0]);
         str = '';
      }
   }

   if (str != '') arr.push(str);

   return arr;
}

Explanation:

The code isn't too hard, but it's worth pointing out a few specific points.

First, I'm extending the native String object using .prototype. I know some people aren't comfortable with this, and if you're one of them, feel free to write a stand-alone function where you pass in a string variable, as well as the regex.

Second, Most browsers actually will capture the delimiter when you use regular expression grouping, so if the client happens to be using a browser that can do it, we'll want to use the native implementation, as it will be much faster. We do it like this:

if ('a~b'.split(/(~)/).length === 3) { return this.split(_regEx); }

The above code does a very simple test in which we check to see if the split of "a~b" on the character "~" returns an array of length 3. If so, we assume support for regular expression grouping with the .split(..) function. Otherwise, we're forced to do this manually.

Now, if we do need to do this manually, the regular expression had better have its global (/g) flag set. Why? Because we're executing a .match(..) on the _regEx later on in the code, and in order for that to match more than just the first delimiter it finds, that global flag needs to be set.

if (!_regEx.global)
   { _regEx = new RegExp(_regEx.source, 'g' + (_regEx.ignoreCase ? 'i' : '')); }

This code first checks to see if the global flag is already set. If so, we're good shape; no need to make any alterations. If not, we'll need to set it. Problem is, the .global property of any regular expression object is read-only, so we must construct a new RegExp using the .source of the original _regEx and manually specify the global flag (by passing in a 'g'). Additionally, we need to make sure we maintain any request in the original regular expression to ignore case, which is accomplished with: (_regEx.ignoreCase ? 'i' : '').

The rest of the logic is pretty straight-forward. We loop through the given string, building, character by character, the strings that will make up the individual string elements in the array to be returned. At each iteration through the loop, we'll test (using .match(..)) to see if the string we've built so far, contains any of the delimiters specified by the passed in _regEx. If so, we need to push that string into the array, but we need to strip the delimiter out of the string before we do (hence the code: arr.push(str.replace(m[0], ''));).

We also need to push the delimiter itself into the array and then reset the str variable back to an empty string.

It should be safe to access m[0] (and not bother with other indices). Since we're testing the match on each iteration, we can expect to receive, at most, one match returned in the m array (even if that match is more than one character).

Final thoughts:

It's worth pointing out that compared to the native .split(..), my implementation is slow! So, whenever you don't need to capture regular expression groupings, stick with the native function. Of course, for browsers that support the feature natively, we'll be using the native .split(..) under the hood anyway, but regardless... if you know you won't need groupings, don't bother with .xSplit(..).

Comments welcome.

6 Responses

  1. Steven Levithan Says:

    Here's an alternative approach to fixing this problem that you might find useful or interesting: Fix JavaScript Split.

  2. sstchur Says:

    Very nice! Thanks Steven… I'll tuck that away for future reference. BTW, "Flagrant Badassery" is the coolest name for a blog ;-)

  3. Brett Knights Says:

    Thanks for the example.

    I reworked your code some and the following was a little better than twice as fast for my test cases:

    String.prototype.xSplit = function(_regEx){
    // Most browsers can do this properly, so let them — they'll do it faster
    if ('a~b'.split(/(~)/).length === 3) { return this.split(_regEx); }

    if (!_regEx.global)
    { _regEx = new RegExp(_regEx.source, 'g' + (_regEx.ignoreCase ? 'i' : ")); }

    // IE (and any other browser that can't capture the delimiter)
    // will, unfortunately, have to be slowed down
    var start = 0, arr=[];
    var result;
    while((result = _regEx.exec(this)) != null){
    arr.push(this.slice(start, result.index));
    if(result.length > 1) arr.push(result[1]);
    start = _regEx.lastIndex;
    }
    if(start < this.length) arr.push(this.slice(start));
    if(start == this.length) arr.push(""); //delim at the end
    return arr;
    };

  4. Guillaume Says:

    Good job and many thanks !!!
    I tried to make a Javascript to transform last and first name (eg : john-peter ladybird to John-Peter Ladybird) with the split function.
    Everything was Ok under Firefox but old IE 6 transformed it to JohnPeterLadybird, because of this f…ing bug

    So congratulations and many many thanks !
    Merci ;)

  5. sstchur Says:

    @Guillaume: Glad it helped!

  6. www.tonitech.com的站长 Says:

    Your method is a bit slow, because I have much data! I need to seek other method.

Got something to say?

Please note: Constructive criticism is welcome. Rude or vulgar comments however, are not and will be removed during moderation.