Split string into array without deleting delimiter?

I have a string like

 "asdf a  b c2 "

And I want to split it into an array like this:

["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]

Using string.split(" ") removes the spaces, resulting in this:

["asdf", "a", "", "b", "c2"]

I thought of inserting extra delimiters, e.g.

string.replace(/ /g, "| |").replace(/||/g, "|").split("|");

But this gives an unexpected result.

Answers:

Answer

Instead of splitting, it might be easier to think of this as extracting strings comprising either the delimiter or consecutive characters that are not the delimiter:

'asdf a  b c2 '.match(/\S+|\s/g)
// result: ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]
'asdf a  b. . c2% * '.match(/\S+|\s/g)
// result: ["asdf", " ", "a", " ", " ", "b.", " ", ".", " ", "c2%", " ", "*", " "]

A more Shakespearean definition of the matches would be:

'asdf a  b c2 '.match(/ |[^ ]+/g)

To or (not to )+.

Answer

Use positive lookahead:

"asdf a  b c2 ".split(/(?= )/)
// => ["asdf", " a", " ", " b", " c2", " "]

Post-edit EDIT: As I said in comments, the lack of lookbehind makes this a bit trickier. If all the words only consist of letters, you can fake lookbehind using \b word boundary matcher:

"asdf a  b c2 ".split(/(?= )|\b/)
// => ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]

but as soon as you get some punctuation in, it breaks down, since it does not only break on spaces:

"asdf-eif.b".split(/(?= )|\b/)
// => ["asdf", "-", "eif", ".", "b"]

If you do have non-letters you don't want to break on, then I will also suggest a postprocessing method.

Post-think EDIT: This is based on JamesA's original idea, but refined to not use jQuery, and to correctly split:

function chop(str) {
  var result = [];
  var pastFirst = false;
  str.split(' ').forEach(function(x) {
    if (pastFirst) result.push(' ');
    if (x.length) result.push(x);
    pastFirst = true;
  });
  return result;
}
chop("asdf a  b c2 ")
// => ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]
Answer

I'm surprised no one has mentioned this yet, but I'll post this here for the sake of completeness. If you have capturing groups in your expression, then .split will include the captured substring as a separate entry in the result array:

"asdf a  b c2 ".split(/( )/)  // or /(\s)/
// ["asdf", " ", "a", " ", "", " ", "b", " ", "c2", " ", ""]

Note, this is not exactly the same as the desired output you specified, as it includes an empty string between the two contiguous spaces and after the last space.

If necessary, you can filter out all empty strings from the result array like this:

"asdf a  b c2 ".split(/( )/).filter(String)
// ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]

However, if this is what you're looking for, I'd probably recommend you go with @Jack's solution.

Answer

You could use a little jQuery

var toSplit = "asdf a  b c2 ".split(" ");
$.each(toSplit, 
    function(index, value) { 
        if (toSplit[index] == '') { toSplit[index] = ' '} 
    }
);

This will create the output you are looking for without the leading spaces on the other elements.

Answer
"asdf a  b c2 ".split(' ').join(' ,');
Answer

Try clean-split:

const cleanSplit = require("clean-split");

cleanSplit("a-b-c", "-");
//=> ["a", "-", "b", "-", "c"]

cleanSplit("a-b-c", "-", { anchor: "before" });
//=> ["a-", "b-", "c"]

cleanSplit("a-b-c", "-", { anchor: "after" });
//=> ["a", "-b", "-c"]

Under the hood, it uses logic adapted from:

In your case, you can do something like this:

const cleanSplit = require("clean-split");

cleanSplit("asdf a  b c2 ", " ");
//=> ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]

Tags

Recent Questions

Top Questions

Home Tags Terms of Service Privacy Policy DMCA Contact Us

©2020 All rights reserved.