Javascript and RegEx: Split and keep delimiter

I have a regex which will split my string into arrays.

Everyything works fine except that I would like to keep a part of the delimiter.

Here is my regex:


in Javascript, I am doing:

var test = paragraph.split(/(&#?[a-zA-Z0-9]+;)[\s]/g);

My paragraph is as followed:

Current addresses:  &dagger;    Biopharmaceutical Research and Development<br />
&Dagger;    Clovis Oncology<br />
&sect;  Pisces Molecular <br />
||  School of Biological Sciences    
&para;  Department of Chemistry<br />

The problem is that I am getting 10 elements in my array and not 5 as I should. In fact, I am also getting my delimiter as an element and my goal is to keep the delimiter with the splited element and not to create a new one.

Thank you very much for your help.


I would like to get this as a result:

1. &dagger; Biopharmaceutical Research and Development<br />
2. &Dagger; Clovis Oncology<br />
3. &sect;   &sect;  Pisces Molecular <br />
||  School of Biological Sciences  
4.  &para;  Department of Chemistry<br />



Try to use match instead:

var test = paragraph.match(/&#?[a-zA-Z0-9]+;\s[^&]*/g);

Updated: Added a required white-space \s match.


  • &#? Match & and an optional # (the question mark match previous one or zero times)

  • [a-zA-Z0-9] is a range of all upper and lower case characters and digits. If you also accept an underscore you could replace this with \w.

  • The + sign means that it should match the last pattern one or more times, so it matches one or more characters a-z, A-Z and digits 0-9.

  • The ; matches the character ;.

  • The \s matches the class white-space. That includes space, tab and other white-space characters.

  • [^&]* Once again a range, but since ^ is the first character the match is negated, so instead of matching the &-characters it matches everything but the &. The star matches the pattern zero or more times.

  • g at the end, after the last / means global, and makes the match continue after the first match and get an array of all matches.

So, match & and an optional #, followed by any number of letters or digits (but at least one), followed by ;, followed by a white-space, followed by zero or more characters that isn't &.


As I said in the comment, this solution (untested, by the way) will only work if you're just managing <br /> elements. Here:

var text = paragraph.split("<br />"); // now text contains just the text on each line

for(var i = 0; i<text.length-1; i++) { // don't want to add an line break to our last line
    text[i] += " <br />"; // replace the <br /> elements on each line

The variable text is now an array, where each element of the array is a line of the original paragraph. The linebreaks (<br />) have been added back on the end of each line. You just mentioned that you want to split on the special characters, but from what I see, each line ends in a line break, so this should hopefully have the same effect. Unfortunately I don't have the time to write up a more complete answer at the moment.


Using regex it is pretty simple:

var result = input.match(/&#?[^\W_]+;\s[^&]*/g);

Test it here.


Recent Questions

Top Questions

Home Tags Terms of Service Privacy Policy DMCA Contact Us

©2020 All rights reserved.