Match text not inside span tags

Using Javascript, I'm trying to wrap span tags around certain text on the page, but I don't want to wrap tags around text already inside a set of span tags.

Currently I'm using:

html = $('#container').html();
var regex = /([\s| ]*)(apple)([\s| ]*)/g;
html = html.replace(regex, '$1<span class="highlight">$2</span>$3');

It works but if it's used on the same string twice or if the string appears in another string later, for example 'a bunch of apples' then later 'apples', I end up with this:

<span class="highlight">a bunch of <span class="highlight">apples</span></span>

I don't want it to replace 'apples' the second time because it's already inside span tags.

It should match 'apples' here:

Red apples are my <span class="highlight">favourite fruit.</span>

But not here:

<span class="highlight">Red apples are my favourite fruit.</span>

I've tried using this but it doesn't work:

([\s|&nbsp;]*)(apples).*(?!</span)

Any help would be appreciated. Thank you.

Answers:

Answer

First off, you should know that parsing html with regex is generally considered to be a bad idea—a Dom parser is usually recommended. With this disclaimer, I will show you a simple regex solution.

This problem is a classic case of the technique explained in this question to "regex-match a pattern, excluding..."

We can solve it with a beautifully-simple regex:

<span.*?<\/span>|(\bapples\b)

The left side of the alternation | matches complete <span... /span> tags. We will ignore these matches. The right side matches and captures apples to Group 1, and we know they are the right ones because they were not matched by the expression on the left.

This program shows how to use the regex (see the results in the right pane of the online demo). Please note that in the demo I replaced with [span] instead of <span> so that the result would show in the browser (which interprets the html):

var subject = 'Red apples are my <span class="highlight">favourite apples.</span>';
var regex = /<span.*?<\/span>|(\bapples\b)/g;
replaced = subject.replace(regex, function(m, group1) {
    if (group1 == "" ) return m;
    else return "<span class=\"highlight\">" + group1 + "</span>";
});
document.write("<br>*** Replacements ***<br>");
document.write(replaced);

Reference

Tags

Recent Questions

Top Questions

Home Tags Terms of Service Privacy Policy DMCA Contact Us

©2020 All rights reserved.