JavaScript Regex to match a URL in a field of text

How can I setup my regex to test to see if a URL is contained in a block of text in javascript. I cant quite figure out the pattern to use to accomplish this

 var urlpattern = new RegExp( "(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?"

 var txtfield = $('#msg').val() /*this is a textarea*/

 if ( urlpattern.test(txtfield) ){
        //do something about it
 }

EDIT:

So the Pattern I have now works in regex testers for what I need it to do but chrome throws an error

  "Invalid regular expression: /(http|ftp|https)://[w-_]+(.[w-_]+)+([w-.,@?^=%&:/~+#]*[[email protected]?^=%&/~+#])?/: Range out of order in character class"

for the following code:

var urlexp = new RegExp( '(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?' );

Answers:

Answer

Though escaping the dash characters (which can have a special meaning as character range specifiers when inside a character class) should work, one other method for taking away their special meaning is putting them at the beginning or the end of the class definition.

In addition, \+ and \@ in a character class are indeed interpreted as + and @ respectively by the JavaScript engine; however, the escapes are not necessary and may confuse someone trying to interpret the regex visually.

I would recommend the following regex for your purposes:

(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*[\[email protected]?^=%&/~+#-])?

this can be specified in JavaScript either by passing it into the RegExp constructor (like you did in your example):

var urlPattern = new RegExp("(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*[\[email protected]?^=%&/~+#-])?")

or by directly specifying a regex literal, using the // quoting method:

var urlPattern = /(http|ftp|https):\/\/[\w-]+(\.[\w-]+)+([\w.,@?^=%&:\/~+#-]*[\[email protected]?^=%&\/~+#-])?/

The RegExp constructor is necessary if you accept a regex as a string (from user input or an AJAX call, for instance), and might be more readable (as it is in this case). I am fairly certain that the // quoting method is more efficient, and is at certain times more readable. Both work.

I tested your original and this modification using Chrome both on <JSFiddle> and on <RegexLib.com>, using the Client-Side regex engine (browser) and specifically selecting JavaScript. While the first one fails with the error you stated, my suggested modification succeeds. If I remove the h from the http in the source, it fails to match, as it should!

Edit

As noted by @noa in the comments, the expression above will not match local network (non-internet) servers or any other servers accessed with a single word (e.g. http://localhost/... or https://sharepoint-test-server/...). If matching this type of url is desired (which it may or may not be), the following might be more appropriate:

(http|ftp|https)://[\w-]+(\.[\w-]+)*([\w.,@?^=%&amp;:/~+#-]*[\[email protected]?^=%&amp;/~+#-])?

#------changed----here-------------^

<End Edit>

Finally, an excellent resource that taught me 90% of what I know about regex is Regular-Expressions.info - I highly recommend it if you want to learn regex (both what it can do and what it can't)!

Answer

You have to escape the backslash when you are using new RegExp.

Also you can put the dash - at the end of character class to avoid escaping it.

&amp; inside a character class means & or a or m or p or ; , you just need to put & and ; , a, m and p are already match by \w.

So, your regex becomes:

var urlexp = new RegExp( '(http|ftp|https)://[\\w-]+(\\.[\\w-]+)+([\\w-.,@?^=%&:/~+#-]*[\\[email protected]?^=%&;/~+#-])?' );
Answer

Here's the most complete single URL parsing pattern.

It works with ANY URI/URL in ANY substring!

https://regex101.com/r/jO8bC4/5

Example JS code with output - every URL is turned into a 5-part array of its 'parts':

var re = /([a-z]+\:\/+)([^\/\s]*)([a-z0-9\[email protected]\^=%&;\/~\+]*)[\?]?([^ \#]*)#?([^ \#]*)/ig; 
var str = 'Bob: Hey there, have you checked https://www.facebook.com ?\n(ignore) https://github.com/justsml?tab=activity#top (ignore this too)';
var m;

while ((m = re.exec(str)) !== null) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }
    console.log(m);
}

Will give you the following:

["https://www.facebook.com",
  "https://",
  "www.facebook.com",
  "",
  "",
  ""
]

["https://github.com/justsml?tab=activity#top",
  "https://",
  "github.com",
  "/justsml",
  "tab=activity",
  "top"
]

BAM! RegEx FTW!

Answer

try (http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?

Answer

I've cleaned up your regex:

var urlexp = new RegExp('(http|ftp|https)://[a-z0-9\-_]+(\.[a-z0-9\-_]+)+([a-z0-9\-\.,@\?^=%&;:/~\+#]*[a-z0-9\[email protected]\?^=%&;/~\+#])?', 'i');

Tested and works just fine ;)

Answer

Try this general regex for many URL format

/(([A-Za-z]{3,9})://)?([-;:&=\+\$,\w][email protected]{1})?(([-A-Za-z0-9]+\.)+[A-Za-z]{2,3})(:\d+)?((/[-\+~%/\.\w]+)?/?([&?][-\+=&;%@\.\w]+)?(#[\w]+)?)?/g
Answer

The trouble is that the "-" in the character class (the brackets) is being parsed as a range: [a-z] means "any character between a and z." As Vini-T suggested, you need to escape the "-" characters in the character classes, using a backslash.

Answer

try this worked for me

/^((ftp|http[s]?):\/\/)?(www\.)([a-z0-9]+)\.[a-z]{2,5}(\.[a-z]{2})?$/

that is so simple and understandable

Tags

Recent Questions

Top Questions

Home Tags Terms of Service Privacy Policy DMCA Contact Us

©2020 All rights reserved.