Better regular expression for URLs

by Guillermo 31. October 2008 16:00

Via Jeff Attwood's post, in summary (because his posts tend to be very, hmm, thorough):

  • The primary improvement here is that we're only accepting a whitelist of known good URL characters. Allowing arbitrary random characters in URLs is setting yourself up for XSS exploits, and I can tell you that from personal experience. Don't do it!
  • We only allow certain characters to "end" the URL. Ending a URL in common punctuation marks like period, exclamation point, semicolon, etc means those characters will be considered end-of-hyperlink characters and not included in the URL.
  • Parens, if present, are allowed in the URL -- and we absorb the leading paren, if it is there, too.
  • The regular expression is:

    \(?\bhttp://[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]

    Tags: ,

    Development | Process & Methodology

    Comments are closed

    Powered by BlogEngine.NET 1.5.0.7
    Theme by Extensive SEO

    About the author

    Something about the author

    Your Most Recent Comments

    Comment RSS

    Page List