Oct312008

Better regular expression for URLs

Published by guillermo at 4:00 PM under Development | Process & Methodology

Via Jeff Attwood's post, in summary (because his posts tend to be very, hmm, thorough):

  • The primary improvement here is that we're only accepting a whitelist of known good URL characters. Allowing arbitrary random characters in URLs is setting yourself up for XSS exploits, and I can tell you that from personal experience. Don't do it!
  • We only allow certain characters to "end" the URL. Ending a URL in common punctuation marks like period, exclamation point, semicolon, etc means those characters will be considered end-of-hyperlink characters and not included in the URL.
  • Parens, if present, are allowed in the URL -- and we absorb the leading paren, if it is there, too.
  • The regular expression is:

    \(?\bhttp://[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]



    [KickIt] [Dzone] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

    Tags: ,

    E-mail | Permalink | Trackback | Post RSSRSS comment feed 0 Responses

    Related posts

    Add comment


    (Will show your Gravatar icon)  

      Country flag

    [b][/b] - [i][/i] - [u][/u]- [quote][/quote]



    Live preview


    Response by on 1/7/2009 6:35:04 AM