Friday, September 15, 2006

email address validation

Recently this post talks about Google’s feature called plus-addressing.
Gmail has an interesting quirk where you can add a plus sign (+) after your Gmail address, and it’ll still get to your inbox. It’s called plus-addressing, and it essentially gives you an unlimited number of e-mail addresses to play with.

My immediate thought for me is: Does most sites allow ‘+’ as a valid character to appear in email address? Most sites will reject this email address. If this is a standard then this breaks our application, not able to accept valid email address.

Other mail server like “Fastmail” also claims to have this feature.

Somebody opposed that this not a feature at all, its part of standard RFC 2822 for email. This is a way to send comments in-line with the email address.

So what would be regular expression to validate email address as per RFC 2822 standards? Not sure, but though according to this page, a regular expression for validation according a previous RFC (822) for email addressing is
http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

The grammar described in RFC 822 is suprisingly complex. Implementing validation with regular expressions somewhat pushes the limits of what it is sensible to do with regular expressions, although Perl copes well:

(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t] )+|Z|(?=[["()
<>@,;:\".[]]))|"(?:[^\"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*
(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^\"r]|.|
(?:(?:rn)?[t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:
(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?
:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))
|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z
|(?=[["()<>@,;:\".[]]))|"(?:[^\"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)
?[ t])*(?:@(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".
[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]
+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*
(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>
@,;:\".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[
] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r]|.)*](?:(?:rn)?
[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[
t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^\"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])
*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["(
)<>@,;:\".[]]))|"(?:[^\"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[
t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([
^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?
:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:r
n)?[ t])*)|(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]
]))|"(?:[^\"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*:(?:(?:rn)?[ t])*(?:(?:(?:
[^()<>@,;:\".[]00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^\"r]|.|
(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 0
0-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^\"r]|.|(?:(?:rn)?[ t
]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)
?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn
)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[
]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn
)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^\"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?
[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=
[["()<>@,;:\".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<
>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r]|.)*](
?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)
?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)
?[ t])*(?:[^()<>@,;:\".[]00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))
|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:\".[]00-31
]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^\"r]|.|(?:(?:rn)?[
t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?
:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^\"r]|.|(?:(?:rn)?[ t]))*"(?:(
?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])
+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])
*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]
r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)(?:,s*(?:(?:[^()<>@,;:\".[] 00-31]+
(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^\"r]|.|(?:(?:rn)?[ t]))*"(?:(?:
rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(
?=[["()<>@,;:\".[]]))|"(?:[^\"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn
)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|
[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:
(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^(
)<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^\"r]|.|(
?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:\".[] 00-
31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])
*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["(
)<>@,;:\".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<
>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r]|.)*]
?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?
[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)
?[ t])*)?(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]
]))|"(?:[^\"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[
^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^
"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,
;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r]|.
)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(
?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*
))*>(?:(?:rn)?[ t])*))*)?;s*)


This regular expression will only validate addresses that have had any comments stripped and replaced with whitespace (this is done by the module).

There could be ways of breaking this single expression into smaller modules, but still this is the one :-o

Recommended Blog Posts