The firstLineMatch in the Mail bundle is '^From: (?=\w+@[\w-]+.\w+)', which doesn't match addresses with names. For example, the first line of my emails is 'From: Grant Hollingworth grant@antiflux.org'.
I changed the match to '^From: .*(?=\w+@[\w-]+.\w+)' (i.e., check for an email address somewhere on the line).
On 14. Nov 2006, at 00:16, Grant Hollingworth wrote:
The firstLineMatch in the Mail bundle is '^From: (?=\w+@[\w-]+.\w +)', which doesn't match addresses with names. For example, the first line of my emails is 'From: Grant Hollingworth grant@antiflux.org'.
I changed the match to '^From: .*(?=\w+@[\w-]+.\w+)' (i.e., check for an email address somewhere on the line).
I changed the default pattern to this as well. As of such, we probably should change \w to [-a-zA-Z0-9_.] and appropriately for the domain name…
Allan Odgaard wrote:
On 14. Nov 2006, at 00:16, Grant Hollingworth wrote:
The firstLineMatch in the Mail bundle is '^From: (?=\w+@[\w-]+.\w +)', which doesn't match addresses with names. For example, the first line of my emails is 'From: Grant Hollingworth grant@antiflux.org'.
I changed the match to '^From: .*(?=\w+@[\w-]+.\w+)' (i.e., check for an email address somewhere on the line).
I changed the default pattern to this as well. As of such, we probably should change \w to [-a-zA-Z0-9_.] and appropriately for the domain name…
From Wikipedia:
According to [RFC 2822][], the local-part of the e-mail may use any of these ASCII characters:
* Uppercase and lowercase letters (case insensitive) * The digits 0 through 9 * The characters, ! # $ % & ' * + - / = ? ^ _ ` { | } ~ * The character "." provided that it is not the first or last character in the local-part.
Additionally, RFC 2821 and RFC 2822 allow the local-part to be a quoted-string, as in "John Doe"@example.com, thus allowing characters in the local-part that would otherwise be prohibited. However, RFC 2821 warns: "a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form".
The domain name is much more restricted. The dot separated domain labels are limited to "letters, digits, and hyphens drawn from the ASCII character set ... Mailbox domains are not case sensitive."
[RFC 2822]: http://tools.ietf.org/html/rfc2822
* Jacob Rus jrus@hcs.harvard.edu [2006-11-14 14:12]:
Yeah, email addresses are pretty free-form. Trying to write an exact regex is pointless. How about: ^From: .*(?=\S+@(?=[\w-]+.)+\w+)
Good enough?