Cheers everyone!
I do not now if regular expressions are involved in the way TextMate detects URLs in text, but I'd gather they do. In that case, John Gruber has just compiled a regexp that seems to make an even better job at finding URLs embedded in plain text (even surrounded by parenthesis, or LaTeX code). First link contains the description, second link contains a text case page:
http://daringfireball.net/2010/07/improved_regex_for_matching_urls http://daringfireball.net/misc/2010/07/url-matching-regex-test-data.text
The only problem I find with it is that the references to LaTeX parts, sections, chapters, etc., built from the LaTeX templates would be matched as well. So, using as inspiration the last expression he offers (only for http/https links), I have generalised it to include also ftp, sftp, smb, afp, and telnet:
(?xi) \b ( # Capture 1: entire matched URL (?: https?:// # http or https protocol | # or s?ftps?:// # sftp or ftp or ftps protocol | # or smb:// # smb protocol | # or afp:// # Apple file sharing protocol | # or telnet:// # telnet protocol | # or www\d{0,3}[.] # "www.", "www1.", "www2." … "www999." | # or [a-z0-9.-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash ) (?: # One or more: [^\s()<>]+ # Run of non-space, non-()<> | # or (([^\s()<>]+|(([^\s()<>]+)))*) # balanced parens, up to 2 levels )+ (?: # End with: (([^\s()<>]+|(([^\s()<>]+)))*) # balanced parens, up to 2 levels | # or [^\s`!()[]{};:'".,<>?«»“”‘’] # not a space or one of these punct chars ) )
Can this be built into TextMate, or where should I change if I wanted it just for personal use?
Thanks!
-- Juande Santander Vela Applied Scientist, Archive Management Group Archive Department, Data Management & Operations Division European Southern Observatory (Germany)
Felix Klein: Todo el mundo sabe lo que es una curva, hasta que estudia suficientes matemáticas como para confundirse con la innumerable cantidad de excepciones posibles.
On 28 Jul 2010, at 10:58, Juande Santander Vela wrote:
I do not now if regular expressions are involved in the way TextMate detects URLs in text, but I'd gather they do [...]
Indeed they are: http://manual.macromates.com/en/language_grammars
[...] The only problem I find with it is that the references to LaTeX parts, sections, chapters, etc., built from the LaTeX templates would be matched as well.
I had a look at his earlier pattern(s) and I think my conclusion was also, that we would just get underlining for more false positives. What we presently use (in the Text grammar) is this rule:
{ name = 'markup.underline.link.text'; match = '(?x) ( (https?|s?ftp|ftps|file|smb|afp|nfs|(x-)?man|gopher|txmt)://|mailto:) [-:@a-zA-Z0-9_.,~%+/?=&#]+(?<![.,?:]) '; },
It’s fairly simple compared to Gruber’s pattern, but it has worked quite well for me.
Can this be built into TextMate, or where should I change if I wanted it just for personal use?
Do you want this for text files? If so, you should edit the Text grammar in the Text bundle. See the above link to language grammars.
Of course if it works well we can put it in the default bundle — we prefer receiving pull requests for the bundle in question via GitHub (for the text bundle that is http://github.com/textmate/text.tmbundle) — don’t know if you are familiar with Git?
Thanks for the answers, Allan!
What I wanted to do is to know where to modify it by myself, and you already answered that.
I am not directly familiar with git, but I am used to svn and cvs, so it should not be a problem.
I will have a look at how to merge both expressions, so that complex URLs with recognized protocols (meaning URLs with parentheses, and with Unicode, like http://en.wikipedia.org/wiki/2001:_A_Space_Odyssey_(film), or http://%E2%9E%A1.ws/%E4%A8%B9). Protocols for operating system or application URLs, like itms, message, skype, might also be interesting to have.
Thanks again!
El 28/07/2010, a las 11:25, Allan Odgaard escribió:
On 28 Jul 2010, at 10:58, Juande Santander Vela wrote:
I do not now if regular expressions are involved in the way TextMate detects URLs in text, but I'd gather they do [...]
Indeed they are: http://manual.macromates.com/en/language_grammars
[...] The only problem I find with it is that the references to LaTeX parts, sections, chapters, etc., built from the LaTeX templates would be matched as well.
I had a look at his earlier pattern(s) and I think my conclusion was also, that we would just get underlining for more false positives. What we presently use (in the Text grammar) is this rule:
{ name = 'markup.underline.link.text'; match = '(?x) ( (https?|s?ftp|ftps|file|smb|afp|nfs|(x-)?man|gopher|txmt)://|mailto:) [-:@a-zA-Z0-9_.,~%+/?=&#]+(?<![.,?:]) '; },
It’s fairly simple compared to Gruber’s pattern, but it has worked quite well for me.
Can this be built into TextMate, or where should I change if I wanted it just for personal use?
Do you want this for text files? If so, you should edit the Text grammar in the Text bundle. See the above link to language grammars.
Of course if it works well we can put it in the default bundle — we prefer receiving pull requests for the bundle in question via GitHub (for the text bundle that is http://github.com/textmate/text.tmbundle) — don’t know if you are familiar with Git?
-- Juande Santander Vela Applied Scientist, Archive Management Group Archive Department, Data Management & Operations Division European Southern Observatory (Germany)
Niels Bohr: Un experto es una persona que ha cometido todos los errores que se pueden cometer en un determinado campo.