Hello all,
I'm developing a bundle for handling biomolecular structure files (PDB files). I'm now working on getting the language description sorted out. In doing this I ran into a regex problem.
The situation: A PDB file is essentially a table. One line from such a table can be as follows:
ATOM 14 CA GLN A 2 -27.648 -9.581 30.325 1.00 10.00
In the language description I would like to use a regular expression construct that matched 'GLN' only if the line starts with either 'ATOM' or 'HETATM'. This seams like a conditional regular expression but my attempts to implement it at such have failed so far.
Can anyone help?
Thanks, Marc
Dijk van Marc wrote:
A PDB file is essentially a table. One line from such a table can be as follows:
ATOM 14 CA GLN A 2 -27.648 -9.581 30.325 1.00 10.00
In the language description I would like to use a regular expression construct that matched 'GLN' only if the line starts with either 'ATOM' or 'HETATM'. This seams like a conditional regular expression but my attempts to implement it at such have failed so far.
Can anyone help?
Is this what you mean?
^(ATOM|HETATM).*GLN
This will match any line which starts with either 'ATOM' or 'HETATM', followed by any string of arbitrary characters, followed by the string 'GLN'.
Dijk van Marc wrote:
In the language description I would like to use a regular expression construct that matched 'GLN' only if the line starts with either 'ATOM' or 'HETATM'. This seams like a conditional regular expression but my attempts to implement it at such have failed so far.
See § 20.3 of TextMate's help.
On May 22, 2009, at 12:56 PM, Dijk van Marc wrote:
The situation: A PDB file is essentially a table. One line from such a table can be as follows:
ATOM 14 CA GLN A 2 -27.648 -9.581 30.325 1.00 10.00
In the language description I would like to use a regular expression construct that matched 'GLN' only if the line starts with either 'ATOM' or 'HETATM'. This seams like a conditional regular expression but my attempts to implement it at such have failed so far.
Start a begin/end match with a look-ahead on ATOM|HETATM:
(?:ATOM|HETATM)
And an end of $ (end of line). Inside the pattern rules have a match for the GLN.
You could use a conditional regex to do this if it was before the ATOM, but you can't do look-behinds of an unspecified length.