[TxMt] regex question

Dr. Drang drdrang at gmail.com
Thu Feb 23 05:18:51 UTC 2006


On 2/22/06, Oliver Taylor <oliver at ollieman.net> wrote:

> I ended up with the following:
>
> #!/usr/bin/perl
>         local $/;             # put Perl in "slurp" mode
>         $text = <>;           # read in the whole file
>         $text =~ s/(\s+)$//mg;
>         $text =~ s/((INT.|EXT.|I\/E.|int.|ext.|i\/e.)\s.*)$/\*\*\*$1/mg;
>         $text =~ s/^([A-Z].*[A-Z \)])\n^(\(.*\))$\n(.+)$/\n\t\t\t\t$1\n\t\t\t
> $2\n\t\t$3\n/mg;
>         $text =~ s/^[A-Z]{2,}.*[A-Z\)\d]$\n^(.*)$/\n\t\t\t\t$1\n\t\t$2\n/;
>         $text =~ s/^\*\*\*//mg;
>         $text =~ s/^(.*(IN:|UP:|TO:))$/\n\t\t\t\t\t\t\t\t\t\t$1\n/;
>         $text =~ s/^(\w+.*(\.|\?|\!|\"|\-))$\n^\w+.*(\.|\?|\!|\"|\-)$/\n$1/mg;
>         print $text;
>
> ...but it doesn't work at all.

Try this

#!/usr/bin/perl
local $/;             # put Perl in "slurp" mode
$text = <>;           # read in the whole file
$text =~ s/(\s+)$//mg;
$text =~ s{((INT\.|EXT\.|I/E\.|int\.|ext\.|i/e\.)\s.*)$}{***$1}mg;
$text =~ s/^([A-Z].*[A-Z
)])\n^(\(.*\))\n(.+)$/\n\t\t\t\t$1\n\t\t\t$2\n\t\t$3\n/mg;
$text =~ s/^([A-Z]{2,}.*[A-Z)0-9])\n(.*)$/\n\t\t\t\t$1\n\t\t$2\n/mg;
$text =~ s/^[*]{3}(.+)$/\n$1\n/mg;
$text =~ s/^(.*(IN:|UP:|TO:))$/\n\t\t\t\t\t\t\t\t\t\t$1\n/mg;
$text =~ s/^(\w+.*(\.|\?|\!|\"|\-))\n\w+.*(\.|\?|\!|\"|\-)$/\n$1/mg;
print $text;

The errors I fixed were:

1. No 'mg' options on some of the substitutions. Without 'm', ^ is the
beginning of the entire string and $ is the end of the entire string.
Without 'g', only the first substitution is made.
2. The use of '$\n' and '\n^' in some of your match strings when just
'\n' was needed. The newline defines the beginning and ending of
lines; adding a $ or ^ is doubling up.
3. Unnecessary backslash escapes in character classes. Things like
asterisks and parentheses are not special in a character class and
don't need to be escaped.
4. Unnecessary backslash escapes in the substitution string. The
substitution string is not a regex and doesn't follow the same rules
as the match string.

The first two will make the matches fail, the second two just make the
regex longer and more difficult to read.

--
Dr. Drang



More information about the textmate mailing list