On 2/22/06, Oliver Taylor oliver@ollieman.net wrote:
I ended up with the following:
#!/usr/bin/perl local $/; # put Perl in "slurp" mode $text = <>; # read in the whole file $text =~ s/(\s+)$//mg; $text =~ s/((INT.|EXT.|I/E.|int.|ext.|i/e.)\s.*)$/***$1/mg; $text =~ s/^([A-Z].*[A-Z )])\n^((.*))$\n(.+)$/\n\t\t\t\t$1\n\t\t\t $2\n\t\t$3\n/mg; $text =~ s/^[A-Z]{2,}.*[A-Z)\d]$\n^(.*)$/\n\t\t\t\t$1\n\t\t$2\n/; $text =~ s/^***//mg; $text =~ s/^(.*(IN:|UP:|TO:))$/\n\t\t\t\t\t\t\t\t\t\t$1\n/; $text =~ s/^(\w+.*(.|?|!|"|-))$\n^\w+.*(.|?|!|"|-)$/\n$1/mg; print $text;
...but it doesn't work at all.
Try this
#!/usr/bin/perl local $/; # put Perl in "slurp" mode $text = <>; # read in the whole file $text =~ s/(\s+)$//mg; $text =~ s{((INT.|EXT.|I/E.|int.|ext.|i/e.)\s.*)$}{***$1}mg; $text =~ s/^([A-Z].*[A-Z )])\n^((.*))\n(.+)$/\n\t\t\t\t$1\n\t\t\t$2\n\t\t$3\n/mg; $text =~ s/^([A-Z]{2,}.*[A-Z)0-9])\n(.*)$/\n\t\t\t\t$1\n\t\t$2\n/mg; $text =~ s/^[*]{3}(.+)$/\n$1\n/mg; $text =~ s/^(.*(IN:|UP:|TO:))$/\n\t\t\t\t\t\t\t\t\t\t$1\n/mg; $text =~ s/^(\w+.*(.|?|!|"|-))\n\w+.*(.|?|!|"|-)$/\n$1/mg; print $text;
The errors I fixed were:
1. No 'mg' options on some of the substitutions. Without 'm', ^ is the beginning of the entire string and $ is the end of the entire string. Without 'g', only the first substitution is made. 2. The use of '$\n' and '\n^' in some of your match strings when just '\n' was needed. The newline defines the beginning and ending of lines; adding a $ or ^ is doubling up. 3. Unnecessary backslash escapes in character classes. Things like asterisks and parentheses are not special in a character class and don't need to be escaped. 4. Unnecessary backslash escapes in the substitution string. The substitution string is not a regex and doesn't follow the same rules as the match string.
The first two will make the matches fail, the second two just make the regex longer and more difficult to read.
-- Dr. Drang