Re: [TxMt] regex question

List overview All Threads
Download

newer

older

regex question

Moving focus to project drawer

Oliver Taylor

22 Feb 2006 22 Feb '06

9:08 a.m.

So, to restate my question in a more clear way... (thanks for your patience Haris) I know nothing about programming...

I'm trying to run a perl search/replace command via a 'bundle command'. I've entered the following into the 'Edit Command' box:

perl -pe ' s/^([A-Z]+.*[A-Z]*\s*)\n((.+))\n(.+)$/\n\n\t\t\t\t$1\n\t\t\t$2\n\n $3/g; '

I've set the input to 'Entire Document' and the output to 'Create New Document'.

But when I run the command none of the tabs or newlines I've specified in the replace section of the command are applied to the text in question. I've also tried the command with the 'e' option at the end which, as far as I can tell, is supposed to evaluate the right side of the search/replace command as regex... but it's not working.

Any help is appreciated.

Just so you know, I'm trying to write a set of commands that will transform the plai-text output from Final Draft when exporting a screenplay. If I can get this working then I can bring all my scripts over to TextMate.

As an example, this: ----------------- OLIVER (I want to tell you) I've got things to say. Dr. Robert ----------------- should look like this: ---------------------------

OLIVER (I want to tell you) I've got things to say.

Dr. Robert ---------------------------

Show replies by date

Charilaos Skiadas

22 Feb 22 Feb

9:21 a.m.

New subject: regex question

On Feb 22, 2006, at 2:08 AM, Oliver Taylor wrote:

...

So, to restate my question in a more clear way... (thanks for your patience Haris) I know nothing about programming...

You've done amazingly well with the bundle so far. I'm very impressed.

...

I'm trying to run a perl search/replace command via a 'bundle command'. I've

First of all, why don't you use the find&replace feature of TextMate? Then you could record the whole thing as a macro.

...

entered the following into the 'Edit Command' box:

perl -pe ' s/^([A-Z]+.*[A-Z]*\s*)\n((.+))\n(.+)$/\n\n\t\t\t\t$1\n\t\t\t$2\n \n$3/g; ' I've set the input to 'Entire Document' and the output to 'Create New Document'.

Create New Document? So, you don't want it to replace the text in the current document?

...

But when I run the command none of the tabs or newlines I've specified in the replace section of the command are applied to the text in question. I've also tried the command with the 'e' option at the end which, as far as I can tell, is supposed to evaluate the right side of the search/replace command as regex... but it's not working.

I don't know much about the perl regexp stuff, but if you are seeing neither tabs nor \t's in the resulting document, that can only mean that the regexp is not matching. The reason I think is the following: Not knowing how the perl engine works, there are two possibilities: 1) It searches each line separately, in which case it won't match the multiline search you are performing 2) (more likely) it does a search over the whole thing, in which case I think that ^ and $ might actually stand for the beginning and end of the input, instead of ends of lines. So maybe a search without ^ and with ([^\n]+) instead of (.+)$ might do the trick.

Or maybe I've gotten the whole thing wrong.

Haris

Allan Odgaard

10:35 a.m.

New subject: regex question

On 22/2/2006, at 9:21, Charilaos Skiadas wrote:

...

...
But when I run the command none of the tabs or newlines I've specified in the replace section of the command are applied to the text in question [...]

[...]

It searches each line separately, in which case it won't match

the multiline search you are performing

That’s exactly it! :)

The -p switch puts a loop around the perl code, which execute that code for each line in the input (stdin).

So: perl -pe 'foo' translates to:

for each line in the input print result of running “foo” on line

Loops and such are generally easier in Ruby, so instead try:

ruby -e ' print STDIN.read.gsub( /^([A-Z]+.*[A-Z]*\s*)\n((.+))\n(.+)$/, "\n\n\t\t\t\t\1\n\t\t\t\2\n\n\3" ) '

So what’s s/«regexp»/«replacement»/ in Perl is sub(/«pattern»/, «replacement») in Ruby. And instead of adding ‘g’ as option to make it “global”, we use gsub.

In addition, in Ruby we have to use \1 instead of $1 etc.

In Perl the s/«regexp»/«replacement»/ runs on the $_ variable (which is the current line, when used with the -p switch). Ruby has a similar feature (both -p and $_ as “accumulator” reguster), but in the above we explicitly call gsub on all we read from STDIN, instead of doing it line-by-line.

Paul McCann

2:32 p.m.

New subject: regex question

Hi Oliver,

...

I'm trying to run a perl search/replace command via a 'bundle command'. I've entered the following into the 'Edit Command' box:

perl -pe ' s/^([A-Z]+.*[A-Z]*\s*)\n((.+))\n(.+)$/\n\n\t\t\t\t$1\n\t\t\t$2\n \n$3/g; '

I've set the input to 'Entire Document' and the output to 'Create New Document'.

The problem there is that the text won't be matching. You need to indicate to perl that you want a multiline match to occur (ie, that the regular expression should apply to all of the text in the document, not just one line at a time). This is done by using the "m" qualifier, as in

...

perl -pe 's/^([A-Z]+.*[A-Z]*\s*)\n((.+))\n(.+)$/\n\n\t\t\t\t$1\n\t \t\t$2\n\n$3/mg;'

But written this way perl is just seeing the first line of the document; in order for it to see a paragraph at a time (where a paragraph is delimited by a blank line) you can use the flag "-000" (three zeroes):

...

perl -000 -pe 's/^([A-Z]+.*[A-Z]*\s*)\n((.+))\n(.+)$/\n\n\t\t\t\t $1\n\t\t\t$2\n\n$3/mg;'

This gets you pretty close to what you're seeking, and I imagine you can tweak it to get it exactly right:

=======================================================================

OLIVER (I want to tell you)

I've got things to say. Dr. Robert

=======================================================================

Good luck, Paul

Dr. Drang

4:17 p.m.

New subject: regex question

Oliver,

By default, Perl's -p switch causes each line of the input to be

1. read, 2. operated on by whatever commands are given (typically an s///), and 3. printed.

So, as Paul McCann said, Perl can't do what you want it to because the command is seeing only one line of input at a time. To operate on the whole file at once, use the -0777 switch. Using -000 to read a paragraph at a time would work for this specific case, but -0777 is usually the more general solution.

I am wondering, though, why you are trying to do this as a one-liner. As far as I know, bundle commands can start with a "shebang" line, so you can use a full, multiline Perl program if the command starts with #!/usr/bin/perl. Thus:

#!/usr/bin/perl local $/; # put Perl in "slurp" mode $text = <>; # read in the whole file $text =~ s/^([A-Z ]+)\n((.+))\n(.+)$/\n\n\t\t\t\t$1\n\t\t\t$2\n \n$3/mg; print $text;

should do the trick. I changed the first part of your regex so the capitalized line can contain only capital (unaccented) letters and spaces, which is what I thought you wanted. The '.*' you have in your regex allows any character in that line.

FYI, the 'g' option at the end of the substitution command makes the substitution work for every instance of the pattern rather than just the first. The 'm' option allows ^ and $ to match beginnings and endings of lines in multiline text. It's common to see the 's' option in multiline regexes, but that would break this one; it allows '.' to match the newline character, and we're relying on '.' *not* matching newline.

I meant to mention yesterday that the most commonly recommended reference on regexes is Friedl's _Mastering Regular Expressions_ from O'Reilly. It's probably more encyclopedic than you want, but it's very highly regarded. For Perl-specific regex help, there are any number of tutorials on the Internet. There's also _Programming Perl_ from O'Reilly and the 'perlre' man page.

On Feb 22, 2006, at 7:32 AM, Paul McCann wrote:

...

Hi Oliver,

...
I'm trying to run a perl search/replace command via a 'bundle command'. I've entered the following into the 'Edit Command' box:

perl -pe ' s/^([A-Z]+.*[A-Z]*\s*)\n((.+))\n(.+)$/\n\n\t\t\t\t$1\n\t\t\t$2\n \n$3/g; '

I've set the input to 'Entire Document' and the output to 'Create New Document'.

The problem there is that the text won't be matching. You need to indicate to perl that you want a multiline match to occur (ie, that the regular expression should apply to all of the text in the document, not just one line at a time). This is done by using the "m" qualifier, as in

...
perl -pe 's/^([A-Z]+.*[A-Z]*\s*)\n((.+))\n(.+)$/\n\n\t\t\t\t$1\n \t\t\t$2\n\n$3/mg;'

But written this way perl is just seeing the first line of the document; in order for it to see a paragraph at a time (where a paragraph is delimited by a blank line) you can use the flag "-000" (three zeroes):

...
perl -000 -pe 's/^([A-Z]+.*[A-Z]*\s*)\n((.+))\n(.+)$/\n\n\t\t\t\t $1\n\t\t\t$2\n\n$3/mg;'

This gets you pretty close to what you're seeking, and I imagine you can tweak it to get it exactly right:

====================================================================== =
		OLIVER
	(I want to tell you)
I've got things to say. Dr. Robert

====================================================================== =

Good luck, Paul

-- Dr. Drang

7120

days inactive

7120

days old

textmate@lists.macromates.com

4 comments

participants

tags (0)

participants (5)

Allan Odgaard
Charilaos Skiadas
Dr. Drang
Oliver Taylor
Paul McCann