[TxMt] help with perl search/replace for paragraphs

Thu Sep 14 18:51:35 UTC 2006

i'm not sure why this is on [TxMt] but you can do it in two steps:

(assuming $_ contains the full body, newlines and all)

## wrap any line that contains at least one non-whitespace character  
in <p>..</p>
  s{^(.*\S.*)$}{<p>$1</p>}mg;	# /m means let ^$ anchor to each line

## now replace consecutive paragraphs (no blank line) with <br>
  s{</p>\n<p>}{<br />\n}sg;            # /s means treat it all as one  
string

that gets you:
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do  
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim  
ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut  
aliquip ex ea commodo consequat.</p>

<p>This is just like what you'd get from Markdown, paragraphs  
separated by a blank line are wrapped in a paragraph tag. But I'd  
like to add a rule that looks for paragraphs that have hard-breaks in  
them, like this:</p>

<p>Lorem ipsum dolor sit amet,<br />
consectetur adipisicing elit, sed do<br />
eiusmod tempor incididunt ut labore<br />
et dolore magna aliqua.</p>

On Sep 14, 2006, at 11:02 AM, Oliver Taylor wrote:

> I'm using a perl search/replace string to re-format documents into  
> HTML. I'm using the following for regular paragraphs:
>
> s/^[^\n\t\<].*/<p>$&<\/p>/g;
>
> That takes care of single-line paragraphs like this:
>
> Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do  
> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim  
> ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut  
> aliquip ex ea commodo consequat.
>
> This is just like what you'd get from Markdown, paragraphs  
> separated by a blank line are wrapped in a paragraph tag. But I'd  
> like to add a rule that looks for paragraphs that have hard-breaks  
> in them, like this:
>
> Lorem ipsum dolor sit amet,
> consectetur adipisicing elit, sed do
> eiusmod tempor incididunt ut labore
> et dolore magna aliqua.
>
> and wrap them in markup like this:
>
> <p>Lorem ipsum dolor sit amet, <br />
> consectetur adipisicing elit, sed do <br />
> eiusmod tempor incididunt ut labore <br />
> et dolore magna aliqua.</p>
>
> The key is finding lines that end only to be followed by more lines  
> in the same paragraph. My RegEx-fu is okay, but not great, so I've  
> come close, but I can't get it right.
>
> Thanks in advance.
>
> ______________________________________________________________________
> For new threads USE THIS: textmate at lists.macromates.com
> (threading gets destroyed and the universe will collapse if you don't)
> http://lists.macromates.com/mailman/listinfo/textmate

---
michael reece :: software engineer :: mreece at vinq.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.macromates.com/textmate/attachments/20060914/bf70efcb/attachment.html>