I'm using a perl search/replace string to re-format documents into HTML. I'm using the following for regular paragraphs:
s/^[^\n\t<].*/<p>$&</p>/g;
That takes care of single-line paragraphs like this:
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
This is just like what you'd get from Markdown, paragraphs separated by a blank line are wrapped in a paragraph tag. But I'd like to add a rule that looks for paragraphs that have hard-breaks in them, like this:
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
and wrap them in markup like this:
<p>Lorem ipsum dolor sit amet, <br /> consectetur adipisicing elit, sed do <br /> eiusmod tempor incididunt ut labore <br /> et dolore magna aliqua.</p>
The key is finding lines that end only to be followed by more lines in the same paragraph. My RegEx-fu is okay, but not great, so I've come close, but I can't get it right.
Thanks in advance.
i'm not sure why this is on [TxMt] but you can do it in two steps:
(assuming $_ contains the full body, newlines and all)
## wrap any line that contains at least one non-whitespace character in <p>..</p> s{^(.*\S.*)$}{<p>$1</p>}mg; # /m means let ^$ anchor to each line
## now replace consecutive paragraphs (no blank line) with <br> s{</p>\n<p>}{<br />\n}sg; # /s means treat it all as one string
that gets you: <p>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</p>
<p>This is just like what you'd get from Markdown, paragraphs separated by a blank line are wrapped in a paragraph tag. But I'd like to add a rule that looks for paragraphs that have hard-breaks in them, like this:</p>
<p>Lorem ipsum dolor sit amet,<br /> consectetur adipisicing elit, sed do<br /> eiusmod tempor incididunt ut labore<br /> et dolore magna aliqua.</p>
On Sep 14, 2006, at 11:02 AM, Oliver Taylor wrote:
I'm using a perl search/replace string to re-format documents into HTML. I'm using the following for regular paragraphs:
s/^[^\n\t<].*/<p>$&</p>/g;
That takes care of single-line paragraphs like this:
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
This is just like what you'd get from Markdown, paragraphs separated by a blank line are wrapped in a paragraph tag. But I'd like to add a rule that looks for paragraphs that have hard-breaks in them, like this:
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
and wrap them in markup like this:
<p>Lorem ipsum dolor sit amet, <br /> consectetur adipisicing elit, sed do <br /> eiusmod tempor incididunt ut labore <br /> et dolore magna aliqua.</p>
The key is finding lines that end only to be followed by more lines in the same paragraph. My RegEx-fu is okay, but not great, so I've come close, but I can't get it right.
Thanks in advance.
For new threads USE THIS: textmate@lists.macromates.com (threading gets destroyed and the universe will collapse if you don't) http://lists.macromates.com/mailman/listinfo/textmate
--- michael reece :: software engineer :: mreece@vinq.com