On Jan 24, 2006, at 10:02 AM, Allen wrote:
Ah okay -- so basically it's the conversion you want help with?
yes.
I assume it's because you are not familiar with any programming language (which I must say, your bundle is pretty impressive if you are not -- even if you are, it still is impressive -- do you mind I link to the intro screencast in the RSS feed as an example of behavioral patterns in TM? I can keep a local cache of the bundle if you're concerned about bandwidth).
I'd be glad to host the screencast, if bandwidth becomes a problem, I'll let you know. And yes, I know exactly zero about programming languages. The first time I used a regexp was for this bundle.
I would suggest using htmldoc [1] for the HTML -> PDF conversion.
I'll check it out.
So what you want is to make a regular expression to match each construct in your format, which you already did in the language grammar, and then as the replacement string you specify how it should be transformed. Here you can use $& to refer to the entire match and $1-$n for captures (stuff captured with (…)).
Okay, based on what you posted here before, I added a few things that were missing and synced these with the language def.
#!/usr/bin/perl -p
s/&/&/g; #ampersands s/</</g; #reserved for HTML s/>/&lgt;/g; #reserved for HTML - maybe this is unnecessary? s/^EXT..*$/<h2>$&</h2>/; #scene heading s/^INT..*$/<h2>$&</h2>/; #scene heading s/^I/E..*$/<h2>$&</h2>/; #scene heading s/^[A-Z].*-\s[A-Z].*/<h2>$&</h2>/; #arbitrary scene heading ending with a time s/^[A-Z].*-\s*$/<h2>$&</h2>/; #arbitrary scene heading NOT ending with a time s/^\w.*$/<p>$&</p>/; #paragraph s///(.*)///<!-- $1 -->/g; #comments s/*(.*)*/<em>$1</em>/; #italics s/^(\t{4})([^\t].*)$/<dl>$1<dt>$2</dt>/; #characters s/^(\t{3})([^\t].*)$/$1<dd class="parenthetical"> $2 </dd>/; #parenthetical s/^(\t{2})([^\t].*)$/$1<dd>$2</dd></dl>/; #dialogue s/^(\t{10})([^\t].*:)$/$1<h3>$2</h3>/; #transition (right) s/^[A-Z].*:\s*/<h4>$&</h4>/; # transition (left)
The only one that's not working properly is the last one. It's baffling to be because it's the same regexp as in the language.
If you need further help, let me know (as I have no idea what your shell/programming skills are).
I have no programming skills other that those I've already demonstrated. Zip
There are a few steps left in the process that need to be addressed. Next the HTML marked-up text (as generated by the above script) needs to be inserted into an actual HTML document with doctype declarations, CSS etc. And somehow (again, I have no idea how) it needs to be transfered to a PDF authoring environment (htmldoc or whatever).
Lastly, thank all of you. It's great to give something to a community and get so much back.