Hello everybody,
First, the caveats and the waffling: I'm new to TextMate, I'm not a programmer, I hope I'm not missing something obvious, I've never posted to a mailing list before.
There, now that's out of the way, here's my problem:
I process a lot of texts for Project Gutenberg. PG has a homemade C program (GutCheck -- http://gutcheck.sourceforge.net) which checks for various features in the text to make sure it's in good shape for posting to PG. I normally run it from Terminal and thought I could easily make a Command to do it from TextMate.
I made my command, and as long as I don't add any arguments, I get back the "usage" information for GutCheck. So far, so good.
However, when I put in my file name (e.g. gutcheck "$TM_FILEPATH"), I don't get anything other than a new empty untitled document and short-lived beach-ball. If I make the command pipe into another (specific) document (e.g. gutcheck "$TM_FILEPATH" > mynewfile.out), I get one line of response from gutcheck, and then the rest of the information is stored in the other document. I then have to open it separately; so it's not much different from running the command in Terminal which is what I'm trying to avoid.
I thought it had something to do with GutCheck reporting to stderr instead of stdout, but there's a switch available to turn that off. It didn't make any difference.
The _weird_ thing though, is that when I have in my command "Output: Show as HTML" (instead of Create New Document), everything is there in the HTML window (but unreadable because there are no line breaks).
What am I missing or doing incorrectly?
Ah, another clue, perhaps: when I run the command on a short file (like this message), it works as I expect (Command: gutcheck "$TM_FILEPATH" Output: Create New Document). But few ebooks are 17 lines long. How do I make it work for real texts (600 to 6000 or more lines)?
Thanks in advance for any advice, Barbara
On Oct 11, 2006, at 5:14 PM, logista wrote:
gutcheck "$TM_FILEPATH"
Works fine over here. Of course, this requires the current document to be a saved file, so you might want to set the command to "Save Current File". Other than that, this should work. Perhaps you could try it on increasingly bigger files? For instance does it work for 400 characters? 1000? 2000?
Haris
On 10/11/06, Charilaos Skiadas skiadas@hanover.edu wrote:
set the command to "Save Current File".
I have from the beginning. Sorry I didn't say so in my first message.
Other than that, this should work. Perhaps you could try it on increasingly bigger files? For instance does it work for 400 characters? 1000? 2000?
With my test file, I can get a good response with 464 lines/4,705 words/26 kb, but if I have one more newline, it fails. I can add more words to the end of line 464 and it will work, though. The test file is about 690 lines.
When I tried a different file (about 6210 lines), the break point changed to line 2981 (27,000+ words/154kb). Again, one more newline causes the problem, but I can add a lot more text (I added the Lorem Ipsum snippet).
---time passes---
OK. I looked at the structure of the lines. The command fails when the ae ligature (æ) appears in the middle of a word. It doesn't fail when it appears at the beginning or the end of a word. A brief test shows that some accented characters also fail (á fails but not é, sometimes ö fails and sometimes it doesn't). I'll have to do some more testing.
Thanks for your response -- seems like I have some more work to do :)
Barbara
On 12. Oct 2006, at 02:09, logista wrote:
[...] OK. I looked at the structure of the lines. The command fails when the ae ligature (æ) appears in the middle of a word [...]
The problem is that in this case GutCheck returns a partial UTF-8 multi-byte. The UTF-8 decoding function which I use seems to just give up on malformed UTF-8, which is why you get no output in this situation.
I will switch to my own UTF-8 function.
For now what you can do is use the HTML output option and then add ‘| pre’ to the command, this should make it not show all on one line (as you initially experienced).
I.e. the command would be:
gutcheck "$TM_FILEPATH"|pre
Maybe we should add a GutCheck command to the Text bundle? It looks like it could be generally useful, if it actually does a good job of what it says it does.
It's very much specialized for PG (most normal people don't care if they're using non-ASCII characters), but I sure would like it :)
On 10/11/06, Allan Odgaard throw-away-1@macromates.com wrote:
For now what you can do is use the HTML output option and then add '| pre' to the command, this should make it not show all on one line (as you initially experienced).
I.e. the command would be:
gutcheck "$TM_FILEPATH"|pre
Thanks! That helps a lot.
Regards, Barbara
logista wrote:
I process a lot of texts for Project Gutenberg. PG has a homemade C program (GutCheck -- http://gutcheck.sourceforge.net) which checks for various features in the text to make sure it's in good shape for posting to PG. I normally run it from Terminal and thought I could easily make a Command to do it from TextMate.
Maybe we should add a GutCheck command to the Text bundle? It looks like it could be generally useful, if it actually does a good job of what it says it does.
-Jacob