Hello,
I'm wondering if anyone here can help with two 'slightly-related to TextMate' problems I've encountered. I've been given a couple of hundred documents in different formats (.doc, .pdf, .rtf) that need to be put into one big document (which will be about 200pp long).
I used 'textutil' to convert most of the documents to txt (ps2ascii for the pdfs), and then gave them all a filename with a number. Using TM, I created a macro that would add the filename at the top of each individual file, and then converted this into "##number ##". I then used 'textutil -cat' to merge them into one file. So now each 'section' has a filename, title, and block of text. I will then turn this into a file format which will allow formatting (LaTeX or maybe RTF, all I need is one big PDF at the end).
I have come across two issues:
1. There are some strange characters (most likely from different character sets?) that have appeared in the text. I am assuming that these are accented characters, smart quotes, and so forth, and was wondering: *How I can automatically convert these characters?* I started doing this using find and replace for the ones I know, but I was wondering if there was some easier way to do it.
some examples follow: actor –meaning Prüm Integration : « new regionalism » i EU’s energy
2. the numbering system I used for the files was based on a primary key from our database, but I've been asked to renumber all the files starting from 1 (for a silly reason, the database started with a primary key of 67, and manually-entered records start at 400.). *Is there any way that I can use TM to convert "##number ##" into "## incremental number ##"?* e.g. in find and replace, use: find: ##(\d*)\w? ## (the \w? is there as a couple of documents are labelled like: 137a) replace: ##(x+1) ##
It's the (x+1) that's bothering me.
Sorry if this is not in the realm of this list (any recommendations?), but any pointers would be really appreciated! Many thanks,
Jamal