Hello,
I'm wondering if anyone here can help with two 'slightly-related to TextMate' problems I've encountered. I've been given a couple of hundred documents in different formats (.doc, .pdf, .rtf) that need to be put into one big document (which will be about 200pp long).
I used 'textutil' to convert most of the documents to txt (ps2ascii for the pdfs), and then gave them all a filename with a number. Using TM, I created a macro that would add the filename at the top of each individual file, and then converted this into "##number ##". I then used 'textutil -cat' to merge them into one file. So now each 'section' has a filename, title, and block of text. I will then turn this into a file format which will allow formatting (LaTeX or maybe RTF, all I need is one big PDF at the end).
I have come across two issues:
1. There are some strange characters (most likely from different character sets?) that have appeared in the text. I am assuming that these are accented characters, smart quotes, and so forth, and was wondering: *How I can automatically convert these characters?* I started doing this using find and replace for the ones I know, but I was wondering if there was some easier way to do it.
some examples follow: actor –meaning Prüm Integration : « new regionalism » i EU’s energy
2. the numbering system I used for the files was based on a primary key from our database, but I've been asked to renumber all the files starting from 1 (for a silly reason, the database started with a primary key of 67, and manually-entered records start at 400.). *Is there any way that I can use TM to convert "##number ##" into "## incremental number ##"?* e.g. in find and replace, use: find: ##(\d*)\w? ## (the \w? is there as a couple of documents are labelled like: 137a) replace: ##(x+1) ##
It's the (x+1) that's bothering me.
Sorry if this is not in the realm of this list (any recommendations?), but any pointers would be really appreciated! Many thanks,
Jamal
On 03/10/2007, Jamal Shahin jshahin@gmail.com wrote:
- the numbering system I used for the files was based on a primary
key from our database, but I've been asked to renumber all the files starting from 1 (for a silly reason, the database started with a primary key of 67, and manually-entered records start at 400.). *Is there any way that I can use TM to convert "##number ##" into "## incremental number ##"?* e.g. in find and replace, use: find: ##(\d*)\w? ## (the \w? is there as a couple of documents are labelled like: 137a) replace: ##(x+1) ##
It's the (x+1) that's bothering me.
You can do this easily enough using a one-line Perl script, as follows.
1. Choose "Filter through command…" from the Text menu, or press apple-option-R. 2. Choose Input: Document, and Output: Replace document. 3. For the command, put the following:
perl -le 's/##\d+\w? ##/++$n/eg'
That will convert your numbers to 1, 2, 3, 4, etc. in order, which is what I think you want.
Robin
On 03/10/2007, Robin Houston robin.houston@gmail.com wrote:
- For the command, put the following:
perl -le 's/##\d+\w? ##/++$n/eg'
Sorry, I pasted the wrong thing here! It should read
perl -pe 's/##\d+\w? ##/"##".++$n." ##"/eg'
Robin
Thank you Robin, that works perfectly, and was just what I needed!
Regarding the other issue I had, I managed to solve it myself: I just put an '-inputencoding UTF-8' argument on the textutil command, which seemed to make everything appear properly in the merged document.
Thanks again Robin.
Best wishes, Jamal
- For the command, put the following:
perl -pe 's/##\d+\w? ##/"##".++$n." ##"/eg'
Robin