[TxMt] Huge files in Textmate ( ***long*** )

Richard Vile hashiru at gmail.com
Tue Mar 28 01:07:08 UTC 2006


Hi,

I'm new to Textmate and enjoying it a lot.

This weekend, I was playing around with the good old Sieve of Eratosthenes.
That was the very first computer program I ever wrote (way back in 1970 in
Fortran IV - ran it on an IBM1130 with 8K - yes that's K! of core memory).
It computed all the primes less than 10,000 and its runtime (determined by
using my wristwatch while looking through the window into the computer room
- timed from when the operator loaded the cards into the reader until the
output started on the line printer) was about 90 seconds, not counting the
printout which probably took longer than the compute time.

Just out of curiosity, I wrote a cheesy, q&d little python script to do the
sieve:

#!/usr/local/env python
import sys, os

"""
The sieve of Eratosthenes. Compute all primes less than some integer
(given here by 'bound')
"""
def printem(sieve):
    # Any sieve entries still not 0 must be primes
    # (Eratosthenes - antiquity)
    px = 0
    for p in range(0, len(sieve[2:])):
        if sieve[p] != 0:
            print sieve[p],
            px += 1
            if (px % 10) == 0:
                print

def sievem(bound=0):
    if bound == 0:
        sys.stdout.write('How many integers should I sieve? ')
        bound = int(sys.stdin.readline()[:-1])
    sieve = range(0, bound+1)
    remove = 2
    while remove * remove < bound+1:
        for pm in range(remove*remove, bound+1, remove):
            if pm < bound + 1:
                sieve[pm] = 0
        # Find "next" prime from the remaining sieve elements
        for np in range(remove+1, bound+1):
            if sieve[np] != 0:
                remove = np
                break
    printem (sieve)
    print

if __name__ == '__main__':
    sievem(10000)

I ran 'time python sieve.py' and got:

real    0m0.816s
user    0m0.127s
sys     0m0.215s

This on my dual 867mhz Mac G4 with 2Gb RAM. Not surprising that it's faster.
But comparing python to compiled Fortran seems a little unfair, so I ginned
up a little C program and compiled that for N=10000 and timed it:

(list of primes < 10000)
...
found 1229 primes

real    0m0.331s
user    0m0.005s
sys     0m0.022s

That's about 272 times as fast as the old IBM machine, but that's not the
point of this story.

I decided to see how far I could push the calculations on my machine without
tying it up for a week and without spending a lot of time fiddling with the
algorithm. I finally wound up computing all the primes less than 1 billion
and redirecting the output to a file. (Googling confirmed that the number of
primes computed by my code was correct). The resulting file is fairly large:


-rw-rw-r--   1 dick  staff  507044566 Mar 27 17:54 primes_upto_a_billion.txt

So, I decided to see how various programs would handle loading, displaying
and (shudder) manipulating this file. First, I burned it to a CD using Toast
with little or no difficulty.

Then I experimented with BBEdit (Lite), irEdit, Eclipse3.1, SeaMonkey
(Mozilla browser), Alpha, etc. etc. with very mixed results. Most of them
either loaded the file (and then were very sluggish about navigating it),
had to be force killed, or cleanly gave up. The latter was the case for most
of the Java based apps, since they probably defaulted to starting the JavaVM
with too little heap: it's obviously gonna take at least 600mb or more.  The
programs that succeeded usually showed both Real and Virtual memory sizes in
the range of 900+ Mb.

Finally, I got around to trying TextMate. Well, the results were
disappointing. It crashed before finishing its load:

TextMate(21976,0xa000ef98) malloc: *** vm_allocate(size=1073741824) failed
(error code=3)
TextMate(21976,0xa000ef98) malloc: *** error: can't allocate region
TextMate(21976,0xa000ef98) malloc: *** set a breakpoint in szone_error to
debug
terminate called after throwing an instance of 'std::bad_alloc'
  what():  St9bad_alloc

My interpretation of this console output is that TextMate was trying to
acquire over a gigabyte of memory with a single request. Not clear (a) why
this much would be needed for a slightly over 500mb file and (b) why my
machine couldn't respond, since I regularly have Inactive size at or near
1Gb and lots more available from programs that at idle could be forced to
page out their real memory to disk.

Anyway, I thought it was all interesting and I would like to hear people's
reactions about the whole topic of editor scalability and editing huge
files. This does have real-world ramifications. E.g. the product I work on
is a large Financial Analysis programming language written in C and bits of
C++, which implements a proprietary database format for time-series data
storage. When there are problems in debugging, we generate "slog" files
(selective logs) which routinely in the worse cases can approach 1Gb in
size. We've never really found a reasonably way to deal with the largest of
these and usually resort to other methods, but it would be nice some day to
be able to actually edit them with some measure of efficiency. I'm sure most
readers of this list have had similar experiences.

In addition, when I had finished experimenting with the other editors /
browsers / IDEs, and went to quit my existing TextMate session, it took
quite some time. I got several spinning beachballs in the process. My take
on this is that my experimenting caused lots of TextMates working storage to
be paged out and that it had to fault all that stuff back into it's working
set. That kind of thing seems to happen with TextMate in general: e.g. when
I accidentally hover too long in the File menu on Open Recent... TextMate
spins that beachball for all it's worth - often taking 10 or 15 seconds to
come back to life. What's up with that? Does it try to generate this from
project files and have to read through the equivalent of 1000's of status
entries, many perhaps out on disk? Whatever - it's quite annoying and I've
tried to force myself to use Cmd-O whenever possible.

Sorry for the length of this, I just couldn't resist. (How many of you made
it all the way to the end?)

-- Dick Vile at home in Dexter, MI USA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.macromates.com/textmate/attachments/20060327/ac21a2c9/attachment.html>


More information about the textmate mailing list