Hi,
I'm new to Textmate and enjoying it a lot.
This weekend, I was playing around with the good old Sieve of Eratosthenes. That was the very first computer program I ever wrote (way back in 1970 in Fortran IV - ran it on an IBM1130 with 8K - yes that's K! of core memory). It computed all the primes less than 10,000 and its runtime (determined by using my wristwatch while looking through the window into the computer room - timed from when the operator loaded the cards into the reader until the output started on the line printer) was about 90 seconds, not counting the printout which probably took longer than the compute time.
Just out of curiosity, I wrote a cheesy, q&d little python script to do the sieve:
#!/usr/local/env python import sys, os
""" The sieve of Eratosthenes. Compute all primes less than some integer (given here by 'bound') """ def printem(sieve): # Any sieve entries still not 0 must be primes # (Eratosthenes - antiquity) px = 0 for p in range(0, len(sieve[2:])): if sieve[p] != 0: print sieve[p], px += 1 if (px % 10) == 0: print
def sievem(bound=0): if bound == 0: sys.stdout.write('How many integers should I sieve? ') bound = int(sys.stdin.readline()[:-1]) sieve = range(0, bound+1) remove = 2 while remove * remove < bound+1: for pm in range(remove*remove, bound+1, remove): if pm < bound + 1: sieve[pm] = 0 # Find "next" prime from the remaining sieve elements for np in range(remove+1, bound+1): if sieve[np] != 0: remove = np break printem (sieve) print
if __name__ == '__main__': sievem(10000)
I ran 'time python sieve.py' and got:
real 0m0.816s user 0m0.127s sys 0m0.215s
This on my dual 867mhz Mac G4 with 2Gb RAM. Not surprising that it's faster. But comparing python to compiled Fortran seems a little unfair, so I ginned up a little C program and compiled that for N=10000 and timed it:
(list of primes < 10000) ... found 1229 primes
real 0m0.331s user 0m0.005s sys 0m0.022s
That's about 272 times as fast as the old IBM machine, but that's not the point of this story.
I decided to see how far I could push the calculations on my machine without tying it up for a week and without spending a lot of time fiddling with the algorithm. I finally wound up computing all the primes less than 1 billion and redirecting the output to a file. (Googling confirmed that the number of primes computed by my code was correct). The resulting file is fairly large:
-rw-rw-r-- 1 dick staff 507044566 Mar 27 17:54 primes_upto_a_billion.txt
So, I decided to see how various programs would handle loading, displaying and (shudder) manipulating this file. First, I burned it to a CD using Toast with little or no difficulty.
Then I experimented with BBEdit (Lite), irEdit, Eclipse3.1, SeaMonkey (Mozilla browser), Alpha, etc. etc. with very mixed results. Most of them either loaded the file (and then were very sluggish about navigating it), had to be force killed, or cleanly gave up. The latter was the case for most of the Java based apps, since they probably defaulted to starting the JavaVM with too little heap: it's obviously gonna take at least 600mb or more. The programs that succeeded usually showed both Real and Virtual memory sizes in the range of 900+ Mb.
Finally, I got around to trying TextMate. Well, the results were disappointing. It crashed before finishing its load:
TextMate(21976,0xa000ef98) malloc: *** vm_allocate(size=1073741824) failed (error code=3) TextMate(21976,0xa000ef98) malloc: *** error: can't allocate region TextMate(21976,0xa000ef98) malloc: *** set a breakpoint in szone_error to debug terminate called after throwing an instance of 'std::bad_alloc' what(): St9bad_alloc
My interpretation of this console output is that TextMate was trying to acquire over a gigabyte of memory with a single request. Not clear (a) why this much would be needed for a slightly over 500mb file and (b) why my machine couldn't respond, since I regularly have Inactive size at or near 1Gb and lots more available from programs that at idle could be forced to page out their real memory to disk.
Anyway, I thought it was all interesting and I would like to hear people's reactions about the whole topic of editor scalability and editing huge files. This does have real-world ramifications. E.g. the product I work on is a large Financial Analysis programming language written in C and bits of C++, which implements a proprietary database format for time-series data storage. When there are problems in debugging, we generate "slog" files (selective logs) which routinely in the worse cases can approach 1Gb in size. We've never really found a reasonably way to deal with the largest of these and usually resort to other methods, but it would be nice some day to be able to actually edit them with some measure of efficiency. I'm sure most readers of this list have had similar experiences.
In addition, when I had finished experimenting with the other editors / browsers / IDEs, and went to quit my existing TextMate session, it took quite some time. I got several spinning beachballs in the process. My take on this is that my experimenting caused lots of TextMates working storage to be paged out and that it had to fault all that stuff back into it's working set. That kind of thing seems to happen with TextMate in general: e.g. when I accidentally hover too long in the File menu on Open Recent... TextMate spins that beachball for all it's worth - often taking 10 or 15 seconds to come back to life. What's up with that? Does it try to generate this from project files and have to read through the equivalent of 1000's of status entries, many perhaps out on disk? Whatever - it's quite annoying and I've tried to force myself to use Cmd-O whenever possible.
Sorry for the length of this, I just couldn't resist. (How many of you made it all the way to the end?)
-- Dick Vile at home in Dexter, MI USA
You should try using the Unix program Less to open the file. In some ways, this is what it was designed for. I'm not sure TextMate was designed with such huge files in mind.
Ram.
On Mar 28, 2006, at 3:07 AM, Richard Vile wrote:
Sorry for the length of this, I just couldn't resist. (How many of you made it all the way to the end?)
i did. and thanks for bringing up the subject. i'm very interested myself... you've clearly put the finger on the wound here... I often experience similar behaviour with much smaller files (xml files with ca 10.000 lines of code that bring my dualcore imac to a grind...)
however, most of the files i work with perform just fine and i'd rather wait for performance improvements while enjoying all of the luxuries that TM has to offer in the mean time than vice versa ;-)
but still... there's a lot of work to be done for TM in the performance arena...
just my $0.02,
tom
On Mar 27, 2006, at 6:28 PM, Tom Lazar wrote:
i did. and thanks for bringing up the subject. i'm very interested myself... you've clearly put the finger on the wound here... I often experience similar behaviour with much smaller files (xml files with ca 10.000 lines of code that bring my dualcore imac to a grind...)
Likewise, I have serious issues with huge project files - once I do a "Find in project" TM gobbles up huge amounts of RAM and doesn't let it go, even after closing the project... until I quit. My computer is no slouch for ram (4G), but when you have multiple projects open where each is about 100k LOC, it chokes down fast :(
D
On 28 Mar 2006, at 3:28, Tom Lazar wrote:
but still... there's a lot of work to be done for TM in the performance arena...
On a vaguely related performance note, I have noticed a delay apply syntax highlighting, when switching between tabs or opening new windows. I'm working on a fairly aged PB15 1Ghz 512mb.
________________________________ We're cycling through Cambodia to raise money for Oxfam, help us raise £5,200! http://cambodiachallenge.org/
On Mar 27, 2006, at 8:38 PM, Afternoon wrote:
On a vaguely related performance note, I have noticed a delay apply syntax highlighting, when switching between tabs or opening new windows. I'm working on a fairly aged PB15 1Ghz 512mb.
This sometimes depends on the language and a deep nesting of scopes for whatever reason. It used to be a problem in LaTeX in older versions of the bundle, where certain scopes would nest pretty deeply. What types files are you noticing this with, or is it all of them? In some random location in the file, how many things show up in the scope?
Haris
Python is the main one. I'm using the Django bundle. I just caught it happening in a source.python.django scope, but that's the only scope applied.
On 28 Mar 2006, at 4:47, Charilaos Skiadas wrote:
On Mar 27, 2006, at 8:38 PM, Afternoon wrote:
On a vaguely related performance note, I have noticed a delay apply syntax highlighting, when switching between tabs or opening new windows. I'm working on a fairly aged PB15 1Ghz 512mb.
This sometimes depends on the language and a deep nesting of scopes for whatever reason. It used to be a problem in LaTeX in older versions of the bundle, where certain scopes would nest pretty deeply. What types files are you noticing this with, or is it all of them? In some random location in the file, how many things show up in the scope?
Haris
For new threads USE THIS: textmate@lists.macromates.com (threading gets destroyed and the universe will collapse if you don't) http://lists.macromates.com/mailman/listinfo/textmate
________________________________ We're cycling through Cambodia to raise money for Oxfam, help us raise £5,200! http://cambodiachallenge.org/
On 28/3/2006, at 4:38, Afternoon wrote:
On a vaguely related performance note, I have noticed a delay apply syntax highlighting, when switching between tabs or opening new windows. I'm working on a fairly aged PB15 1Ghz 512mb.
What do you mean delay? That the text show before it is being colored? That is because the syntax highlight require that the document is parsed first, and parsing isn’t instant.
On 28 Mar 2006, at 13:20, Allan Odgaard wrote:
What do you mean delay? That the text show before it is being colored? That is because the syntax highlight require that the document is parsed first, and parsing isn’t instant.
Yes, uncoloured text is shown.
The file I'm editing right now, which has exhibited the problem, is 250 lines long. Parsing it shouldn't take a long time.
________________________________ We're cycling through Cambodia to raise money for Oxfam, help us raise £5,200! http://cambodiachallenge.org/
On 28/3/2006, at 14:25, Afternoon wrote:
What do you mean delay? That the text show before it is being colored? That is because the syntax highlight require that the document is parsed first, and parsing isn’t instant.
Yes, uncoloured text is shown.
The file I'm editing right now, which has exhibited the problem, is 250 lines long. Parsing it shouldn't take a long time.
So how long does it (roughly) take?
On 28 Mar 2006, at 13:31, Allan Odgaard wrote:
So how long does it (roughly) take?
Slightly under a second. The window is shown with uncoloured text and the register window is shown. Nothing happens to the text until I click later. The syntax is coloured about a second after that click.
________________________________ We're cycling through Cambodia to raise money for Oxfam, help us raise £5,200! http://cambodiachallenge.org/
On 28/3/2006, at 14:49, Afternoon wrote:
So how long does it (roughly) take?
Slightly under a second. The window is shown with uncoloured text and the register window is shown. Nothing happens to the text until I click later.
The registration window runs a modal event loop, basically blocking other events.
It has to parse the text using NFA-based regular expressions (not an efficient parser written/generated for the grammar), and it assigns colors to the text using a scope selector system.
So while it can probably be made faster down the road, taking up to a second is likely not indicative of any performance problem (especially since it’s threaded), it’s just the price to pay for the flexibility this system provides.
On 28 Mar 2006, at 14:10, Allan Odgaard wrote:
It has to parse the text using NFA-based regular expressions (not an efficient parser written/generated for the grammar), and it assigns colors to the text using a scope selector system.
Did I see you mention somewhere that you'd built your own RE engine for some parts TextMate? If so, how does that compare to off-the- shelf parsers for speed?
So while it can probably be made faster down the road, taking up to a second is likely not indicative of any performance problem (especially since it’s threaded), it’s just the price to pay for the flexibility this system provides.
Of course, it is very flexible. It's just a bit surprising when we're used to other editors parsing and highlighting text instantly.
________________________________ We're cycling through Cambodia to raise money for Oxfam, help us raise £5,200! http://cambodiachallenge.org/
On 28/3/2006, at 19:08, Afternoon wrote:
Did I see you mention somewhere that you'd built your own RE engine for some parts TextMate? If so, how does that compare to off-the- shelf parsers for speed?
I currently use Oniguruma.
So while it can probably be made faster down the road, taking up to a second is likely not indicative of any performance problem (especially since it’s threaded), it’s just the price to pay for the flexibility this system provides.
Of course, it is very flexible. It's just a bit surprising when we're used to other editors parsing and highlighting text instantly.
TM could probably make you think it was instant by redrawing the first visible lines, as soon as these were parsed -- as for surprising, let’s see how other editors perform when they get to the same level as TM wrt language grammars.
On 28/3/2006, at 4:28, Tom Lazar wrote:
[...] however, most of the files i work with perform just fine and i'd rather wait for performance improvements while enjoying all of the luxuries that TM has to offer in the mean time than vice versa ;-)
Glad to hear someone express this view -- perfect is the enemy of the good, meaning that should I have made everything perfect, TM would have never shipped.
Handling 500 MB files, while certainly a long-term goal, was not a concern when I had to decide on what features was necessary to bring TM to marked.
Richard Vile wrote:
My interpretation of this console output is that TextMate was trying to acquire over a gigabyte of memory with a single request. Not clear (a) why this much would be needed for a slightly over 500mb file
Because TextMate uses UCS-2 (or UTF-16) as internal representation, taking up two bytes per character.
But it's definately fair to say, especially on 'slower' machines (I have a 1GHz G4, 768MB ram), that TextMate currently performs quite poorly on large files.
-- Sune.