C bundle: Functions with multi-line argument lists

List overview All Threads
Download

newer

older

Using "validate syntax" in Ruby...

two minor find quibbles

John Kooker

8 Jun 2007 8 Jun '07

2:10 a.m.

Hi all,

My C/C++ code doesn't always get parsed correctly, and I think I've narrowed the problem down: the bundle doesn't seem to like it when my parentheses are on separate lines. An example:

---------------------------

int a (int *p);

int main ( void ) { int *p = 0; /* null pointer */ return a (p); }

int a ( int *p ) { int y = *p; return y; }

--------------------------- Is this intended? If not, how can I fix it?

Thanks! John

Show replies by date

Allan Odgaard

8 Jun 8 Jun

9:40 a.m.

On 8. Jun 2007, at 02:10, John Kooker wrote:

...

My C/C++ code doesn't always get parsed correctly, and I think I've narrowed the problem down: the bundle doesn't seem to like it when my parentheses are on separate lines. An example:

Function prototypes are only matched when they are on a single line. This is a technical limitation of the parser, and unlikely to go away.

John Kooker

6:51 p.m.

wow, not even for 2.0?

On Jun 8, 2007, at 12:40 AM, Allan Odgaard wrote:

...

On 8. Jun 2007, at 02:10, John Kooker wrote:

...
My C/C++ code doesn't always get parsed correctly, and I think I've narrowed the problem down: the bundle doesn't seem to like it when my parentheses are on separate lines. An example:

Function prototypes are only matched when they are on a single line. This is a technical limitation of the parser, and unlikely to go away.

For new threads USE THIS: textmate@lists.macromates.com (threading gets destroyed and the universe will collapse if you don't) http://lists.macromates.com/mailman/listinfo/textmate

Allan Odgaard

9 Jun 9 Jun

10:29 a.m.

On 8. Jun 2007, at 18:51, John Kooker wrote:

...

wow, not even for 2.0?

No, not even for 2.0. As said, this is a technical limitation!

If you allow patterns to match more than a single line, you have a situation where a change, no matter where it is done in the document, can affect every other part of the document.

From a performance perspective this is bad because technically TM would then have to re-parse your entire document, each time you do a single change.

Other text editors can live with this, that is, they may re-parse from 5 lines above the caret, then after .5s of idle time, re-parse the full document, or sometimes they may just leave it to the user to fix out-of-sync syntax highlight.

In TextMate the parser is used for more than syntax highlighting, it is for example used to decide how to interpret your key strokes, that means if the current line is not parsed, TM cannot decode your key stroke -- in practice it could rely on outdated information, but that leads to a situation where the exact same key sequence can give different results (depending on whether or not the parser finished in time, or whether or not the outdated information is “good enough”).

Maybe some good heuristics for this can be created, but 2.0 is not going to be the “fix all problems ever reported”-release, which your reply seems to imply it is.

...

On Jun 8, 2007, at 12:40 AM, Allan Odgaard wrote:

...
On 8. Jun 2007, at 02:10, John Kooker wrote:

...
My C/C++ code doesn't always get parsed correctly, and I think I've narrowed the problem down: the bundle doesn't seem to like it when my parentheses are on separate lines. An example:

Function prototypes are only matched when they are on a single line. This is a technical limitation of the parser, and unlikely to go away.

Steve King

11 Jun 11 Jun

3:33 p.m.

On Sat, 9 Jun 2007, Allan Odgaard wrote:

...

On 8. Jun 2007, at 18:51, John Kooker wrote:

...
wow, not even for 2.0?

No, not even for 2.0. As said, this is a technical limitation!

If you allow patterns to match more than a single line, you have a situation where a change, no matter where it is done in the document, can affect every other part of the document.

Just out of curiosity, C is chock full of multi-line constructs. In fact, the language itself (ignoring the pre-processor) assigns no special meaning to the end-of-line. It's just another whitespace character. How are multiline language constructs handled elsewhere, and why do function prototypes cause a particular problem?

-- Steve King, steve@narbat.com

Allan Odgaard

5:09 p.m.

On 11. Jun 2007, at 15:33, Steve King wrote:

...

[...] Just out of curiosity, C is chock full of multi-line constructs. In fact, the language itself (ignoring the pre-processor) assigns no special meaning to the end-of-line. It's just another whitespace character. How are multiline language constructs handled elsewhere, and why do function prototypes cause a particular problem?

TextMate only has the current line as look-ahead, and that is not enough to say if what we’re looking at is starting a function or some other construct. If C functions were all prefixed with a ‘function’ keyword or if there weren’t a zillion other constructs, which given only the first line, could be mistaken for a function, there wouldn’t be a problem.

So the problem boils down to, given just the first line of a construct, can you say what the construct is? In C you sometimes can’t, in most other languages you generally can.

Steve King

9:28 p.m.

On Mon, 11 Jun 2007, Allan Odgaard wrote:

...

So the problem boils down to, given just the first line of a construct, can you say what the construct is? In C you sometimes can’t, in most other languages you generally can.

How about scanning the file backwards from the insertion point to find the start of a multi-line construct whenever the parser finds itself in ambiguous territory?

How does it currently handle things like double-quoted strings or '/*' comments spanning lines?

-- Steve King, steve@narbat.com

Pavan Gunupudi

9:50 p.m.

On 11-Jun-07, at 3:28 PM, Steve King wrote:

...

On Mon, 11 Jun 2007, Allan Odgaard wrote:

...
So the problem boils down to, given just the first line of a construct, can you say what the construct is? In C you sometimes can’t, in most other languages you generally can.

How about scanning the file backwards from the insertion point to find the start of a multi-line construct whenever the parser finds itself in ambiguous territory?

How does it currently handle things like double-quoted strings or '/ *' comments spanning lines?

'/*' comments are quite messy actually.. Try this

/* if (cond) { */

blah1;

/* } */

blah2;

Notice the relative indentation between blah1 and blah2. You can even fold the if-block!! Actually no surprises here because that's the way TextMate's one-line-at-a-time parser works. It just can't find more information.

I've got issues with this because, there are several areas in the code I work where c-code is commented. That just upsets everything.

For now, thanks to you and James, I am using a combination of astyle and gnu-indent to format my code after every block I write to get indentation right. But I still can't stop those commented if-blocks from folding :)

Pavan

Ps: Well it's a good thing I am realizing the limitations of TextMate early in my migration-phase from Emacs. I thought TextMate would be the final frontier but I guess not or not yet ;)

...

-- Steve King, steve@narbat.com ______________________________________________________________________ For new threads USE THIS: textmate@lists.macromates.com (threading gets destroyed and the universe will collapse if you don't) http://lists.macromates.com/mailman/listinfo/textmate -- BEGIN-ANTISPAM-VOTING-LINKS

Teach CanIt if this mail (ID 2351832) is spam: Spam: http://134.117.9.7/canit/b.php? i=2351832&m=7470336a0dc0&c=s Not spam: http://134.117.9.7/canit/b.php? i=2351832&m=7470336a0dc0&c=n Forget vote: http://134.117.9.7/canit/b.php? i=2351832&m=7470336a0dc0&c=f

END-ANTISPAM-VOTING-LINKS

Allan Odgaard

9:58 p.m.

On 11. Jun 2007, at 21:28, Steve King wrote:

...

On Mon, 11 Jun 2007, Allan Odgaard wrote:

...
So the problem boils down to, given just the first line of a construct, can you say what the construct is? In C you sometimes can’t, in most other languages you generally can.

How about scanning the file backwards from the insertion point to find the start of a multi-line construct whenever the parser finds itself in ambiguous territory?

It would need to scan *forward* to settle the ambiguity. But this is where performance drops, cause if we need to scan 20 lines forward to get the current state, any editing in the next 20 lines would need to scan backwards to the point where the ambiguity was seen (since any edit in these 20 lines could change the outcome of what we did 20 lines ago).

...

How does it currently handle things like double-quoted strings or '/ *' comments spanning lines?

The example grammar in the manual does double-quoted strings: http:// macromates.com/textmate/manual/language_grammars#example_grammar

Seeing a /* is not a problem because we know with 100% certainty that this starts a comment (if we are in a state that allows comments). As said the problem is with C functions where we do not know if “int” starts a function, nor if “int foo” does, nor if “int foo(” does, etc. -- we only know it when we have seen the entire thing, but since the look-ahead is limited to the rest of the current line, it will not be able to tell.

That said, people who use “simple” C should probably be able to write a rule for multi-line functions, just some C++ might cause a false positive, or some of the more esoteric flavors of function declarations will go unmatched.

Pavan Gunupudi

6:03 p.m.

On 9-Jun-07, at 4:29 AM, Allan Odgaard wrote:

...

On 8. Jun 2007, at 18:51, John Kooker wrote:

...
wow, not even for 2.0?

No, not even for 2.0. As said, this is a technical limitation!

If you allow patterns to match more than a single line, you have a situation where a change, no matter where it is done in the document, can affect every other part of the document.

From a performance perspective this is bad because technically TM would then have to re-parse your entire document, each time you do a single change.

Other text editors can live with this, that is, they may re-parse from 5 lines above the caret, then after .5s of idle time, re-parse the full document, or sometimes they may just leave it to the user to fix out-of-sync syntax highlight.

In TextMate the parser is used for more than syntax highlighting, it is for example used to decide how to interpret your key strokes, that means if the current line is not parsed, TM cannot decode your key stroke -- in practice it could rely on outdated information, but that leads to a situation where the exact same key sequence can give different results (depending on whether or not the parser finished in time, or whether or not the outdated information is “good enough”).

This is very interesting. But thinking out aloud, what stops TextMate from say having two parsers. One to deal with what's on the current line and decoding keystrokes etc and another one that runs on the whole file during idle time to update information collected from function identification and similar things. For example, in Xcode, if you break a function declaration into separate lines, it takes a short while before that function is added back to the selection list. That to me indicates that Xcode could be parsing the whole file when it identifies idle time.

Pavan

...

Maybe some good heuristics for this can be created, but 2.0 is not going to be the “fix all problems ever reported”-release, which your reply seems to imply it is.

...
On Jun 8, 2007, at 12:40 AM, Allan Odgaard wrote:

...
On 8. Jun 2007, at 02:10, John Kooker wrote:

...
My C/C++ code doesn't always get parsed correctly, and I think I've narrowed the problem down: the bundle doesn't seem to like it when my parentheses are on separate lines. An example:

Function prototypes are only matched when they are on a single line. This is a technical limitation of the parser, and unlikely to go away.

For new threads USE THIS: textmate@lists.macromates.com (threading gets destroyed and the universe will collapse if you don't) http://lists.macromates.com/mailman/listinfo/textmate

-- BEGIN-ANTISPAM-VOTING-LINKS

Teach CanIt if this mail (ID 2351690) is spam: Spam: http://134.117.9.7/canit/b.php? i=2351690&m=224725910aff&c=s Not spam: http://134.117.9.7/canit/b.php? i=2351690&m=224725910aff&c=n Forget vote: http://134.117.9.7/canit/b.php? i=2351690&m=224725910aff&c=f

END-ANTISPAM-VOTING-LINKS

Allan Odgaard

6:31 p.m.

On 11. Jun 2007, at 18:03, Pavan Gunupudi wrote:

...

[...] what stops TextMate from say having two parsers.

I think that would be a bad design, it adds complexity, duplicates code, requires more from each language, and for what? Getting multi- line C function prototypes to parse better? ;)

John Kooker

13 Jun 13 Jun

8:06 p.m.

Ha, sorry to imply that 2.0 needed to be the "achieve world peace" release. I just thought it was pretty common practice to have multi- line function prototypes, which would make this bug a pretty big deal. But I guess you're right - it's features vs. performance, so there will always be trade-offs. Thanks for the explanation!

John

On Jun 9, 2007, at 1:29 AM, Allan Odgaard wrote:

...

On 8. Jun 2007, at 18:51, John Kooker wrote:

...
wow, not even for 2.0?

No, not even for 2.0. As said, this is a technical limitation!

If you allow patterns to match more than a single line, you have a situation where a change, no matter where it is done in the document, can affect every other part of the document.

From a performance perspective this is bad because technically TM would then have to re-parse your entire document, each time you do a single change.

Other text editors can live with this, that is, they may re-parse from 5 lines above the caret, then after .5s of idle time, re-parse the full document, or sometimes they may just leave it to the user to fix out-of-sync syntax highlight.

In TextMate the parser is used for more than syntax highlighting, it is for example used to decide how to interpret your key strokes, that means if the current line is not parsed, TM cannot decode your key stroke -- in practice it could rely on outdated information, but that leads to a situation where the exact same key sequence can give different results (depending on whether or not the parser finished in time, or whether or not the outdated information is “good enough”).

Maybe some good heuristics for this can be created, but 2.0 is not going to be the “fix all problems ever reported”-release, which your reply seems to imply it is.

...
On Jun 8, 2007, at 12:40 AM, Allan Odgaard wrote:

...
On 8. Jun 2007, at 02:10, John Kooker wrote:

...
My C/C++ code doesn't always get parsed correctly, and I think I've narrowed the problem down: the bundle doesn't seem to like it when my parentheses are on separate lines. An example:

Function prototypes are only matched when they are on a single line. This is a technical limitation of the parser, and unlikely to go away.

For new threads USE THIS: textmate@lists.macromates.com (threading gets destroyed and the universe will collapse if you don't) http://lists.macromates.com/mailman/listinfo/textmate

6621

days inactive

6626

days old

textmate@lists.macromates.com

11 comments

participants

tags (0)

participants (4)

Allan Odgaard
John Kooker
Pavan Gunupudi
Steve King