On Aug 15, 2007, at 08:00, Alex Ross wrote:
I agree that prefixing all re's is not ideal.
So, we have five options:
- Match all raw strings unambiguously as regular expressions. We
will sometimes have false-positives.
- Match raw strings that are arguments to methods from the re
module. We will sometimes not match raw strings that are regular expressions, but can be pretty well guaranteed to never have a false-positive.
- Require some prefix to a raw string to "turn on" regular
expression matching. This has an extremely high probability of removing false-positives and false-negatives, but at the cost of additional CRUFT.
- A combination of 2. and 3. Match raw strings that are arguments
to re.compile and raw strings prefixed with (?#) as regular expressions, but no others.
- Don't match re's at all.
It would seem there is no perfect option. I propose that we put it to a vote, and perhaps appeal to our BDFL Allan.
–Alex
My vote would be for 4, but I'll add two more options:
Parse r' and r''' but not r" and r""" (or vice-versa) as regexes.
Parse the "r" prefix, but not the "R" prefix, as regexes.
The last option is probably the simplest. I don't think I've ever seen the "R" prefix in use and didn't even know it was an option until I just read the spec moments ago.
j.
6. and 7. are really variations of 3 - which 'special' prefix do you use to turn highlighting on or off, and what is it's default state? I have incorporated them below.
The last option is probably the simplest. I don't think I've ever seen the "R" prefix in use and didn't even know it was an option until I just read the spec moments ago.
I've never seen 'R' in use either, but I'm sure somebody, somewhere is doing it. I think 6,7 are going to be too confusing.
I think that it's no more confusing than having r'(?# as the lead prefix, having R' instead, or having r' turning regex highlighting ON, and R' disabling it. It also has the benefit that it's easier to read, and since there doesn't seem to be a standardized common-use of R', making it the 'I'm not a regex raw string' marker is reasonable.
And adding a no-op "signal" to raw strings that will later be used as regexes just to turn on some coloring seems very unPythonic in that:
It is ugly. It is implicit. It adds complexity. It detracts from readability. It is not the obvious way to do it.
Partially true - but this is not a language definition, nor Python code, but something different - a highlighter FOR Python. The underlying code that makes up Python is very unpythonic.... and it certainly will make it MORE readable in TextMate, as then it will be highlighted correctly!
I vote for 4 as well. Method number 2 will cover the most common use cases of regexes, and will keep the regex using folk (like me) happy, without highlighting non-regex raw strings, and keep that group happy. Part 3 is more touchy...
Once we start to get to the edge cases, such as feeding in a raw string defined in one line into an re.compile in another line, either we:
a - Never highlight - keep raw string users happy, annoy regex users quite a bit,
b - Highlight when a prefix turns it on - obscure and a bit ugly, but keeps raw string users happy, annoy regex users much less, but still a little,
c - Highlight by default, but adding a prefix can turn it OFF, e.g. R instead of r - again, obscure and a bit ugly; annoys raw string users slightly, keep regex users happy, or
d - Always highlight all raw strings as regexes - annoy raw string users, keep regex users happy - which is what we have now.
I use regexes quite a bit, but I could foresee a case where I might want a raw string non-highlighted. If we change it at all, I would vote for 4c.
Nick