Re: [TxMt] Spell checking using Google

List overview All Threads
Download

newer

older

Feature Request: Highlighting...

TextMate Apple + Enter doesn't...

Andy Herbert

30 Apr 2007 30 Apr '07

9:27 p.m.

...

Hi,

I found a tiny error. Here the corrected utf-8 version of gspell.

BTW In addition to that it is very easy to write out all google suggestions delimited by a '|' within the multiple word spell checking. With the help of a small command which shows this list as pull-menu you can edit each error very handy ;)

Cheers,

Hans

Thanks Hans, before reading your recent replies with the attached files, I had already started work on UTF-8 version with environment variable support (I receive the list in digest form, so I'm not always up to date with the activity here.) I'm not quite sure if my contribution is any better or worse than yours because I essentially cannibalised the HTML bundle for encoding and decoding UTF-8 for POST- ing and GET-ting with Google's servers. As a result my effort relies on entities.txt existing in the Support directory in the Textmate bundle (this means that I can no longer supply it as a singular tmCommand), I genuinely do not know if this is a better method for handling this sort of thing - perhaps someone on this list could help clarify.

Incidentally, I think there was an error in the original version which was inherited by the version you've supplied - the final tab stop was not being preserved, to correct this remove the line 'text += "$0"' and replace 'print $text' with 'print "#{$text}$0"'.

[The GSpell bundle is available here][1]

[1]: http://homepage.mac.com/andy.herbert/.Public/GSpell.tmbundle.zip

Show replies by date

Hans-Jörg Bibiko

1 May 1 May

12:25 a.m.

New subject: Spell checking using Google

On 30.04.2007, at 21:27, Andy Herbert wrote:

...

...
Hi,

I found a tiny error. Here the corrected utf-8 version of gspell.

BTW In addition to that it is very easy to write out all google suggestions delimited by a '|' within the multiple word spell checking. With the help of a small command which shows this list as pull-menu you can edit each error very handy ;)

Cheers,

Hans

Thanks Hans, before reading your recent replies with the attached files, I had already started work on UTF-8 version with [...] Support directory in the Textmate bundle (this means that I can no longer supply it as a singular tmCommand), I genuinely do not know if this is a better method for handling this sort of thing - perhaps someone on this list could help clarify.

Your approach is more general. BTW I do not believe that gspell returns html entities like ü. It returns the hex code, I believe.

But anyway executing your code against a German text I get the following error messages if the text contains German umlauts (äöü): /tmp/temp_textmate.Rn3S55:24:in `unpack': malformed UTF-8 character (expected 2 bytes, given 1 bytes) (ArgumentError) from /tmp/temp_textmate.Rn3S55:24:in `encodeUTF' from /tmp/temp_textmate.Rn3S55:22:in `gsub' from /tmp/temp_textmate.Rn3S55:22:in `encodeUTF' from /tmp/temp_textmate.Rn3S55:27

This error messages are shown if you add the line: 'print encodeUTF ($text)' I did this because nothing happens as long as I had German umlauts in the text and I looked for the request string.

On the other hand I have a problem with '\n\n' in the text.

...

Incidentally, I think there was an error in the original version which was inherited by the version you've supplied - the final tab stop was not being preserved, to correct this remove the line 'text += "$0"' and replace 'print $text' with 'print "#{$text}$0"'.

Yes, OK.

Cheer,

Hans

Hans-Jörg Bibiko

1:10 a.m.

New subject: Spell checking using Google

Hi,

On 01.05.2007, at 00:25, Hans-Jörg Bibiko wrote:

...

On the other hand I have a problem with '\n\n' in the text.

OK. I solved this problem by replacing '\n' with a Japanese space ;)

An other problem arose with many spaces ' '. I also solved it with a Japanese character ;) [I know that is a kind of a hack and a quick and dirty solution but it works and unfortunately Japanese isn't supported :( ]

Now it works for the entire text with many newlines and many spaces. I check it with English, Dansk, Française, and German and it seems to work properly.

Hans

Allan Odgaard

3:52 a.m.

New subject: Spell checking using Google

On 1. May 2007, at 01:10, Hans-Jörg Bibiko wrote:

...

...
On the other hand I have a problem with '\n\n' in the text.

OK. I solved this problem by replacing '\n' with a Japanese space ;)

An other problem arose with many spaces ' '. I also solved it with a Japanese character ;)

Attached is my take on it (based on yours and Andy’s code).

I use the offset and length given back by Google to figure out where the corrections should go.

Google seems to a) give the count in “code points” (regardless of encoding) and b) collapse all successive whitespace into one character. Here is the helper I use to adjust for that:

def substr(str, idx, len = 10000) $1 if str =~ /^(?:\s+|.){#{idx}}((?:\s+|.){0,#{len}})/m end

When Google returns multiple suggestions, the command will show a menu, with the original word at the top. It does this for every misspelled word in the text. When there is only one suggestion, the command uses that, and makes the word a tab-stop.

Personally I think the menu is generally not wanted, so that part should probably be removed (and the first suggestion should always be picked).

I gave it a key equivalent of ⌥⇧F2 which is analogous with ⌥F2 for the regular context menu (which has spelling suggestions).

Hans-Joerg Bibiko

2 May 2 May

10:23 a.m.

New subject: Spell checking using Google

Hi Allan,

nice coding (I'm learning Ruby step by step ;) )

Here my comments:

1) ESC behaviour If I press ESC while the pull-down menu is shown it takes Google's first suggestion. By my opinion it's more logically that pressing ESC will change nothing. Thus I wrote: ... res = TextMate::UI.menu(items.flatten) correction = (res) ? res['string'] : org ...

2) \n\n problem There's is still the problem if the selected text contains '\n\n' I managed it by relpacing all \n with a non-used character

3) Google returns only one suggestion

...

When there is only one suggestion, the command uses that, and makes the word a tab-stop.

By my opinion it would be better to show always a pull-down menu with the original and the suggestion. Google hasn't always right.

...

Personally I think the menu is generally not wanted, so that part should probably be removed (and the first suggestion should always be picked).

As you mentioned the popping up of many pull-down menus is a bit confusing because I don't know which word is currently checked. But how about that way:

- If a single word will be checked - same behaviour (except the command doesn't return a snippet; it returns it as exit_replace_text)

- If a text is selected meaning at least one space is in it the command only returns a snippet of misspelled word without any changes. After that you can navigate with TAB through the snippet (misspelled words) and if you find a really misspelled word you can invoke the command again and so forth.

5) For security reason the command returns the snippet: new_str + "${0:}"

Attached is my version of gspell.

Hans

Allan Odgaard

3 May 3 May

1:42 p.m.

New subject: Spell checking using Google

On 2. May 2007, at 10:23, Hans-Joerg Bibiko wrote:

...

ESC behaviour

If I press ESC while the pull-down menu is shown it takes Google's first suggestion. By my opinion it's more logically that pressing ESC will change nothing. Thus I wrote: ... res = TextMate::UI.menu(items.flatten) correction = (res) ? res['string'] : org ...

Actually, I think escape should abort the entire thing.

...

\n\n problem

There's is still the problem if the selected text contains '\n\n' I managed it by relpacing all \n with a non-used character

I cannot reproduce this. What text are you testing with?

...

Google returns only one suggestion

...
When there is only one suggestion, the command uses that, and makes the word a tab-stop.

By my opinion it would be better to show always a pull-down menu with the original and the suggestion. Google hasn't always right.

hmm… you can always undo. If Google is right >50% of the time, I think always accepting, and require an undo, is better than showing a pop-up.

...

...
Personally I think the menu is generally not wanted, so that part should probably be removed (and the first suggestion should always be picked).

As you mentioned the popping up of many pull-down menus is a bit confusing because I don't know which word is currently checked. But how about that way:

If a single word will be checked - same behaviour (except the

command doesn't return a snippet; it returns it as exit_replace_text)

If a text is selected meaning at least one space is in it the

command only returns a snippet of misspelled word without any changes. After that you can navigate with TAB through the snippet (misspelled words) and if you find a really misspelled word you can invoke the command again and so forth.

Yes, this is probably a desired behavior most of the time.

...

For security reason the command returns the snippet: new_str + "${0:}"

Huh?

Hans-Jörg Bibiko

2:25 p.m.

New subject: Spell checking using Google

On 03.05.2007, at 13:42, Allan Odgaard wrote:

...

On 2. May 2007, at 10:23, Hans-Joerg Bibiko wrote:

...

ESC behaviour

If I press ESC while the pull-down menu is shown it takes Google's first suggestion. By my opinion it's more logically that pressing ESC will change nothing. Thus I wrote: ... res = TextMate::UI.menu(items.flatten) correction = (res) ? res['string'] : org ...

Actually, I think escape should abort the entire thing.

I agree, but if you use my way it does the same.

...

...

\n\n problem

There's is still the problem if the selected text contains '\n\n' I managed it by relpacing all \n with a non-used character

I cannot reproduce this. What text are you testing with?

I was wrong. Not \n\n is the problem but a line like " \n" or "\t\n". If I don't replace the \n the snippet matching is wrong after such a line.

German example (TM_GSPELL_LANG = de) -comment out str = str.gsub(/\n/, "　") in my script

Text: Dies ist ein Tesst.\n \n Dies istt ein Test.\n

(second line is SPACE\n) invoke the script Tesst is highlighted, press TAB and now 'tt e' highlighted instead of istt. If you replace \n everything is fine.

...

...

Google returns only one suggestion

...
When there is only one suggestion, the command uses that, and makes the word a tab-stop.

By my opinion it would be better to show always a pull-down menu with the original and the suggestion. Google hasn't always right.

hmm… you can always undo. If Google is right >50% of the time, I think always accepting, and require an undo, is better than showing a pop-up.

OK - undo is also fine ;)

...

...

For security reason the command returns the snippet: new_str + "${0:}"

Huh?

What do you mean with 'Huh?'? Huh - I forgot it or I don't understand why? If the second is right: If you don't write + "${0:}" and you come to last misspelled word and press TAB you will replace the last highlighted word with TAB instead of jumping to the end. You don't know that this word is last misspelled word. ;)

Attached the modified script.

Hans

Jacob Rus

9:42 p.m.

New subject: Spell checking using Google

Allan Odgaard wrote:

...

...

If a text is selected meaning at least one space is in it the

command only returns a snippet of misspelled word without any changes. After that you can navigate with TAB through the snippet (misspelled words) and if you find a really misspelled word you can invoke the command again and so forth.

Yes, this is probably a desired behavior most of the time.

Hmm, two points about "invoke the command again and so forth."

1. I believe google's spellcheck takes the context of the word into account when deciding what the correct word is. So popping up a list only for particular words is probably not the greatest idea.

2. As soon as a command is run on one of the tab stop misspelled words, all the other snippet tab stops are lost.

Hans-Jörg Bibiko

4 May 4 May

12:08 a.m.

New subject: Spell checking using Google

On 03.05.2007, at 21:42, Jacob Rus wrote:

...

Allan Odgaard wrote:

...
...

If a text is selected meaning at least one space is in it the

command only returns a snippet of misspelled word without any changes. After that you can navigate with TAB through the snippet (misspelled words) and if you find a really misspelled word you can invoke the command again and so forth.

Yes, this is probably a desired behavior most of the time.

Hmm, two points about "invoke the command again and so forth."

I believe google's spellcheck takes the context of the word into

account when deciding what the correct word is. So popping up a list only for particular words is probably not the greatest idea.

Well, I believe it should but up to now I couldn't find an example for such a context spell checking behaviour. If you find such a behaviour, please let it me know.

It seems to me that Google's suggestions are based on the following: If the word is not in the Google's corpus it will allow exact one operation on it, meaning deletion, inserting, or replacing of one character to get a word which is in the corpus. The output is sorted according to these operations, I guess.

...

As soon as a command is run on one of the tab stop misspelled

words, all the other snippet tab stops are lost.

Not in this case. The script distinguishes whether a text is sent or a single word. If text - output a snippet; if word - output replace text. By doing so it won't destroy the other snippet tab stops.

Hans

Jacob Rus

12:41 a.m.

New subject: Spell checking using Google

Hans-Jörg Bibiko wrote:

...

On 03.05.2007, at 21:42, Jacob Rus wrote:

...

I believe google's spellcheck takes the context of the word into

account when deciding what the correct word is. So popping up a list only for particular words is probably not the greatest idea.

Well, I believe it should but up to now I couldn't find an example for such a context spell checking behaviour. If you find such a behaviour, please let it me know.

It seems to me that Google's suggestions are based on the following: If the word is not in the Google's corpus it will allow exact one operation on it, meaning deletion, inserting, or replacing of one character to get a word which is in the corpus. The output is sorted according to these operations, I guess.

No, I think you are pretty clearly wrong about that. As an example, if you spell check "rought with", the first suggestion will be "wrought", as "wrought with «foo»" is a commonly-used construction. If you replace "with" with some other word, the suggestion will change, and in most cases will be "rough", instead.

In fact, in every example that I tried, Google figured out the intended word from context. I don't know their exact algorithm, but it clearly takes neighboring words into account.

-Jacob

Hans-Jörg Bibiko

7:14 a.m.

New subject: Spell checking using Google

On 04.05.2007, at 00:41, Jacob Rus wrote:

...

Hans-Jörg Bibiko wrote:

...
On 03.05.2007, at 21:42, Jacob Rus wrote:

...

I believe google's spellcheck takes the context of the word

into account when deciding what the correct word is. So popping up a list only for particular words is probably not the greatest idea.

Well, I believe it should but up to now I couldn't find an example for such a context spell checking behaviour. If you find such a behaviour, please let it me know. It seems to me that Google's suggestions are based on the following: If the word is not in the Google's corpus it will allow exact one operation on it, meaning deletion, inserting, or replacing of one character to get a word which is in the corpus. The output is sorted according to these operations, I guess.

No, I think you are pretty clearly wrong about that. As an example, if you spell check "rought with", the first suggestion will be "wrought", as "wrought with «foo»" is a commonly-used construction. If you replace "with" with some other word, the suggestion will change, and in most cases will be "rough", instead.

In fact, in every example that I tried, Google figured out the intended word from context. I don't know their exact algorithm, but it clearly takes neighboring words into account.

To be honest I only check it with German ;)

OK if that is the case, of course one should change the script.

Hans

6662

days inactive

6666

days old

textmate@lists.macromates.com

10 comments

participants

tags (0)

participants (5)

Allan Odgaard
Andy Herbert
Hans-Joerg Bibiko
Hans-Jörg Bibiko
Jacob Rus