I have a page of text edited in Text Mate. I want to know the number of characters in a specific paragraph (by highlighting that paragraph). Is this possible in Text Mate and if yes, how? Thanks in advance.
CTRL+SHIFT+N. It's in the "Text" bundle.
On Tue, May 20, 2008 at 1:49 PM, Marc Chanliau marc.chanliau@gmail.com wrote:
I have a page of text edited in Text Mate. I want to know the number of characters in a specific paragraph (by highlighting that paragraph). Is this possible in Text Mate and if yes, how? Thanks in advance.
For new threads USE THIS: textmate@lists.macromates.com (threading gets destroyed and the universe will collapse if you don't) http://lists.macromates.com/mailman/listinfo/textmate
Great! Thanks for the fast response.
On Tue, May 20, 2008 at 11:07 AM, Patrick McElhaney pmcelhaney@gmail.com wrote:
CTRL+SHIFT+N. It's in the "Text" bundle.
On Tue, May 20, 2008 at 1:49 PM, Marc Chanliau marc.chanliau@gmail.com wrote:
I have a page of text edited in Text Mate. I want to know the number of characters in a specific paragraph (by highlighting that paragraph). Is
this
possible in Text Mate and if yes, how? Thanks in advance.
For new threads USE THIS: textmate@lists.macromates.com (threading gets destroyed and the universe will collapse if you don't) http://lists.macromates.com/mailman/listinfo/textmate
-- Patrick McElhaney 704.560.9117
For new threads USE THIS: textmate@lists.macromates.com (threading gets destroyed and the universe will collapse if you don't) http://lists.macromates.com/mailman/listinfo/textmate
20 maj 2008 kl. 20.07 skrev Patrick McElhaney:
CTRL+SHIFT+N. It's in the "Text" bundle.
One should make a note though that C-S-N doesn't return the number of characters, but the number of bytes. This is only an issue if you use multi-byte character, which is commonly enough to make the C-S-N command a bit broken IMHO.
I would be very grateful if anyone could point to a function that does the equivalent of C-S-N but returns the proper number of characters and not bytes (the ideal would be "full" statistics; words, characters and bytes). I made a quick hack but realised that I did not know how to tell Perl what character encoding there where, i.e. that it was UTF-8 or Latin-1.
Thanks.
/Jonas
On May 27, 2008, at 10:11 PM, Jonas Steverud wrote:
20 maj 2008 kl. 20.07 skrev Patrick McElhaney:
CTRL+SHIFT+N. It's in the "Text" bundle.
One should make a note though that C-S-N doesn't return the number of characters, but the number of bytes. This is only an issue if you use multi-byte character, which is commonly enough to make the C-S-N command a bit broken IMHO.
I would be very grateful if anyone could point to a function that does the equivalent of C-S-N but returns the proper number of characters and not bytes (the ideal would be "full" statistics; words, characters and bytes). I made a quick hack but realised that I did not know how to tell Perl what character encoding there where, i.e. that it was UTF-8 or Latin-1.
The command in the text bundles does report the full statistics...lines, words, bytes. Perhaps you are using a modified word count command that uses the same keybinding.
Best, Mark
On 27.05.2008, at 23:24, Mark Eli Kalderon wrote:
On May 27, 2008, at 10:11 PM, Jonas Steverud wrote:
20 maj 2008 kl. 20.07 skrev Patrick McElhaney:
CTRL+SHIFT+N. It's in the "Text" bundle.
One should make a note though that C-S-N doesn't return the number of characters, but the number of bytes. This is only an issue if you use multi-byte character, which is commonly enough to make the C-S-N command a bit broken IMHO.
Firstly only a short answer to count Unicode characters:
cat | ruby -e 'print STDIN.read.split(//u).size'
input: selection or doc output: Show Tooltip
--Hans
On 28.05.2008, at 00:34, Hans-Jörg Bibiko wrote:
On 27.05.2008, at 23:24, Mark Eli Kalderon wrote:
On May 27, 2008, at 10:11 PM, Jonas Steverud wrote:
20 maj 2008 kl. 20.07 skrev Patrick McElhaney:
CTRL+SHIFT+N. It's in the "Text" bundle.
One should make a note though that C-S-N doesn't return the number of characters, but the number of bytes. This is only an issue if you use multi-byte character, which is commonly enough to make the C-S-N command a bit broken IMHO.
Firstly only a short answer to count Unicode characters:
cat | ruby -e 'print STDIN.read.split(//u).size'
input: selection or doc output: Show Tooltip
Maybe better:
#!/usr/bin/ruby
bytes=chars=words=lines=0
STDIN.read.each_line { |l| lines+=1 bytes+=l.split(//).size chars+=l.split(//u).size words+=l.split(/ +/).size }
puts("Bytes: #{bytes}") puts("Characters: #{chars}") puts("Words: #{words}") puts("Lines: #{lines}")
One could output it much more prettier ;)
--Hans
On 28.05.2008, at 00:50, Hans-Jörg Bibiko wrote:
On 28.05.2008, at 00:34, Hans-Jörg Bibiko wrote:
On 27.05.2008, at 23:24, Mark Eli Kalderon wrote:
On May 27, 2008, at 10:11 PM, Jonas Steverud wrote:
20 maj 2008 kl. 20.07 skrev Patrick McElhaney:
CTRL+SHIFT+N. It's in the "Text" bundle.
One should make a note though that C-S-N doesn't return the number of characters, but the number of bytes. This is only an issue if you use multi-byte character, which is commonly enough to make the C-S-N command a bit broken IMHO.
Firstly only a short answer to count Unicode characters:
cat | ruby -e 'print STDIN.read.split(//u).size'
input: selection or doc output: Show Tooltip
Maybe better:
#!/usr/bin/ruby
bytes=chars=words=lines=0
STDIN.read.each_line { |l| lines+=1 bytes+=l.split(//).size chars+=l.split(//u).size words+=l.split(/ +/).size }
puts("Bytes: #{bytes}") puts("Characters: #{chars}") puts("Words: #{words}") puts("Lines: #{lines}")
Three Unicode problems aren't solved with that script. 1) combining diacritics e.g. é can be written as one single code point U+00E9 and as e + combining ´ U+0065 U+0301 This can be solved by ignoring these combining diacritics.
2) n-grams There are some glyphs which represent one phoneme but they are written as to characters. E.g. dz U+01F3 (dz)
3) ligatures E.g. the ligatur fi (fi)
2) and 3) could be solved by Unicode's canonical decomposition NKFD.
One could write such a script, but I guess Ruby is not able to do a NFKD, I mean one has to install a separate library. But it should work with Python. Maybe I find a bit time to write such a script tomorrow, because it's late ;)
Cheers,
--Hans
27 maj 2008 kl. 23.24 skrev Mark Eli Kalderon:
On May 27, 2008, at 10:11 PM, Jonas Steverud wrote:
20 maj 2008 kl. 20.07 skrev Patrick McElhaney:
CTRL+SHIFT+N. It's in the "Text" bundle.
One should make a note though that C-S-N doesn't return the number of characters, but the number of bytes. This is only an issue if you use multi-byte character, which is commonly enough to make the C-S-N command a bit broken IMHO.
I would be very grateful if anyone could point to a function that does the equivalent of C-S-N but returns the proper number of characters and not bytes (the ideal would be "full" statistics; words, characters and bytes). I made a quick hack but realised that I did not know how to tell Perl what character encoding there where, i.e. that it was UTF-8 or Latin-1.
The command in the text bundles does report the full statistics...lines, words, bytes. Perhaps you are using a modified word count command that uses the same keybinding.
Yes, but I am not interested in the number of bytes, I would like to know the number of characters, which is not the same thing. Räksmörgås is ten characters but is reported as 13 bytes since åäö are stored as multi-byte characters. I use the Statistics for Document / Selection (word count) command from the Text Bundle and the ruby script uses wc -l for statistics, which is not Unicode aware AFAIK.
/Jonas
On 28 May 2008, at 18:55, Jonas Steverud wrote:
[...] Yes, but I am not interested in the number of bytes, I would like to know the number of characters, which is not the same thing. Räksmörgås is ten characters but is reported as 13 bytes since åäö are stored as multi-byte characters. I use the Statistics for Document / Selection (word count) command from the Text Bundle and the ruby script uses wc -l for statistics, which is not Unicode aware AFAIK.
Actually ‘wc’ _is_ multi-byte (encoding) aware. But for that, one has to use the -m[ulti-bytes] instead of -c[haracters].
So for a quick fix, change ‘wc -lwc’ in the command to ‘wc -lwm’ and it should work as you expect.
Of course this does not handle all the complex issues of pre/ decomposed unicode, diacritics, and ligatures that Hans-Joerg mentioned.
On 28.05.2008, at 19:30, Allan Odgaard wrote:
On 28 May 2008, at 18:55, Jonas Steverud wrote:
[...] Yes, but I am not interested in the number of bytes, I would like to know the number of characters, which is not the same thing. Räksmörgås is ten characters but is reported as 13 bytes since åäö are stored as multi-byte characters. I use the Statistics for Document / Selection (word count) command from the Text Bundle and the ruby script uses wc -l for statistics, which is not Unicode aware AFAIK.
Actually ‘wc’ _is_ multi-byte (encoding) aware. But for that, one has to use the -m[ulti-bytes] instead of -c[haracters].
So for a quick fix, change ‘wc -lwc’ in the command to ‘wc -lwm’ and it should work as you expect.
Not for me on Mac ppc 10.4.11.
--Hans
On 28.05.2008, at 19:38, Hans-Jörg Bibiko wrote:
On 28.05.2008, at 19:30, Allan Odgaard wrote:
On 28 May 2008, at 18:55, Jonas Steverud wrote:
[...] Yes, but I am not interested in the number of bytes, I would like to know the number of characters, which is not the same thing. Räksmörgås is ten characters but is reported as 13 bytes since åäö are stored as multi-byte characters. I use the Statistics for Document / Selection (word count) command from the Text Bundle and the ruby script uses wc -l for statistics, which is not Unicode aware AFAIK.
Actually ‘wc’ _is_ multi-byte (encoding) aware. But for that, one has to use the -m[ulti-bytes] instead of -c[haracters].
So for a quick fix, change ‘wc -lwc’ in the command to ‘wc -lwm’ and it should work as you expect.
Not for me on Mac ppc 10.4.11.
Oops. Of course, one has to set LC_ALL in the Ruby script.
In the bundle command 'Statistics for Doc/sel (Word Count) one should write:
... counts = `export LC_ALL=en_GB.UTF-8;wc -lwm`.scan(/\d+/) ...
--Hans
On 28 May 2008, at 19:49, Hans-Jörg Bibiko wrote:
[...] Oops. Of course, one has to set LC_ALL in the Ruby script.
TextMate sets LC_CTYPE for the programs it executes.
So on a normal system it should not be necessary to set this up in the script. However, other locale variables take precedence over LC_CTYPE, so most likely you have anther one set (to something other than UTF-8).
On 28.05.2008, at 19:59, Allan Odgaard wrote:
On 28 May 2008, at 19:49, Hans-Jörg Bibiko wrote:
[...] Oops. Of course, one has to set LC_ALL in the Ruby script.
TextMate sets LC_CTYPE for the programs it executes.
Yes, I know, but I do not know if Ruby by itself starts a system command via `` whether that command inherits the locale settings?
So on a normal system it should not be necessary to set this up in the script. However, other locale variables take precedence over LC_CTYPE, so most likely you have anther one set (to something other than UTF-8).
Actually no.
--Hans
On 28.05.2008, at 20:20, Hans-Jörg Bibiko wrote:
On 28.05.2008, at 19:59, Allan Odgaard wrote:
On 28 May 2008, at 19:49, Hans-Jörg Bibiko wrote:
[...] Oops. Of course, one has to set LC_ALL in the Ruby script.
TextMate sets LC_CTYPE for the programs it executes.
Yes, I know, but I do not know if Ruby by itself starts a system command via `` whether that command inherits the locale settings?
So on a normal system it should not be necessary to set this up in the script. However, other locale variables take precedence over LC_CTYPE, so most likely you have anther one set (to something other than UTF-8).
Actually no.
If I write a tmcommand with:
- a shell script: env | sort
I see LC_CTYPE
- a Ruby/Perl script: #!/usr/bin/ruby print `env|sort`
I do not see LC_CTYPE
Can you confirm this behaviour? Or am I wrong?
--Hans
On 28 May 2008, at 20:32, Hans-Jörg Bibiko wrote:
[...] If I write a tmcommand with:
- a shell script:
env | sort
I see LC_CTYPE
- a Ruby/Perl script:
#!/usr/bin/ruby print `env|sort`
I do not see LC_CTYPE
Can you confirm this behaviour? Or am I wrong?
I actually thought I set LC_CTYPE in code, but turns out only in Support/lib/bash_init.sh.
It worked for me because I also set LC_CTYPE in ~/MacOSX/ environment.plist.
So yes, for this to work, the ruby script will have to set the encoding variable.