posts

Kotoba is Fav’conified

 

Kotoba FavIcon
Kotoba's new favicon.

Sometimes it is the little things that can make you smile.  Case is point is Kotoba‘s new favicon.  

I wanted something that was clean and easily recognizable in the toolbar.  While options included something abstract, I felt that a monogram might best represent the intent of the site.  While “K” might be arguably more universal, I ultimately decided that the 「言」という文字 is both identifiable to many using the site and abstract enough to act as a “logo”.

In order to create the image, I went through the process of  getting various Japanese-supporting applications installed just to create a very simple image.  This process culminated in the installation of OpenOffice.org’s OpenOffice and  Gimp.   

There is a wealth of information about installing Japanese versions of these pieces of software:

In the end, installing the latest versions of both OpenOffice and Gimp is sufficient to get out-of-the-box compatibility with Japanese display and input.  While OpenOffice supports Japanese input directly into the application, the same cannot be said for Gimp.   Nevertheless, it is relatively painless to use OpenOffice to create images with Japanese text that can then be imported directly into Gimp.  

Once we had a means of creating an image, the final step is to get it into a format that is universally used by web browsers.  I lucked out when I found the very  wonderful website, http://www.favicon.jp/, which will take any image and create a version in .ico format.  Once this file is generated, it is straight-forward to added to your site you can point to it in your HEAD tag.

And that is how we get something small but that gives us a large, warm fuzzy in our tummy.  😀

Phase II Complete

It took me a bit of wrangling, but I believe Phase II of the project is now complete!  It too me three different UI approaches to managing vocabulary lists before I finally settle on a set of functionality that meets my design goals.  It is now possible to add vocabulary lists that are only editable by you.  While other people can see this lists and use them as-is for their own learning, they cannot edit them.  Eventually, I will add the ability to share lists between people but that is for a future day.  Once a vocabulary list is created you can either add words from the list itself, or by browsing words.  When editing words, you will only see your lists that you can attach a word.   Hopefully the current approach is one that people find intuitive and easy to use.

Multi-byte Strikes Again

Multi-byte support, while by-and-large a now a well-supported stable of modern programming languages, is still something that trip up a person from time to time.  In particular, while UTF-8 is the de facto standard for encoding it does pose an issue when you get down to the character level.

I ran into this today while trying to shrink a potentially large bit of text to a manageable “chunk” of characters.  Ruby, it appears, separates a string into 8-bit characters.   When grabbing a substr from a string with mixed Japanese and English you end up with the dreaded 文字化け (mo-ji-ba-ke); something now only whispered in darkened corners from veterans reliving the horror days before unicode.

Well, the short of the long is a nice bit of hackery provided by at 山下英孝.  Basically, the trick involves a slice after grabbing the characters directly from the string.

However.  While this is a fun hack, it is hack.  The better way to handle this is to use chars instance method on the String class.  This ensures that a character is a logical character (e.g. ‘a’ or ‘あ’) and not the physical char returned directly by the array.

In summary, you want to use:

multi_byte_string = "私のマルチバイト文です。My multi-byte sentence."
# this is a hack of the physical characters
# you can, of course, use this; but, you should not
multi_btye_string[0,5].slice(/\A.{0,}/m)
# Instead, this is a much better approach which uses 
# all the UTF-8 goodiness to get logical characters from the string
mutli_byte_string.chars[0,5]

Phase I Complete

Well, I think we can safely say we have reached our first milestone with Kotoba: importation of JMDict.  Yeah!  We now have over 130,000 Japanese words with some 220,000 definitions for English alone, not to mention French, German, and Russian definitions.  We have not even begun to really delve into what Kotoba is all about, though: learning a new language!  Stay tuned as we bring on phase II, III, and more!

Cross References, ho!

What is a dictionary if you cannot discover relationships between words?  Well, not much in my opinion.  In order to better let to find related words, Kotoba now supports cross-references between words.  Ultimately, this allows you to better discover relationships (e.g. antonyms, synonyms, et cetera) between words which better help to delve into the nuances of words.