It took me a bit of wrangling, but I believe Phase II of the project is now complete! It too me three different UI approaches to managing vocabulary lists before I finally settle on a set of functionality that meets my design goals. It is now possible to add vocabulary lists that are only editable by you. While other people can see this lists and use them as-is for their own learning, they cannot edit them. Eventually, I will add the ability to share lists between people but that is for a future day. Once a vocabulary list is created you can either add words from the list itself, or by browsing words. When editing words, you will only see your lists that you can attach a word. Hopefully the current approach is one that people find intuitive and easy to use.
Author: Ward
Multi-byte Strikes Again
Multi-byte support, while by-and-large a now a well-supported stable of modern programming languages, is still something that trip up a person from time to time. In particular, while UTF-8 is the de facto standard for encoding it does pose an issue when you get down to the char
acter level.
I ran into this today while trying to shrink a potentially large bit of text to a manageable “chunk” of characters. Ruby, it appears, separates a string into 8-bit characters. When grabbing a substr
from a string with mixed Japanese and English you end up with the dreaded 文字化け (mo-ji-ba-ke); something now only whispered in darkened corners from veterans reliving the horror days before unicode.
Well, the short of the long is a nice bit of hackery provided by at 山下英孝. Basically, the trick involves a slice after grabbing the characters directly from the string.
However. While this is a fun hack, it is hack. The better way to handle this is to use chars
instance method on the String class. This ensures that a character is a logical character (e.g. ‘a’ or ‘あ’) and not the physical char
returned directly by the array.
In summary, you want to use:
multi_byte_string = "私のマルチバイト文です。My multi-byte sentence."
# this is a hack of the physical characters
# you can, of course, use this; but, you should not
multi_btye_string[0,5].slice(/\A.{0,}/m)
# Instead, this is a much better approach which uses
# all the UTF-8 goodiness to get logical characters from the string
mutli_byte_string.chars[0,5]
Phase I Complete
Well, I think we can safely say we have reached our first milestone with Kotoba: importation of JMDict. Yeah! We now have over 130,000 Japanese words with some 220,000 definitions for English alone, not to mention French, German, and Russian definitions. We have not even begun to really delve into what Kotoba is all about, though: learning a new language! Stay tuned as we bring on phase II, III, and more!
Cross References, ho!
What is a dictionary if you cannot discover relationships between words? Well, not much in my opinion. In order to better let to find related words, Kotoba now supports cross-references between words. Ultimately, this allows you to better discover relationships (e.g. antonyms, synonyms, et cetera) between words which better help to delve into the nuances of words.
ActiveScaffold, Meet RecordSelect
A natural consequence of managing many-to-many relationships is the need to provide usable controls to select items. With the JMdict alone we will have 130,000 Japanese entries that can be added to any number of vocabulary lists. While Rails provides a drop-down list, having all 130,000 is not only usable, it will more than likely crash your web browser the moment you click on it: not the user-experience we are seeking.
Our solution is this problem is to leverage Lance Levy‘s excellcent RecordSelect. This project provides a fully AJAX’d solution to showing any number of entries in a searchable way. While the current RecordSelect is not out-of-the-box suitable for use with ActiveScaffold, a little of tender love eventually got the two working together. At this point, integration is nominally complete albeit there is still room for greater usability. However, I believe we have a great start to ensuring that you can manage your vocabulary lists easily.