We got CRUD!

For those not familiar with the term CRUD, it stands for “create”, “retrieve” “update” and “delete”.  Basically, these are the four (4) basic operations you ever need to do to maintain information.

After a bit of struggling to understand Rails support of many-to-many relationships, whether it is with HABTM (Has And Belongs to Many) or HMT (Has Many Through) as they intersect with ActiveScaffold, I finally was able to determine a reasonable approach to supporting editable many-to-many relationships.

For those interested, HABTM supports only weak entities.  This can be, in the majority of cases, all that you really need when tracking a many-to-many relationship; however, there appears to be some consensus that HABTM and weak entities are a Bad Thing.  Frankly, while I think some people’s positions are over-stated on the matter of HABTM, it is true that there is some niceness when using with HMT over HABTM.  As previously stated, HABTM are weak which means they have no state of their own.  The use of HTM (strong entity vis-a-vis join table) allows you to track not just the relationship, but the state of the relationship between the two entities; itself a good thing if you need this.

Ultimately I decided that I wanted to use HTM throughout Kotoba to keep things consistent and provide insurance against future growth.  Unfortunately, ActiveScaffold’s support for HTM is only read-only while HABTM is fully editable right out of the box.  And unfortunate in that I really want to be able to add words to vocabulary lists, and vocabulary lists to words in a manner that is intuitive and usable.   That said, I needed a means of updating the vocabulary_lists_words join directly from either vocabulary lists or words.  Luckily the wonderful developers of ActiveScaffold include some necessary call-backs (e.g. before_update_save) at the controller level very similar in vein to ActiveRecord’s before_ methods.  

What does this mean to you?  You can now manage your vocabulary list’s words directly from the vocabulary list.  And equally cool, you can just add a word to your vocabulary list as you find them.

Kotoba goes I18n and L10n

Internationalization, aka I18n for those l33t (elite) readers out there, is the process of removing any specific language references from your application.  Typically this is accomplished in its simplest form with a simple map of key, value pairs for each target language and/or region.  

Once this step is complete, the next phase is to localize (L10n) the application.  While internationalizing an application removes all assumptions about a human language, localizing an application is ironically enough the opposite; namely, re-inserting human language and/or region specific requirements back into application albeit in a extensible fashion.

Why all this rather esoteric dribble about I18n and L10n?  As of Rails 2.2.2, internationalization is a part of the base configuration.  While you can still use such gems as gettext either directly or as the backend to Rails implementation, using Rails implementation is a great way to get started immediately.  Which means it is possible to more easily add any number of languages to your Rails application.

Which is a lot of write up to say this: as of today I have included a first pass at internationalizing Kotoba.  I still have work to do with messages, some less-used views, and emails; however, at present the application supports both English and 日本語.  You can change your preferred language under Preferences.  Give it a shot and tell me what you think.

Degraded Performance Resolved

Two days ago I ran into an issue with ActiveScaffold (AS) and Kotoba whereby performance was degraded as the number of words in the database increased.  At first I thought there was an issue with AS itself in the way in which it handled associations.  However, I had overlooked a table index on the child entity’s parent id.  Once the index was included in the schema things took off.

Remember, kids.  Indexes are key.

Importing JMdict

Well, it seems progress toward getting Kotoba to its base-line functionality is nearer than farther away as of this weekend with the ability to import words from JMdict.

Importing using Ruby is both straight-forward and circuitous.  It is straight-forward in that using REXML API is pretty straight-forward.  However, the dearth of competent examples does mean that implementing a parser is strewn with mis-directions, especially if you take endorsements at face-value.  In particular, determining what parser to use along with predicting parser performance required me to write three different parsers over the weekend to ultimately create one with decent performance.  While Enterprise Ruby :: Parsing is a well-written article, it makes some implicit assumptions about file-size that limits the utility of XMLLib as a parser for large files.  More pointedly, anytime a tree/DOM parser is advocated for large files (10+ MB) then users should seriously question the credibility of the source.  The only appropriate solution in these situations is the use of stream parsers such as callback (SAX1 or SAX2 APIs) or another similar listener pattern.

To wit, JMdict is itself a 47 MB XML file with some 150,000 Japanese-English entries.  More so, there is a number of extraneous nodes, at least from perspective of Kotoba, that need to be processed but ignored.  A tree-parser will need to create objects for each node, even if a large potion of the tree will be ignored.  Given that each Ruby object minimally requires 12 bytes per object, large files that are first loaded into run-time memory before being processed will significantly tax most modern machines.  

Fortunately, minus the cross-references and some trivial references to parts of speech and dialects, the majority of information within each entry node is self-contained.  Consequently, DOM parsers are neither necessary nor realistic due to their memory requirements.  It took some searching to find a great resource on Ruby XML parsers.  I ultimately wrote a StreamListener and a StreamParser that works in an incremental fashion where we parse N entries then persist those N entries.  This is to ensure that we do not load our entire set of words (150,000) into memory before persisting to our database.

Going Legit

What web 2.0 is not legit if it doesn’t have a PayPal Donate button somewhere on it?  Answer: none.  And Kotoba is no different.  So donate if you feel like it, though honestly it is too soon.  But if you do then thanks!

We also started implementing our security model into the application.  This will be an on-going effort to ensure that the right bits and bobs are accessible to the right people.  Frankly, ActiveScaffold makes this part of the job relatively painless with its own controller/model API.