Degraded Performance Resolved

Two days ago I ran into an issue with ActiveScaffold (AS) and Kotoba whereby performance was degraded as the number of words in the database increased.  At first I thought there was an issue with AS itself in the way in which it handled associations.  However, I had overlooked a table index on the child entity’s parent id.  Once the index was included in the schema things took off.

Remember, kids.  Indexes are key.

Importing JMdict

Well, it seems progress toward getting Kotoba to its base-line functionality is nearer than farther away as of this weekend with the ability to import words from JMdict.

Importing using Ruby is both straight-forward and circuitous.  It is straight-forward in that using REXML API is pretty straight-forward.  However, the dearth of competent examples does mean that implementing a parser is strewn with mis-directions, especially if you take endorsements at face-value.  In particular, determining what parser to use along with predicting parser performance required me to write three different parsers over the weekend to ultimately create one with decent performance.  While Enterprise Ruby :: Parsing is a well-written article, it makes some implicit assumptions about file-size that limits the utility of XMLLib as a parser for large files.  More pointedly, anytime a tree/DOM parser is advocated for large files (10+ MB) then users should seriously question the credibility of the source.  The only appropriate solution in these situations is the use of stream parsers such as callback (SAX1 or SAX2 APIs) or another similar listener pattern.

To wit, JMdict is itself a 47 MB XML file with some 150,000 Japanese-English entries.  More so, there is a number of extraneous nodes, at least from perspective of Kotoba, that need to be processed but ignored.  A tree-parser will need to create objects for each node, even if a large potion of the tree will be ignored.  Given that each Ruby object minimally requires 12 bytes per object, large files that are first loaded into run-time memory before being processed will significantly tax most modern machines.  

Fortunately, minus the cross-references and some trivial references to parts of speech and dialects, the majority of information within each entry node is self-contained.  Consequently, DOM parsers are neither necessary nor realistic due to their memory requirements.  It took some searching to find a great resource on Ruby XML parsers.  I ultimately wrote a StreamListener and a StreamParser that works in an incremental fashion where we parse N entries then persist those N entries.  This is to ensure that we do not load our entire set of words (150,000) into memory before persisting to our database.

Going Legit

What web 2.0 is not legit if it doesn’t have a PayPal Donate button somewhere on it?  Answer: none.  And Kotoba is no different.  So donate if you feel like it, though honestly it is too soon.  But if you do then thanks!

We also started implementing our security model into the application.  This will be an on-going effort to ensure that the right bits and bobs are accessible to the right people.  Frankly, ActiveScaffold makes this part of the job relatively painless with its own controller/model API.

Stopping Spammers

In my first implementation of password reset, a user just entered their email address and they were sent a new password.  Unfortunately, this meant that an unsavory person could just keep entering your email into the application, resetting your password and by-and-large being, well, an unsavory person.

Originally I had intended next implement security questions in order to allow users to reset their passwords.  However, I had a lot of design decisions about how to support and implement this both in terms of functionality and usability.  While I think security questions are ultimately a superior approach, I opted for an easier approach.  

The current solution is to send user’s reset password confirmation key to their email account.  Upon receipt, they can return to the site and with their email address and confirmation key in hand request their password be reset.  At this point the original flow is followed; namely, we create a random password and send it to you in email.

The advantage to this approach is that it leveraged extant functionality in the user model, thus requiring a minimum of coding.  However, the one down-side is that resetting your password requires you to use both Kotoba and you email client.  Security questions would obviate the need for email and make the whole user experience better.

Nevertheless, expediency won the day.  For now.

Kotoba Gains Authentication

Kotoba‘s user authentication and registration are functionally complete.  This includes account creation confirmation using a key that is sent to your email address.  If you forget your authorization key or need to reset your password, both of these can now be done from the web-site.

The next step will be to allow security questions for resetting your password, as this will minimize (eliminate) other persons from resetting other users’ passwords.

Also in the works is the ability to import JMDict/EDict XML files directly into the dictionary.  Once this importation is complete the real work can begin.