Caching RSS Feeds

We recently incorporated RSS feeds into Kotoba. I noted while implementing my solution that there are some tutorials [1, 2] out there that provide quick-and-dirty solutions to getting RSS feeds into a Rails application. While the tutorials are a good start, they do neglect the fact these implementations will drive a lot of (unnecessary) traffic between your site and the site with the RSS feed.

What can you do? What else! The time honored solution for all things that ail software: cache it.

First, let us create our RSS controller, or:

require 'kotoba_rss/feed'

class RssController < ApplicationController

  def parse_feed
    feed_url                = params['rss_url']                      || 'http://word.wardosworld.com/?feed=rss2&cat=6'
    maximum_number_of_items = params['maximum_number_of_items'].to_i || 5
    title                   = params['title']                        || t('rss.rss_feed')
    details                 = params['details']                      || t('rss.read_more')
    details_url             = params['details_url']                  || 'http://word.wardosworld.com/?cat=6'
    feed(feed_url, maximum_number_of_items, title, details, details_url)
  end

  def feed(feed_url, maximum_number_of_items, title, details, details_url)
    kotoba_rss_feed = Kotoba_Rss::Feed.new
    result = kotoba_rss_feed.get(feed_url)
    return render( 
      :partial => '/rss/feed', 
      :layout => 'popup',
      :locals => { 
        :result => result, 
        :maximum_number_of_items => maximum_number_of_items, 
        :title => title,
        :details => details,
        :details_url => details_url
      }
    )
  end
end

The next step is to create a class that will fetch RSS feeds for the rest of our application caching as appropriate, lib/kotoba_rss/feed.rb:

require "rss"
require 'rss/2.0'
require 'open-uri'

class Kotoba_Rss::Feed
  
  def initialize
    @cache = Kotoba_Rss::Cache.instance
  end
  
  def get(url)
    result = nil
    if @cache.has_expired?(url)
      open(url) do |http|
        response = http.read
        result   = RSS::Parser.parse(response, false)
      end
      @cache.add(url,result)
    else
      result = @cache.get(url)
    end
    return result
  end
  
end

Finally, we want to be able to cache our RSS feed results between calls, lib/kotoba_rss/cache.rb:

# Keep a cache of fetched RSS feeds.  This is to ensure
# we do not hit an RSS feed too often; an issue when we
# have many page fetches.  
#
# This is also a very useful thing when we are fetching
# from sites that might be throttling requests from specific
# users and, or domains (e.g. twitter.com).
class Kotoba_Rss::Cache
  include Singleton
  
  ENTRY_EXPIRES_IN_SECONDS = 300 # 5 minutes
  
  def initialize
    @cache = Hash.new
  end
  
  def add(url, result)
    entity = Kotoba_Rss::Cache::Entry.new(Time.now,result)
    @cache[url] = entity
  end
  
  def get(url)
    @cache[url].feed_result
  end
    
  def has_expired?(url)
    if @cache.has_key?(url)
      entry        = @cache[url]
      current_time = Time.now
      time_expired = current_time - entry.feed_time
      if time_expired > ENTRY_EXPIRES_IN_SECONDS
        return true
      else
        return false
      end
    else
      # if we do not have the URL yet then 
      # consider it expired
      return true
    end
  end

end

In the same file, lib/kotoba_rss/cache.rb, I also include:


class Kotoba_Rss::Cache::Entry
  attr_reader :feed_time, :feed_result
  
  def initialize(time, result)
    @feed_time   = time
    @feed_result = result
  end
  
end

I consider Kotoba_Rss::Cache::Entry an inner class that is only really useful for Kotaba_Rss::Cache, thus a great candidate for being included in its parent’s file; however, if you prefer you can certainly put it in its own file. In truth, Kotaba_Rss::Cache can also be included in lib/kotoba_rss/feed.rb which makes an even better design (note to self to refactor afterwards).

To use we have in our ApplicationHelper

  def link_to_blog_popup
    link_to(
         image_gif('blog',t('meta.blog')), 
         { 
           :action => 'parse_feed', 
           :controller => 'rss', 
           :rss_url => 'http://word.wardosworld.com/?feed=rss2&cat=6', 
           :maximum_number_of_items => 10, 
           :title => t('meta.blog'),
           :details => t('meta.blog_read_more'),
           :details_url => 'http://word.wardosworld.com/?cat=6'
         } , 
         :class => 'popup',
         :onclick => "return hs.htmlExpand(this, { objectType: 'ajax', contentId: 'popup', wrapperClassName: 'highslide-no-border', dimmingOpacity: 0.75, align: 'center'} )"
     ) 
  end

Note that we are using I18n t() method for localization.

In summary, we have a good separation of responsibility between all our moving parts. We are using a controller the way it is intended, with our presentation properly off-loaded to views. In our library we have a means of making a service that invisibly caches its results.

In my experience, caches should almost always be hidden from the caller; otherwise, you begin to degrade cohesion and venture into the land of tight coupling. Additionally, it makes the cache design much harder with multiple point of entry.

That is it. Whenever we have held a result in our cache longer than ENTRY_EXPIRES_IN_SECONDS our RSS feed will get new result that it stores in its cache. This will ensure that as our site grows (and it will!) we will not be hitting external services with every page fetch; something that would yield little value to our users (slower fetches) and make us bad clients of other sites’ feeds.

Author: Ward

I’m the creator and operator of this little corner of the internets, writing on all things related to art and more specifically my experiences trying to figure this whole thing out. I guess I’m trying to figure out life, too, but mostly I just post about art here.

Breath some fire into this post!

This site uses Akismet to reduce spam. Learn how your comment data is processed.