asemanfar - a blog about programming

Getting rid of that extra ActiveRecord fluff

March 15, 2008

On the web app I'm currently working on, I needed to get all the restaurants that a user was associated with either through visiting it or rating it. My original solution is shown below.

   1  from_outings = outings.find(:all, :include => {:restaurant => :ratings}).map {|o| o.restaurant}
   2  from_ratings = ratings.find(:all, :include => :restaurant).map{|r| r.restaurant}
   3  (from_ratings + from_outings).uniq

Here's what's going on here:

  1. It runs a query that will return all the visits (outings) the current user has visited.
  2. It creates an object for every Outing, Restaurant, and Rating returned from that query.
  3. Iterates through each Outing and collects the Restaurant object associated with each (there may be repeats.)
  4. Then it runs a query that gets all the ratings the particular user has made and the restaurants they were for.
  5. It takes the results from the previous step and creates Rating and Restaurant objects for each result returned (even if we have a Rating or Restaurant object for this particular record from the previous step.)
  6. Once again, iterates through the Rating objects and collects the Restaurant from each.
  7. Finally, combines both lists of Restaurant objects and takes out duplicates.

That's 2 queries and tons of unused Outing objects and possible repeated Restaurant and Rating objects.

After writing a custom SQL to just get all the Restaurants that I wanted in 1 query, I was still left with having to load the ratings for each restaurant which results in 1 + N queries which is worse than my 2 queries from above.

Luckily, I was able to get ActiveRecord to play nicely with my custom SQL and load the Rating objects from the joins I was already doing and came up with this.

   1  Restaurant.find(:all, :include => :ratings, :from => "`restaurants` LEFT OUTER 
   2    JOIN (`outings` INNER JOIN `attending_users` ON (`attending_users`.`outing_id` = `outings`.`id`))
   3    ON (`outings`.`restaurant_id` = `restaurants`.`id`)", 
   4    :conditions => ["(`ratings`.`user_id` = ? AND `ratings`.`restaurant_id` = `restaurants`.`id`) 
   5    OR (`attending_users`.`user_id` = ? AND `outings`.`restaurant_id` = `restaurants`.`id`)", self.id, self.id], 
   6    :order => "`outings`.`date` DESC")

It's long, but it seems to work fine. It doesn't create any objects I don't need, and it doesn't create any duplicate Restaurant objects.

I'm not 100% sure if it's logically equivalent to what I'm trying to do. I'm not sure if it will load all the Rating objects for a given restaurant (including the ones not by the current user.)

Any constructive criticism would be greatly appreciated.

Update:

After messing around some more with it, I came up with this. It's faster and it's more correct. Still got some work to do to make sure it's doing what I want.

   1  Restaurant.find(:all, :include => :ratings, 
   2    :from => "(select ratings.restaurant_id from ratings where ratings.user_id = #{self.id.to_i}) as m1, 
   3      (select outings.restaurant_id from outings inner join attending_users on (attending_users.outing_id = outings.id) 
   4      where attending_users.user_id = #{self.id.to_i}) as m2, restaurants", 
   5    :conditions => ["restaurants.id in (m1.restaurant_id) or 
   6      restaurants.id in (m2.restaurant_id)"], :order => "restaurants.name DESC")

Update 2: Changed it to a union, so it now looks like:

   1  Restaurant.find(:all, :include => :ratings, 
   2    :from => "((select ratings.restaurant_id from ratings where ratings.user_id = #{self.id.to_i}) union 
   3      (select outings.restaurant_id from outings inner join attending_users on (attending_users.outing_id = outings.id) 
   4      where attending_users.user_id = #{self.id.to_i})) as m1, restaurants", 
   5    :conditions => ["restaurants.id in (m1.restaurant_id)"], :order => "restaurants.name DESC")

Comments

posted by Rohith on 03/18/08 05:15 PM PDT

Wouldn't it make more sense to de-normalize the attending_users table to include the restaurant_id so that you can do something along the lines of:

   1  "select restaurant_id from attending_users where user_id = 'blah'"
   2    # union it with
   3    "select restaurant_id from ratings where user_id = 'blah'"
   4    # and then doing a uniq?

I am guessing that with a fair amount of rows in the rating and outing table, 2 separate small queries will be better than one gnarly sub-query.

All this logic could be nicely wrapped in

   1  Restaurant::find_related_to(user)
posted by Arya Asemanfar on 03/18/08 05:29 PM PDT

I recently changed it to be a union of m1 and m2 instead of doing that as two separate queries. It's only doing the selection of the restaurant_id once so it's not repeating the sub-query multiple times. De-normalization is a possibility in the future when the query attending_user x outing join starts becoming slow (not that it should since it's indexed) starts to become slow, I don't want to make it ugly quite yet.

Thanks for the reply, I didn't think of de-normalizing it until you mentioned it.


Leave a Comment