Getting rid of that extra ActiveRecord fluff
March 15, 2008On the web app I'm currently working on, I needed to get all the restaurants that a user was associated with either through visiting it or rating it. My original solution is shown below.
1 from_outings = outings.find(:all, :include => {:restaurant => :ratings}).map {|o| o.restaurant} 2 from_ratings = ratings.find(:all, :include => :restaurant).map{|r| r.restaurant} 3 (from_ratings + from_outings).uniq
Here's what's going on here:
- It runs a query that will return all the visits (outings) the current user has visited.
- It creates an object for every Outing, Restaurant, and Rating returned from that query.
- Iterates through each Outing and collects the Restaurant object associated with each (there may be repeats.)
- Then it runs a query that gets all the ratings the particular user has made and the restaurants they were for.
- It takes the results from the previous step and creates Rating and Restaurant objects for each result returned (even if we have a Rating or Restaurant object for this particular record from the previous step.)
- Once again, iterates through the Rating objects and collects the Restaurant from each.
- Finally, combines both lists of Restaurant objects and takes out duplicates.
That's 2 queries and tons of unused Outing objects and possible repeated Restaurant and Rating objects.
After writing a custom SQL to just get all the Restaurants that I wanted in 1 query, I was still left with having to load the ratings for each restaurant which results in 1 + N queries which is worse than my 2 queries from above.
Luckily, I was able to get ActiveRecord to play nicely with my custom SQL and load the Rating objects from the joins I was already doing and came up with this.
1 Restaurant.find(:all, :include => :ratings, :from => "`restaurants` LEFT OUTER 2 JOIN (`outings` INNER JOIN `attending_users` ON (`attending_users`.`outing_id` = `outings`.`id`)) 3 ON (`outings`.`restaurant_id` = `restaurants`.`id`)", 4 :conditions => ["(`ratings`.`user_id` = ? AND `ratings`.`restaurant_id` = `restaurants`.`id`) 5 OR (`attending_users`.`user_id` = ? AND `outings`.`restaurant_id` = `restaurants`.`id`)", self.id, self.id], 6 :order => "`outings`.`date` DESC")
It's long, but it seems to work fine. It doesn't create any objects I don't need, and it doesn't create any duplicate Restaurant objects.
I'm not 100% sure if it's logically equivalent to what I'm trying to do. I'm not sure if it will load all the Rating objects for a given restaurant (including the ones not by the current user.)
Any constructive criticism would be greatly appreciated.
Update:
After messing around some more with it, I came up with this. It's faster and it's more correct. Still got some work to do to make sure it's doing what I want.
1 Restaurant.find(:all, :include => :ratings, 2 :from => "(select ratings.restaurant_id from ratings where ratings.user_id = #{self.id.to_i}) as m1, 3 (select outings.restaurant_id from outings inner join attending_users on (attending_users.outing_id = outings.id) 4 where attending_users.user_id = #{self.id.to_i}) as m2, restaurants", 5 :conditions => ["restaurants.id in (m1.restaurant_id) or 6 restaurants.id in (m2.restaurant_id)"], :order => "restaurants.name DESC")
Update 2: Changed it to a union, so it now looks like:
1 Restaurant.find(:all, :include => :ratings, 2 :from => "((select ratings.restaurant_id from ratings where ratings.user_id = #{self.id.to_i}) union 3 (select outings.restaurant_id from outings inner join attending_users on (attending_users.outing_id = outings.id) 4 where attending_users.user_id = #{self.id.to_i})) as m1, restaurants", 5 :conditions => ["restaurants.id in (m1.restaurant_id)"], :order => "restaurants.name DESC")
Comments
Wouldn't it make more sense to de-normalize the attending_users table to include the restaurant_id so that you can do something along the lines of:
I am guessing that with a fair amount of rows in the rating and outing table, 2 separate small queries will be better than one gnarly sub-query.
All this logic could be nicely wrapped in
I recently changed it to be a union of m1 and m2 instead of doing that as two separate queries. It's only doing the selection of the restaurant_id once so it's not repeating the sub-query multiple times. De-normalization is a possibility in the future when the query attending_user x outing join starts becoming slow (not that it should since it's indexed) starts to become slow, I don't want to make it ugly quite yet.
Thanks for the reply, I didn't think of de-normalizing it until you mentioned it.
Leave a Comment