Tuesday, 2 July 2013

ActiveRecord breeds terrible programmers

I am starting to think the old saying "make a tool that any fool can use, and only a fool will use it" is starting to ring true for ActiveRecord.

Consider the following model relationships:

class User < ActiveRecord::Base
  has_many :subscriptions

class Subscription < ActiveRecord::Base
  has_many :payments
  belongs_to :user

class Payment < ActiveRecord::Base
  belongs_to :subscription

Extremely straight forward right. However, consider a situation where you want to give a user a list of all of their payments. A terrible solution might be to do payments = user.subscriptions.map(&:payments).flatten. Which is just a golfed version of what some would consider an "ok" solution, but when you think about it, is still bad (in HAML)

- if @user.subscriptions.count > 0
  %h2 User Payments
  - @user.subscriptions.each do |subscription|
    - subscription.payments.each do |payment|
      %li.reference= payment.reference

The problem with this solution, is that you are triggering a query for the count, and then the classic n+1 query issue for the subscription/payment relationship. Gems like bullet will help you find places in your code where your query issues can be improved, but why should a developer need a gem to show them something they should already know? Which is, make sure you know what SQL you are running at all times.

The solution to the above n+1 problem in ActiveRecord, as bullet will tell you, is to make sure you use includes(subscriptions: :payments) when you are setting the @user variable from a finder. But this is not always practical. Perhaps you have extracted the finder code out into a before_filter in your controller, especially if all actions require a @user.

Not to mention that User.includes(subscriptions: :payments).find(params[:id]) reads like complete crap. My main concern here is that ActiveRecord is hiding all this SQL generation away from developers, to the point where there are probably a huge amount of people calling themselves "developers" now that don't know a lick of SQL.

Using the above relationships, a common requirement in systems is to allow user objects to still be deleted, but not remove their payments as they are required for permanent record. So imagine a payment view with the following HAML:

- if @payment.subscription && @payment.subscription.user
  %h2 User details
  = render 'shared/_user', user: @payment.subscription.user

Here the developer probably doesn't know they are again triggering 2 more queries. And that's only 2 because of ActiveRecord's built in caching, it could be up to 5 in other ORMs. This non stop relationship chaining is what is causing developers to ignore what's happening at the database level.

For those playing along at home, the best way to get all of a user's payments in rails 4 is Payment.joins(:subscription).where('subscriptions.user_id = ?', user.id).references(:subscription) but putting all that in your controller is not recommended, and putting it into a model method makes it harder to customise the includes() that you may need to make your SQL more efficient.

Are you saving any code by writing that instead of select p.* from payments p join subscriptions s on (s.id = p.subscription_id) where s.user_id = ?? At least if you do it this way, you know exactly what is going on at the database layer.


  1. Try adding to User

    has_many :payments, through: :subscriptions

  2. I do somewhat agree with your post, but the funny thing is, that sooner or later you will inevitably add the user_id to the payments table as well (joins are expensive and they will bite at some point). Then the query simply becomes Payments.where('user_id = ?', user_id)

  3. @Anonymous, true, but I was merely trying to demonstrate that programmers are becoming increasingly ignorant to the queries they trigger, and I blame ActiveRecord for that.

    @Matt, adding a database column for efficiency based on what is essentially some view logic (displaying all of a user's payments in a single hit rather than grouped by subscription) seems illogical to me. With indexes in the right places, denormalizing the database in the way you have suggested should be no more efficient.

    I think this post leads nicely into an old post I wrote regarding "view models" which is located here: http://blog.tigris.id.au/2012/07/mvc-is-dead-not.html

    Basically a view such as the user payments controller in this example, is going to setup a few variables for view to use, and if you want to do those queries the most efficient way possible, that is really something quite specific to this particular view. If there was a model with an initialize method that pre-defined all required queries, this would be easily testable and easy to locate all the queries used.

  4. As I said on twitter: Who cares?

    Is the query causing you performance problems? Optimize it. Is it not causing performance problems? Move on to the next task.

    If a developer doesn't know SQL, they're gonna have a really bad time with AR anyway: any kind of complex join & conditional & table creation requires SQL knowledge. AR is actually one of the better ORMs for allowing you to callout to SQL when you need to, compared to some of the more restrictive everything-is-an-object ones on Python & Java.

    And if the developer doesn't have those kinds of complex situations, they're probably not gonna come up against those performance issues. If they do, anyway, then they're gonna have a great reason to learn SQL & how databases work.

  5. Thank you for the useful article.
    When it comes to data security I always suggest keeping employees informed of how to deal with important data in order to prevent a data breach. The statistics say that more than 50% of data breaches are caused by people. For better security I would use virtual data rooms.