Friday 26 July 2013

More than you ever wanted to know about email

So recently one of our CushyCMS users informed us that "welcome" emails were getting picked up by gmail as spam. I checked around a few other users that don't use gmail, and they seemed fine. The generic message from gmail was "other users who received emails like this one reported them as spam". Surely gmail doesn't take the popular word of the masses and mark certain emails as spam for everyone just because 100 other people marked it as such? Are we supposed to inform users that in order to receive emails from us they need to add us to their gmail contact list, before they even sign up?

Alas, I am forced into researching the intricacies of email headers. It turns out that gmail message is indeed just a "generic" one, and not surprisingly, it doesn't exactly work like that. In any case, I found some nice articles around the tubes that lead you in the right direction to making sure your email servers are providing all the relevant headers required for stricter spam filters like gmail.

There are of course the obvious things you need to do, which I have always known about and always done. They are making sure your reverse DNS works, make sure you aren't open relay and make sure the mailbox in the "From", "Reply-To" and "Sender" fields all exist, since bounces will go there and spammers hate it when you don't accept bounces. They can just go to /dev/null locally for all you care, spam filters don't care what you do with them as long as you don't reject them.

Up until now, the above has worked perfectly fine for me on pretty much everything I've built. I guess once you start generating a lot more email, more stuff is gonna start getting checked. This is where I discovered there is a whole lot more to spam fighting. There are 4 basic protocols that people suggest you implement. They sounded daunting at first, but they in fact ended up being a breeze once I got my head around it. I'm sure there are probably more, but these 4 seem to get most people going fine.

I'll start with SPF (Sender Policy Framework). This is quite simply, adding a record to your DNS that lists all the IP addresses of servers allowed to send email for your host. This started out as a TXT record in your zone file, but has evolved and SPF is now a valid record type. But not all DNS hosts allow you full access to your zone file, I know no-ip.com will only allow you to enter the TXT record and not the SPF one. For this reason, mail servers tend to check all the TXT records looking for ones that are in fact SPF records, as well as checking the SPF records themselves. Luckily I use gandi.net who allow you to edit the raw zone file, but it's still recommended you have both the TXT and SPF records. I hate duplication, but easy enough to do, wait for propagation and you're done.

Next was Sender ID, a lovely Microsoft invention. SPF actually evolved from this, but a lot of spam filters still use it, and since it's almost identical to SPF, we've already done the hard yards so easy enough to add it at the exact same time. This is simply another TXT record in the DNS. With even the same syntax (well, for the parts of SPF we are using anyway. SPF has more functionality, but we don't need it)

Lastly comes DKIM (Domain Keys Identified Mail). Much like the SPF/Sender ID relationship, once you have done the hard yards with DKIM, you also get the Yahoo invention Domain Keys at little cost. This is basically PGP for email. You will generate a public/private key pair and put your public key in your DNS under the hostname xxx._domainkeys.yourdomain.com where xxx is some identifier you will use in the DKIM headers. Most guides will just use "mail". Outgoing mail will use the private key to add an encrypted message to the header of the email. Receiving servers will see the header, grab your public key from DNS records and decrypt that message in the header to verify it really came from your server (the email itself is not actually encrypted, just one of the headers). A neat little idea really, and luckily there are plenty of guides for setting up the "opendkim" package on linux based servers, as well as in my case, a chef recipe that required no configuraion at all. Happy days.

The tricky part for me though was IPv6. For whatever reason, outgoing mail from my servers seem to alternate between the IPv6 address and the IPv4. When gmail was receiving an email from the IPv4 address, everything was hunky dory. But alternate times it got it from the IPv6 and couldn't verify any of the authenticity. Solution was to duplicate ALL of the above steps I'd done for IPv4, with IPv6. This requires adding AAAA records everywhere you have an A record. Making sure all your SPF/TXT records that mention your IPv4 address also mention your IPv6. And most importantly, setup reverse DNS for your IPv6.

Who knew email was so complicated and so tightly tied to your DNS. I guess we have the spammers out there to blame for all this. Thanks guys!

Tuesday 2 July 2013

ActiveRecord breeds terrible programmers

I am starting to think the old saying "make a tool that any fool can use, and only a fool will use it" is starting to ring true for ActiveRecord.

Consider the following model relationships:

class User < ActiveRecord::Base
  has_many :subscriptions
end

class Subscription < ActiveRecord::Base
  has_many :payments
  belongs_to :user
end

class Payment < ActiveRecord::Base
  belongs_to :subscription
end


Extremely straight forward right. However, consider a situation where you want to give a user a list of all of their payments. A terrible solution might be to do payments = user.subscriptions.map(&:payments).flatten. Which is just a golfed version of what some would consider an "ok" solution, but when you think about it, is still bad (in HAML)

- if @user.subscriptions.count > 0
  %h2 User Payments
  %ul
  - @user.subscriptions.each do |subscription|
    - subscription.payments.each do |payment|
      %li.reference= payment.reference


The problem with this solution, is that you are triggering a query for the count, and then the classic n+1 query issue for the subscription/payment relationship. Gems like bullet will help you find places in your code where your query issues can be improved, but why should a developer need a gem to show them something they should already know? Which is, make sure you know what SQL you are running at all times.

The solution to the above n+1 problem in ActiveRecord, as bullet will tell you, is to make sure you use includes(subscriptions: :payments) when you are setting the @user variable from a finder. But this is not always practical. Perhaps you have extracted the finder code out into a before_filter in your controller, especially if all actions require a @user.

Not to mention that User.includes(subscriptions: :payments).find(params[:id]) reads like complete crap. My main concern here is that ActiveRecord is hiding all this SQL generation away from developers, to the point where there are probably a huge amount of people calling themselves "developers" now that don't know a lick of SQL.

Using the above relationships, a common requirement in systems is to allow user objects to still be deleted, but not remove their payments as they are required for permanent record. So imagine a payment view with the following HAML:

- if @payment.subscription && @payment.subscription.user
  %h2 User details
  = render 'shared/_user', user: @payment.subscription.user


Here the developer probably doesn't know they are again triggering 2 more queries. And that's only 2 because of ActiveRecord's built in caching, it could be up to 5 in other ORMs. This non stop relationship chaining is what is causing developers to ignore what's happening at the database level.

For those playing along at home, the best way to get all of a user's payments in rails 4 is Payment.joins(:subscription).where('subscriptions.user_id = ?', user.id).references(:subscription) but putting all that in your controller is not recommended, and putting it into a model method makes it harder to customise the includes() that you may need to make your SQL more efficient.

Are you saving any code by writing that instead of select p.* from payments p join subscriptions s on (s.id = p.subscription_id) where s.user_id = ?? At least if you do it this way, you know exactly what is going on at the database layer.