Tuesday 28 July 2015

Windows, why u so hard to install?

Forgive me father, for I have sinned. It's been 2 years and 2 days since my last confession.

I purchased an Intel NUC to replace my outdated WD TV Live that has no grunt power and will never be able to play Full HD movies, and lacks newish codec support.

My initial intention was to install kodibuntu or openelec on it, since I've used XBMC in the past, I assumed I'd be familiar with it. Unfortunately openelec's website is so terrible you can't even download their software without the download constantly timing out, and kodibuntu is based on LTS versions of ubuntu so the drivers are so out of date that the installer won't work properly on the brand new top of the range hardware in my "just released" Intel i7 based NUC. (In particular, its the Iris 6100 graphics chip causing the issues).

So, I hadn't purchased Windows OEM or anything when I purchased the NUC, due to expecting some form of linux to be fine. But, 2 hours of stuffing around, I gave up. To the microsoft store! I should be able to purchase a windows key online, download a USB bootable ISO and be away right. RIGHT?

Well, I purchased windows no problem. Nowhere on the purchase screen does it tell you that you need an existing windows machine to be able to create the boot media. I loaded a "live chat" with an "answer tech" at the microsoft store. 30 mins of "you need a windows computer to create the dvd or usb".... "what if i don't have a windows computer"... "find one, or install windows on one"... "how do i install windows one one?"... "create the boot media and install it"... "how do i create the boot media?"... "from a windows computer". SHOOT ME.

So I found a really old Windows Vista dvd lying around, had a really crappy motherboard/cpu in the cupboard doing nothing, attached a power supply and monitor. Hell I didn't even bother with a case. Installed vista without activating. Ran the stupid "mediacreationtool.exe" file i had to run and voila. We have boot media.

Ok, onto installing it on the NUC. Booting, happy days, enter product key, yep. All reminding me of my old windows days. What have I become?

Oooh here we go, partition time. It sees my brand new 128GB SSD formatted as "unknown", straight from packaging into the NUC. Oh what's this Windows, you won't let me re-format it? Ok delete it. Yep that worked. Create new partition. Hmm nope, nothing happens? Try again. As expected. Hmm.

Ok google around. Other people saying it could be corrupt SSD drive. Some say unplug it and plug it back in. Didn't work. So I took out the drive, plugged it into a linux machine, manually partitioned it with GNU fdisk, put it back in the NUC. Windows setup now likes my drive, yay. It can now delete and re-format partitions. Away we goooo. And it's installing.

And booted. WINDOWS!

Sigh. Why was that so hard. AND so expensive. $149 for that experience Microsoft? Puuuh-lease. Grand total of over 6 hours to get to that point. Un. Buh. Leavable.

Friday 26 July 2013

More than you ever wanted to know about email

So recently one of our CushyCMS users informed us that "welcome" emails were getting picked up by gmail as spam. I checked around a few other users that don't use gmail, and they seemed fine. The generic message from gmail was "other users who received emails like this one reported them as spam". Surely gmail doesn't take the popular word of the masses and mark certain emails as spam for everyone just because 100 other people marked it as such? Are we supposed to inform users that in order to receive emails from us they need to add us to their gmail contact list, before they even sign up?

Alas, I am forced into researching the intricacies of email headers. It turns out that gmail message is indeed just a "generic" one, and not surprisingly, it doesn't exactly work like that. In any case, I found some nice articles around the tubes that lead you in the right direction to making sure your email servers are providing all the relevant headers required for stricter spam filters like gmail.

There are of course the obvious things you need to do, which I have always known about and always done. They are making sure your reverse DNS works, make sure you aren't open relay and make sure the mailbox in the "From", "Reply-To" and "Sender" fields all exist, since bounces will go there and spammers hate it when you don't accept bounces. They can just go to /dev/null locally for all you care, spam filters don't care what you do with them as long as you don't reject them.

Up until now, the above has worked perfectly fine for me on pretty much everything I've built. I guess once you start generating a lot more email, more stuff is gonna start getting checked. This is where I discovered there is a whole lot more to spam fighting. There are 4 basic protocols that people suggest you implement. They sounded daunting at first, but they in fact ended up being a breeze once I got my head around it. I'm sure there are probably more, but these 4 seem to get most people going fine.

I'll start with SPF (Sender Policy Framework). This is quite simply, adding a record to your DNS that lists all the IP addresses of servers allowed to send email for your host. This started out as a TXT record in your zone file, but has evolved and SPF is now a valid record type. But not all DNS hosts allow you full access to your zone file, I know no-ip.com will only allow you to enter the TXT record and not the SPF one. For this reason, mail servers tend to check all the TXT records looking for ones that are in fact SPF records, as well as checking the SPF records themselves. Luckily I use gandi.net who allow you to edit the raw zone file, but it's still recommended you have both the TXT and SPF records. I hate duplication, but easy enough to do, wait for propagation and you're done.

Next was Sender ID, a lovely Microsoft invention. SPF actually evolved from this, but a lot of spam filters still use it, and since it's almost identical to SPF, we've already done the hard yards so easy enough to add it at the exact same time. This is simply another TXT record in the DNS. With even the same syntax (well, for the parts of SPF we are using anyway. SPF has more functionality, but we don't need it)

Lastly comes DKIM (Domain Keys Identified Mail). Much like the SPF/Sender ID relationship, once you have done the hard yards with DKIM, you also get the Yahoo invention Domain Keys at little cost. This is basically PGP for email. You will generate a public/private key pair and put your public key in your DNS under the hostname xxx._domainkeys.yourdomain.com where xxx is some identifier you will use in the DKIM headers. Most guides will just use "mail". Outgoing mail will use the private key to add an encrypted message to the header of the email. Receiving servers will see the header, grab your public key from DNS records and decrypt that message in the header to verify it really came from your server (the email itself is not actually encrypted, just one of the headers). A neat little idea really, and luckily there are plenty of guides for setting up the "opendkim" package on linux based servers, as well as in my case, a chef recipe that required no configuraion at all. Happy days.

The tricky part for me though was IPv6. For whatever reason, outgoing mail from my servers seem to alternate between the IPv6 address and the IPv4. When gmail was receiving an email from the IPv4 address, everything was hunky dory. But alternate times it got it from the IPv6 and couldn't verify any of the authenticity. Solution was to duplicate ALL of the above steps I'd done for IPv4, with IPv6. This requires adding AAAA records everywhere you have an A record. Making sure all your SPF/TXT records that mention your IPv4 address also mention your IPv6. And most importantly, setup reverse DNS for your IPv6.

Who knew email was so complicated and so tightly tied to your DNS. I guess we have the spammers out there to blame for all this. Thanks guys!

Tuesday 2 July 2013

ActiveRecord breeds terrible programmers

I am starting to think the old saying "make a tool that any fool can use, and only a fool will use it" is starting to ring true for ActiveRecord.

Consider the following model relationships:

class User < ActiveRecord::Base
  has_many :subscriptions
end

class Subscription < ActiveRecord::Base
  has_many :payments
  belongs_to :user
end

class Payment < ActiveRecord::Base
  belongs_to :subscription
end


Extremely straight forward right. However, consider a situation where you want to give a user a list of all of their payments. A terrible solution might be to do payments = user.subscriptions.map(&:payments).flatten. Which is just a golfed version of what some would consider an "ok" solution, but when you think about it, is still bad (in HAML)

- if @user.subscriptions.count > 0
  %h2 User Payments
  %ul
  - @user.subscriptions.each do |subscription|
    - subscription.payments.each do |payment|
      %li.reference= payment.reference


The problem with this solution, is that you are triggering a query for the count, and then the classic n+1 query issue for the subscription/payment relationship. Gems like bullet will help you find places in your code where your query issues can be improved, but why should a developer need a gem to show them something they should already know? Which is, make sure you know what SQL you are running at all times.

The solution to the above n+1 problem in ActiveRecord, as bullet will tell you, is to make sure you use includes(subscriptions: :payments) when you are setting the @user variable from a finder. But this is not always practical. Perhaps you have extracted the finder code out into a before_filter in your controller, especially if all actions require a @user.

Not to mention that User.includes(subscriptions: :payments).find(params[:id]) reads like complete crap. My main concern here is that ActiveRecord is hiding all this SQL generation away from developers, to the point where there are probably a huge amount of people calling themselves "developers" now that don't know a lick of SQL.

Using the above relationships, a common requirement in systems is to allow user objects to still be deleted, but not remove their payments as they are required for permanent record. So imagine a payment view with the following HAML:

- if @payment.subscription && @payment.subscription.user
  %h2 User details
  = render 'shared/_user', user: @payment.subscription.user


Here the developer probably doesn't know they are again triggering 2 more queries. And that's only 2 because of ActiveRecord's built in caching, it could be up to 5 in other ORMs. This non stop relationship chaining is what is causing developers to ignore what's happening at the database level.

For those playing along at home, the best way to get all of a user's payments in rails 4 is Payment.joins(:subscription).where('subscriptions.user_id = ?', user.id).references(:subscription) but putting all that in your controller is not recommended, and putting it into a model method makes it harder to customise the includes() that you may need to make your SQL more efficient.

Are you saving any code by writing that instead of select p.* from payments p join subscriptions s on (s.id = p.subscription_id) where s.user_id = ?? At least if you do it this way, you know exactly what is going on at the database layer.

Tuesday 18 September 2012

Weirdest problem you ever had to solve?

A common question in job interviews is "What is the weirdest problem you ever had to solve?". Every time I am asked this, I can never think of something on the spot. I vaguely remember debugging things like nutty mod_perl caching through apache restarts, issues with variables ending up with some info from other variables due to memory overflows and such.

But today, I am pretty sure I discovered the weirdest thing I have ever seen. FTP servers blocking my connection attempts, AFTER letting me put in the password.

In migrating my application to a new machine, located on Linode in the Fremont data center, I was obviously allocated a new IP address. Everything seemed hunky dory for a day or so. Then I started noticing an increasing number of users saying that FTP connections from that IP address to their webhost were being rejected with password failed. This is nothing new, failed passwords are extremely common for my application. But the rate at which these complaints were coming in were becoming hard to ignore.

So I started debugging. I grabbed a short list of 5 of the supposed bad credentials and tested them from the new server and sure enough, I got a bad password error. Not a "connection refused" error or anything like that, all was normal in the connection process, the FTP server headers appeared, the username was requested and given, etc. I then went over to the old server, and a few other random servers I had access to. All connections were absolutely fine from these other servers.

Now I was completely confused. If web hosts out there were blocking my new IP, why weren't they blocking at the network level with a "host unreachable" or "connection refused" response. Why let it connect only to reject the password. This would imply the blocking is being done in the FTP server software itself, which just doesn't sit right with me.

In any case, I could not figure it out. I asked the kind folk at Linode to change my IP address and they obliged. All FTP connections are now working perfectly again. A few customers will be annoyed at having to update their firewalls with new IP addresses twice in the last few days, but it was that or have a whole bunch of users not even able to connect.

What could it be? Surely the IP is not blocked at the FTP server level. Surely not that fast after only just setting up the machine 5 days ago. Perhaps that IP previously belonged to some hax0r and was already blocked, but as mentioned, why block with FTP servers instead of iptables. Hopefully I haven't been MITM'd somewhere along the way!

Edit: The second IP Linode gave me started experiencing the same problems after a day or 2. The end result was a few web hosts out there blocking my IP after supposed "suspect" activity. Connections work from other servers as a once off, but prolonged use from my server is what causes them to ultimately block the IP in the weird manner mentioned originally. The resolution is that affected clients will need to whitelist my IP in the FTP section of their web hosting control panel.

Monday 3 September 2012

MySQL to PostgreSQL database transition

There are quite a few blog posts around about how to convert your database from mysql to postgresql. I feel none of them really cover the entirety of the problem, I think the majority of people are dealing with small datasets or simple data and don't see the issues I had. So I figured I'd document my findings. I'll leave the arguments about why you should use one database or another for a different time.

First you'll want to get your data out of mysql. Personally, I found it faster/easier to just rewrite my entire database schema from scratch for postgres. You will need to be careful with datatypes to make sure the data will match up ok. This is where I hit my first problem, mysql doesn't have a boolean type, it uses a tinyint with the data being 1 and 0. Since we will be dumping the raw data later, it means your postgres database will have to be also a tinyint for now (you can change it once the data is in).

Once you have your schema written up, it's time to dump the data. This is where I hit problem number 2. No matter what your default charset is on your tables, the data will be dumped with the database default. I assumed mysql tables were utf8, postgres tables were utf8, good 'nuff. Wrong. You'll want to dump with --default-charset=utf8 to make sure.

Since we already wrote our schema, we can dump with --skip-create-info. Some other handy options you will want so you don't lock your mysql or run out of memory are --skip-opt --quick --single-transaction. And there are others just so you don't get bloat in your dump file, which are --skip-comments --skip-dump-date.

Now the main dump args you want to make sure it's importable by postgres are --skip-add-locks --complete-insert --compatible=postgres.

Onto problem number 3, mysql uses backslash as an escape char in it's dump data, escaping a whole bunch of stuff like tabs, newlines etc, when it doesn't really need to. At first I thought I just needed to fix the escaped quotes of \' and replace with ''. This allowed me to fully insert the data and I thought i was done. But after a while, I noticed all the extra backslashes in my data. Ouch. I thought I was going to have to get my sed programming hat on and replace every single escaped mysql character (there are lots). But alas, thanks to some dudes in the postgresql IRC channel, all you need is PGOPTIONS='-c standard_conforming_strings=off -c escape_string_warning=off'.

Now we're good right? We have our huge data file, the escape chars are taken care of, encoding is fixed. Nope. Remember the datatypes I referred to earlier. Turns out, postgres doesn't allow the null character (not the NULL value), in text or varchar fields. Any of your data that contains the null char you are going to need to change that field to a bytea. And then guess what, your database dump is now useless, postgres can't nicely import that "string" content into a bytea.

So make a call, do you really care about all your null characters? If you do, you're gonna have to find another way to import your data, if you don't, strip them out with perl or similar, making sure to use a positive look behind so you strip \0 but not \\0 (escaped backslash followed by literal zero). Why perl? Cos sed doesn't support look arounds.

All in all, it's a bit of a nightmare to keep all your data intact, I hope your transition goes smoother than mine.

Friday 3 August 2012

The future of gaming

A while back I was having a discussion about the future of PC gaming and there were arguments that it was on the improve due to the release of Starcraft II and the take up of professional gaming in America (it's been huge in South Korea for a long time). However, my argument was not about the amount of money PC games might bring in for development companies, but that the average age of PC gamers was only getting higher. In 2011 the average age reported was 37. That age hadn't changed in 5 years, I.e. in 2006 the average age was about 32. The only conclusion you can draw from this is that it's the same people buying/playing PC games now that bought them 5 years ago. The industry isn't gaining any new customers.

The reasoning behind this 5 years ago was (in my insignificant opinion) the increase in console gaming. The xbox 360 and PS3 had just come out, young 20 somethings would rather sit on the couch and play online against randoms with server side match making / ranking facilities that sony/microsoft made available. Something the PC gaming world severely lacked. Some games implemented it, such as the arena rankings in W.o.W. but it was left up to individual game developers rather than a platform wide system.

The console gaming has now taken a massive drop off, and PC gaming is now smashing consoles in sales. Surely this is due to the outdated hardware of the PS3 et al. But where are these gamers going? Not to the PC, as the average age and sales have not changed much.

Enter mobile gaming. The same mob that did the survey last year (linked above), did the study again this year and decided they had no choice but to start including mobile gaming. As such the average "gamer" age has dropped dramatically, to 30. Little Johnny from grade 5 can't afford $100 for the new modern warfare. But he can afford $1.99 for angry birds. Will mobile gaming mean the death of PC gaming? I highly doubt it, since existing PC gamers aren't going anywhere. But I do worry about the future of PC gaming if all these teens playing "song pop" aren't switching over to guild wars any time soon. In 10 years time, will the average age of a PC gamer be 47?

It's hard to say, but mobile gaming is clearly having an effect on PC gaming, much like console gaming did. 5 years from now will something kill mobile gaming the same way it killed console gaming? Maybe virtual gaming is making a come back, if John Carmack makes it, I'll play it.

In the mean time, grab your console, whack it behind your door cos it's only good for a doorstop, delete your games off your mobile cos they aren't really "games", install Steam and checkout what real games are all about.

Tuesday 3 July 2012

MVC is dead..... NOT

So I stumbled across this post on the twitterverse claiming that "MVC is dead". After a bit of hacker talk with Iordy, it was discovered there was already a heated discussion going on here. It seems even on the HN thread people get confused and start contradicting themselves.

The original article lost me pretty early on when it said "you end up stuffing too much code into your controllers, because you don't know where else to put it". If you're stuffing code in your controllers, then you're doing it wrong already.

The problem here stems from Rails programmers thinking the M in MVC is tightly coupled to your database table (via ActiveRecord). You can in fact have as many models as you want, not all of them have to be database tables, and in fact you can even inherit from existing models that are database tables if you wanted to.

To expand on that theory, the last few projects I've been a part of, we've sort of evolved into what I will call SMVC. We took the "database" side of things out of the model and put them into "schemes". Your scheme would do all the get/set work as well as serialize/de-serialize. The model could then inherit from the scheme and contain all the business logic but none of the "easy" serialization stuff.

Your controller in this case should not have any business logic. If you wanted to do certain things depending on request/environment, you could pass that info to the model via intialize and handle it in the model. Since you decoupled your scheme, you can easily add new accessors/mutators or just private instance variables that you can then use later, without affecting ActiveRecord in any way. Since you inherited from ActiveRecord you can overload any of the update/save methods you want and check your intial request/environment settings before calling super if you need to.

There is also not much need for fixtures if you are testing, since you still have access to the raw scheme, you can easily manage database data through these objects bypassing HTTP specific environments if you ever needed to.

Looking back to the original article above "MOVE", the SMVC example above is pretty much the same thing. It's just that the original article used the term "model" when referring to the serialization of data, probably because they thought that's just what models do. They simply "evolved" into the "MOVE" pattern because they were using MVC wrong in the first place. I.e. MVC is not dead, you're just doing it wrong, in fact, your new pattern is MVC renamed and ever so slightly abstracted.

Now we've focused on "too much logic in your controller", I'd like you to think about how much logic you have in your views. If it's way too much, I encourage you to check out MVVM.