Tuesday 18 September 2012

Weirdest problem you ever had to solve?

A common question in job interviews is "What is the weirdest problem you ever had to solve?". Every time I am asked this, I can never think of something on the spot. I vaguely remember debugging things like nutty mod_perl caching through apache restarts, issues with variables ending up with some info from other variables due to memory overflows and such.

But today, I am pretty sure I discovered the weirdest thing I have ever seen. FTP servers blocking my connection attempts, AFTER letting me put in the password.

In migrating my application to a new machine, located on Linode in the Fremont data center, I was obviously allocated a new IP address. Everything seemed hunky dory for a day or so. Then I started noticing an increasing number of users saying that FTP connections from that IP address to their webhost were being rejected with password failed. This is nothing new, failed passwords are extremely common for my application. But the rate at which these complaints were coming in were becoming hard to ignore.

So I started debugging. I grabbed a short list of 5 of the supposed bad credentials and tested them from the new server and sure enough, I got a bad password error. Not a "connection refused" error or anything like that, all was normal in the connection process, the FTP server headers appeared, the username was requested and given, etc. I then went over to the old server, and a few other random servers I had access to. All connections were absolutely fine from these other servers.

Now I was completely confused. If web hosts out there were blocking my new IP, why weren't they blocking at the network level with a "host unreachable" or "connection refused" response. Why let it connect only to reject the password. This would imply the blocking is being done in the FTP server software itself, which just doesn't sit right with me.

In any case, I could not figure it out. I asked the kind folk at Linode to change my IP address and they obliged. All FTP connections are now working perfectly again. A few customers will be annoyed at having to update their firewalls with new IP addresses twice in the last few days, but it was that or have a whole bunch of users not even able to connect.

What could it be? Surely the IP is not blocked at the FTP server level. Surely not that fast after only just setting up the machine 5 days ago. Perhaps that IP previously belonged to some hax0r and was already blocked, but as mentioned, why block with FTP servers instead of iptables. Hopefully I haven't been MITM'd somewhere along the way!

Edit: The second IP Linode gave me started experiencing the same problems after a day or 2. The end result was a few web hosts out there blocking my IP after supposed "suspect" activity. Connections work from other servers as a once off, but prolonged use from my server is what causes them to ultimately block the IP in the weird manner mentioned originally. The resolution is that affected clients will need to whitelist my IP in the FTP section of their web hosting control panel.

Monday 3 September 2012

MySQL to PostgreSQL database transition

There are quite a few blog posts around about how to convert your database from mysql to postgresql. I feel none of them really cover the entirety of the problem, I think the majority of people are dealing with small datasets or simple data and don't see the issues I had. So I figured I'd document my findings. I'll leave the arguments about why you should use one database or another for a different time.

First you'll want to get your data out of mysql. Personally, I found it faster/easier to just rewrite my entire database schema from scratch for postgres. You will need to be careful with datatypes to make sure the data will match up ok. This is where I hit my first problem, mysql doesn't have a boolean type, it uses a tinyint with the data being 1 and 0. Since we will be dumping the raw data later, it means your postgres database will have to be also a tinyint for now (you can change it once the data is in).

Once you have your schema written up, it's time to dump the data. This is where I hit problem number 2. No matter what your default charset is on your tables, the data will be dumped with the database default. I assumed mysql tables were utf8, postgres tables were utf8, good 'nuff. Wrong. You'll want to dump with --default-charset=utf8 to make sure.

Since we already wrote our schema, we can dump with --skip-create-info. Some other handy options you will want so you don't lock your mysql or run out of memory are --skip-opt --quick --single-transaction. And there are others just so you don't get bloat in your dump file, which are --skip-comments --skip-dump-date.

Now the main dump args you want to make sure it's importable by postgres are --skip-add-locks --complete-insert --compatible=postgres.

Onto problem number 3, mysql uses backslash as an escape char in it's dump data, escaping a whole bunch of stuff like tabs, newlines etc, when it doesn't really need to. At first I thought I just needed to fix the escaped quotes of \' and replace with ''. This allowed me to fully insert the data and I thought i was done. But after a while, I noticed all the extra backslashes in my data. Ouch. I thought I was going to have to get my sed programming hat on and replace every single escaped mysql character (there are lots). But alas, thanks to some dudes in the postgresql IRC channel, all you need is PGOPTIONS='-c standard_conforming_strings=off -c escape_string_warning=off'.

Now we're good right? We have our huge data file, the escape chars are taken care of, encoding is fixed. Nope. Remember the datatypes I referred to earlier. Turns out, postgres doesn't allow the null character (not the NULL value), in text or varchar fields. Any of your data that contains the null char you are going to need to change that field to a bytea. And then guess what, your database dump is now useless, postgres can't nicely import that "string" content into a bytea.

So make a call, do you really care about all your null characters? If you do, you're gonna have to find another way to import your data, if you don't, strip them out with perl or similar, making sure to use a positive look behind so you strip \0 but not \\0 (escaped backslash followed by literal zero). Why perl? Cos sed doesn't support look arounds.

All in all, it's a bit of a nightmare to keep all your data intact, I hope your transition goes smoother than mine.