I am continuously asked by wide-eyed green web 2.0 developers about Master-Master Replication (M-M). Development Managers and CTOs/CIOs aren’t immune to this either. They read a couple of posts about it, on the web, and seem to think that you can put together a couple of commodity servers, through in a Load Balancer and voila! Instant cluster, without the high price tag of Oracle’s RAC.
Unfortunately, its not that simple. MySQL actually has a cluster product which it acquired from Ericsson. It has some limitations. V5.0 needs to be able to have both data and index completely in memory. V5.1 needs to be able to have all index in memory. Either later choices performs a 2 phase commit, so at anytime, querying any node with the same query, will return the same result. In a M-M environment, depending on the load on either node, replication can be lagging, thus returning different results for the same query from each node.
If such a case is not handled by the application, at best, you will have inconsistent results, at worse dire results.
Some applications will simply not be able to use it at all. Here is an example. Say your DB Schema has a column that needs to be unique, however, not part of the Primary Key. If you insert a record on one node, then insert an identical record on the second node, you will end up with two legal records on their respective nodes. When it gets time to replicate that record in both directions, the unique constraint will make replication fail on both servers.
But doesn’t it offer High-Availability?
It does, if you are willing to live with a potentially inconsistent dataset after a node failure. Remember we specified earlier that replication can lag between nodes. If disaster strikes on one of your nodes, you have lost every transaction for the period that the other node was lagging behind the lost one.
The reuslt? Your website or app is still running, however data that was previously on one server will no longer be available. depending on your app, that could have huge consequences.
So Why use it?
If you could mitigate the above mentioned risks, and design your app properly, you could take advantage of M-M. Reads would be shared between both nodes, theoretically allowing you to handle more load. You could also pull out one node at a time, deploy possible long schema changes, rebuild indexes, etc. while the other handles the whole load, then do the same to the other.
I tend to stay away from M-M. It adds an extra, often unnecessary, layer of complexity to your environment. It does have its place, and should be part of your little bag of tricks, but only pull it out when you need to, and when it absolutely fits. Don’t let that green PHP developper tell you he built it at home and it works, then you get stuck supporting a nightmare. His pager isn’t going to go off at 3am…yours is!