Before discussing the replication formats in MySQL it is necessary to discuss how replication works in MySQL. Why is replication required? The answer lies in my earlier post in which I described the process of scaling MySQL for reads (
Scale-Out MySQL). Lets proceed by discussing the replication process. Replication process consists of the following steps:
MySQL Master writes any changes that occur to the 'binlog'. binlog is a log that contains any updates/inserts/deletes made on the master MySQL.
Slave's I/O thread reads the binlog from the master and writes the events in its 'relaylog'.
The MySQL thread reads the events from the relaylog and applies those events to the slave.
Steps 1 till 3 are repeated so the slave is synchronized with the master all the time.
Below is the diagram that will aid you in understanding the process of replication in MySQL.
Now that you are aware to some extent about the process of replication, lets dive deep into the replication formats. By replication format I mean the format in which events are recorded in master's binlog. There are three types of replication formats:
SBR or Statement based Replication
In this format of replication, the MySQL master, records the events as SQL statements into the binlog. The statements are picked up by the MySQL slaves and replayed in the same way as they are played at master
RBR or Row based Replication
In this format of replication, the MySQL master, records the events as actual rows that indicate how the rows are changed at the master.
Mixed Mode Replication
This format of replication is a mix of RBR and SBR. MySQL switches the format in real-time depending on the type of event.
Of course there are pros and cons of each format, I will primarily address the RBR and SBR. I will discuss the pros and cons of each replication format type.
Statement based Replication
Pros:
As in case of statement based replication the events are logged as SQL statements and not in form of row changes, hence the log files are much much smaller and utilize less storage space.
The log file contains all the SQL statements and hence can be utilized for analysis, audit or restoring from backup.
In terms of bandwidth, statement based replication is much more efficient because its the queries that are transferred to slave and not the actual row updates.
Cons:
Statements that possess non-deterministic properties are difficult to replicate using this format.
Statements that use stored procedures or user defined functions (UDF) are considered to be nondeterministic statements and hence are difficult to replicate using SBR.
DELETE and UPDATE statements that use LIMIT without using ORDER BY are considered to be nondeterministic and hence cannot be replicated using SBR.
UPDATE statements with a WHERE clause that doesn't utilize the index require more locks in SBR then in RBR.
INSERT statements that use auto-increment, block other non-conflicting INSERT statements. This is applicable to InnoDB engine. Its a very important point to consider as this point effects concurrancy.
More locks are required on the slave for INSERT/UPDATE/DELETE
Complex statements that are evaluated and executed on the master, need to go through the same process on the slave before applying changes to the slave.
Table definations should be identical on the master and slave.
Row based Replication
Pros:
Its the safest form of replication though not as old as statement based replication.
Fewer row locks are required for INSERT statements with auto-increment.
Fewer locks are required for INSERT/UPDATE/DELETE on the slave when compared to statement based replication.
Fewer locks are required when UPDATE and DELETE statement's WHERE clause does not make use of the index.
Cons:
Contrary to statement based replication, row based replication records changes to each row. The most probable disadvantage from this behavior is the size of the binlog. If a SQL statement effects 100 rows then in SBR just one query is logged whereas in RBR changes to 100 rows are recorded. This behaviour makes binlogs in RBR much much larger then SBR.
As in case of SBR, the SQL queries are logged and hence they can be reused and read for audit. In RBR, there is no way to figure out what statements were executed on the master and recieved on the slave. However what rows has been changed or inserted, can be decoded.
There is no mechanism available in RBR to ensure that the binlog in MySQL master was processed without any problem at the slave.
After comparison of both the formats, I personally believe that Row based Replication in MySQL is the way to go because:
Its the safest form of replication
It is safer when it comes to replicating triggers and stored procedures.
It requires fewer locks and hence its much faster and achieves high concurrancy.
Regarding the drawback of having large binlog, I would say that not all applications may issue updates that effect thousands of rows. Comparing to the other advantages that row based replication provides and an informed guess I would say that this disadvantage may be ignored.
The choice of which format to choose depends on the application and requirements. I have not touched mixed format. My guess is that mixed format may bring in the best of both world. I will be discussing mixed format replication in detail, in my upcoming post.