Data Replication
- Data Replication — approach of storing the same data on multiple storage devices
-
Benefits
- +Availability
- +Reliability
- +Read/Write Throughput
- -Latency (in case of geo-distributed replication)
-
Drawbacks
- +Complexity
- +Cost
- Database Coupling
-
Types
-
By user's request processing
- Active: every replica process client's request
- Passive: only one replica process client's request and updates others
-
By time coupling
- Synchronous: client waits until all replicas to be updated
- Asynchronous: client waits until subset of replicas to be updated
-
Replica
- Replica — node in data replication approach
- Type
- Active — process all requests from clients
- Passive — process only some of client requests or do not process at all and receive results from active replica
Consistency
- Data Consistency — property of a system to keep data the same at different places
- Types
- Eventual Consistency — consistency is reached eventually
- Strong Consistency — consistency is always met from the client's point of view
Consistency Tricks
in case of asynchronous replication
Reading Your Own Writes
- If user is able to modify only small subset of resources then allow user read those resources from leader and rest of them from replicas (e.g. profile in social media)
- If application can suffer a replica lag then measure, for example, 99% percentile of resource replication lag and use it to switch read opeartions to replicas (e.g. 99% percentile of resource replication lag is 200ms then within 200ms after write operation read from leader and since 200ms have passed read from replicas)
- Keep track of most recent update timestamp on client side and send it as a part of read request. Then ensure that replica from which we are reading store data fresh enough according to provided timestamp
Monotonic Reads
- Keep track of most recent update timestamp on client side and send it as a part of read request. Then ensure that replica from which we are reading store data fresh anough according to provided timestamp
Approaches
Single Leader Replication
also called as primary-secondary backup, active/passive, leader-follower or master-slave replications
- Single Leader Replication — data replication with single leader (active replica)
-
Benefits
- +Read Throughput
- Drawbacks
- +Latency (synchronous mode)
- Reading stale data (asynchronous mode)
- Single point of failure
- Examples
- PostgreSQL
- MySQL
- Oracle
- MongoDB
- HBase
- Kafka
Workarounds
- Client stores timestamps for read-after-write consistency
- Sticky routing for monotonic read consistency (clients read from the same replica, if it dies, new replica is chosen for reading)
- Quorum and fencing
- If a network is partitioned into two subsets, the subset with the majority of nodes remains active while shooting the minority subset (this approach is literally called STONITH — Shoot The Other Node In The Head) by sending out a special signal to power supply controller
Multi Leader Replication
also called active/active or multi-master replications
- Multi Leader Replication — data replication with multiple leaders (active replicas)
-
Benefits
- +Distributed Write Load
- -Latency (in case of geo-distributed replication)
- Support offline client
- Drawbacks
- Require leaders coordination (synchronous mode)
- Require conflict resolution (asynchronous mode)
- Write order can be broken
- Examples
- WANdisco
- CouchDB
- Google Docs
Leaderless Replication
- Leaderless Replication — data replication with no leader (every replica is active)
-
Benefits
- +Failure Handling Simplicity
- +Availability
- Drawbacks
- -Consistency Maintenance Simplicity
- Discrepancies can appear
- Require conflict resolution
- Examples
- Dynamo
- Riak
- Cassandra
- Voldemort
- Configuration
- \(N\) — number of nodes, \(R\) — read quorum size, \(W\) — write quorum size
- \(W + R > N\) — reading only up-to-date records
- Setups
- \(N\) — odd is preferred (look at Fault-Tolerance section)
- \(W = R = (N + 1) / 2\) — balanced setup
- \(W = N\), \(R = 1\) — read optimized setup
- Fault-Tolerance
- \(W < N\), \(R < N\)
- \(N = 3\), \(W = 2\), \(R = 2\) — 1 node can fail
- \(N = 4\), \(W = 3\), \(R = 2\) — 1 node can fail
- \(N = 5\), \(W = 3\), \(R = 3\) — 2 nodes can fail
- Additional mechanisms
- Read Repair — updating stale data and conflict resolution on read operation
- Anti-Entropy — background process which compares states from replicas and resolves conflicts
- Hinted handoff — failed node will be temporarily replaced with another one
- Sloppy quorum — replacement nodes are counted multiple times to quorum size (operations can be served even if number of available replicas is less than quorum size)
- Strict quorum — replacement nodes are counted once to quorum size
Conflict Resolution
Activating
- On read operation
- On write operation
Policy
- Last Write Wins (LWW)
- Vector Clocks
- Vector Clock — data structure used for determining the partial ordering of events in distributed system and detecting causality violations
- Benefits
- +Causality
- Drawbacks
- -API Simplicity
- +Memory Usage
- Why Vector Clocks Are Hard
- Application Specific Procedure (including case of client's interaction)
- Conflict-Free Replication Data Types (CRDT) (when order doesn't matter)
Q&A
- What are advantages and disadvantages of synchronous replication compared to asynchronous?
- Advantages
- It's guaranteed that all replicas store up-to-date data
- no need for conflict resolution
- Disadvantages
- works slowly
- can't write if one replica disabled
- Advantages
- What is replication strategy for Google Docs?
- Several leaders
- one leader – big delay, especially in presence of network partition
- no leader – big delay, especially if there are some of replicas in other region
- Asynchronous replication
- synchronous replication - big delay, especially in presence of network partition
- Several leaders
- What are disadvantages of LWW?
- Clock drift causes wrong results of operations
- No conflict resolution logic
- What is a difference between vector clock and Lamport Clock?
- Lamport clock doesn't respect causality
- What is a usage example of vector clocks?
- Questions and replies in messenger's chats require causality