Consistency comes into the picture with reliability and write speed requirements. What's the probability of eventually loosing the data you can live with? How fast should a write request return? If the answers are "0" and 10 ms you got into eventual consistency world already. Basic idea of eventual consistency is that when you write a data item, you write synchronously on X (minimum 1) machine only. The write returns with success if that operation totally succeeds and a failure otherwise. An asynchronous process copies the data to other Y storage machines (preferably located in another place). X/Y ratio's dependent on the degree of paranoia while the absolute values of X and Y (especially Y) depends upon you read requirements (e.g. such use-cases as edge caching; write in one region, read anywhere etc.).
Adding more X nodes slows writes down and makes them less reliable, which is why X is usually set at 2. Turns out people really do not understand why the data written by their own application does not show up on the next read request. Even worse problem is writing the second write on top of an obsolete version of data. There are various ways of dealing with this problem. One of them is to expose data version information to the application and let it make some sort of intelligent decision but that does not help the application developers all that much and the paradigm confuses them.
The only solution that seems to work is to provide consistent reads for the process that wrote the data and let everyone else read whatever version is available (they do not know what the current version is one way or another). This is known as read after write consistency. Most modern services achieve this goal by having some session stickiness in request routing level. Subsequent requests from a writer hit one of the X nodes used in the write request.
Adding more X nodes slows writes down and makes them less reliable, which is why X is usually set at 2. Turns out people really do not understand why the data written by their own application does not show up on the next read request. Even worse problem is writing the second write on top of an obsolete version of data. There are various ways of dealing with this problem. One of them is to expose data version information to the application and let it make some sort of intelligent decision but that does not help the application developers all that much and the paradigm confuses them.
The only solution that seems to work is to provide consistent reads for the process that wrote the data and let everyone else read whatever version is available (they do not know what the current version is one way or another). This is known as read after write consistency. Most modern services achieve this goal by having some session stickiness in request routing level. Subsequent requests from a writer hit one of the X nodes used in the write request.

No comments:
Post a Comment