Contents
A database transaction guarantees an all-or-nothing outcome, meaning that either all operations within a transaction succeed, or they are rolled back in the event of a failure. While transactions are a powerful mechanism that help us avoid many issues, they do not inherently solve concurrency problems. In modern multi-user systems, when multiple user requests are processed concurrently within transactions, it can lead to inconsistent and undesired results—this is where transaction isolation becomes crucial.
Isolation is a key property in Database Management Systems (DBMS) that dictates how and when changes made by one transaction become visible to others. To manage these interactions, databases offer various isolation levels, each designed to control the visibility of data and prevent undesirable read anomalies that can arise when transactions overlap.
Read Anomalies
Read Anomaly is a situation where a transaction reads data in an inconsistent or unintended state due to concurrent transactions writing the same data. There are 4 different Read Anomalies which occur in concurrent running transactions.
Notations used in the below diagrams.
T{n}:
Database Transaction.t{n}:
Time Events.v{n}:
A value in database.r{n}:
A record in a database. e.g. r1 is a row or record in a table.R{n}(V{n}):
Read by transaction T{n}. e.g. R1(V1) transaction T1 reads value V1.W{n}(V{n}):
Write by transaction T{n} e.g. W2(V1) transaction T2 reads value V1.C{n}:
Commit by transaction T{n}. e.g. C1 is a commit made by transaction T1.A{n}:
Abort by transaction T{n}. e.g. A1 is an abort made by transaction T1 i.e. all the operation performed by T1 is rolled back.R{n}(r | c):
Read all records (r) that matches the condition (c) performed by transaction T{n}.W{n}(r | c):
Write a record (r) that matches the condition (c) performed by transaction T{n}.
Dirty Read
A dirty read is when a transaction reads uncommitted changes made by another transaction.
At time t3, transaction T2 reads value v1 = v1'
which was not committed by T1 and is later rolled back/aborted at t4. As a result of this dirty read, T2 processes data that no longer exists at time t4, potentially leading to undesired outcomes for T2.
Problems
- Dirty Read can lead to situation where a transaction processes on value that is no longer valid or exists in the Database state.
Non-repeatable Read
A non-repeatable read occurs when a transaction reads the same value twice but receives different results because another transaction has modified and committed changes between the two reads.
In transaction T1, the value of v1 read at time t1 differs from the value read at time t4 because another transaction, T2, committed a change to v1 between the two reads. It’s worth noting that T1 reads the committed data at t4.
Problems
Non-repeatable reads can lead to issues with constraint validation. For example, consider the following constraint that must be maintained in a bank account system:
Account Balance = x = (Sum of Credit transactions value - Sum of Debit transactions value)
In the above example, the constraint still holds true for transaction T2, but for T1, the constraint is broken between the reads at t1 and t5 due to the actions of another transaction, T2.
Skewed Read
A skewed read is a variation of the non-repeatable read anomaly that involves multiple related values. In the previous example, the account balance and transactions are two linked entities. We read two different values at times t1 and t5. Imagine we didn’t check the correctness of the constraint and instead proceeded to generate an account statement based on the values read at times t1 and t5. This would result in an incorrect account statement that violates the constraint account balance = (sum of credits) - (sum of debits)
.
Phantom Read
A phantom read occurs when a transaction retrieves a different set of records for the same query condition (c) in two separate reads.
The read query R1 for a given condition c
returns an inconsistent set of records at time t2 and t4 within the same transaction T1.
Non-repeatable Read vs Phantom Read
- Phantom Read is an anomaly based on records i.e. different set of rows either a new or missing row(s) are returned for the same read query with condition
c
. - Non-repeatable read is anomaly based on value i.e. different value is returned for the same read query.
Serialization Anomaly
Serialization anomaly occurs when concurrent transactions produce a final database state that is inconsistent with any serial order of those transactions.
Problems
- Serialization anomaly produces incorrect results. As seen in the above diagram, the result produced by concurrent execution of transactions T1 and T2 is not equal to the serial execution of T1 and T2.
- Also, in the above illustration, we can see that the write
v''
made by T2 is completely lost. This is called a lost update anomaly.
Transaction Isolation
We have 4 different isolation levels in SQL and by choosing the right isolation level, we can prevent the above discussed anomalies.
- Read uncommitted
- Read committed
- Repeatable read
- Serializable
What does each transaction isolation prevent?
Isolation Level | Dirty read | Non-repeatable Read | Phantom Read | Serilizable Anomaly |
---|---|---|---|---|
Read uncommitted | ✅ | ✅ | ✅ | ✅ |
Read committed | ❌ | ✅ | ✅ | ✅ |
Repeatable read | ❌ | ❌ | ✅ | ✅ |
Serializable | ❌ | ❌ | ❌ | ❌ |
Isolation we get in Postgres
Isolation Level | Dirty read | Non-repeatable Read | Phantom Read | Serilizable Anomaly |
---|---|---|---|---|
Read uncommitted | ❌ (Lowest isolation level in PG is Read Committed) | ✅ | ✅ | ✅ |
Read committed | ❌ | ✅ | ✅ | ✅ |
Repeatable read | ❌ | ❌ | ❌ (Repeatable read isolation also prevents Phantom Read) | ✅ |
Serializable | ❌ | ❌ | ❌ | ❌ |
Performance: As we move to higher isolation levels, there is a tradeoff between consistency and performance. Higher isolation levels typically offer stronger consistency guarantees but may result in decreased performance.