I received a case where a developer had a session that hung waiting on enq: TX – row lock contention. It was several things that puzzled me, he claims he is the only one running on this database and the row lock was related to an insert.
insert into PRODUCT (ID, VIN, CHASSISSERIES, CHASSISNO, MODIFICATIONTIME, CREATIONTIME) values (:1 , :2 , :3 , :4 , :5 , :6 )
I found this a bit odd at first. I relate enq: TX row lock contention with updates and deletes, not inserts. I want to analyze this further.
This is how the issue looked in Grid Control
We can see one session waiting on Application, we have no other sessions visible doing work. If we select and drill down on that sql_id 6padky01y6s2k, we get he following picture.
We can see that it is an insert and that it is waiting on enq: TX – row lock contention.
And still one session running the sql.
Luckily we have good tools in Oracle to dig deeper, we can have a look at the blocking sessions screen in grid control, it is located under the performance tab.
We see our session holding the lock, it is session id 105. Looking at that session, what is it waiting for ? It is Idle, a session that actively holds a row lock is not doing anything ? Idle time for a session is not uncommon, but how can it causes a locking issue ?
Lets dig deeper. Lets look at that session by selecting the sid.
What can we see if we look at session 105. It is indeed idle, waiting for sql*net message from client. It has been logged on for 1 day 20 hours 15 minutes and 30 seconds. Now have a look at the row WAIT:wait duration, 1 day 20 hours 14 minutes and 14 seconds. This session has been idle for more than 99% of the time. I asked the developer what happened the day before ? He replied that they had some issues with Java and he killed the java app manually. This is most likely the reason for us having that idle session. I plan to explain how this and similar scenarios can happen in another blog later on.
Still we have to answer the question how an insert can cause a row lock contention ?
If we look in the Database Performance Tuning Guide for the description of the wait event:
10.3.6.2.4 TX enqueue
These are acquired exclusive when a transaction initiates its first change and held until the transaction does a
- Waits for TX in mode 6: occurs when a session is waiting for a row level lock that is held by another session. This occurs when one user is updating or deleting a row, which another session wants to update or delete. This type of TX enqueue wait corresponds to the wait event
contention.The solution is to have the first session holding the lock perform a
- Waits for TX in mode 4 can occur if the session is waiting for an ITL (interested transaction list) slot in a block. This happens when the session wants to lock a row in the block but one or more other sessions have rows locked in the same block, and there is no free ITL slot in the block. Usually, Oracle Database dynamically adds another ITL slot. This may not be possible if there is insufficient free space in the block to add an ITL. If so, the session waits for a slot with a TX enqueue in mode 4. This type of TX enqueue wait corresponds to the wait event
entry.The solution is to increase the number of ITLs available, either by changing the
MAXTRANSfor the table (either by using an
ALTERstatement, or by re-creating the table with the higher values).
- Waits for TX in mode 4 can also occur if a session is waiting due to potential duplicates in
UNIQUEindex. If two sessions try to insert the same key value the second session has to wait to see if an
ORA-0001should be raised or not. This type of TX enqueue wait corresponds to the wait event
contention.The solution is to have the first session holding the lock perform a
- Waits for TX in mode 4 is also possible if the session is waiting due to shared bitmap index fragment. Bitmap indexes index key values and a range of rowids. Each entry in a bitmap index can cover many rows in the actual table. If two sessions want to update rows covered by the same bitmap index fragment, then the second session waits for the first transaction to either
ROLLBACKby waiting for the TX lock in mode 4. This type of TX enqueue wait corresponds to the wait event
- Waits for TX in Mode 4 can also occur waiting for a
- Waits for TX in mode 4 also occur when a transaction inserting a row in an index has to wait for the end of an index block split being done by another transaction. This type of TX enqueue wait corresponds to the wait event
The most common scenario regarding enq:TX row lock contention is related to what the firts bullet says, a session is updating or deleting a set of rows and holds the lock for those rows, another session is then trying to do another dml (update or delete) against one or more of the same set of rows. But our locking scenario is not related to updates or deletes.
Continuing reading the bullet mentioning the insert and relations to the unique key is more interesting. If we look at the table product.
We have one Primary key constraint and two unique constraints, all supported by indexes.
Our blocked session waits on another insert that is not yet committed or rolled back. Since the session that block our session has been idle for more than 1 day we can assume it will not commit or rollback that transaction. I want to verify my findings, looking at session 105 as we did previously, under the label Application we have Open Cursors, if we select that we get to a screen that shows all open cursors for session 105.
Here we can see the same insert sql as our locked session is running.
This is what I think happened, yesterday the developer ran the application and did the same activity he did today, the key issue here is that he insert the exact same data today as he did yesterday. If he used other values for the primary/unique key today, This issue would not occur. Having issues with Java, he killed the application. The session stays connected and it has ongoing transactions that hasn’t been committed or rolled back..
How to resolve
We killed session 105, note that killing the session and rolling back the transaction can introduce logical corruption in your application. You might need to spend more time on the clean up activity making sure your data is correct.
Our customer use WebSphere as application server and I learned yesterday that you can set a parameter in websphere called agedtimeout. You set this to some seconds, like 600.
If that session has been idle for 600 seconds the connection is terminated (active transactions will be rolled back) and the connection is returned to the connection pool manager. You need to be careful setting the value to low, it is normal that the session has idle time. You do not want to terminate sessions that are active. Using this feature makes it more difficult to handle logical corruption explained above, so think carefully before implementing it. I don’t know the name of the parameter for other app servers but I am sure it can easily be found.
I am also looking into Dead Connection Detection (DCD), but I need to evaluate that more before I can say for sure that it is possible to use. I would prefer to use the feature in the app server if available. If you experiencing frequent scenarios like the one I explain here you probably need to spend more time analyzing your application, issues like these are most often coming from the application. More on this topic will come later.
Ash reports are often used analyzing performance issues, and it is a very useful tool. I want to show you how my ash report looks for the issue above.
As expected our top wait event is enq:TX- row lock contention. It stands for 97.4% of the sampled time.
We have 1 insert statement that stands for 97.40% of the total activity, note how that matches the percentage of the top wait event.
Session 173 is our top session, it waits 97.40% on the row lock wait. We can see which user it is LDE_DEV4 which is the same username reported by the developer.
And at the top blocking session we have the session causing it all, session 105. Note the label Event Caused, it says that this session 105 is the session causing the enq: TX row lock contention session 173 suffers from. Grid control gives you more information but the ash report is all you need to fix the issue.
Database Performance Tuning Guide 11.2.0
Dead Connection Detection
Oracle Net 12c: New Implementation of Dead Connection Detection