During a batch job we noticed we had several sessions suffering from buffer busy waits.
Many sessions was running the same insert, one was related to auditing user logon.
We could see the pattern several days back.
This is how it looked in grid control
The gray rectangle is what is shown below, we have many sessions waiting on concurrency. Looking att he statements they references different tables. So the inserts/updates are towards diffrent tables. But we have many sessions running them.
if we check the top sql 55n805vncuccw, we get the following picture
So we have multiple (30) sessions suffering from buffer busy waits, the top one is waiting on log file sync (checkpoint incomplete). What is that ?
log file sync (checkpoint incomplete)
I found some information in the following note
Checkpoint Tuning and Troubleshooting Guide (Doc ID 147468.1)
Sometimes, you can see in your alert.log file, the following corresponding messages: Thread 1 advanced to log sequence 248 Current log# 2 seq# 248 mem# 0: /prod1/oradata/logs/redologs02.log Thread 1 cannot allocate new log, sequence 249 Checkpoint not complete This message indicates that Oracle wants to reuse a redo log file, but the current checkpoint position is still in that log. In this case,Oracle must wait until the checkpoint position passes that log. Because the incremental checkpoint target never lags the current log tail by more than 90% of the smallest log file size, this situation may be encountered if DBWR writes too slowly, or if a logswitch happens before the log is completely full, or if log filesizes are too small. When the database waits on checkpoints,redo generation is stopped until the log switch is done.
We do have the message in our alertlog
Thu Nov 05 17:59:51 2015 Thread 1 cannot allocate new log, sequence 89115 Checkpoint not complete Current log# 3 seq# 89114 mem# 0: /jdedbp/orluRedo/dzjdep/redo_03.log
Information above says redo generations is stopped when we wait on the checkpoint to be completed. A checkpoint is triggered for every log switch. this means all user sessions are “frozen” untill the checkpoint/log switch has completed.
Can this have an impact on my sessions waiting for buffer busy waits ?
I think so, the session issued a commit needs to write data to the redolog. Writing to the redo has stopped since we wait for the checkpoint/log switch to complete. This means our transaction is not complete. Other sessions also inserting into the same table has to wait for that transaction to finish. The table is a audit log table with data for a long time, no deletes has been performed for years (which you can have your thoughts about). The lack of deletes would trigger all inserts to store data in blocks above highwater mark, like if you added append to the insert. I think that is the reason for the major buffer busy block contention.
This is normally a very fast process, but with the introduction of checkpoint not complete it will cause issues.
From the text above we can see that we have several scenarios that can cause this.
- DBWR writes too slowly
- a logswitch happens before the log is completely full
- log file sizes are too small
DBWR writes too slowl:
At the time we have 8 DBWR processes running, you can see that on the first picture.
I started to read about multiple DBWR and came a cross Kevin Clossons blog about multiple DBWR. (link at the bottom) in part 2 I found an intersting comment.
Scheduling anomalies. If you have more than 1 DBWR process, you run the increased risk of having DBWR processes that can’t get on CPU. Consider a checkpoint. All LRUs will have modified buffers that need flushed during a checkpoint. Any given LRU can only be cleaned by the boot-time assigned DBWR process. That means that a checkpoint requires action from all DBWR processes. It’s easier for 1 process to get scheduled on a CPU on, say, an 8 CPU system than 8 processes.
Interesting since this looks like our scenario, all DBWR needs cpu time to flush which delays the checkpoint. I checked the ckpt tracefile for the timestamp above, I found this.
*** 2015-11-05 18:17:51.369 1: 7090ms (rw) file: kcrf.c line: 10001 count: 140733193388195 total: 142120ms time: 2304835 2: 6110ms (rw) file: ktu.c line: 25705 count: 140733193388104 total: 60300ms time: 2301773 3: 5780ms (rw) file: kcv.c line: 11894 count: 140733193390241 total: 2089910ms time: 2301977 4: 3810ms (rw) file: kcrf.c line: 1390 count: 140733193388035 total: 10610ms time: 2499898 5: 3290ms (rw) file: krbm.c line: 8845 count: 140733193388034 total: 3860ms time: 1549139 6: 2890ms (rw) file: kct.c line: 7799 count: 140733193388681 total: 423650ms time: 1965758 7: 1950ms (rw) file: krbb.c line: 14189 count: 140733193388182 total: 110230ms time: 2423785 8: 1920ms (rw) file: kct.c line: 2522 count: 140733193388033 total: 1920ms time: 1368916 9: 1410ms (ro) file: kcf.c line: 4758 count: 140733193388033 total: 1410ms time: 1368914 10: 1290ms (rw) file: krse.c line: 1706 count: 140733193388045 total: 9910ms time: 2439472 Control file enqueue hold time tracking dump at time: 2512454
Not sure how to read this, but I have several blocks like the ones above with ~30 minutes difference. compared it to another db with 1 dbwr and it had 3 rows per block, I have to ask around for details how to read this. I can see some really high timestamps and for now I just take that as a confirmation that checkpoints takes a long time.
A logswitch happens before the log is completely full:
I’m a bit puzzled about this, because I don’t know why a log switch should occour before it is full, the only reason is if you issue the alter system switch logfile command.
I looked at the average size of the archivelogs for the day we had our issue, and it was:
SELECT Trunc(first_time) DAY, Avg(blocks * block_size) "Bytes" FROM v$archived_log WHERE Trunc(first_time) = '05-NOV-15' GROUP BY Trunc(first_time); DAY Bytes --------- ---------------- 05-NOV-15 846,203,211
The average size is 846 Mb, the size of the logfiles was 1GB at the time, I have now increased them to 1.6 Gb. Searching for an answer I came acroos this from Note
Archived redolog is (significant) smaller than the redologfile. (Doc ID 1356604.1)
The archive logs do not have to be even in size. This was decided avery long time ago, when blank padding the archive logs was stopped, for a very good reason - in order to save disk space. The log switch does not occur when a redo log file is 100% full. There is an internal algorithm that determines the log switch moment. This also has a very good reason - doing the log switch at the last moment could incur performance problems (for various reasons, out of the scope of this note). As a result, after the log switch occurs, the archivers are copyingonly the actual information from the redo log files. Since the redologs are not 100% full after the log switch and the archive logs are not blank padded after the copy operation has finished, this results in uneven, smaller files than the original redo log files. There are a number of factors which combine to determine the log switch frequency. These are the most relevant factors in this case:
So its seems it is related to internal algoritms.
Log file sizes are too small:
Increasing the size should delay the log switch (with the same load). I have done that. I raised the size from 1Gb to 1.6 Gb.
Last nights batch was not as heavy as the one the day before. So I didn’t see any impact
of my increased log file. I think the behavior is load related. From the beginning I was pretty sure it was related to the log file size, but after reading Kevin Clossons blog about multiple DBWR processes (we have 8) I believe I need to discuss and evaluate the settings with my customer.