In my previous post 1 and 2 I analyzed high wait time for Log File Sync. This is my last part and we will see what happens when we have added really fast fusion IO disk cards to our new system. This is also a release 12C database.
Let´s follow the path I did in my first blog about Log File Sync.
We have now installed 12c using standard edition, this since licenses are included in the JDE application. So this analyze is based om Statspack .
Same report time as before, but the load is not identical.
If you read post 1 perhaps you remember that we had 4 sections to check
– LGWR I/O performance
– LGWR IO Peak
– Application commits
– CPU capacity
LGWR I/O performance
The first things we want to analyze is if writing to the log files are slow, we can compare avg wait time for log file sync with log file parallel write.
As you see now avg wait time is 0 ms for both wait events. This is due to the fusion IO disks. So Lgwr I/O performance is not an issue.
LGWR I/O Peek
With an avg wait rime of 0 ms, i do not expect to have a peek > 500ms.
And as I expected no info about peek I/O.
Another interesting information in the lgwr tracefile is
*** 2015-10-25 02:42:27.011 Adaptive scalable LGWR disabling workers kcrfw_slave_adaptive_updatemode: single->scalable redorate=5500711 switch=2499756 *** 2015-10-25 04:47:18.996 Adaptive scalable LGWR enabling workers kcrfw_slave_adaptive_updatemode: scalable->single group0=43068858 all=43068861 delay=26 rw=52
This is related to 12c, we can now have multiple lgwr. You can read Craig Shalahammers blog about it.
When it comes to adaptive log file sync, which menas we switch between POST(WUAT and POLL. It still works the same way in 12c. But I cannot see any evidences that we switched method .
Size of redologs, we have the following info
We have 8 switches per hour, which is higher than the Oraclle recommended 4.
And over time ….
we have a peek between 19 and 22.
Do we suffer from CPU shortage ?
The formula for calculate utilization is U=R/C where R is Requirement (Busy Time ) and C is Capacity (Busy Time + Idle Time). (Note that the timing in Operating System statistics is centi seconds) Our formula will be 168.765/(168.765+5.592.223)=0,0029 ~ 0,2%. So we have plenty of CPU resources available.
If the application commits to often it can cause high waits on log file sync since each commit flushes redo data from the redo buffer to the redo logs. Oracle has a recommendation that user calls per (commits+rollbacks) should not be lower than 30, if it is the application is committing to frequent. In our case we have:
The formula are: user calls/(user commits+user rollbacks) which gives us
11.067.197/(2.041.053+92)=11.067.197/2.041.145= 5.4 avg user calls per commit. This is way below Oracles recommendations.
What does this tells us ? I see it as the new disks made a tremendous improvement from our earlier environment. We reduces Log File sync from 59% to 14%. This is a good indication since our load is not identical. But the fact that the avg wait time for both log file sync and log file parallel write is 0 ms, tells us it had an good impact.
But we still have 14% waits on logfile sync, we know it is not related to the LGWR writing. I think this is related to the commit frequency, even if writing is very fast we have sessions queueing up waiting for other sessions to finish the commit. The only solution to this is to rewrite the application to perform less commits.