In an EDI application we use, we had times of very high buffer busy waits
and enq: TX – contention. This was seen while processing edi messages in the system. Database is 11.2.0.3.4
The top 5 wait events shows this when the problem is seen.
Top 5 Timed Foreground Events
Event | Waits | Time(s) | Avg wait (ms) | % DB time | Wait Class |
---|---|---|---|---|---|
buffer busy waits | 654,952 | 49,655 | 76 | 65.87 | Concurrency |
enq: TX – contention | 1,273,449 | 23,157 | 18 | 30.72 | Other |
DB CPU | 1,228 | 1.63 | |||
db file sequential read | 367,266 | 604 | 2 | 0.80 | User I/O |
latch: enqueue hash chains | 109,790 | 256 | 2 | 0.34 | Other |
At this time the application didn’t process many messages at this time and it quickly became a sever problem for the comapny.
Our further analysis showed that buffer busy waits was related to one single block.
Looking in v$session for all sessions having buffer busy waits as event the p1,p2 and p3 values shows file,block and reasoncode.
Reasoncode for us was 8 which means 1st level bmp
This is the ASSM bitmap block, see the concept guide for details about ASSM.
This was also confirmed looking into the awr report
Buffer Wait Statistics
Class | Waits | Total Wait Time (s) | Avg Time (ms) |
---|---|---|---|
1st level bmb | 720,881 | 49,593 | 69 |
free list | 14,224 | 34 | 2 |
undo header | 40,229 | 10 | 0 |
data block | 328 | 0 | 1 |
3rd level bmb | 38 | 0 | 9 |
undo block | 6 | 0 | 0 |
2nd level bmb | 2 | 0 | 0 |
Looking at the sql that was running we can see the following
SQL ordered by Elapsed Time
Elapsed Time (s) | Executions | Elapsed Time per Exec (s) | %Total | %CPU | %IO | SQL Id | SQL Module | SQL Text |
---|---|---|---|---|---|---|---|---|
27,091.58 | 254 | 106.66 | 35.94 | 1.05 | 0.78 | 0u0vxfm0k4yba | SeeBIS6 AS 19686 | INSERT INTO TBPINSTANCES (cID,… |
8,593.48 | 48 | 179.03 | 11.40 | 1.07 | 0.71 | 4xfhztsqxh1nq | SeeBIS6 AS 19686 | UPDATE TBPINSTANCES SET cProce… |
7,013.41 | 29 | 241.84 | 9.30 | 1.01 | 0.84 | 20a291w9y9ruy | SeeBIS6 AS 19686 | UPDATE TBPINSTANCES SET cProce… |
6,238.21 | 13 | 479.86 | 8.28 | 1.01 | 0.83 | ght7b34v6f7g4 | SeeBIS6 AS 19686 | UPDATE TBPINSTANCES SET cProce… |
5,731.39 | 38 | 150.83 | 7.60 | 1.04 | 0.73 | 3bfqvwrqqpgmr | SeeBIS6 AS 19686 | UPDATE TBPINSTANCES SET cProce… |
5,187.19 | 50 | 103.74 | 6.88 | 1.12 | 0.68 | 7u1b7rtp4stn3 | SeeBIS6 AS 19686 | UPDATE TBPINSTANCES SET cState… |
5,059.49 | 14 | 361.39 | 6.71 | 1.03 | 0.80 | 8a1urkdd3y048 | SeeBIS6 AS 19686 | UPDATE TBPINSTANCES SET cState… |
4,855.83 | 28 | 173.42 | 6.44 | 1.08 | 0.65 | 899f7bqcg1rg7 | SeeBIS6 AS 19686 | UPDATE TBPINSTANCES SET cProce… |
1,342.83 | 31 | 43.32 | 1.78 | 0.99 | 0.79 | 88w0d433ftf1v | SeeBIS6 AS 27951 | UPDATE TBPINSTANCES SET cState… |
1,088.53 | 9 | 120.95 | 1.44 | 1.07 | 0.82 | 4554dhmrq0kht | SeeBIS6 AS 19686 | UPDATE TBPINSTANCES SET cProce… |
Looking at the sql we can see that all top sql performs insert or updates into one specific table, tbpinstances. Analyzing further shows that top segment for
buffer busy waits is a lobsegment.
Segments by Buffer Busy Waits
Owner | Tablespace Name | Object Name | Subobject Name | Obj. Type | Buffer Busy Waits | % of Capture |
---|---|---|---|---|---|---|
GBSASP | GBSLOB | SYS_LOB0000017961C00006$$ | LOB | 728,771 | 99.96 | |
GBSASP | GBSDTA | TJMSMESSAGES | TABLE | 114 | 0.02 | |
GBSASP | GBSIDX | IDX_JMSMESSAGES_TXOP_TXID | INDEX | 65 | 0.01 | |
GBSASP | GBSIDX | PK_TJMSTRANSACTIONS | INDEX | 45 | 0.01 | |
GBSASP | GBSIDX | PK_TJMSMESSAGES | INDEX | 25 | 0.00 |
We learned that the application is using blobs for edi messages data, when processing it inserts the row and then updates it during the various stages of the process.
This could accours between 5-25 times depending what type of mesage we are processing.
We logged a P1 servcie request with Oracle Support and got help analyzing the problem. We first thought this was a bug but development
didn’t agree on that.
The solution:
We was recommended to change the parameter “_securefiles_concurrency_estimate”=50 (default set to 12)
IMPORTANT: You must rebuild the table for this change to be effective.
In our case the table has grown to 260Gb. Our solution was to create an empty table (after changing the parameter)
and inserting the non processed edi messages into the table, when that was done and the application started and
confirmed to working as expected we staretd to add odler data that was mainly used by customers to verify that there messages
has been processed.
The parameter is hidden and undocumented so I can not say anything what changes when we set it. But I assume it has to do with
block structure.
2013-09-18. Problem was experienced again, rebuilding with the parameter changed did not help.
We are currently working with Oracle to try to find the root cause. We did rebuild the table again
and used retention = none (Oracles recommendation) for the securefiles and the problem was gone.
This disables read consistency for securefiles.
I doubt that this is the solution, I will update when we have more information from Oracle.
2014-02-05
We have Bug 17479510 : SECUREFILES DMLS CAUSE HIGH ‘BUFFER BUSY WAITS’ & ‘ENQ: TX – CONTENTION’ registered with Oracle Development.
I believe they have difficulties to analyse the issue, I don’t expect this to be solved for some time.
meanwhile we schedule frequent rebuilds of the table.
The rebuild fixes the problem, but it is introduced again. We did our last rebuild 2 month ago and we start to see the problem again, so we scheduled another rebuild this weekend.
2014-04-03
Latest update in this case:
As it seems now this is not a new bug, when we took over the db we noticed that we had several Wnnn processes on top in the machine using 100% cpu.
This was confirmed by Oracle Support to be Wnnn processes consuming high CPU (Doc ID 1492880.1)
The recommended solution in the note is to SQL> ALTER SYSTEM SET “_ENABLE_SPACE_PREALLOCATION”=0;
If you want more info about space pre allocation see:
SMCO (Space Management Coordinator) For Autoextend On Datafiles And How To Disable/Enable (Doc ID 743773.1)
So the recommendation was to patch to 11.2.0.4 and set _ENABLE_SPACE_PREALLOCATION”=3 (default value) and
rebuild the table, we tried without rebuildingt he table and the table was still there. Rebuilding the table fixed the issue
but it is also our current workaround. After we have rebuilt the table it takes anything from 2 to 6 months until we see the
issue again. So this is means that it can take several months for us to verify the fix.
Development say that we can confirm this by:
“You can verify that the suggested solution is working in the following way:
In AWR report, you can search for “dispenser allocations” and “CFS allocations”.
That should give the number of chunks allocated from dispenser vs CFS.
CFS allocations should be very minimal. CFS allocations will happen only when dispenser is not able to meet requests.”
This is our values:
AWR from a test system where we reproduced the issue:
Statistic Total per Second per Trans
——————————– —————— ————– ————-
segment cfs allocations 153 0.1 0.0
segment dispenser allocations 3,662 2.0 0.1
After the patch and rebuild:
Statistic Total per Second per Trans
——————————– —————— ————– ————-
segment cfs allocations 1 0.0 0.0
segment dispenser allocations 12,626 3.5 0.1
cfs allocation is lower but I expected bigger differences between good and bad.
But we will patch and set the parameter back to its default value in production
and hopefully this issue is solved.
Dear sir, So the solution for this situation is using retention = none to rebuild the table with lob segment? however we must rebuid the table periodical?
Thank you very much for your great advice
Pingback: Part 3: Oracle SecureFile LOB is better than BasicFile, isn’t it? – Jakub Wartak's blog
Looks like we have something similar on our production. Oracle 12c (12.1.0.2.0).
Top events are ‘DB CPU’ and ‘enq: TX – contention’.
However it is reproduced only once currently.
Hi Nik, the occurence for us was very inconsistant. After each rebuild it could take anything from 2 to 4 months to occour again.
just out of curiosity, we had similar issue. The issue was on our BPM platform. Can you please confirm if this was a BPM platform that gave this kind of issue? very similar pattern, insert with emptylob and then updates in quick succession. We have recreated the table and keeping our fingers crossed.
Hi Dina, In our case it was related to space management.
Securefiles relies on the SMCO process to preallocate space to lobsegments. We ran into a bug with SMCO which created 10+ Wnnn processes working with allocating space. These where always on top and took a lot of the cpu power from the server.
Since we wasn’t allowed to patch at the time Oracle recommendation was to disable SMCO. The problem was very inconsistent, it took several weeks before we ran into the above error. And after rebuilding the table we could run for months before it was seen again. It wasn’t until the “lob” development team worked with us that we realized that SMCO was disabled and it could be related. After patching and enabling SMCO again we haven’t had the issue.
Thanks Magnus. Your site has been quite informative in us solving (or mitigating should I say) our issue. We have rebuilt the table and waiting to see if this holds good. In a parallel effort we are testing pounding the new table in a standby env (opened it in snapshot stby mode) to mimic 3 months worth of DML to see if the issue returns.
Thanks for the feedback Dina !
Hey Dina – just curious: which BPM product were you using? It didn’t happen to be IBM BPM did it?
Hi Magnus,
We just ran into this issue and would like to know if the last fix did indeed correct your issue. We are currently being told by rebuild the securefiles with the _securefiles_concurrency_estimate=50 and change disable-storage-in-row to enable-storage-in-row.
Thanks in advance
Hi Virginia
Yes it seemes to have fixed the issue. We haven’t experienced
The issue since we enabled space preallocation again.
Securefiles seemes to rely heavily on space preallocation.
I hope it works for you !
Br Magnus
Hi, you might be also interested in this http://jakub.wartak.pl/blog/?p=798