High Buffer Busy Waits – Securefiles

In an EDI application we use,  we had times of very high buffer busy waits
and enq:  TX – contention. This was seen while processing edi messages in the system. Database is 11.2.0.3.4

bbw

The top 5 wait events shows this when the problem is seen.

Top 5 Timed Foreground Events

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
buffer busy waits 654,952 49,655 76 65.87 Concurrency
enq: TX – contention 1,273,449 23,157 18 30.72 Other
DB CPU 1,228 1.63
db file sequential read 367,266 604 2 0.80 User I/O
latch: enqueue hash chains 109,790 256 2 0.34 Other

At this time the application didn’t process many messages at this time and it quickly became a sever problem for the comapny.

Our further analysis showed that buffer busy waits was related to one single block.
Looking in v$session for all sessions having buffer busy waits as event the p1,p2 and p3 values shows file,block and reasoncode.
Reasoncode for us was 8 which means 1st level bmp
This is the ASSM bitmap block, see the concept guide for details about ASSM.

This was also confirmed looking into the awr report

Buffer Wait Statistics

Class Waits Total Wait Time (s) Avg Time (ms)
1st level bmb 720,881 49,593 69
free list 14,224 34 2
undo header 40,229 10 0
data block 328 0 1
3rd level bmb 38 0 9
undo block 6 0 0
2nd level bmb 2 0 0

Looking at the sql that was running we can see the following

SQL ordered by Elapsed Time

Elapsed Time (s) Executions Elapsed Time per Exec (s) %Total %CPU %IO SQL Id SQL Module SQL Text
27,091.58 254 106.66 35.94 1.05 0.78 0u0vxfm0k4yba SeeBIS6 AS 19686 INSERT INTO TBPINSTANCES (cID,…
8,593.48 48 179.03 11.40 1.07 0.71 4xfhztsqxh1nq SeeBIS6 AS 19686 UPDATE TBPINSTANCES SET cProce…
7,013.41 29 241.84 9.30 1.01 0.84 20a291w9y9ruy SeeBIS6 AS 19686 UPDATE TBPINSTANCES SET cProce…
6,238.21 13 479.86 8.28 1.01 0.83 ght7b34v6f7g4 SeeBIS6 AS 19686 UPDATE TBPINSTANCES SET cProce…
5,731.39 38 150.83 7.60 1.04 0.73 3bfqvwrqqpgmr SeeBIS6 AS 19686 UPDATE TBPINSTANCES SET cProce…
5,187.19 50 103.74 6.88 1.12 0.68 7u1b7rtp4stn3 SeeBIS6 AS 19686 UPDATE TBPINSTANCES SET cState…
5,059.49 14 361.39 6.71 1.03 0.80 8a1urkdd3y048 SeeBIS6 AS 19686 UPDATE TBPINSTANCES SET cState…
4,855.83 28 173.42 6.44 1.08 0.65 899f7bqcg1rg7 SeeBIS6 AS 19686 UPDATE TBPINSTANCES SET cProce…
1,342.83 31 43.32 1.78 0.99 0.79 88w0d433ftf1v SeeBIS6 AS 27951 UPDATE TBPINSTANCES SET cState…
1,088.53 9 120.95 1.44 1.07 0.82 4554dhmrq0kht SeeBIS6 AS 19686 UPDATE TBPINSTANCES SET cProce…

Looking at the sql we can see that all top sql performs insert or updates into one specific table, tbpinstances. Analyzing further shows that top segment for
buffer busy waits is a lobsegment.

Segments by Buffer Busy Waits

Owner Tablespace Name Object Name Subobject Name Obj. Type Buffer Busy Waits % of Capture
GBSASP GBSLOB SYS_LOB0000017961C00006$$ LOB 728,771 99.96
GBSASP GBSDTA TJMSMESSAGES TABLE 114 0.02
GBSASP GBSIDX IDX_JMSMESSAGES_TXOP_TXID INDEX 65 0.01
GBSASP GBSIDX PK_TJMSTRANSACTIONS INDEX 45 0.01
GBSASP GBSIDX PK_TJMSMESSAGES INDEX 25 0.00

We learned that  the application is using blobs for edi messages data, when processing it inserts the row and then updates it during the various stages of the process.
This could accours between 5-25 times depending what type of mesage we are processing.

We logged a P1 servcie request with Oracle Support and got help analyzing the problem. We first thought this was a bug but development
didn’t agree on that.

The solution:

We was recommended to change the parameter “_securefiles_concurrency_estimate”=50 (default set to 12)
IMPORTANT: You must rebuild the table for this change to be effective.
In our case the table has grown to 260Gb. Our solution was to create an empty table (after changing the parameter)
and inserting the non processed edi messages into the table, when that was done and the application started and
confirmed to working as expected we staretd to add odler data that was mainly used by customers to verify that there messages
has been processed.

The parameter is hidden and undocumented so I can not say anything what changes when we set it. But I assume it has to do with
block structure.

2013-09-18. Problem was experienced again, rebuilding with the parameter changed did not help.

We are currently working with Oracle to try to find the root cause. We did rebuild the table again
and used retention = none (Oracles recommendation) for the securefiles and the problem was gone.
This disables read consistency for securefiles.
I doubt that this is the solution, I will update when we have more information from Oracle.

2014-02-05

We have Bug 17479510 : SECUREFILES DMLS CAUSE HIGH ‘BUFFER BUSY WAITS’ & ‘ENQ: TX – CONTENTION’ registered with Oracle Development.

I believe they have difficulties to analyse the issue, I don’t expect this to be solved for some time.
meanwhile we schedule frequent rebuilds of the table.
The rebuild fixes the problem, but it is introduced again. We did our last rebuild 2 month ago and we start to see the problem again, so we scheduled another rebuild this weekend.

2014-04-03

Latest update in this case:

As it seems now this is not a new bug, when we took over the db we noticed that we had several Wnnn processes on top in the machine using 100% cpu.
This was confirmed by Oracle Support to be Wnnn processes consuming high CPU (Doc ID 1492880.1)

The recommended solution in the note is to SQL> ALTER SYSTEM SET “_ENABLE_SPACE_PREALLOCATION”=0;
If you want more info about space pre allocation see:
SMCO (Space Management Coordinator) For Autoextend On Datafiles And How To Disable/Enable (Doc ID 743773.1)

So the recommendation was to patch to 11.2.0.4 and set _ENABLE_SPACE_PREALLOCATION”=3 (default value) and
rebuild the table, we tried without rebuildingt he table and the table was still there. Rebuilding the table fixed the issue
but it is also our current workaround. After we have rebuilt the table it takes anything from 2 to 6 months until we see the
issue again. So this is means that it can take several months for us to verify the fix.

Development say that we can confirm this by:

“You can verify that the suggested solution is working in the following way:
In AWR report, you can search for “dispenser allocations” and “CFS allocations”.
That should give the number of chunks allocated from dispenser vs CFS.
CFS allocations should be very minimal. CFS allocations will happen only when dispenser is not able to meet requests.”

This is our values:

AWR from a test system where we reproduced the issue:

Statistic                                                            Total per Second per Trans
——————————– —————— ————– ————-
segment cfs allocations                                  153               0.1             0.0
segment dispenser allocations               3,662                2.0             0.1

After the patch and rebuild:

Statistic                                                            Total per Second per Trans
——————————– —————— ————– ————-
segment cfs allocations                                     1                0.0              0.0
segment dispenser allocations           12,626                3.5                0.1

cfs allocation is lower but I expected bigger differences between good and bad.

But we will patch and set the parameter back to its default value in production
and hopefully this issue is solved.

Advertisements

7 thoughts on “High Buffer Busy Waits – Securefiles

  1. just out of curiosity, we had similar issue. The issue was on our BPM platform. Can you please confirm if this was a BPM platform that gave this kind of issue? very similar pattern, insert with emptylob and then updates in quick succession. We have recreated the table and keeping our fingers crossed.

    • Hi Dina, In our case it was related to space management.
      Securefiles relies on the SMCO process to preallocate space to lobsegments. We ran into a bug with SMCO which created 10+ Wnnn processes working with allocating space. These where always on top and took a lot of the cpu power from the server.
      Since we wasn’t allowed to patch at the time Oracle recommendation was to disable SMCO. The problem was very inconsistent, it took several weeks before we ran into the above error. And after rebuilding the table we could run for months before it was seen again. It wasn’t until the “lob” development team worked with us that we realized that SMCO was disabled and it could be related. After patching and enabling SMCO again we haven’t had the issue.

      • Thanks Magnus. Your site has been quite informative in us solving (or mitigating should I say) our issue. We have rebuilt the table and waiting to see if this holds good. In a parallel effort we are testing pounding the new table in a standby env (opened it in snapshot stby mode) to mimic 3 months worth of DML to see if the issue returns.

  2. Hi Magnus,

    We just ran into this issue and would like to know if the last fix did indeed correct your issue. We are currently being told by rebuild the securefiles with the _securefiles_concurrency_estimate=50 and change disable-storage-in-row to enable-storage-in-row.
    Thanks in advance

    • Hi Virginia
      Yes it seemes to have fixed the issue. We haven’t experienced
      The issue since we enabled space preallocation again.
      Securefiles seemes to rely heavily on space preallocation.
      I hope it works for you !
      Br Magnus

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s