This problem started some weeks ago, users were complaining that certain search functions didn’t perform. it was either very slow or the application froze.
I noticed different plans over time.
But soon we got more complains, other search fields also started to respond poorly, and new ones were added if not daily so frequent. I started to analyze the sqls and noticed that they were constructed the same way. All sql statement having an issue had an in with the same subquery in the where clause, see the file sql.pdf for details. In the bottom we have..
... AND T5.bu_flg = 'Y' ) T4.row_id IN ( SELECT sq1_t1.ou_ext_id FROM siebel.s_accnt_postn sq1_t1, siebel.s_party_rpt_rel sq1_t2 WHERE ( sq1_t1.position_id = sq1_t2.sub_party_id ) AND ( sq1_t2.party_id = :v3 ))
All of them had this, I should also say that I had similar sqls performing fine as well, but I have identified 3 sqls having the issue. I do believe it is more I have just not found them yet.
What I found in the plan was that I found a view att the top of the plan with the name VM_NWVW_2. This is the plan (sorry for the bad formatting) we got..
As you might know Oracle has a set of query transformation features that the optimizer carries out on all sqls this to make the sql better to execute, this is what he manual says about query transformation.
"For some statements, the query transformer determines whether it is advantageous to rewrite the original SQL statement into a semantically equivalent SQL statement with a lower cost. When a viable alternative exists, the database calculates the cost of the alternatives separately and chooses the lowest-cost alternative. Chapter 5, "Query Transformations" describes the different types of optimizer transformations. "Looking at chapter five we have several of them...
This is what says about subquery unnesting.
"Subqueries are nested when they appear in the WHERE clause of the parent statement. When Oracle Database evaluates a statement with a nested subquery, it must evaluate the subquery portion multiple times and may overlook some efficient access paths or joins. Subquery unnesting unnests and merges the body of the subquery into the body of the statement that contains it, allowing the optimizer to consider them together when evaluating access paths and joins. The optimizer can unnest most subqueries, with some exceptions. Those exceptions include hierarchical subqueries and subqueries that contain a ROWNUM pseudocolumn, one of the set operators, a nested aggregate function, or a correlated reference to a query block that is not the immediate outer query block of the subquery." And the way I understand how this works is that when we have a subquery as in my case, the optimizer tries to.. 1. transform it to an inline view and evaluate that further. 2. If possible it transform that new sql with for example view merge.
The VM_NWVW_2 view is related to the merge part, if you want to read more about it check this blog. Since it seems we are doing an unnest of the query my first test was to add a hint to the sub query, it looks like this: /*+ NO_UNNEST */ with the hint we got a much better plan.
The execution time was reduced from ~10 minutes to
< 1 second. Buffer gets was reduced from 3200K to 20K.
Since I could hint for a better plan, I created a case with Oracle support. Since this is a Siebel application it was transfered to Siebel support. I thought it can be a bug involved or bad statistics. But I wanted supports help to understand why the better plan wasn’t discovered by the optimizer. Knowing that Siebel likes altering a lot of the database/session parameters it might had an impact.
But they had no interest, all my arguments was met with:
“There is no bug here, just an ultra inefficient custom query that under certain conditions may perform well. “. I will say that I normally have a very good response from Oracle Support so I was a bit puzzled over the prompt declined for this to be a bug.
Well I thought they were wrong, I also believed this wasn’t Siebel related since I could reproduce it outside the application.
So I continued my analyze. We had patched the database to 220.127.116.11.6 some weeks ago, and I had a feeling that the problem might started after that. I did a test and set optimzer_features_enable=’18.104.22.168′, I was not totally suprised when I got a good plan after that. Not as good as with my hint, the new plan converted my subselect into an inline view, but didn’t merge it. It ran in 3 seconds and performed ~50K buffer gets.
Now the work started to find what caused the change, normally it is related to a parmeter change between the two versions. But I tested to set the parameters to the same settings as it was after changing O.F.E to 22.214.171.124. But no success, next is bugs, plenty of bugs are altered ON/OFF between the two versions. You can use a 10053 traces to compare.
I did extract all bugs with altered values, altered them in my session and now I got the good plan again. So some more work was done to pinpoint THE bug fixing my issue.
alter session set "_fix_control" = '12555499:0';
This was the bug changing the behavior, so what happens here ?
The information is in this Bug 18255105 – Patch for upgrade scripts to identify histograms affected by fix of bug 15898932 (Doc ID 18255105.8)
Due to fix for bug 15898932 the coding of histograms for CHAR columns changed in 126.96.36.199 compared to earlier versions.
So it seems Oracle has changed the way they collect histograms for char columns. Char columns is not very much used these days, varchar columns are much more frequent I would say. But it seems that Siebel uses char columns for there*_flg columns (flag).
And we had one in our sql as well, T5.BU_FLG = ‘Y’.
So what I did was to disable the bug by using the recommended fix_control, and then I regathered statistics on the table S_BU, which I identified as the key table. Another thing to check is that Siebel now have updated it recommendations for Siebel, see Performance Tuning Guidelines for Siebel CRM Application on Oracle Database (Doc ID 781927.1). In this Note you have the New version 2015 – Performance Tuning Guidelines for Siebel on Oracle Database and also script for new Siebel Stats. There is a few changes that needs to be done in the script, the new documentation says on page 5 that three tables should have method_opt changed from “for all indexed columns size 254” to “for all columns size 254”. This is not done in the script what I noticed so I did that change for my S_BU table since it was one of them..
Now I got the good plan again, so it had a big impact for us. This is the plan. (newplan1)
I used dbms_stats.diff_table_stats_in_history to compare my old and new stats.
REPORT ------------------------------------------------------------------- MAXDIFFPCT ---------- ###################################################################
STATISTICS DIFFERENCE REPORT FOR: .................................
TABLE : S_BU OWNER : SIEBEL SOURCE A : Statistics as of 11-NOV-15 07.51.06.484796 AM -05:00 SOURCE B : Statistics as of 13-NOV-15 09.30.14.125030 AM -05:00 PCTTHRESHOLD : 10 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NO DIFFERENCE IN TABLE / (SUB)PARTITION STATISTICS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ COLUMN STATISTICS DIFFERENCE: ............................. COLUMN_NAME SRC NDV DENSITY HIST NULLS LEN MIN MAX SAMPSIZ ................................................................... BU_FLG A 1 .000132696 YES 0 2 59 59 3768 B NO_STATS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NO DIFFERENCE IN INDEX / (SUB)PARTITION STATISTICS ###################################################################
As you see no statistics differs on table or index level, but on column level and it is the same flag column that we used in our sql that has different statistics.
My last step was to disable the new way of collecting stats and recollect stats for all my Siebel tables. I used the new script. We have now identified 8 similar sqls to have the same issue, and all of the 8 listed sqls was now performing good !!
Also after this I believe we got a much more stable database when it comes to sql executiong, the sudden changes in plans seems gone now.
The concluion is that if you run Siebel and recently have updated to version 188.8.131.52 of the databses. Be aware that changes was done how Oracle collects statistics on char columns. Char columns is frequently used by Siebel on flag columns (*_flg).
I changed the parameter on system level and noticed many of my poor performing sqls to improve a lot.
alter system set "_fix_control" = '12555499:0' scope=both;