So group column statistics, what is it ? it is information about relationship between columns in the same table. There are numerous blogs about this. One I read is the Oracle Optimizer Blog explaining the details. This is from 11g when you had to create it manually.
From 126.96.36.199 it is done on the fly, which can cause issue. See my previous post about my experiences. This post is a spin off from that post explaining the details a bit more.
I will use the same data as in the optimizer blog but hopefully add some more interesting stuff. As i said there are a lot of posts about this topic. I can direct you to one or the other posts but prefer to show the steps here again. It is just to keep all information together and to use it in my next post going into more details. Let’s go thru the steps and see how this works.
We will use the same table as in the oracle optimizer blog, but I will use a bit different sql.
SELECT Count(*) FROM sh.customers WHERE cust_city = 'White Plains' AND cust_state_province = 'NY' AND country_id = 52790;
If we run it the first time we get the following information.
We can see that the estimation (E-ROWS) that the optimizer does is way of actual rows (A-ROWS), it is estimated to return 1 row when in real life it is 62. How does the optimizer calculates estimated rows to be 1 ?
With basic statistics the formula are:
select CEIL(NumRowsInTable*(1/ndv)*(1/ndv)*(1/ndv)) "E-ROWS" from dual;
1/ndv (ndv=distinct values) is for the columns involved in our case cust_city,cust_state_province and country_id. With the statistics we have it will be
select CEIL(55500*(1/19)*(1/145)*(1/620)) "E-ROWS" from dual; E-ROWS ---------- 1.0000000
But Oracle detects that the estimation and the real value differs a lot and it initiates a tuning step. Statistics feedback is what is used to call cardinality feedback in 11g, if we look at our cursor from my first execution, we can see:
select child_number, IS_REOPTIMIZABLE "IS_REOPTIMIZABLE" from v$sql where sql_id='fy3ffyvh27ynj' CHILD_NUMBER IS_REOPTIMIZABLE --------------- -------------------- 0 Y
IS_REOPTIMIZABLE is set to Y, which means that Oracle will use information from our previous execution in our next execution. If we run it again it looks like this.
Now we can see that the estimation is on track, E-Rows is equal to A-Rows. And we can see at the end that statistics feedback is used for this statement.
Now we have a second cursor.
Cursor 0 will not be used since USE_FEEDBACK is Y, so Oracle is hard parsing a new cursor (1) using statistics from our first execution. The optimizer will not reoptimize cursor (1) any further, since the is_reoptimizable value is (N)o for cursor (1). We can use dbms_stats to check column usage for table customers, the function is report_col_usage.
select dbms_stats.report_col_usage(‘SH’, ‘CUSTOMERS’) from dual;
We have information that our columns has been used in single table equality predicates.
The columns listed matches our sql executed.
One more thing also happening in the background is creation of a sql plan directive (SPD). But it will take some time until it is visible, it is written to the sysaux tablespace every 15:n minute. You can force it by issue: EXEC DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE.
We can use the dba_sql_plan_directives to get information about our SPD. We have one on the table and one for each columns in our predicate list.
Now I execute the sql the third time and I change White Plains to Union Springs. This to initiate a hard parse. Now we have the SPD listed as well.
After we have executed the same sql but with other literal values (hard parse) we get more information in our report_col_usage function.
There is a relationship between the columns, cust_city is influenced by cust_state_province which is influenced by country_id.
Oracle gathers information about the relationship between these columns and store it as an extension. Before it shows up in dba_stat_extensions, you need to gather new stats on the table using DBMS_STATS. I have done that and above you can see how the extension details looks like.
If we look at the column statistics for table customers we can see that we have a new system generated column with the same name as the extension_name. Now we have statistics for our extension. This is used by the optimizer hard parsing any sql using those columns. Remember that extended statistics is not related to a specific sql_id but to a group of columns.