New and old histograms an overview (11g and 12c)

When I visited IOUG Collaborate 15 in Las Vegas earlier this year I attended a presentation about new histograms in 12c. While listening to the presenter I started to think what impact can it have on my systems. I’m DBA for two Siebel systems and they collect statistics in its own way, so I started to blog about Siebel and new histograms. You can see it here. But first I wanted to review the old and new histograms. Here is that review. Histograms is all about distribution, the more detailed information you give the optimizer the better it will perform. So lets look at an example… We can use the example data that can be installed with the database, we have a schema named SH and a table named products. In that table we have a column named prod_category. The data for prod category looks like this:

PROD_CATEGORY               COUNT
--------------------------- -----
Hardware                        2
Photo                          10
Electronics                    13
Accessories                    21
Software                       26
                            -----
                               72

The optimizer expects an even distribution of prod_categories whiten the product table. Since we have 5 distinct prod_categories the selectivity = 1/5 = 20%. We have 72 rows in the table and 20% of 72 rows is 14.4 rows.

even

So how does it look if we run a sql searching for a specific product category and have no histograms ?

SQL>  exec dbms_stats.gather_table_stats(ownname=>NULL,
tabname=>'PRODUCTS',
method_opt=>'for all indexed columns size 1')

select column_name,num_distinct,num_buckets,histogram 
from user_tab_columns 
where table_name='PRODUCTS' and column_name='PROD_CATEGORY';

COLUMN_NAME    NUM_DISTINCT NUM_BUCKETS HISTOGRAM
-------------- ------------ ----------- ----------- 
PROD_CATEGORY             5           1 NONE

SQL> explain plan for 
     select * from sh.products where prod_category='Hardware';

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
Plan hash value: 139359126

----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                      |    14 |  2324 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| PRODUCTS             |    14 |  2324 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | PRODUCTS_PROD_CAT_IX |    14 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("PROD_CATEGORY"='Hardware')

As yoiu see from above the estimated row returned are 14, and it doesn’t matter what you search for. 14 rows will always be the estimated rows returned.

We know that this is not true, the distribution is uneven as we seen from the data above.

uneven Some product categories has more products then others. Software category has most products while hardware has least. This information can be crucial for the optimizer to make the correct decision. It can be a big difference if we are interested in software or hardware . So if we collect histograms on the model column we can have two types of histograms (<=11g), frequency or height.

Frequency Histogram

Frequency histograms is the more accurate of the two, a frequency histogram creates as many buckets as there are number of distinct values (NDV). In our case we have 5 categories, so we will have 5 buckets. Each bucket will contain the amount of rows for each category.

uneve_numb

Now we can see that we have 26 products in category software and 2 in category hardware. This will help the optimizer to make the correct decision. So frequency histograms has one bucket per value as long as NDV is <=254. If we had 300 categories instead of our five categories a frequency histogram can’t be used.

This is how we collect and can list histogram information.

exec dbms_stats.gather_table_stats(ownname=>NULL,
tabname=>'PRODUCTS',
method_opt=>'for all indexed columns size 254')

SQL>select column_name,num_distinct,num_buckets,histogram 
    from user_tab_columns 
    where table_name='PRODUCTS' and column_name='PROD_CATEGORY';

COLUMN_NAME       NUM_DISTINCT NUM_BUCKETS HISTOGRAM
----------------- ------------ ----------- ---------
PROD_CATEGORY                5           5 FREQUENCY

SELECT ROW_NUMBER() OVER (ORDER BY ENDPOINT_VALUE) AS Bucket,
ENDPOINT_VALUE,ENDPOINT_NUMBER 
FROM USER_TAB_HISTOGRAMS
WHERE TABLE_NAME = 'PRODUCTS' AND COLUMN_NAME = 'PROD_CATEGORY'
ORDER BY ENDPOINT_NUMBER;

BUCKET  ENDPOINT_VALUE                         ENDPOINT_NUMBER
------- -------------------------------------- ---------------
      1   339515129341172000000000000000000000              21
      2   360467016300303000000000000000000000              34
      3   375821830649412000000000000000000000              36
      4   417501949642057000000000000000000000              46
      5   433220104041556000000000000000000000              72

Endpoint value is our categories (hardware,software etc) and you can convert the numbers to text if you like. So the endpoint number is the number where each bucket ends, So bucket1 has 0-21=21 rows, bucket2 has 34-21=13 rows and so on. A bucket which endpoint is at least 2 times higher than the previous bucket is an popular value, if not it is referenced as a non-popular value. Both bucket 4 and 5 are popular values.

How does the it look if we execute a query ?

SQL> explain plan for
  2  select * from products where prod_category='Hardware';

Explained.

SQL> select plan_table_output from table(dbms_xplan.display('plan_table',null,'serial'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
Plan hash value: 139359126

----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                      |     2 |   332 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| PRODUCTS             |     2 |   332 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | PRODUCTS_PROD_CAT_IX |     2 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("PROD_CATEGORY"='Hardware')

With another value ?

SQL> explain plan for
  2  select * from products where prod_category='Software';

Explained.

SQL> select plan_table_output from table(dbms_xplan.display('plan_table',null,'serial'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
Plan hash value: 1954719464

-----------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows    | Bytes   | Cost (%CPU)  | Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |      26 |    4316 |       3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| PRODUCTS |      26 |    4316 |       3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
---------------------------------------------------

   1 - filter("PROD_CATEGORY"='Software')

As you see for the two executions, estimated values to be returned is identical with the value we have. The count for hardware is 2 and for software is 26. And as you see based on the histograms the optimizer chooses different plans for the 2 executions.

What if we use bind variables, some years ago I was often told that OLTP applications should use bind variables and then histograms was not useful. It could even cause issues. I think the reason behind that was bind peeking, as you might know Oracle can peek into bind variables when a sql is hard parsed. If you have a sql that is executed frequently it is parsed once and then executed many times, exactly as we want it in an oltp application. But if you had histograms and the plan was not an optimal one, we can have inconsistent performance problem. Depending on what value you supplied on the hardparse you can be locked to that plan for a long time. So often a recommendation was to turn of bind peeking in OLTP applications. For datawarehouse it was different it was often sql executed few times handling large amount of data and bind variables should not be used, because using literals and having histograms could really boost performance.

We define a variable v1 and assign the value we are interested in to the bind variable.

SQL> variable v1 varchar2(30);
SQL> exec :v1 :='Hardware';

PL/SQL procedure successfully completed.

SQL> explain plan for
  2  select * from products where prod_category=:v1;

Explained.

SQL> select plan_table_output from table(dbms_xplan.display('plan_table',null,'serial'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
Plan hash value: 139359126

----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                      |    14 |  2324 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| PRODUCTS             |    14 |  2324 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | PRODUCTS_PROD_CAT_IX |    14 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("PROD_CATEGORY"=:V1)

14 rows selected.

SQL> exec :v1 :='Software';

PL/SQL procedure successfully completed.

SQL> explain plan for
  2  select * from products where prod_category=:v1;

Explained.

SQL> select plan_table_output from table(dbms_xplan.display('plan_table',null,'serial'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
Plan hash value: 139359126

----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                      |    14 |  2324 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| PRODUCTS             |    14 |  2324 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | PRODUCTS_PROD_CAT_IX |    14 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("PROD_CATEGORY"=:V1)

14 rows selected.

As you see the estimation is back to the default value without histogram 1/5=20%
which is 72*0.2=14.4=14 rows. We are also using the same plan fro both values, even if we know that Software would benefit from a full table scan.
So histogram is useless or even bad when using bind variables ?
Not fr.o.m version 11G, in 11G we got something called Adaptive Cursor Sharing, which means Oracle can change plan based on histograms and bind variables.
You can see an example of ACS here.

Height Histogram

Height histogram is used when we have NDV > 254, lets use cust_city_id from table sh.customers for our example with height histograms. We have 620 distinct values which is higher than our max 254 buckets for frequency histograms. The table has totally 55500 rows.

CUST_CITY                      CUST_CITY_ID      ROWS
------------------------------ ------------ ----------
Aachen                                51040        68
Aalborg                               51041        79
Aalen                                 51042        28
Abingdon                              51043       166
Accomac                               51044       145
Adelaide                              51045       158
Aix-en-Provence                       51046        25
Aix-les-Bains                         51047        79
Aladdin                               51048       110
Albion                                51049        60
Aldridge                              51050        79
Alicante                              51051        28
Aline                                 51052        42
Alkmaar                               51053       128
Alleppey                              51054        80
..
Zeist                                 52530       135
Zoetermeer                            52531        54

if we create a height histogram for our CUST_CITY_ID (I use it since it easier to see the values in the histogram than CUST_CITY)

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'SH',
TABNAME=>'CUSTOMERS',
ESTIMATE_PERCENT=>100,
METHOD_OPT=>'FOR ALL COLUMNS 254')

So we have now a Height Histogram for our CUST_CITY_ID column.

SQL> select column_name,num_distinct,num_buckets,histogram 
     from user_tab_columns 
     where table_name='CUSTOMERS' and column_name='CUST_CITY_ID';

COLUMN_NAME               NUM_DISTINCT NUM_BUCKETS HISTOGRAM
------------------------- ------------ ----------- ---------------
CUST_CITY_ID                       620         254 HEIGHT BALANCED

How do we collect histogram information ? We start by sorting the data and select n:th row from the table, the n.th row is calculated as total rows/buckets so in our case 55500/254, so around each 218:th row is listed in our height histogram.

You can think of this like walking down a road with our data, if we sort the data on CUST_CITY_ID and put each row out on the road, then we start walking and for each 218:th “step” we record that as the endpoint value for our bucket. We have 254 buckets that should have a value. Our first stop att position 218 gives us 51043 and the second 51044 the third 51046 and so on..

X->
 --------------------------------------------------------------
 0              218               436              654

The statistics from the histogram looks like this:

SQL> SELECT ENDPOINT_NUMBER, ENDPOINT_VALUE 
     FROM USER_HISTOGRAMS
     WHERE  TABLE_NAME='CUSTOMERS'
     AND    COLUMN_NAME='CUST_CITY_ID';

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
          0      51040
          1      51043
          2      51044
          3      51046
          4      51049
          5      51053
          6      51055

As you see our figures match the real values starting with bucket 1, we have 51043,51044 and 51046. Here you can also note that we have a bucket 0. Bucket 0 is automatically added if the first bucket has multiple values. In our case it is true since values 51040,51041,51042 and 51043 is stored here.

If we have a value spanning more than 1 bucket, they can be compressed which means some buckets are removed, then only the last bucket for that value is stored. We can see this happening for our height histogram.

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
          0      51040
          1      51043
          2      51044
          3      51046
          4      51049
          5      51053
          6      51055
          7      51057
          8      51059
          9      51061
         10      51062
         11      51067
         14      51069
         15      51073

So we can see that bucket 12 and 13 has been compressed into bucket 14, 51067 is the endpoint value for bucket 11. 51068 is in bucket 12 and so is 51069, 51069 has so many values (576) that it fills all of bucket 13 and becomes the endpoint value for bucket 14.
Values like 51069 is called a popular value since it spans more than 1 bucket, if it doesn’t it is called unpopular.

The optimizer use this information and calculates the cardinality like this:
For POPULAR values: cardinality= (num rows in table) * (num of endpoints spanned by this value / total num of endpoints).

For NON-POPULAR values: cardinality= (num rows in table) * density
density is calculated internally and is a value between 0 and 1.

Let us run the select searching for cust_city_id = 51069, following the formula for popular values above it should be: 55500 * (3/254)= 55500 * 0,01181 = 655 rows

SQL > explain plan for 
      select * from customers where cust_city_id=51069;

Explained.

SQL > @?/rdbms/admin/utlxpls

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
Plan hash value: 2008213504

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time      |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |   656 |   115K|   405    (1)| 00:00:05 |
|*  1 |  TABLE ACCESS FULL| CUSTOMERS |   656 |   115K|   405    (1)| 00:00:05 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CUST_CITY_ID"=51069)

The estimation is very accurate, our math gave 655 rows. What if we ask for a unpopular value like 51042 ?

SQL > explain plan for 
      select * from customers where cust_city_id=51042;

Explained.

SQL > @?/rdbms/admin/utlxpls

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
Plan hash value: 2008213504

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time      |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |    66 | 11946 |   405    (1)| 00:00:05 |
|*  1 |  TABLE ACCESS FULL| CUSTOMERS |    66 | 11946 |   405    (1)| 00:00:05 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CUST_CITY_ID"=51042)

Estimation is 66 rows, looking at our data we can see that 51042 has 28 rows. So the estimation is not that good.

New histograms in 12c

Top Frequency Histograms

Frequency histograms are the best histograms since it is the most accurate one, but when we have NDV>buckets normally a height histogram is created. What if few values occupies most of the rows. Let’s use country_id in sh.customers.

COUNTRY_ID     ROWS
---------- ----------
     52787       75
     52791       88
     52788       91
     52785      244
     52777      383
     52773      403
     52769      597
     52782      624
     52786      708
     52771      712
     52774      831
     52775      832
     52772     2010
     52778     2039
     52779     3833
     52789     7557
     52770     7780
     52776     8173
     52790    18520

We can see that country_id 52790 is very popular while 52787 is not popular. We can now have a histogram for popular values, non-popular values are ignored. This is how I collect stats.

exec dbms_Stats.gather_table_stats(ownname=>'SH',
tabname=>'CUSTOMERS',
method_opt=>'for all columns size 14')

Note that we have removed the estimate_percent parameter, it is default set  to dbms_stats.auto_sample_size, this is a must to have new histograms created.

A top-frequence histogram is generated when:

- The data set has more than n (254) distinct values.
- The percentage of rows occupied by the top n frequent values is 
  equal to or greater than threshold p, where p is (1-(1/n))*100.
- The estimate_percent parameter is set to AUTO_SAMPLE_SIZE

So what does it mean for our case, we set n=14, n in this case is equal to buckets. (1-(1/14))*100 = (1-0.07)*100 = 0.93*100 = 93%. So is the % of top n(14) values (Brown) > 93%. The top 14 values has 55159 rows out of 55500 which is 99.3% of total rows.
Yes it is bigger and we will generate a top-frequency histogram.

Now we can see that we have a top-frequence histogram.

TABLE_NAME     COLUMN_NAME  NUM_DISTINCT NUM_BUCKETS HISTOGRAM
-------------- ------------ ------------ ----------- ---------------
CUSTOMERS      COUNTRY_ID             19          14 TOP-FREQUENCY

So how does the histogram statistics look like.

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
        597              52769
       8377              52770
       9089              52771
      11099              52772
      11930              52774
      12762              52775
      20935              52776
      22974              52778
      26807              52779
      27431              52782
      28139              52786
      35696              52789
      54216              52790
      54217              52791

Now we can see that we have 14 buckets and endpoint_number tells us how many rows we have, like in the old frequency histogram. Note that some values are missing  since they are considered non-popular. (Here I am a bit puzzled my assumption is that the n (14) most frequent values are the popular values, but my example does not support that.
I had a case with support and they confirmed this to be Bug 16549928 FIRST BUCKET NOT REMOVED WHEN GENERATING A TOP FREQUENCY HISTOGRAM
The optimizer calculates cardinality as

C = num_rows * ( num_values / num_rows ) 
For value 52774 C = 55500 * (831/55500) =830.99

Since this is a frequency histogram cardinality is equal to the bucket size.
Non-popular values uses density.

We can verify our findings.

SQL> explain plan for
     select * from customers where country_id=52774;

SQL> @?/rdbms/admin/utlxpls

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2008213504

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |   831 |   146K|   405    (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| CUSTOMERS |   831 |   146K|   405    (1)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

   1 - filter("COUNTRY_ID"=52774)

As you see estimation is 831 rows and if you check with the details above you can see that it is spot on. value 52774 has 831 rows. It also matches the calculation above.
If we choose an unpopular value ?

SQL > explain plan for
     select * from customers where country_id=52773;

SQL > @?/rdbms/admin/utlxpls

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
Plan hash value: 2008213504

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU) | Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |   257 | 46517 |   405    (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| CUSTOMERS |   257 | 46517 |   405    (1)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COUNTRY_ID"=52773)

Estimation is 257 rows but the real value is 403, so a bit of.

Hybrid Histograms

Hybrid histograms is a mixture of height and frequency histograms, combining the two will get a better result. Height histograms can produce bad estimates for almost popular values. Values that spans almost two buckets are not popular values. The problem is solved by hybrid histograms, in a hybrid histograms no value spans more than one bucket, instead we have an endpoint value repeat count. By using the repeat count the optimizer can give good estimates for almost popular values. In my example I will use my own table TEST with column NR. The data looks like this

SQL> select nr,count(1) from test group by nr order by 1;

    NR   COUNT(1)
---------- ----------
     1      100
     2       80
     3       60
     4       40
     5       20
     6       10

I have collected stats on the table

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'SH',
TABNAME=>'TEST',
METHOD_OPT=>'FOR ALL COLUMNS SIZE 3')

We can see that we have a hybrid histogram with 3 buckets.

TABLE_NAME  COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM
----------- ----------- ------------ ----------- --------- 
TEST                 NR            6           3 HYBRID

This is how our buckets look like

ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_REPEAT_COUNT
--------------- -------------- ---------------------
        100                  1                   100
        240                  3                    60
        310                  6                    10

With a height histogram, we will share all rows almost equally among the buckets, so we have 310 rows in test and we have 3 buckets. 310/3= ~103 rows per bucket, if this was a height histogram we would sort the data and pick every 103 value in the list. With Hybrid histogram we will make some adjustment, we will put the data into buckets that no value will span buckets. For the last value in the bucket we count how many times it is repeated and store it in column endpoint_repeat_count. If the value we search for is one that is the last one in the bucket, it has a repeat count and the optimizer will use that value to calculate cardinality, if it isn’t it uses density.
We walk down our road with values, 103 steps. We stop at value 2, then we revert and put all rows having value 1 into bucket 1, count the frequency (100) and stores it in endpoint_repeat_count. Then we walk another 103 steps and stops at value 3, we put all rows for value 2 and 3 into the second bucket, since 3 is the last endpoint number in that bucket we count it and stores the value into endpoint_repeat_count. Another 103 steps and we stop at value 6, values 4,5 and 6 is stored into the last bucket, since 6 is the last endpoint number we count it and stores into endpoint_repeat_count.
If we have a value spaning more than one bucket we will adjust the it to be stored in only one bucket.

What is the result if we search for a value listed in the bucket ?

SQL> explain plan for 
     select * from test where nr=3;

SQL> @?/rdbms/admin/utlxpls

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
Plan hash value: 1357081020

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    60 |   180 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST |    60 |   180 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------

   1 - filter("NR"=3)

Estimation of 60 rows is good, it match what our data says. What if we search for a value not in the bucket.

SQL> explain plan for 
     select * from test where nr=5;

SQL> @?/rdbms/admin/utlxpls

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
Plan hash value: 1357081020

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    52 |   156 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST |    52 |   156 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------

   1 - filter("NR"=5)

Estimation is not that good, 5 has 20 values. It uses density to estimate the amount of rows returned.

During my tests I noticed a lot of odd behaviour and as I see it false result.
New features tends to be a bit buggy in the first release.

Also I wrote this in relation to another blog I have done, 12c new statistics with Siebel.

Reference:

Database SQL Tuning Guide

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s