Not In versus Not Exists

I have helped a customer archiving some data from a big table to a backup table.
When he verified data he said he found some oddities, he is running an update script to synchronize data.  This is his comments.

“Now to complicate things a bit, when I run these queries (not EXISTS vs not IN) I get different results. That’s the strange thing to me, NOT EXISTS returns what I would expect in the test/uat DBs as compared with the NOT IN.  Though the NOT IN and NOT EXISTS appears to “work” in PROD.” I ran this in the test database where the issue was seen.

SELECT Count(*) 
FROM   account a 
WHERE  a.account_transaction_type = 4 
  AND NOT EXISTS (SELECT DISTINCT( c.account_transaction_id ) 
                       FROM   charge c 
                       WHERE a.account_transaction_id = 
                             c.account_transaction_id)

The result is

  COUNT(*)
----------
 7636

And for not in

SELECT Count(*) 
FROM   billink.account a 
WHERE  a.account_transaction_type = 4 
       AND a.account_transaction_id NOT IN 
           (SELECT DISTINCT( c.account_transaction_id ) 
            FROM   billink.charge c)

The result are

  COUNT(*)
----------
 0

To understand why this is happening we need to read the  Database SQL Language Reference 11.2. In  the section about Nulls.

Nulls with Comparison Conditions

To test for nulls, use only the comparison conditions IS NULL and IS NOT NULL. 
If you use any other condition with nulls and the result depends on the value of the null, then the result is UNKNOWN.
Because null represents a lack of data, a null cannot be equalor unequal to any value or to another null. 
However, Oracle considers two nulls to be equal when 
evaluating a DECODE function. Refer to DECODE for syntax and 
additional information.

Oracle also considers two nulls to be equal if they appear in 
compound keys. That is, Oracle considers identical two 
compound keys containing nulls if all the non-null components of the keys are equal.
Nulls in Conditions

A condition that evaluates to UNKNOWN acts almost like FALSE. 
For example, a SELECT statement with a condition in the WHERE 
clause that evaluates to UNKNOWN returns no rows. 
However, a condition evaluating to UNKNOWN differs from FALSE 
in that further operations on an UNKNOWN condition evaluation 
will evaluate to UNKNOWN. Thus, NOT FALSE evaluates to TRUE, 
but NOT UNKNOWN evaluates to UNKNOWN.

So what does that mean in our case, since the column ACCOUNT_TRANSACTION_ID in the charge table allows null and the data has rows where ACCOUNT_TRANSACTION_ID is null we have this issue, as stated above a condition evaluating to UNKNOWN will for further operations continue to evaluate to UNKNOWN.

The solution is to not allow nulls, but perhaps that is not possible so one other option would be to add “and c.account_transaction_id is not null” in the sub select of the not in.

SELECT Count(*) 
FROM   billink.account a 
WHERE  a.account_transaction_type = 4 
       AND a.account_transaction_id NOT IN 
           (SELECT DISTINCT( c.account_transaction_id ) 
            FROM   billink.charge c 
            WHERE  c.account_transaction_id IS NOT NULL

The result will be

 COUNT(*)
----------
 7636

So by removing rows where c.account_transaction_id is null, we will fix the problem.

But if you recall my colleague stated: “Though the NOT IN and NOT EXISTS appears to work in PROD” . This is related to not having any rows where ACCOUNT_TRANSACTION_ID is null in production. This makes the issue even more complex since it relates to your data.

But why does NOT IN  and NOT EXISTS behave differently, aren’t both conditions ?

Sure they are I believe it is related to how Oracle transform the query.

If we trace with 10053 trace for the NOT EXISTS query we can see
Final query after transformations:******* UNPARSED QUERY IS *******

SELECT Count(*) "COUNT(*)" 
FROM   "BILLINK"."charge" "C", 
       "BILLINK"."account" "A" 
WHERE  "A"."account_transaction_type" = 4 
AND    "A"."account_transaction_id" = 
       "C"."account_transaction_id"

And if we do the same for the NOT IN query

SELECT Count(*) "COUNT(*)" 
FROM   "BILLINK"."charge" "C", 
       "BILLINK"."account" "A" 
WHERE  "A"."account_transaction_type" = 4 
AND    "A"."account_transaction_id" = 
       "C"."account_transaction_id" 
AND NOT EXISTS (SELECT 0 
                FROM   "BILLINK"."charge" "C" 
                WHERE  "C"."account_transaction_id" IS NULL)

So basically the same, but the NOT IN adds a not exists, I don’t know why the optimizer rewrites the sql like this. I will investigate some more and see if I can find the answer.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s