Login or Sign Up to become a member!
LessThanDot Site Logo

LessThanDot

Community Wiki

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Navigation

Google Ads

5 ways to return rows from one table not in another table

From Wiki

Jump to: navigation, search

There are at least 5 ways to return data from one table which is not in another table. Two of these are SQL Server 2005 and greater only

NOT IN

NOT EXISTS

LEFT and RIGHT JOIN

OUTER APPLY (2005+)

EXCEPT (2005+)


First create these two tables

  1. CREATE TABLE testnulls (ID INT)
  2. INSERT INTO testnulls VALUES (1)
  3. INSERT INTO testnulls VALUES (2)
  4. INSERT INTO testnulls VALUES (null)
  5.  
  6. CREATE TABLE testjoin (ID INT)
  7. INSERT INTO testjoin VALUES (1)
  8. INSERT INTO testjoin VALUES (3)


NOT IN

Run the following Code

  1. SELECT * FROM testjoin WHERE ID NOT IN(SELECT ID FROM testnulls)

What happened? Nothing gets returned! The reason is because the subquery returns a NULL and you can't compare a NULL to anything

Now run this

  1. SELECT * FROM testjoin WHERE ID NOT IN(SELECT ID FROM testnulls WHERE ID IS NOT NULL)


That worked because we eliminated the NULL values in the subquery


NOT EXISTS

NOT EXISTS doesn't have the problem that NOT IN has. Run the following code

  1. SELECT *
  2. FROM testjoin j
  3. WHERE NOT EXISTS (
  4.    SELECT 1
  5.    FROM testnulls n
  6.    WHERE n.ID = j.ID
  7. )

Everything worked as expected

LEFT and RIGHT JOIN

Plain vanilla LEFT and RIGHT JOINS

  1. SELECT j.*
  2. FROM
  3.    testjoin j
  4.    LEFT JOIN testnulls n ON n.ID = j.ID
  5. WHERE n.ID IS NULL
  6.  
  7. SELECT j.*
  8.    FROM  testnulls n
  9.    RIGHT OUTER JOIN testjoin j ON n.ID = j.ID
  10.    WHERE n.ID IS NULL


OUTER APPLY (SQL 2005 +)

OUTER APPLY is something that got added to SQL 2005

  1. SELECT j.*
  2.    FROM testjoin j
  3.    OUTER APPLY (
  4.       SELECT id
  5.       FROM testnulls n
  6.       WHERE n.ID = j.ID
  7.    ) a
  8. WHERE a.ID IS NULL


EXCEPT(SQL 2005 +)

EXCEPT was added in SQL 2005. It returns everything from the top table which is not in the bottom table.

  1. SELECT * FROM testjoin
  2. EXCEPT
  3. SELECT * FROM testnulls


INTERSECT returns whatever is in both tables (like a regular join).

  1. SELECT * FROM testjoin
  2. INTERSECT
  3. SELECT * FROM testnulls

There seems to be a higher cost associated with the EXCEPT and INTERSECT queries vs. the IN or EXISTS approach due to a sort.

In the case of this test:

  1. DECLARE @TABLE1 TABLE (id int IDENTITY(1,1), name varchar(10))
  2. DECLARE @TABLE2 TABLE (id int IDENTITY(1,1), name varchar(10))
  3.  
  4. INSERT @TABLE1 VALUES ('DAVE')
  5. INSERT @TABLE1 VALUES ('MARK')
  6.  
  7. INSERT @TABLE2 VALUES ('DAVE')
  8. INSERT @TABLE2 VALUES (NULL)
  9.  
  10. SELECT * FROM @TABLE1 t1 WHERE NOT EXISTS (SELECT 1 FROM @TABLE2 t2 WHERE t1.NAME = t2.NAME)
  11.  
  12. SELECT * FROM @TABLE1
  13. EXCEPT
  14. SELECT * FROM @TABLE2


The plan produced shows the higher cost:

File:Sqlplan.png


Is this a fair assumption to make or are there other factors to take into consideration (such as number of records, indexes etc)?

This is also the same reason that UNION ALL is much faster than UNION

Take this for example

  1. CREATE TABLE #TABLE1  (id int IDENTITY(1,1), name varchar(10))
  2. CREATE TABLE #TABLE2  (id int IDENTITY(1,1), name varchar(10))
  3.  
  4. INSERT #TABLE1 VALUES ('DAVE')
  5. INSERT #TABLE1 VALUES ('MARK')
  6.  
  7. INSERT #TABLE2 VALUES ('DAVE')
  8. INSERT #TABLE2 VALUES (NULL)


Now run this and check the plan

  1. SELECT * FROM #TABLE1
  2. UNION
  3. SELECT * FROM #TABLE2
  4.  
  5.  
  6. SELECT * FROM #TABLE1
  7. UNION ALL
  8. SELECT * FROM #TABLE2


Execution plans (in text)

UNION

  1. |--Sort(DISTINCT ORDER BY:([Union1004] ASC, [Union1005] ASC))
  2.            |--Concatenation
  3.                 |--Table Scan(OBJECT:([tempdb].[dbo].[#TABLE1000000005630]))
  4.                 |--Table Scan(OBJECT:([tempdb].[dbo].[#TABLE2000000005630]))


UNION ALL

  1. |--Concatenation
  2.            |--Table Scan(OBJECT:([tempdb].[dbo].[#TABLE1000000005630]))
  3.            |--Table Scan(OBJECT:([tempdb].[dbo].[#TABLE2000000005630]))

Take a look at SSIS way of solving this problem at SSIS: Checking if a row exists and if it does, has it changed?

See also performance tests for these 5 methods at Timings of different techniques for finding missing records


Contributed by: --SQLDenis 02:45, 31 May 2008 (GMT)

Part of SQL Server Programming Hacks

Section Handy tricks

169 Rating: 3.1/5 (125 votes cast)