Login or Sign Up to become a member!

EXPERTS, INFORMATION, IDEAS & KNOWLEDGE

Social bookmarker Add this

5 ways to return rows from one table not in another table

From Wiki

Jump to: navigation, search

There are at least 5 ways to return data from one table which is not in another table. Two of these are SQL Server 2005 and greater only

NOT IN

NOT EXISTS

LEFT and RIGHT JOIN

OUTER APLY (2005+)

EXCEPT (2005+)


First create these two tables

  1. CREATE TABLE testnulls (ID INT)
  2.     INSERT INTO testnulls VALUES (1)
  3.     INSERT INTO testnulls VALUES (2)
  4.     INSERT INTO testnulls VALUES (null)
  5.  
  6.     CREATE TABLE testjoin (ID INT)
  7.     INSERT INTO testjoin VALUES (1)
  8.     INSERT INTO testjoin VALUES (3)


Contents

NOT IN

Run the following Code

  1. SELECT * FROM testjoin WHERE ID NOT IN(SELECT ID FROM testnulls)

What happened? Nothing gets returned! The reason is because the subquery returns a NULL and you can't compare a NULL to anything

Now run this

  1. SELECT * FROM testjoin WHERE ID NOT IN(SELECT ID FROM testnulls WHERE ID IS NOT NULL)


That worked because we eliminated the NULL values in the subquery


NOT EXISTS

NOT EXISTS doesn't have the problem that NOT IN has. Run the following code

  1. SELECT * FROM testjoin j
  2.     WHERE NOT EXISTS (SELECT n.ID
  3.     FROM testnulls n
  4.     WHERE n.ID = j.ID)

Everything worked as expected


LEFT and RIGHT JOIN

Plain vanilla LEFT and RIGHT JOINS

  1. SELECT j.* FROM testjoin j
  2.     LEFT OUTER JOIN testnulls n ON n.ID = j.ID
  3.     WHERE n.ID IS NULL
  4.  
  5.     SELECT j.* FROM  testnulls n
  6.     RIGHT OUTER JOIN testjoin j  ON n.ID = j.ID
  7.     WHERE n.ID IS NULL


OUTER APPLY (SQL 2005 +)

OUTER APPLY is something that got added to SQL 2005

  1. SELECT j.* FROM testjoin j
  2.     OUTER APPLY
  3.     (SELECT id  FROM testnulls n
  4.     WHERE n.ID = j.ID) a
  5.     WHERE a.ID IS NULL


EXCEPT(SQL 2005 +)

EXCEPT is something that got added to SQL 2005. It basically returns everything from the top table which is not in the bottom table

  1. SELECT * FROM testjoin
  2.     EXCEPT
  3.     SELECT * FROM testnulls


INTERSECT returns what ever is in both tables(like a regular join)

  1. SELECT * FROM testjoin
  2.     INTERSECT
  3.     SELECT * FROM testnulls

There seems to be a higher cost associated with the EXCEPT and INTERSECT queries vs the IN or EXISTS approach due to a Sort.

In the case of this test:

  1. DECLARE @TABLE1 TABLE (id INT IDENTITY(1,1), name VARCHAR(10))
  2.     DECLARE @TABLE2 TABLE (id INT IDENTITY(1,1), name VARCHAR(10))
  3.  
  4.     INSERT @TABLE1 VALUES ('DAVE')
  5.     INSERT @TABLE1 VALUES ('MARK')
  6.  
  7.     INSERT @TABLE2 VALUES ('DAVE')
  8.     INSERT @TABLE2 VALUES (NULL)
  9.  
  10.     SELECT * FROM @TABLE1 t1 WHERE NOT EXISTS (SELECT NAME FROM @TABLE2 t2 WHERE t1.NAME = t2.NAME)
  11.  
  12.     SELECT * FROM @TABLE1
  13.     EXCEPT
  14.     SELECT * FROM @TABLE2


this plan is produced showing the higher cost:

Image:Sqlplan.png


Is this a fair assumption to make or are there other factors to take into consideration (such as number of records, indexes etc)?

This is also the same reason that UNION ALL is much faster than UNION

Take this for example

  1. CREATE TABLE #TABLE1  (id INT IDENTITY(1,1), name VARCHAR(10))
  2.     CREATE TABLE #TABLE2  (id INT IDENTITY(1,1), name VARCHAR(10))
  3.  
  4.     INSERT #TABLE1 VALUES ('DAVE')
  5.     INSERT #TABLE1 VALUES ('MARK')
  6.  
  7.     INSERT #TABLE2 VALUES ('DAVE')
  8.     INSERT #TABLE2 VALUES (NULL)


Now run this and check the plan

  1. SELECT * FROM #TABLE1
  2.     UNION
  3.     SELECT * FROM #TABLE2
  4.  
  5.  
  6.     SELECT * FROM #TABLE1
  7.     UNION ALL
  8.     SELECT * FROM #TABLE2


execution plan (in text)

Union

  1. |--Sort(DISTINCT ORDER BY:([Union1004] ASC, [Union1005] ASC))
  2.            |--Concatenation
  3.                 |--Table Scan(OBJECT:([tempdb].[dbo].[#TABLE1000000005630]))
  4.                 |--Table Scan(OBJECT:([tempdb].[dbo].[#TABLE2000000005630]))


Union all

  1. |--Concatenation
  2.            |--Table Scan(OBJECT:([tempdb].[dbo].[#TABLE1000000005630]))
  3.            |--Table Scan(OBJECT:([tempdb].[dbo].[#TABLE2000000005630]))


Contributed by: --SQLDenis 02:45, 31 May 2008 (GMT)

Part of SQL Server Programming Hacks

Section Handy tricks

169 Rating: 3.4/5 (11 votes cast)