5 ways to return rows from one table not in another table
From Wiki
There are at least 5 ways to return data from one table which is not in another table. Two of these are SQL Server 2005 and greater only
First create these two tables
- CREATE TABLE testnulls (ID INT)
- INSERT INTO testnulls VALUES (1)
- INSERT INTO testnulls VALUES (2)
- INSERT INTO testnulls VALUES (null)
- CREATE TABLE testjoin (ID INT)
- INSERT INTO testjoin VALUES (1)
- INSERT INTO testjoin VALUES (3)
Contents |
NOT IN
Run the following Code
- SELECT * FROM testjoin WHERE ID NOT IN(SELECT ID FROM testnulls)
What happened? Nothing gets returned! The reason is because the subquery returns a NULL and you can't compare a NULL to anything
Now run this
- SELECT * FROM testjoin WHERE ID NOT IN(SELECT ID FROM testnulls WHERE ID IS NOT NULL)
That worked because we eliminated the NULL values in the subquery
NOT EXISTS
NOT EXISTS doesn't have the problem that NOT IN has. Run the following code
- SELECT * FROM testjoin j
- WHERE NOT EXISTS (SELECT n.ID
- FROM testnulls n
- WHERE n.ID = j.ID)
Everything worked as expected
LEFT and RIGHT JOIN
Plain vanilla LEFT and RIGHT JOINS
- SELECT j.* FROM testjoin j
- LEFT OUTER JOIN testnulls n ON n.ID = j.ID
- WHERE n.ID IS NULL
- SELECT j.* FROM testnulls n
- RIGHT OUTER JOIN testjoin j ON n.ID = j.ID
- WHERE n.ID IS NULL
OUTER APPLY (SQL 2005 +)
OUTER APPLY is something that got added to SQL 2005
- SELECT j.* FROM testjoin j
- OUTER APPLY
- (SELECT id FROM testnulls n
- WHERE n.ID = j.ID) a
- WHERE a.ID IS NULL
EXCEPT(SQL 2005 +)
EXCEPT is something that got added to SQL 2005. It basically returns everything from the top table which is not in the bottom table
- SELECT * FROM testjoin
- EXCEPT
- SELECT * FROM testnulls
INTERSECT returns what ever is in both tables(like a regular join)
- SELECT * FROM testjoin
- INTERSECT
- SELECT * FROM testnulls
There seems to be a higher cost associated with the EXCEPT and INTERSECT queries vs the IN or EXISTS approach due to a Sort.
In the case of this test:
- DECLARE @TABLE1 TABLE (id INT IDENTITY(1,1), name VARCHAR(10))
- DECLARE @TABLE2 TABLE (id INT IDENTITY(1,1), name VARCHAR(10))
- INSERT @TABLE1 VALUES ('DAVE')
- INSERT @TABLE1 VALUES ('MARK')
- INSERT @TABLE2 VALUES ('DAVE')
- INSERT @TABLE2 VALUES (NULL)
- SELECT * FROM @TABLE1 t1 WHERE NOT EXISTS (SELECT NAME FROM @TABLE2 t2 WHERE t1.NAME = t2.NAME)
- SELECT * FROM @TABLE1
- EXCEPT
- SELECT * FROM @TABLE2
this plan is produced showing the higher cost:
Is this a fair assumption to make or are there other factors to take into consideration (such as number of records, indexes etc)?
This is also the same reason that UNION ALL is much faster than UNION
Take this for example
- CREATE TABLE #TABLE1 (id INT IDENTITY(1,1), name VARCHAR(10))
- CREATE TABLE #TABLE2 (id INT IDENTITY(1,1), name VARCHAR(10))
- INSERT #TABLE1 VALUES ('DAVE')
- INSERT #TABLE1 VALUES ('MARK')
- INSERT #TABLE2 VALUES ('DAVE')
- INSERT #TABLE2 VALUES (NULL)
Now run this and check the plan
- SELECT * FROM #TABLE1
- UNION
- SELECT * FROM #TABLE2
- SELECT * FROM #TABLE1
- UNION ALL
- SELECT * FROM #TABLE2
execution plan (in text)
Union
- |--Sort(DISTINCT ORDER BY:([Union1004] ASC, [Union1005] ASC))
- |--Concatenation
- |--Table Scan(OBJECT:([tempdb].[dbo].[#TABLE1000000005630]))
- |--Table Scan(OBJECT:([tempdb].[dbo].[#TABLE2000000005630]))
Union all
- |--Concatenation
- |--Table Scan(OBJECT:([tempdb].[dbo].[#TABLE1000000005630]))
- |--Table Scan(OBJECT:([tempdb].[dbo].[#TABLE2000000005630]))
Contributed by: --SQLDenis 02:45, 31 May 2008 (GMT)
Part of SQL Server Programming Hacks
Section Handy tricks


