How Can I Select Multiple TOP N Data Sets With SQL Server 2005
From Wiki
A common question I see in user groups is "How can I select multiple TOP N datasets in a single query?".
I am sure most SQL Server users are aware of the TOP predicate offered by Microsoft. This can be useful, but it is not all that useful if you need the top X records per (column value to group by).
for more background on this problem, and a SQL 2000 solution, please see this link: How Can I Select Multiple TOP N Data Sets With SQL Server (2000 and earlier)
Anyway here is the sample data:
- CREATE TABLE #SpeedingTickets (
- TicketID INT IDENTITY(1,1)
- , DriverName VARCHAR(15)
- , TicketDate SMALLDATETIME
- , TicketAmount NUMERIC(10,2)
- )
- INSERT #SpeedingTickets
- SELECT 'Alex', '20070101', 197.36
- UNION all SELECT 'Alex', '20050327', 122.5
- UNION all SELECT 'Kellie', '20070903', 137.82
- UNION all SELECT 'Steve', '20060928', 153.2
- UNION all SELECT 'Kyle', '20070314', 119.8
- UNION all SELECT 'Alex', '20001225', 3097.96
- UNION all SELECT 'Steve', '20010813', 300
- UNION all SELECT 'Alex', '20060915', 97.89
- UNION all SELECT 'Dave', '20070930', 55.64
- UNION all SELECT 'Kyle', '20030322', 98.9
- UNION all SELECT 'Steve', '19990422', 156.3
- UNION all SELECT 'Kyle', '19980921', 29.75
- UNION all SELECT 'Steve', '19890324', 136.92
As before, imagine you are working for SpeedingTicketTrackCo, and your boss needs a report containing the 3 most recent tickets for each DriverName.
However, SpeedingTicketTrackCo just upgraded to SQL 2005, and your life is now much easier thanks to the ROW_NUMBER() function. This function lets you add two clauses, ORDER BY and PARTITION BY. ORDER BY is required, as every good SQL developer knows it should be (because without an order by you cannot guarantee consistent ranking of your rows). PARTITION BY is optional, but it is this one that lets us accomplish our goal. PARTITION BY is similar to a group by, but it is used to tell ROW_NUMBER when to start counting back at one. So our query with ROW_NUMBER() to establish rank will look like this:
- SELECT TicketID
- , DriverName
- , TicketDate
- , TicketAmount
- , ROW_NUMBER() OVER (partition BY DriverName ORDER BY TicketDate ASC) AS ROWNUM
- FROM #SpeedingTickets
We want to eliminate ROWNUM from our select, but we can't do this because, as SQL will so politely tell you:
Msg 4108, Level 15, State 1, Line 24 Windowed functions can only appear in the SELECT or ORDER BY clauses.
This is no problem though, we can simply use the query above as the table we are selecting from, and limit what is returned to rows where ROWNUM <= 3, like this:
- SELECT a.TicketID
- , a.DriverName
- , a.TicketDate
- , a.TicketAmount
- FROM (
- SELECT TicketID
- , DriverName
- , TicketDate
- , TicketAmount
- , ROW_NUMBER() OVER (partition BY DriverName ORDER BY TicketDate ASC) AS ROWNUM
- FROM #SpeedingTickets
- ) a
- WHERE a.ROWNUM <= 3
- ORDER BY a.DriverName, a.TicketDate
This gives a pretty big advantage, in that you don't have to use the self join. These joins can get pretty expensive when you are working with large tables. I will come back and add execution plan screenshots for each at a later date, but even with this tiny dataset I see about a 25% difference in Query Cost using the ROW_NUMBER() method.
I hope that this helps to learn about the ROW_NUMBER() function. If you are anything like me, you will start thinking of lots and lots of uses for this in your day to day coding (even if you don't work for SpeedingTicketTrackCo )
Contributed by: AlexCuse


