Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

How Can I Select Multiple TOP N Data Sets With SQL Server (2000 and earlier)

From Wiki

Jump to: navigation, search

A common question I see in user groups is "How can I select multiple TOP N datasets in a single query?".

I am sure most SQL Server users are aware of the TOP predicate offered by Microsoft. This can be useful, but it is not all that useful if you need the top X records per (table item to group by). Many times, people think it would be nice if you could do something like this:

  1. select top 5 CustID, OrderID
  2.     from Orders
  3.     group by CustID
  4.     order by OrderDate desc


Unfortunately, the way queries are processed, this would not work because TOP and DISTINCT predicates are processed before the GROUP BY. But surely there must be another way to get this, right?

The first thing to come to mind is usually looping and inserting the TOP N dataset for each grouping item into a temp table. But digging a little deeper we'll find we can do this by joining the table to itself. Consider the following sample data:

  1. create table #SpeedingTickets (
  2.        TicketID int identity(1,1)
  3.        , DriverName varchar(15)
  4.        , TicketDate smalldatetime
  5.        , TicketAmount numeric(10,2)
  6.     )
  7.  
  8.     insert #SpeedingTickets
  9.     select 'Alex', '20070101', 197.36
  10.     union all select 'Alex', '20050327', 122.5
  11.     union all select 'Kellie', '20070903', 137.82
  12.     union all select 'Steve', '20060928', 153.2
  13.     union all select 'Kyle', '20070314', 119.8
  14.     union all select 'Alex', '20001225', 3097.96
  15.     union all select 'Steve', '20010813', 300
  16.     union all select 'Alex', '20060915', 97.89
  17.     union all select 'Dave', '20070930', 55.64
  18.     union all select 'Kyle', '20030322', 98.9
  19.     union all select 'Steve', '19990422', 156.3
  20.     union all select 'Kyle', '19980921', 29.75
  21.     union all select 'Steve', '19890324', 136.92

We will use this #SpeedingTickets table for the rest of the exercise. Imagine you work for SpeedingTicketTrackCo, and you are asked to produce a report detailing the 3 most recent tickets for each Driver in this table. We want to do this in a single query if at all possible, because this will be a popular report and we don't want to tie up the server too much with looping and building temp tables. Maybe we could use a union query, each containing the top 3 most recent tickets for 1 driver. This is not really practical either, because it requires we know every single DriverName at the time we are writing the query, so we are back at some kind of dynamic SQL solution that will require lots of looping. So what can we use to tell which records we want?

We know right away the relevant info (at least to the problem at hand) is DriverName and TicketDate. But how can we identify the top 3 from this? The answer is to join the table to itself, with some rather odd conditions. We want to join on DriverName and TicketDate. But TicketDate = TicketDate is not going to do anything for us, and we will end up with the whole table.

We know we want the most recent tickets. Another way to look at this is that we want the three tickets with the GREATEST dates. Now we are getting somewhere. To set ourselves up to at least know which tickets have the GREATEST dates, we can run this query:

  1. select a.TicketID
  2.        , a.DriverName
  3.        , a.TicketDate
  4.        , a.TicketAmount
  5.        , count(b.DriverName)
  6.     from #SpeedingTickets a
  7.     inner join #SpeedingTickets b
  8.     on a.DriverName = b.DriverName
  9.        and a.TicketDate <= b.TicketDate
  10.     group by a.TicketID
  11.        , a.DriverName
  12.        , a.TicketDate
  13.        , a.TicketAmount
  14.     order by count(b.DriverName) asc

You will notice that the most recent tickets will have the lowest count of DriverName on the right side of the join. This is because each DriverName on the left side is matched to only rows on the right side with a date equaling or exceeding that of the row in question. So ordering by count(b.DriverName) asc is equivalent to ordering by a.TicketName desc. Now that we know this, an easy way to filter for the top 3 per driver becomes apparent. Enter the HAVING clause:

  1. select a.TicketID
  2.        , a.DriverName
  3.        , a.TicketDate
  4.        , a.TicketAmount
  5.     from #SpeedingTickets a
  6.     inner join #SpeedingTickets b
  7.     on a.DriverName = b.DriverName
  8.        and a.TicketDate <= b.TicketDate
  9.     group by a.TicketID
  10.        , a.DriverName
  11.        , a.TicketDate
  12.        , a.TicketAmount
  13.     having count(b.DriverName) <= 3
  14.     order by a.DriverName, a.TicketDate desc

We have removed the count from our columns selected, because we don't really care about that. All that we care about is the 3 most recent dates. This is accomplished by adding the having clause. We can easily adjust this to be TOP 1, TOP 5, or top 50 simply by changing our having clause.

We can also use the same approach to get the first 3 tickets a driver received, by changing our join condition:

  1. select a.TicketID
  2.        , a.DriverName
  3.        , a.TicketDate
  4.        , a.TicketAmount
  5.     from #SpeedingTickets a
  6.     inner join #SpeedingTickets b
  7.     on a.DriverName = b.DriverName
  8.        and a.TicketDate >= b.TicketDate
  9.     group by a.TicketID
  10.        , a.DriverName
  11.        , a.TicketDate
  12.        , a.TicketAmount
  13.     having count(b.DriverName) <= 3
  14.     order by a.DriverName, a.TicketDate

And this will also work for other kinds of values, not just dates. Imagine you want the 3 most expensive tickets for each driver:

  1. select a.TicketID
  2.        , a.DriverName
  3.        , a.TicketDate
  4.        , a.TicketAmount
  5.     from #SpeedingTickets a
  6.     inner join #SpeedingTickets b
  7.     on a.DriverName = b.DriverName
  8.        and a.TicketAmount <= b.TicketAmount
  9.     group by a.TicketID
  10.        , a.DriverName
  11.        , a.TicketDate
  12.        , a.TicketAmount
  13.     having count(b.DriverName) <= 3
  14.     order by a.DriverName, a.TicketAmount desc

I hope that this leaves anyone who reads it with a good grasp of how to obtain multiple TOP N datasets. I have had to use this technique many times, and it has always worked well for me. With SQL 2005's new ROW_NUMBER feature we get a new way to do this, I will write a similar post about that shortly.

For a SQL 2005 (and presumably later) technique, see How Can I Select Multiple TOP N Data Sets With SQL Server 2005?


Contributed By: AlexCuse

86 Rating: 2.5/5 (52 votes cast)