Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.
Basic Query Design Patterns-Selecting Records From Related Tables
From Wiki
Sometimes you need to get records from more than one table. If the tables are related (they have one or more fields in common), then a SELECT statment with a JOIN is the design pattern you use.
In this article, I'm going to discuss the simplest of those JOIN queries, the one when you want to return data from all the tables participating in the JOIN. This is called an INNER JOIN.
You can join any number of tables together to get data, but the more tables you have the longer the query will probably take to run and the more complicated it will be to get the correct result set. So in this article I will limit myself to discussing no more than two JOINs.
The basic format of a query between two related tables is:
- SELECT Column List
- FROM TableName1 Alias1
- INNER JOIN TableName2 Alias2
- ON Alias1.SomeColumn = Alias2.SomeColumn
- WHERE Some Conditions
The basic format of a query between three related tables is:
- SELECT Column List
- FROM TableName1 Alias1
- INNER JOIN TableName2 Alias2
- ON Alias1.SomeColumn = Alias2.SomeColumn
- INNER JOIN TableName3 Alias3
- ON Alias2.SomeColumn = Alias3.SomeColumn
- WHERE Some Conditions
Why do I show aliases for the tables as part of the basic pattern? Because when you start adding mulitple tables to a query, you need to be able to easily identify which table a column came from. Since most table names (especially if you happen to need three or four part names) can get long and involved; aliases help make the query clearer. At the simplest level you don't need them, but it is good to start with good habits. They become necessary for some of the more advanced query techniques and it helps to be in the habit of using them.
What is an alias? Just a shortcut name for the table. Developers often have the same shortcut they use whenever querying the table, it is good to be in that habit. When I list the columns after the SELECT keyword, I provide the alias as well. Strictly speaking, you only need to do this when the tables have columns with the same name. However, I find it is alot easier when you need to maintain a complex query to know right up front where each column came from, so again it's a matter of starting with good habits, so they are second nature when you really need them.
Now in listing columns, again you don't want to use SELECT * to get all the columns. This is even more critical than when you are querying only one table. This is because in order to do an INNER JOIN, you must have at least one column in common between the two tables and it wastes server and network resources to return the same data twice.
Now to the JOIN itself. To do an INNER JOIN, you need to tell the database how the records should relate together. At the simplest and (by far the most common) level, you generally have two columns that are equal to each other. If the personID in table1 is 10, it will match to the records in table2 which have a person_id of 10. The ON clause shows what columns the tables are to join on.
Critical to understanding the INNER JOIN is to understand that if there is a one-to-many relationship between the tables and there is more than one record in in the second table related to the first table, then the data from the first table will be repeated for each record.
A sample query for two related tables is:
- SELECT Per.First_Name, Per.Last_Name, Addr.Street, Addr.City, Addr.State
- FROM Mydatabase.dbo.Person Per
- INNER JOIN MyDatabase.dbo.Addres Addr
- ON Per.Person_ID = Addr.Person_ID
- WHERE Addr.State in ('IL', 'TN', 'IN')
So suppose you had a Person table with the following people:
John Smith Sally Jones Willian Porter
In the Address table John Smith has two addresses, Sally Jones has one address and Willam Porter has no addresses. The results set returned from the INNER JOIN will show something like the following:
John Smith 10 King St. Chicago IL John Smith 125 State St. Chicago IL Sally Jones 312 Main St. Memphis TN
The inner join would exclude William Porter because he has no record in the address table. It would show two records for John and one for Sally because that is the number of records they have in the address table. Understanding what records should be returned gets complex as you add tables.
When you have more than two tables that you need to get data from in the query, you can continue to add them as JOINs to the query. For the third table, you will need to relate it to one or both of the tables already listed in the query. So if I wanted to add the records from the Phone table to the query, I would probably JOIN it to the Person table. But suppose I had a Directions table that held the directions to the Addresses, then I would JOIN to the Address table.
Using the Phone table, let's look at what happens to the query results:
Sally Jones has two phone numbers, John Smith has one phone number and William Porter has three phone numbers.
The query might look like:
- SELECT Per.First_Name, Per.Last_Name, Addr.Street, Addr.City, Addr.State, PH.Phone_Number
- FROM Mydatabase.dbo.Person Per
- INNER JOIN MyDatabase.dbo.Addres Addr
- ON Per.Person_ID = Addr.Person_ID
- INNER JOIN MyDatabase.dbo.Phone PH
- ON Per.Person_ID = PH.Person_ID
- WHERE Addr.State in ('IL', 'TN', 'IN')
The result set would be something like:
John Smith 10 King St. Chicago IL 613-217-9999 John Smith 125 State St. Chicago IL 613-217-9999 Sally Jones 312 Main St. Memphis TN 412-333-6989 Sally Jones 312 Main St. Memphis TN 412-528-7452
Now you see that even though Sally only has one address, she has two records now in the result set because she has two phone numbers. John has only one phone number but he has two addresses, so just like the name is repeated in each record, so is the phone number. William Porter still doesn't show even though he has no phone because he isn't in the address table.
The example WHERE clause came from the first table in the join, but it could have come from any of them or a combination of them. You also don't always need a WHERE clause.
If you need the records returned in a specific order, you should also add an ORDER BY clause. This tells the query what order to display the records. By default, the ORDER BY sorts in ascending order (alphabetical order for string data and ordinary numeric order for numeric data). If a column appears to contain numbers but is actually string data like phone numbers, it will sort in aphabetical order. This means the order could be something like:
1 10 2 3 31 4
You can also sort on more than one column.
An example sorted INNER JOIN query would be:
- SELECT Per.First_Name, Per.Last_Name, Addr.Street, Addr.City, Addr.State, PH.Phone_Number
- FROM Mydatabase.dbo.Person Per
- INNER JOIN MyDatabase.dbo.Addres Addr
- ON Per.Person_ID = Addr.Person_ID
- INNER JOIN MyDatabase.dbo.Phone PH
- ON Per.Person_ID = PH.Person_ID
- WHERE Addr.State in ('IL', 'TN', 'IN')
- ORDER BY Last_name, First_name
Remember, you cannot rely on the query to return records in a particular order unless you have an ORDER BY clause.
Something that can cause problems in joining tables is if the join fields are not the same data type. It is critical to make sure the columns that will be joined together have the same data type.
Some people also like to use implict joins instead of the explicit ones shown here. Do not fall into this bad habit. Implicit joins are subject to accidental cross joins (matching every record in table1 to every record in table2) and returning the wrong data if you need to use left joins (the implicit LEFT JOIN syntax is broken as far back as SQL Server 2000 and is deprecated). They are also harder to maintain. You can recognize an implicit join because more than one table is referenced, but the keyword JOIN is not in the query. Instead they put the relationship in the WHERE clause. The implicit join was replaced by the explicit joins shown here in 1992. There is no need to ever write a query using this syntax and any that you see when doing maintenance should be replaced.



LTD Social Sitings
Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.