crop multiracial people joining hands together during break in modern workplace

Understanding INNER, OUTER, LEFT, and RIGHT Joins in SQL

One of the first and most common things newcomers to SQL struggle with is how each JOIN is different from the other. In this article, I’ll explain the differences between the inner vs outer JOIN and left vs right JOIN in SQL using examples from each.

Introduction to Joining Tables

In SQL, we use JOIN in our queries to create result sets made up of multiple tables. For example, it’s common to have countries and states tables when capturing address information, so an address record may look something like this:

street_addresscitystate_or_provincecountry_id
123 Main StHoustonTexas1
558 Maple AveTorontoOntario2
addresses table

Here, the numbers in the country_id column of the addresses table refer to the IDs of their respective country in the countries table, which may look like this:

idcountry_name
1United States
2Canada
countries table

Now, say we want to write a query that displays a full address, something like 123 Main St, Houston, TX United States. Since the street, city, and state names are recorded in the addresses table, and the country name is recorded in the countries table, we have to join the two tables to get the result set that we want. The relevant SQL would probably look something like this:

SELECT
    a.street_address,
    a.city,
    a.state_or_province,
    c.country_name
FROM
    addresses a
JOIN
    countries c
ON
    a.country_id = c.id

The SQL code above shows us something like the following results:

street_addresscitystate_or_provincecountry_name
123 Main StHoustonTexasUnited States
558 Maple AveTorontoOntarioCanada
Result of query

Notice here that we have values from the street_address, city, and state_or_province columns of the addresses table, and values from the country_name column of the countries table. We were able to join these two tables together by matching the country_id value in addresses with the id column of countries.

There are a few kinds of joins in SQL, each one behaving slightly differently. This article aims to help you understand the difference between the different kinds of joins in SQL.

Download and Build the Sample Database

If you’d like to follow along with this article, go ahead and clone the code to build the example database . To build the database in pgAdmin, first run the command in create_database.sql then make sure to change connections to the newly created employee_database database and run the commands in build_and_seed_tables.sql.

You should now have a database with four tables: addresses, countries, departments, and employees. Let’s quickly discuss the data model before moving on to the actual point of this article.

The addresses and countries tables are just like the ones in the examples above; each address record references a country record. Then, we have an employees table, each record of which references an address record as well as a department record. The department records are pretty simple; the departments table simply has an id and name column and refers to what department an employee works.

Ok, now on to actual joins!

Right vs. Left Joins

Let’s start by talking about the difference left and right joins. It’s helpful to think about all the tables in a SQL query as a horizontal list of tables. The leftmost table is the table in the FROM clause, the next table to the right of the leftmost table is whatever table we are joining. So for example, if we write the query

SELECT * FROM employees e JOIN addresses a ON e.address_id = a.id

Then the employees table is the leftmost table while the addresses table is the next one to the right, kind of like this:

two rounded squares in a horizontal row, the left square is labeled "employees" and the right square is labeled "addresses"
Order of tables

If we take it a step further and join the countries table to the addresses, the next table to the right would be countries. In other words, the following SQL query:

SELECT *
FROM employees e
JOIN addresses a ON e.address_id = a.id
JOIN countries c ON a.country_id = c.id

can be visualized as the following:

three rounded squares in a horizontal row, the left square is labeled "employees", the middle square is labeled "addresses", and the right square is labeled "countries"

Notice again that whenever a table is joined to another, the already-existing table (for lack of a better word) is the left table and the joining table is the right table. So in our example above, our base table is employees so it’s on the left; then we joined addresses to it, so it’s to the right of employees. Finally, we then joined countries to addresses, putting addresses to the right of countries and thus the rightmost table.

This visualization is helpful in understanding how left and right joins work. For example, query all of the employees records and LEFT JOIN the addresses table and see what happens. The query:

SELECT e.first_name, a.street_address, a.city
FROM employees e
LEFT JOIN addresses a ON e.address_id = a.id

The results:

"Bob" "123 Main St" "Houston"
"Jordan" "220 C 30" "Izamal"
"Nicole" "3030 Burgos" "Xalapa"
"Erik" "1010 2nd St" "Omaha"
"Lina" "1010 2nd St" "Omaha"
"Sandra", NULL, NULL

Notice here that all records for employees, the leftmost table, appear as results in this query, including a record with no associated addresses record. The records in the rightmost table, addresses, only appear if they can be joined to the leftmost table. There are records in addresses that cannot be joined to employees because no employees record has a corresponding address_id.

Conversely, if we use RIGHT JOIN to join the addresses table, we will see results that include all of the addresses table–the rightmost table–despite not being able to be joined to the employees table. For example, the query:

SELECT e.first_name, a.street_address, a.city
FROM employees e
RIGHT JOIN addresses a ON e.address_id = a.id

The results:

"Bob" "123 Main St" "Houston"
"Jordan" "220 C 30" "Izamal"
"Nicole" "3030 Burgos" "Xalapa"
"Erik" "1010 2nd St" "Omaha"
"Lina" "1010 2nd St" "Omaha"
null "558 Maple Ave" "Toronto"
null "99 Jackson Rd" "Flin Flon"
null "821 Carol" "Nuuk"

Notice this time that we have three addresses records in the result set that have null values in their respective addresses column. Also notice that the employee record with no associated address record is not in this result set. This is because by doing a RIGHT JOIN, we essentially tell the database to give us all results of the rightmost table, and join only the leftmost table’s records if they are associated with the rightmost.

Conversely, we can switch which table is right and left by referring to the addresses table in the FROM clause and then joining the employees table. For example, let’s SELECT from addresses and LEFT JOIN the employees table. The query:

SELECT e.first_name, a.street_address, a.city
FROM addresses a
LEFT JOIN employees e ON e.address_id = a.id

The result:

"Bob" "123 Main St" "Houston"
"Jordan" "220 C 30" "Izamal"
"Nicole" "3030 Burgos" "Xalapa"
"Erik" "1010 2nd St" "Omaha"
"Lina" "1010 2nd St" "Omaha"
null "558 Maple Ave" "Toronto"
null "99 Jackson Rd" "Flin Flon"
null "821 Carol" "Nuuk"

The results when we LEFT JOIN the employees table to the addresses table are the same as when we RIGHT JOIN the addresses table to the employees table. That’s because in this case, the leftmost table is addresses so by left joining the employees table to it, we are telling the database to give us all the addresses records and then join the employees table records if they exist.

Now, let’s take it a step further and see what happens when we chain JOIN commands together. First, let’s SELECT FROM the addresses table and RIGHT JOIN the countries table so we can see how many of the countries records are not associated to addresses. The query:

SELECT
    a.street_address,
    a.city,
    a.state_or_province,
    c.country_name
FROM addresses a
RIGHT JOIN countries c
ON a.country_id = c.id

The results:

"1010 2nd St"	"Omaha"	"NE"	"United States"
"123 Main St"	"Houston"	"TX"	"United States"
"3030 Burgos"	"Xalapa"	"Ver"	"Mexico"
"220 C 30"	"Izamal"	"Yuc"	"Mexico"
"99 Jackson Rd"	"Flin Flon"	"MB"	"Canada"
"558 Maple Ave"	"Toronto"	"ON"	"Canada"
"821 Carol"	"Nuuk"	"SQ"	"Greenland"
NULL	NULL	NULL	"Iceland"

As we can see from the result set of our previous query, the only country in our database that isn’t related to an address record is Iceland.

Knowing this, let’s see what happens when we SELECT FROM the employees table, LEFT JOIN the addresses table to it, and then RIGHT JOIN the countries table to that. Do you think we’ll see all the countries? Let’s find out! The query:

SELECT
    e.first_name,
    e.last_name,
    a.street_address,
    a.city,
    a.state_or_province,
    c.country_name
FROM employees e
LEFT JOIN addresses a
ON e.address_id = a.id
RIGHT JOIN countries c
ON a.country_id = c.id

The results:

"Lina"	"Mazin"	"1010 2nd St"	"Omaha"	"NE"	"United States"
"Erik"	"Whiting"	"1010 2nd St"	"Omaha"	"NE"	"United States"
"Bob"	"Robertson"	"123 Main St"	"Houston"	"TX"	"United States"
"Nicole"	"Nicholson"	"3030 Burgos"	"Xalapa"	"Ver"	"Mexico"
"Jordan"	"Mays"	"220 C 30"	"Izamal"	"Yuc"	"Mexico"
NULL	NULL	NULL	NULL	NULL	"Canada"
NULL	NULL	NULL	NULL	NULL	"Greenland"
NULL	NULL	NULL	NULL	NULL	"Iceland"

These are interesting results that show us something about the way RIGHT and LEFT joins behave. Notice that we only get 5 rows of results that include records from the addresses table even though there are 7 records in that table. This is because there are only 5 addresses that are associated to employees records, and when we LEFT JOINed the addresses table, we told the database that we only want records from the addresses table if they’re associated with an employee record. However, when we RIGHT JOIN the countries table, we tell the database we want all records from the countries table, even if there’s no associated addresses record in the result set.

Now that we’ve talked all about the difference between RIGHT and LEFT, let’s talk about INNER and OUTER joins.

Inner vs. Outer Joins

The difference between INNER and OUTER joins are very similar to the differences between LEFT and RIGHT joins. However, in this case, it’s more helpful to think of tables as Venn diagrams. You know, the ones where two circles meet in the middle and the overlapping section is some shared attribute between the circles?

We still want to think of tables being joined from left to right, but with INNER joins, we have to think about how records might overlap.

NOTE: The default JOIN operation in Postgres (and every other RDBMS I've ever worked with) is the INNER JOIN. That means if you write something like SELECT * FROM A JOIN B ON A.b_id = B.id, it is implied that you mean to do an inner join and the SQL SELECT * FROM A INNER JOIN B ON A.b_id = B.id is exactly the same thing.

Let’s see how INNER JOIN works when we join addresses to employees. The query:

SELECT e.first_name, a.street_address
FROM employees e
INNER JOIN addresses a
ON e.address_id = a.id

The results:

"Bob"	"123 Main St"
"Jordan"	"220 C 30"
"Nicole"	"3030 Burgos"
"Erik"	"1010 2nd St"
"Lina"	"1010 2nd St"

Notice this time that we have five results, despite there being six employees records in the database. This is unlike the LEFT JOIN from the previous section where the employee record with no associated address record was still in the result set. In this way, INNER JOIN means we only want results from the leftmost table if the rightmost table can be joined to them. That’s why you often see Venn diagrams used to explain joins. In this case, INNER JOIN is this Venn diagram:

a Venn diagram with "employees" on the left and "addresses" on the right, and an arrow pointing to where they overlap saying "inner join results".

So, the main difference between LEFT and INNER joining that we’ve seen so far is that if you want all records of the leftmost table–whether they have associated records in the rightmost table or not–you want to use a LEFT join. If you only want records from the leftmost table if a record from the right most table can be joined to it, you want to use INNER JOIN.

So what about OUTER? The OUTER JOIN is a special case because even though it seems antimonious with INNER JOIN, OUTER joins need to to be specified with either RIGHT, LEFT, or FULL. For example, the following SQL will result in a syntax error:

-- Doesn't work!
SELECT e.first_name
FROM employees e
OUTER JOIN addresses a
ON e.address_id = a.id

We have to specify if we want the leftmost or right most table to be outer-joined. Let’s see how LEFT OUTER JOIN behaves. The SQL:

SELECT
    e.first_name,
    a.street_address
FROM employees e
LEFT OUTER JOIN addresses a
ON e.address_id = a.id

The results:

"Bob"	"123 Main St"
"Jordan"	"220 C 30"
"Nicole"	"3030 Burgos"
"Erik"	"1010 2nd St"
"Lina"	"1010 2nd St"
"Sandra"	NULL

Here we specified that we want the leftmost table to be outer-joined and therefore got all results from employees regardless of whether a record from the addresses table can be joined to it or not.

The RIGHT OUTER JOIN on the other hand will include all records from the rightmost table regardless of whether a record from the leftmost table can be joined to it. The SQL:

SELECT
    e.first_name,
    a.street_address
FROM employees e
RIGHT OUTER JOIN addresses a
ON e.address_id = a.id

The results:

"Bob"	"123 Main St"
"Jordan"	"220 C 30"
"Nicole"	"3030 Burgos"
"Erik"	"1010 2nd St"
"Lina"	"1010 2nd St"
NULL	"558 Maple Ave"
NULL	"99 Jackson Rd"
NULL	"821 Carol"

See here that all the addresses records were included in the result set, even if there was no employees record that could be joined to it. Notice also that the value from employees that has no associated addresses record is not included in the result set.

Did you notice that the results for LEFT OUTER JOIN and RIGHT OUTER JOIN are the exact same results we get from LEFT JOIN and RIGHT JOIN respectively? If so, good on you! LEFT JOIN and LEFT OUTER JOIN are exactly the same thing; same with RIGHT JOIN and RIGHT OUTER JOIN. Whenever you use RIGHT or LEFT in your JOIN, the OUTER is implied and you actually don’t have to write it (though some people do because they say it adds clarity).

There’s one more kind of OUTER join: FULL. The FULL OUTER JOIN will include all results of both the left and rightmost tables, regardless of whether they can be joined to each other. Check it out, the SQL:

SELECT
    e.first_name,
    a.street_address
FROM employees e
FULL OUTER JOIN addresses a
ON e.address_id = a.id

The results:

"Bob"	"123 Main St"
"Jordan"	"220 C 30"
"Nicole"	"3030 Burgos"
"Erik"	"1010 2nd St"
"Lina"	"1010 2nd St"
"Sandra"	NULL
NULL	"558 Maple Ave"
NULL	"99 Jackson Rd"
NULL	"821 Carol"

See how this time we have all records from each table. The leftmost table, employees, includes even the record with no associated addresses record. Likewise, the result set includes three addresses records with no associated employees record. That’s because the FULL join is both a LEFT and RIGHT join put together.

Conclusion

Nearly everyone struggles with the different kinds of joins when they’re first learning SQL, so let’s review what we learned:

  • LEFT, RIGHT, and FULL joins are the same as LEFT OUTER, RIGHT OUTER, and FULL OUTER joins
    • LEFT will include all records from the leftmost table, even if there are no records in the rightmost table that can join to them
    • RIGHT will include all records from the rightmost table, even if there are no records in the leftmost table that can join to them
    • FULL includes records from both the rightmost and leftmost tables, even if the records from one table have no joining records in the other
  • INNER JOIN is the default join
    • INNER will only include records where both the leftmost and rightmost tables have associated records
    • Use this when you want only full results; records from one table with no associated records in the other table will not show up in the result set

Now that you know how to use joins, try out some of them in the example database using the departments column. As always, feel free to tweet me at @erikwhiting4 or send me an email at erik@erikwhiting.com if you have any questions. Good luck!


Posted

in

by