Skip to main content

SQL Joins

What You'll Learn

By the end of this page, you will be able to:

  • Explain why joins are necessary in relational databases
  • Write INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN queries
  • Understand the difference between each join type using Venn diagrams and result tables
  • Use CROSS JOIN and SELF JOIN for specialized scenarios
  • Avoid common performance pitfalls when writing join queries
  • Choose the right join type for a given business question

The Problem: Data Lives in Multiple Tables

Relational databases split data across tables to avoid duplication and maintain integrity (normalization). But to answer real questions, you often need to combine data from multiple tables.

employees table:          departments table:
+----+--------+--------+ +--------+------------------+
| id | name | dept_id| | dept_id| dept_name |
+----+--------+--------+ +--------+------------------+
| 1 | Alice | 10 | | 10 | Engineering |
| 2 | Bob | 20 | | 20 | Marketing |
| 3 | Carol | 30 | | 30 | Sales |
| 4 | Dave | NULL | | 40 | HR |
+----+--------+--------+ +--------+------------------+

Questions you can't answer from a single table:

  • "Which department does Alice work in?" — needs both tables
  • "Which departments have no employees?" — needs both tables
  • "Who has no department?" — needs both tables

JOINs are the SQL mechanism that combines rows from two tables based on a related column between them.

Sample Tables

All examples use these two tables:

CREATE TABLE departments (
dept_id INT PRIMARY KEY,
dept_name VARCHAR(50)
);

INSERT INTO departments VALUES
(10, 'Engineering'),
(20, 'Marketing'),
(30, 'Sales'),
(40, 'HR');

CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(50),
dept_id INT REFERENCES departments(dept_id)
);

INSERT INTO employees VALUES
(1, 'Alice', 10),
(2, 'Bob', 20),
(3, 'Carol', 30),
(4, 'Dave', NULL),
(5, 'Eve', 10);

Key observations:

  • Department 40 (HR) has no employees
  • Employee 4 (Dave) has no department (dept_id is NULL)
  • This asymmetry makes the differences between join types visible

Join Types Overview

INNER JOIN

Returns only the rows where there is a match in both tables. Non-matching rows from either table are excluded.

SELECT e.name, d.dept_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;

Result:

namedept_name
AliceEngineering
BobMarketing
CarolSales
EveEngineering
  • 4 rows returned — only employees with a valid department
  • Dave is excluded (no department)
  • HR is excluded (no employees)
tip

INNER JOIN is the default — writing just JOIN is equivalent to INNER JOIN.

LEFT JOIN (Left Outer Join)

Returns all rows from the left table and the matching rows from the right table. If there's no match, the right side columns are filled with NULL.

SELECT e.name, d.dept_name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id;

Result:

namedept_name
AliceEngineering
BobMarketing
CarolSales
DaveNULL
EveEngineering
  • 5 rows returned — all employees are included
  • Dave's dept_name is NULL because he has no department
  • HR is still excluded (no employees)

Finding rows with no match

A common pattern is using LEFT JOIN ... WHERE right.id IS NULL to find rows that exist in the left table but have no match in the right table:

-- Find employees with no department
SELECT e.name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id
WHERE d.dept_id IS NULL;

Result:

name
Dave

RIGHT JOIN (Right Outer Join)

Returns all rows from the right table and the matching rows from the left table. If there's no match, the left side columns are filled with NULL.

SELECT e.name, d.dept_name
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.dept_id;

Result:

namedept_name
AliceEngineering
BobMarketing
CarolSales
NULLHR
EveEngineering
  • 5 rows returned — all departments are included
  • HR has NULL for name because it has no employees
  • Dave is now excluded (no department)
tip

RIGHT JOIN is functionally equivalent to LEFT JOIN with the table order swapped. In practice, most developers prefer LEFT JOIN for readability — it keeps the "primary" table first.

FULL OUTER JOIN

Returns all rows from both tables. Matching rows are combined; non-matching rows from either side get NULL for the other table's columns.

SELECT e.name, d.dept_name
FROM employees e
FULL OUTER JOIN departments d ON e.dept_id = d.dept_id;

Result:

namedept_name
AliceEngineering
BobMarketing
CarolSales
DaveNULL
EveEngineering
NULLHR
  • 6 rows returned — every employee and every department
  • Dave has NULL for department (no department)
  • HR has NULL for name (no employees)

Finding unmatched rows from both sides

-- Find employees with no department AND departments with no employees
SELECT e.name, d.dept_name
FROM employees e
FULL OUTER JOIN departments d ON e.dept_id = d.dept_id
WHERE e.dept_id IS NULL OR d.dept_id IS NULL;

Result:

namedept_name
DaveNULL
NULLHR

CROSS JOIN

Returns the Cartesian product — every row from the left table combined with every row from the right table. No ON condition is needed.

SELECT e.name, d.dept_name
FROM employees e
CROSS JOIN departments d;

Result: 5 employees x 4 departments = 20 rows

namedept_name
AliceEngineering
AliceMarketing
AliceSales
AliceHR
BobEngineering
......
warning

CROSS JOIN can produce very large result sets. A cross join of two tables with 10,000 rows each produces 100,000,000 rows. Use it intentionally, not accidentally.

When to use CROSS JOIN

  • Generating all combinations (e.g., all sizes x all colors)
  • Creating a matrix or grid of values
  • Building reporting periods by crossing date ranges with entities

SELF JOIN

A table joined to itself. Useful when rows in the same table have hierarchical or peer relationships.

Example: Employee and Manager

ALTER TABLE employees ADD COLUMN manager_id INT REFERENCES employees(id);

-- Alice and Bob report to Eve (manager_id = 5)
-- Carol reports to Alice (manager_id = 1)
-- Dave and Eve have no manager
UPDATE employees SET manager_id = 5 WHERE id IN (1, 2);
UPDATE employees SET manager_id = 1 WHERE id = 3;
SELECT
e.name AS employee,
m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

Result:

employeemanager
AliceEve
BobEve
CarolAlice
DaveNULL
EveNULL

Common self-join scenarios

ScenarioTable references itself to find...
Org chartEmployee → Manager relationship
Bill of materialsAssembly → Component relationship
Category treesCategory → Parent category
Friend/ follower graphsUser → Related user

Join Type Comparison

INNERLEFTRIGHTFULL OUTERCROSS
ReturnsMatching rows onlyAll left + matching rightAll right + matching leftAll rows from bothEvery combination
Unmatched left rowsExcludedIncluded (NULL right)ExcludedIncluded (NULL right)N/A
Unmatched right rowsExcludedExcludedIncluded (NULL left)Included (NULL left)N/A
Result sizeSmallestDependsDependsLargest (no CROSS)Left x Right
Common useDefault joinMost business queriesRare (swap to LEFT)Data auditingCombinations

Multiple Joins

Real queries often join more than two tables. Joins chain left-to-right:

SELECT
e.name AS employee,
d.dept_name,
p.project_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
LEFT JOIN employee_projects ep ON e.id = ep.employee_id
LEFT JOIN projects p ON ep.project_id = p.id;
tip

Mix join types deliberately. Use INNER JOIN for required relationships and LEFT JOIN for optional ones.

Join Performance

How the database executes joins

Databases use different join algorithms depending on table size, indexes, and query complexity:

AlgorithmBest ForTime ComplexityMemory
Nested LoopIndexed joins, small outer tableO(M x log N) with indexLow
Hash JoinLarge unsorted tables, equijoinsO(M + N)O(smaller table)
Merge JoinPre-sorted data, large tablesO(M + N)Low

Performance tips

  1. Index the join columns — the most impactful optimization. Foreign key columns should be indexed.
-- This makes joins on dept_id fast
CREATE INDEX idx_employees_dept_id ON employees(dept_id);
  1. Filter early — apply WHERE conditions before joining to reduce row counts:
-- Good: filters before joining
SELECT e.name, d.dept_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
WHERE d.dept_name = 'Engineering';

-- Bad: joins everything, then filters
-- (optimizer may fix this, but be explicit)
  1. Select only needed columns — avoid SELECT * in join queries:
-- Good
SELECT e.name, d.dept_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;

-- Avoid
SELECT *
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;
  1. Be cautious with multiple joins — each join can multiply the intermediate result set. Check row counts with EXPLAIN.

  2. Use explicit join conditions — avoid comma-style joins (cross joins with a WHERE filter) as they are harder to read and more error-prone.

-- Good: explicit JOIN syntax
SELECT e.name, d.dept_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;

-- Avoid: implicit join (old syntax)
SELECT e.name, d.dept_name
FROM employees e, departments d
WHERE e.dept_id = d.dept_id;

Common Pitfalls

  • Accidental cross join — forgetting the ON clause produces a Cartesian product. Always include a join condition.
  • NULL in join columnsNULL = NULL evaluates to UNKNOWN (not TRUE), so rows with NULL join keys are excluded from INNER JOINs silently.
  • Duplicate rows from multiple matches — if the join column has duplicate values, you get more rows than expected. Check for duplicates with COUNT(*).
  • Wrong join type — using INNER JOIN when you need LEFT JOIN silently drops rows with no match. Think about whether unmatched rows matter.
  • Joining on the wrong column — always verify the join condition references the correct foreign key relationship.
  • Filtering in WHERE vs ON — for LEFT JOIN, putting right-table conditions in WHERE turns it into an effective INNER JOIN:
-- This EXCLUDES departments with no employees (WHERE filters after join)
SELECT d.dept_name, e.name
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id
WHERE e.name LIKE 'A%';

-- This KEEPS all departments (ON filters before join)
SELECT d.dept_name, e.name
FROM departments d
LEFT JOIN employees e ON d.dept_id = d.dept_id AND e.name LIKE 'A%';

Best Practices

  1. Use explicit JOIN syntax — not comma-separated tables with WHERE filters
  2. Always qualify column names — use table aliases (e.name, d.dept_name) to avoid ambiguity
  3. Index join columns — foreign keys and join predicates should be indexed
  4. Choose the correct join type — INNER for required matches, LEFT for optional data
  5. Prefer LEFT over RIGHT — swap table order and use LEFT for readability
  6. Use table aliases — keep queries concise, but use meaningful aliases (e for employees, d for departments)
  7. Check with EXPLAIN — verify the query plan uses the right join algorithm and indexes

Interview Questions

1. What is the difference between INNER JOIN and LEFT JOIN?

INNER JOINLEFT JOIN
ReturnsOnly matching rows from both tablesAll rows from left table + matching right rows
Unmatched left rowsExcludedIncluded (right columns are NULL)
Use case"Show me employees and their departments""Show me all employees, even those without departments"
Result sizeLess than or equal to left table rowsEqual to left table rows

2. What is the difference between LEFT JOIN and RIGHT JOIN?

They are mirror images of each other:

  • A LEFT JOIN B = all rows from A + matching rows from B
  • A RIGHT JOIN B = all rows from B + matching rows from A

Any RIGHT JOIN can be rewritten as a LEFT JOIN by swapping the table order:

-- These are equivalent:
SELECT * FROM A RIGHT JOIN B ON A.id = B.a_id;
SELECT * FROM B LEFT JOIN A ON B.a_id = A.id;

In practice, LEFT JOIN is preferred for readability — keep the "driving" table first.

3. What is a FULL OUTER JOIN and when would you use it?

FULL OUTER JOIN returns all rows from both tables, with NULLs where there's no match. Use it when you need:

  • Data reconciliation — comparing two datasets to find mismatches
  • Finding orphaned records — rows in either table with no match in the other
  • Merging datasets — combining two sources while preserving all records
-- Find all unmatched rows in either table
SELECT e.name, d.dept_name
FROM employees e
FULL OUTER JOIN departments d ON e.dept_id = d.dept_id
WHERE e.id IS NULL OR d.dept_id IS NULL;

4. What is the difference between ON and WHERE in a LEFT JOIN?

This is a common trap:

-- WHERE clause: filters AFTER the join → turns LEFT into effective INNER
SELECT d.dept_name, e.name
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id
WHERE e.name IS NOT NULL;

-- ON clause: filters DURING the join → preserves LEFT behavior
SELECT d.dept_name, e.name
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id AND e.name IS NOT NULL;
  • ON — determines which rows to join. Unmatched left rows still appear with NULLs.
  • WHERE — filters the final result. WHERE e.name IS NOT NULL removes rows where e.name is NULL, effectively turning the LEFT JOIN into an INNER JOIN.

5. How do you find rows in one table that have no match in another?

Use LEFT JOIN ... WHERE right.key IS NULL:

-- Employees with no department
SELECT e.name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id
WHERE d.dept_id IS NULL;

-- Departments with no employees
SELECT d.dept_name
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id
WHERE e.id IS NULL;

Alternatively, use NOT EXISTS (often preferred for readability):

SELECT e.name
FROM employees e
WHERE NOT EXISTS (
SELECT 1 FROM departments d WHERE d.dept_id = e.dept_id
);

6. What is a self join and when would you use it?

A self join is a regular join where both sides reference the same table. It requires table aliases to distinguish the two "copies":

-- Find employees and their managers
SELECT
e.name AS employee,
m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

Common use cases: organizational hierarchies, category trees, adjacency lists, and comparing rows within the same table.

Learn More