SQL Joins
What You'll Learn
By the end of this page, you will be able to:
- Explain why joins are necessary in relational databases
- Write INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN queries
- Understand the difference between each join type using Venn diagrams and result tables
- Use CROSS JOIN and SELF JOIN for specialized scenarios
- Avoid common performance pitfalls when writing join queries
- Choose the right join type for a given business question
The Problem: Data Lives in Multiple Tables
Relational databases split data across tables to avoid duplication and maintain integrity (normalization). But to answer real questions, you often need to combine data from multiple tables.
employees table: departments table:
+----+--------+--------+ +--------+------------------+
| id | name | dept_id| | dept_id| dept_name |
+----+--------+--------+ +--------+------------------+
| 1 | Alice | 10 | | 10 | Engineering |
| 2 | Bob | 20 | | 20 | Marketing |
| 3 | Carol | 30 | | 30 | Sales |
| 4 | Dave | NULL | | 40 | HR |
+----+--------+--------+ +--------+------------------+
Questions you can't answer from a single table:
- "Which department does Alice work in?" — needs both tables
- "Which departments have no employees?" — needs both tables
- "Who has no department?" — needs both tables
JOINs are the SQL mechanism that combines rows from two tables based on a related column between them.
Sample Tables
All examples use these two tables:
CREATE TABLE departments (
dept_id INT PRIMARY KEY,
dept_name VARCHAR(50)
);
INSERT INTO departments VALUES
(10, 'Engineering'),
(20, 'Marketing'),
(30, 'Sales'),
(40, 'HR');
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(50),
dept_id INT REFERENCES departments(dept_id)
);
INSERT INTO employees VALUES
(1, 'Alice', 10),
(2, 'Bob', 20),
(3, 'Carol', 30),
(4, 'Dave', NULL),
(5, 'Eve', 10);
Key observations:
- Department 40 (HR) has no employees
- Employee 4 (Dave) has no department (
dept_idis NULL) - This asymmetry makes the differences between join types visible
Join Types Overview
INNER JOIN
Returns only the rows where there is a match in both tables. Non-matching rows from either table are excluded.
SELECT e.name, d.dept_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;
Result:
| name | dept_name |
|---|---|
| Alice | Engineering |
| Bob | Marketing |
| Carol | Sales |
| Eve | Engineering |
- 4 rows returned — only employees with a valid department
- Dave is excluded (no department)
- HR is excluded (no employees)
INNER JOIN is the default — writing just JOIN is equivalent to INNER JOIN.
LEFT JOIN (Left Outer Join)
Returns all rows from the left table and the matching rows from the right table. If there's no match, the right side columns are filled with NULL.
SELECT e.name, d.dept_name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id;
Result:
| name | dept_name |
|---|---|
| Alice | Engineering |
| Bob | Marketing |
| Carol | Sales |
| Dave | NULL |
| Eve | Engineering |
- 5 rows returned — all employees are included
- Dave's
dept_nameisNULLbecause he has no department - HR is still excluded (no employees)
Finding rows with no match
A common pattern is using LEFT JOIN ... WHERE right.id IS NULL to find rows that exist in the left table but have no match in the right table:
-- Find employees with no department
SELECT e.name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id
WHERE d.dept_id IS NULL;
Result:
| name |
|---|
| Dave |
RIGHT JOIN (Right Outer Join)
Returns all rows from the right table and the matching rows from the left table. If there's no match, the left side columns are filled with NULL.
SELECT e.name, d.dept_name
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.dept_id;
Result:
| name | dept_name |
|---|---|
| Alice | Engineering |
| Bob | Marketing |
| Carol | Sales |
| NULL | HR |
| Eve | Engineering |
- 5 rows returned — all departments are included
- HR has
NULLfornamebecause it has no employees - Dave is now excluded (no department)
RIGHT JOIN is functionally equivalent to LEFT JOIN with the table order swapped. In practice, most developers prefer LEFT JOIN for readability — it keeps the "primary" table first.
FULL OUTER JOIN
Returns all rows from both tables. Matching rows are combined; non-matching rows from either side get NULL for the other table's columns.
SELECT e.name, d.dept_name
FROM employees e
FULL OUTER JOIN departments d ON e.dept_id = d.dept_id;
Result:
| name | dept_name |
|---|---|
| Alice | Engineering |
| Bob | Marketing |
| Carol | Sales |
| Dave | NULL |
| Eve | Engineering |
| NULL | HR |
- 6 rows returned — every employee and every department
- Dave has
NULLfor department (no department) - HR has
NULLfor name (no employees)
Finding unmatched rows from both sides
-- Find employees with no department AND departments with no employees
SELECT e.name, d.dept_name
FROM employees e
FULL OUTER JOIN departments d ON e.dept_id = d.dept_id
WHERE e.dept_id IS NULL OR d.dept_id IS NULL;
Result:
| name | dept_name |
|---|---|
| Dave | NULL |
| NULL | HR |
CROSS JOIN
Returns the Cartesian product — every row from the left table combined with every row from the right table. No ON condition is needed.
SELECT e.name, d.dept_name
FROM employees e
CROSS JOIN departments d;
Result: 5 employees x 4 departments = 20 rows
| name | dept_name |
|---|---|
| Alice | Engineering |
| Alice | Marketing |
| Alice | Sales |
| Alice | HR |
| Bob | Engineering |
| ... | ... |
CROSS JOIN can produce very large result sets. A cross join of two tables with 10,000 rows each produces 100,000,000 rows. Use it intentionally, not accidentally.
When to use CROSS JOIN
- Generating all combinations (e.g., all sizes x all colors)
- Creating a matrix or grid of values
- Building reporting periods by crossing date ranges with entities
SELF JOIN
A table joined to itself. Useful when rows in the same table have hierarchical or peer relationships.
Example: Employee and Manager
ALTER TABLE employees ADD COLUMN manager_id INT REFERENCES employees(id);
-- Alice and Bob report to Eve (manager_id = 5)
-- Carol reports to Alice (manager_id = 1)
-- Dave and Eve have no manager
UPDATE employees SET manager_id = 5 WHERE id IN (1, 2);
UPDATE employees SET manager_id = 1 WHERE id = 3;
SELECT
e.name AS employee,
m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
Result:
| employee | manager |
|---|---|
| Alice | Eve |
| Bob | Eve |
| Carol | Alice |
| Dave | NULL |
| Eve | NULL |
Common self-join scenarios
| Scenario | Table references itself to find... |
|---|---|
| Org chart | Employee → Manager relationship |
| Bill of materials | Assembly → Component relationship |
| Category trees | Category → Parent category |
| Friend/ follower graphs | User → Related user |
Join Type Comparison
| INNER | LEFT | RIGHT | FULL OUTER | CROSS | |
|---|---|---|---|---|---|
| Returns | Matching rows only | All left + matching right | All right + matching left | All rows from both | Every combination |
| Unmatched left rows | Excluded | Included (NULL right) | Excluded | Included (NULL right) | N/A |
| Unmatched right rows | Excluded | Excluded | Included (NULL left) | Included (NULL left) | N/A |
| Result size | Smallest | Depends | Depends | Largest (no CROSS) | Left x Right |
| Common use | Default join | Most business queries | Rare (swap to LEFT) | Data auditing | Combinations |
Multiple Joins
Real queries often join more than two tables. Joins chain left-to-right:
SELECT
e.name AS employee,
d.dept_name,
p.project_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
LEFT JOIN employee_projects ep ON e.id = ep.employee_id
LEFT JOIN projects p ON ep.project_id = p.id;
Mix join types deliberately. Use INNER JOIN for required relationships and LEFT JOIN for optional ones.
Join Performance
How the database executes joins
Databases use different join algorithms depending on table size, indexes, and query complexity:
| Algorithm | Best For | Time Complexity | Memory |
|---|---|---|---|
| Nested Loop | Indexed joins, small outer table | O(M x log N) with index | Low |
| Hash Join | Large unsorted tables, equijoins | O(M + N) | O(smaller table) |
| Merge Join | Pre-sorted data, large tables | O(M + N) | Low |
Performance tips
- Index the join columns — the most impactful optimization. Foreign key columns should be indexed.
-- This makes joins on dept_id fast
CREATE INDEX idx_employees_dept_id ON employees(dept_id);
- Filter early — apply
WHEREconditions before joining to reduce row counts:
-- Good: filters before joining
SELECT e.name, d.dept_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
WHERE d.dept_name = 'Engineering';
-- Bad: joins everything, then filters
-- (optimizer may fix this, but be explicit)
- Select only needed columns — avoid
SELECT *in join queries:
-- Good
SELECT e.name, d.dept_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;
-- Avoid
SELECT *
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;
-
Be cautious with multiple joins — each join can multiply the intermediate result set. Check row counts with
EXPLAIN. -
Use explicit join conditions — avoid comma-style joins (cross joins with a
WHEREfilter) as they are harder to read and more error-prone.
-- Good: explicit JOIN syntax
SELECT e.name, d.dept_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;
-- Avoid: implicit join (old syntax)
SELECT e.name, d.dept_name
FROM employees e, departments d
WHERE e.dept_id = d.dept_id;
Common Pitfalls
- Accidental cross join — forgetting the
ONclause produces a Cartesian product. Always include a join condition. - NULL in join columns —
NULL = NULLevaluates toUNKNOWN(notTRUE), so rows with NULL join keys are excluded from INNER JOINs silently. - Duplicate rows from multiple matches — if the join column has duplicate values, you get more rows than expected. Check for duplicates with
COUNT(*). - Wrong join type — using INNER JOIN when you need LEFT JOIN silently drops rows with no match. Think about whether unmatched rows matter.
- Joining on the wrong column — always verify the join condition references the correct foreign key relationship.
- Filtering in WHERE vs ON — for
LEFT JOIN, putting right-table conditions inWHEREturns it into an effectiveINNER JOIN:
-- This EXCLUDES departments with no employees (WHERE filters after join)
SELECT d.dept_name, e.name
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id
WHERE e.name LIKE 'A%';
-- This KEEPS all departments (ON filters before join)
SELECT d.dept_name, e.name
FROM departments d
LEFT JOIN employees e ON d.dept_id = d.dept_id AND e.name LIKE 'A%';
Best Practices
- Use explicit
JOINsyntax — not comma-separated tables withWHEREfilters - Always qualify column names — use table aliases (
e.name,d.dept_name) to avoid ambiguity - Index join columns — foreign keys and join predicates should be indexed
- Choose the correct join type — INNER for required matches, LEFT for optional data
- Prefer LEFT over RIGHT — swap table order and use LEFT for readability
- Use table aliases — keep queries concise, but use meaningful aliases (
efor employees,dfor departments) - Check with EXPLAIN — verify the query plan uses the right join algorithm and indexes
Interview Questions
1. What is the difference between INNER JOIN and LEFT JOIN?
| INNER JOIN | LEFT JOIN | |
|---|---|---|
| Returns | Only matching rows from both tables | All rows from left table + matching right rows |
| Unmatched left rows | Excluded | Included (right columns are NULL) |
| Use case | "Show me employees and their departments" | "Show me all employees, even those without departments" |
| Result size | Less than or equal to left table rows | Equal to left table rows |
2. What is the difference between LEFT JOIN and RIGHT JOIN?
They are mirror images of each other:
A LEFT JOIN B= all rows from A + matching rows from BA RIGHT JOIN B= all rows from B + matching rows from A
Any RIGHT JOIN can be rewritten as a LEFT JOIN by swapping the table order:
-- These are equivalent:
SELECT * FROM A RIGHT JOIN B ON A.id = B.a_id;
SELECT * FROM B LEFT JOIN A ON B.a_id = A.id;
In practice, LEFT JOIN is preferred for readability — keep the "driving" table first.
3. What is a FULL OUTER JOIN and when would you use it?
FULL OUTER JOIN returns all rows from both tables, with NULLs where there's no match. Use it when you need:
- Data reconciliation — comparing two datasets to find mismatches
- Finding orphaned records — rows in either table with no match in the other
- Merging datasets — combining two sources while preserving all records
-- Find all unmatched rows in either table
SELECT e.name, d.dept_name
FROM employees e
FULL OUTER JOIN departments d ON e.dept_id = d.dept_id
WHERE e.id IS NULL OR d.dept_id IS NULL;
4. What is the difference between ON and WHERE in a LEFT JOIN?
This is a common trap:
-- WHERE clause: filters AFTER the join → turns LEFT into effective INNER
SELECT d.dept_name, e.name
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id
WHERE e.name IS NOT NULL;
-- ON clause: filters DURING the join → preserves LEFT behavior
SELECT d.dept_name, e.name
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id AND e.name IS NOT NULL;
ON— determines which rows to join. Unmatched left rows still appear with NULLs.WHERE— filters the final result.WHERE e.name IS NOT NULLremoves rows wheree.nameis NULL, effectively turning the LEFT JOIN into an INNER JOIN.
5. How do you find rows in one table that have no match in another?
Use LEFT JOIN ... WHERE right.key IS NULL:
-- Employees with no department
SELECT e.name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id
WHERE d.dept_id IS NULL;
-- Departments with no employees
SELECT d.dept_name
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id
WHERE e.id IS NULL;
Alternatively, use NOT EXISTS (often preferred for readability):
SELECT e.name
FROM employees e
WHERE NOT EXISTS (
SELECT 1 FROM departments d WHERE d.dept_id = e.dept_id
);
6. What is a self join and when would you use it?
A self join is a regular join where both sides reference the same table. It requires table aliases to distinguish the two "copies":
-- Find employees and their managers
SELECT
e.name AS employee,
m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
Common use cases: organizational hierarchies, category trees, adjacency lists, and comparing rows within the same table.
Learn More
- Database Indexing — how indexes speed up join operations
- ACID Properties — transaction guarantees that protect data integrity
- Database Replication — how data is distributed across nodes