SQL UNION
What You'll Learn
By the end of this page, you will be able to:
- Explain what set operations are and how they combine result sets in SQL
- Write UNION, UNION ALL, INTERSECT, and EXCEPT queries
- Understand the difference between UNION and UNION ALL and when to use each
- Apply the column compatibility rules that govern set operations
- Avoid common performance and correctness pitfalls
- Choose the right set operation for a given problem
The Problem: Combining Results from Multiple Queries
Sometimes you need data that spans multiple tables or multiple conditions that can't be expressed in a single SELECT. Set operations let you combine the results of two or more queries into a single result set.
customers table: suppliers table:
+----+--------+-------+ +----+--------+-------+
| id | name | city | | id | name | city |
+----+--------+-------+ +----+--------+-------+
| 1 | Alice | Hanoi | | 1 | Gamma | HCMC |
| 2 | Bob | HCMC | | 2 | Delta | Hanoi |
| 3 | Carol | Hanoi | | 3 | Epslon| Da Nang|
+----+--------+-------+ +----+--------+-------+
Questions you can't answer from a single query:
- "Give me all contacts — both customers and suppliers" — needs to combine two tables
- "Which cities have both customers and suppliers?" — needs to find the overlap
- "Which cities have customers but no suppliers?" — needs to find the difference
Set operations solve these by treating query results as mathematical sets.
Sample Tables
All examples use these tables:
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(50),
city VARCHAR(50)
);
INSERT INTO customers VALUES
(1, 'Alice', 'Hanoi'),
(2, 'Bob', 'HCMC'),
(3, 'Carol', 'Hanoi');
CREATE TABLE suppliers (
id INT PRIMARY KEY,
name VARCHAR(50),
city VARCHAR(50)
);
INSERT INTO suppliers VALUES
(1, 'Gamma', 'HCMC'),
(2, 'Delta', 'Hanoi'),
(3, 'Epsilon', 'Da Nang');
Key observations:
- Hanoi appears in both tables (Alice, Carol are customers; Delta is a supplier)
- HCMC appears in both tables (Bob is a customer; Gamma is a supplier)
- Da Nang appears only in suppliers
- The
namecolumn has no overlap — useful for demonstrating deduplication
Set Operations Overview
Rules for Set Operations
Before diving into each operation, all set operations follow these rules:
- Same number of columns — both queries must return the same column count
- Compatible data types — corresponding columns must have compatible types
- Column names — the result uses column names from the first query
- ORDER BY — can only appear once, at the very end, and applies to the entire result
-- Valid: same columns, compatible types
SELECT name, city FROM customers
UNION
SELECT name, city FROM suppliers;
-- Invalid: different column counts
SELECT name, city, email FROM customers
UNION
SELECT name, city FROM suppliers; -- ERROR: each UNION query must have the same number of columns
UNION
Returns all distinct rows from both queries. Duplicate rows are removed.
SELECT name, city FROM customers
UNION
SELECT name, city FROM suppliers;
Result:
| name | city |
|---|---|
| Alice | Hanoi |
| Bob | HCMC |
| Carol | Hanoi |
| Delta | Hanoi |
| Epsilon | Da Nang |
| Gamma | HCMC |
- 6 rows returned — all distinct combinations
- Since all names are unique across both tables, no duplicates are removed here
When duplicates exist
-- Get all cities from both tables
SELECT city FROM customers
UNION
SELECT city FROM suppliers;
Result:
| city |
|---|
| Hanoi |
| HCMC |
| Da Nang |
- 3 rows returned —
HanoiandHCMCappear in both tables but UNION keeps only one copy - Without UNION (using UNION ALL), you'd get 6 rows
UNION ALL
Returns all rows from both queries, including duplicates. No deduplication step is performed.
SELECT city FROM customers
UNION ALL
SELECT city FROM suppliers;
Result:
| city |
|---|
| Hanoi |
| HCMC |
| Hanoi |
| HCMC |
| Hanoi |
| Da Nang |
- 6 rows returned — every row from both queries, no deduplication
Always prefer UNION ALL when you know duplicates are impossible or desired. UNION removes duplicates by sorting/hashing the result set, which adds overhead. UNION ALL simply appends results — it's faster.
UNION vs UNION ALL
| UNION | UNION ALL | |
|---|---|---|
| Duplicates | Removed | Kept |
| Performance | Slower (dedup step) | Faster (no dedup) |
| Use when | You need unique rows only | Duplicates don't matter or are impossible |
| Equivalent to | SELECT DISTINCT ... across all results | Concatenation of results |
INTERSECT
Returns only the rows that appear in both query results.
-- Cities that have BOTH customers and suppliers
SELECT city FROM customers
INTERSECT
SELECT city FROM suppliers;
Result:
| city |
|---|
| Hanoi |
| HCMC |
- Only cities present in both tables are returned
Da Nangis excluded because no customer lives there
MySQL does not support INTERSECT natively. Use INNER JOIN or WHERE ... IN as a workaround:
-- MySQL alternative for INTERSECT
SELECT DISTINCT c.city
FROM customers c
INNER JOIN suppliers s ON c.city = s.city;
EXCEPT
Returns rows from the first query that are not in the second query. Also known as MINUS in Oracle.
-- Cities with customers but NO suppliers
SELECT city FROM customers
EXCEPT
SELECT city FROM suppliers;
Result:
| city |
|---|
| (empty — all customer cities also have suppliers) |
Now let's reverse it:
-- Cities with suppliers but NO customers
SELECT city FROM suppliers
EXCEPT
SELECT city FROM customers;
Result:
| city |
|---|
| Da Nang |
Da Nanghas a supplier but no customer — only it survives the EXCEPT
MySQL does not support EXCEPT natively. Use LEFT JOIN ... WHERE IS NULL or NOT EXISTS:
-- MySQL alternative for EXCEPT
SELECT DISTINCT s.city
FROM suppliers s
LEFT JOIN customers c ON s.city = c.city
WHERE c.city IS NULL;
In Oracle, EXCEPT is written as MINUS. They are functionally identical.
-- Oracle syntax
SELECT city FROM customers
MINUS
SELECT city FROM suppliers;
Set Operation Comparison
| UNION | UNION ALL | INTERSECT | EXCEPT | |
|---|---|---|---|---|
| Returns | All distinct rows | All rows | Rows in both | Rows in first only |
| Duplicates | Removed | Kept | Removed | Removed |
| Performance | Moderate | Fastest | Slowest | Slow |
| MySQL support | Yes | Yes | No (use JOIN) | No (use JOIN) |
| Analogy | A ∪ B | A ∪ B (with dupes) | A ∩ B | A − B |
Practical Examples
Combining data from similar tables
A common use case is combining rows from tables with similar structure:
-- All contacts (customers + suppliers) with a source label
SELECT name, city, 'Customer' AS source
FROM customers
UNION ALL
SELECT name, city, 'Supplier' AS source
FROM suppliers;
Result:
| name | city | source |
|---|---|---|
| Alice | Hanoi | Customer |
| Bob | HCMC | Customer |
| Carol | Hanoi | Customer |
| Gamma | HCMC | Supplier |
| Delta | Hanoi | Supplier |
| Epsilon | Da Nang | Supplier |
Merging reports across time periods
-- Sales from Q1 and Q2, combined
SELECT product_id, amount, 'Q1' AS quarter FROM sales_q1
UNION ALL
SELECT product_id, amount, 'Q2' AS quarter FROM sales_q2
ORDER BY product_id, quarter;
Finding orphaned records
-- Products that exist in the catalog but have never been ordered
SELECT product_id FROM products
EXCEPT
SELECT DISTINCT product_id FROM order_items;
Finding complete vs incomplete records
-- Users who have both a profile AND an active subscription
SELECT user_id FROM profiles
INTERSECT
SELECT user_id FROM subscriptions WHERE status = 'active';
Performance Tips
1. Prefer UNION ALL over UNION
UNION performs an implicit DISTINCT operation — it must sort or hash the entire result set to eliminate duplicates. UNION ALL simply appends.
-- Slow: implicit dedup
SELECT city FROM customers
UNION
SELECT city FROM suppliers;
-- Fast: no dedup (use when you know there are no duplicates)
SELECT city FROM customers
UNION ALL
SELECT city FROM suppliers;
2. Filter before combining
Apply WHERE clauses in each individual query rather than filtering after the set operation:
-- Good: filters in each query, less data to combine
SELECT name, city FROM customers WHERE city = 'Hanoi'
UNION ALL
SELECT name, city FROM suppliers WHERE city = 'Hanoi';
-- Inefficient: combines everything, then filters
SELECT * FROM (
SELECT name, city FROM customers
UNION ALL
SELECT name, city FROM suppliers
) combined
WHERE city = 'Hanoi';
3. Index columns used in set operations
If you're frequently running INTERSECT or EXCEPT on a column, make sure it's indexed:
CREATE INDEX idx_customers_city ON customers(city);
CREATE INDEX idx_suppliers_city ON suppliers(city);
4. Limit early
Use LIMIT / TOP in subqueries when you only need a sample:
-- Only need 10 recent entries from each source
(SELECT id, name, created_at FROM customers ORDER BY created_at DESC LIMIT 10)
UNION ALL
(SELECT id, name, created_at FROM suppliers ORDER BY created_at DESC LIMIT 10)
ORDER BY created_at DESC;
When combining ORDER BY or LIMIT with set operations, wrap each individual query in parentheses. The final ORDER BY applies to the entire combined result.
Common Pitfalls
- Mixing up UNION and UNION ALL — UNION removes duplicates (slower), UNION ALL keeps them (faster). Choose deliberately.
- Column mismatch — the most common runtime error. Both queries must return the same number of columns with compatible types.
- Misleading column names — the result uses column names from the first query. If the second query's columns have different names, you may get confusing headers. Use aliases in the first query.
- ORDER BY placement —
ORDER BYcan only appear once, at the end of the entire statement. Placing it in individual queries (without parentheses) causes an error. - NULL comparisons in INTERSECT/EXCEPT — set operations treat two NULLs as equal, unlike standard SQL comparisons where
NULL = NULLisUNKNOWN. - Assuming order — set operations do not guarantee row order. Always use
ORDER BYif order matters.
Best Practices
- Use UNION ALL by default — only switch to UNION when you explicitly need deduplication
- Add a source column — when combining similar tables, include a label column to track where each row came from
- Wrap individual queries in parentheses — when using
ORDER BYorLIMITper query - Verify column alignment — double-check that corresponding columns have compatible types and semantics
- Use INTERSECT/EXCEPT for data validation — great for finding overlaps and differences between datasets
- Consider alternatives — sometimes a single query with
OR,IN, orJOINis simpler and faster than a set operation
-- Sometimes simpler than UNION
SELECT * FROM contacts
WHERE type = 'Customer' OR type = 'Supplier';
-- Sometimes simpler than INTERSECT
SELECT DISTINCT c.city
FROM customers c
INNER JOIN suppliers s ON c.city = s.city;
Interview Questions
1. What is the difference between UNION and UNION ALL?
| UNION | UNION ALL | |
|---|---|---|
| Duplicates | Removed | Kept |
| Performance | Slower (needs dedup) | Faster (no dedup) |
| When to use | Need unique rows | Duplicates don't matter |
UNION performs an extra step to eliminate duplicate rows (like wrapping the result in DISTINCT). UNION ALL simply concatenates the results. If you know duplicates are impossible or acceptable, always use UNION ALL for better performance.
2. What are the rules for SQL set operations?
- Both queries must return the same number of columns
- Corresponding columns must have compatible data types
- The result uses column names from the first query
ORDER BYcan only appear once, at the end of the entire statement- NULLs are treated as equal for deduplication purposes
3. How would you find rows in one table that don't exist in another?
Use EXCEPT (or MINUS in Oracle):
SELECT product_id FROM products
EXCEPT
SELECT DISTINCT product_id FROM order_items;
In MySQL, use a workaround:
SELECT p.product_id
FROM products p
LEFT JOIN order_items oi ON p.product_id = oi.product_id
WHERE oi.product_id IS NULL;
4. When would you use INTERSECT vs INNER JOIN?
Both find overlapping data, but they work at different levels:
| INTERSECT | INNER JOIN | |
|---|---|---|
| Compares | Entire rows | Specific columns |
| Returns | Distinct rows from both queries | Combined columns from both tables |
| Duplicates | Removed automatically | May produce duplicates (one-to-many) |
| Use when | Comparing complete row values | Need columns from both tables |
-- INTERSECT: cities in both tables (just the city value)
SELECT city FROM customers
INTERSECT
SELECT city FROM suppliers;
-- INNER JOIN: same question, but can include more columns
SELECT DISTINCT c.city, c.name AS customer_name, s.name AS supplier_name
FROM customers c
INNER JOIN suppliers s ON c.city = s.city;
5. Can you use ORDER BY with UNION? How?
Yes, but only once — at the very end of the entire statement:
SELECT name, city FROM customers
UNION ALL
SELECT name, city FROM suppliers
ORDER BY city, name; -- applies to the combined result
To order individual queries before combining, wrap them in parentheses:
(SELECT name, city FROM customers ORDER BY name)
UNION ALL
(SELECT name, city FROM suppliers ORDER BY name)
ORDER BY city; -- final ordering of combined result
6. How do you add a "source" column to distinguish rows in a UNION?
Add a literal string in each query:
SELECT name, city, 'Customer' AS source FROM customers
UNION ALL
SELECT name, city, 'Supplier' AS source FROM suppliers;
This is a common pattern when merging data from similar tables and you need to track which table each row came from.
Learn More
- SQL Joins — combining columns from multiple tables
- SQL Aggregation — GROUP BY, HAVING, and aggregate functions
- Database Indexing — how indexes speed up query performance