SQL Aggregation & Filtering

What You'll Learn

By the end of this page, you will be able to:

Use aggregate functions (COUNT, SUM, AVG, MIN, MAX) to summarize data
Write GROUP BY queries to produce subtotals per category
Explain the critical difference between WHERE and HAVING
Use DISTINCT and understand when it differs from GROUP BY
Write queries with multiple grouping levels and filtered aggregations
Avoid common pitfalls with NULLs, empty groups, and incorrect filtering

The Problem: Summarizing Data

Raw data lives in rows — one row per event, per order, per employee. But business questions ask about groups, not individual rows:

"What is the average salary per department?"
"How many orders did each customer place last month?"
"Which departments have more than 5 employees?"

You need a way to collapse many rows into summary rows. That's what aggregation does.

Sample Tables

All examples use these tables:

CREATE TABLE departments (
    dept_id   INT PRIMARY KEY,
    dept_name VARCHAR(50)
);

INSERT INTO departments VALUES
    (10, 'Engineering'),
    (20, 'Marketing'),
    (30, 'Sales'),
    (40, 'HR');

CREATE TABLE employees (
    id        INT PRIMARY KEY,
    name      VARCHAR(50),
    dept_id   INT REFERENCES departments(dept_id),
    salary    DECIMAL(10,2),
    hire_date DATE
);

INSERT INTO employees VALUES
    (1, 'Alice',   10, 90000, '2020-01-15'),
    (2, 'Bob',     20, 75000, '2021-03-20'),
    (3, 'Carol',   10, 85000, '2022-06-10'),
    (4, 'Dave',    30, 70000, '2023-09-01'),
    (5, 'Eve',     10, 95000, '2019-11-05'),
    (6, 'Frank',   20, 78000, '2024-02-14'),
    (7, 'Grace',   30, 72000, '2023-07-22'),
    (8, 'Hank',    NULL, 65000, '2024-01-10');

Key observations:

Employee 8 (Hank) has no department (dept_id is NULL)
Department 40 (HR) has no employees
Engineering has 3 employees, Marketing and Sales each have 2

Aggregate Functions

Aggregate functions compute a single value from a set of input rows:

Function	Returns	NULL Handling
`COUNT(*)`	Total number of rows	Counts all rows, including NULLs
`COUNT(col)`	Number of non-NULL values in `col`	Skips NULLs
`SUM(col)`	Sum of non-NULL values	Skips NULLs, returns NULL if no rows
`AVG(col)`	Average of non-NULL values	Skips NULLs
`MIN(col)`	Minimum non-NULL value	Skips NULLs
`MAX(col)`	Maximum non-NULL value	Skips NULLs

-- Aggregate across the entire table
SELECT
    COUNT(*) AS total_employees,
    COUNT(dept_id) AS with_department,
    AVG(salary) AS avg_salary,
    MIN(salary) AS min_salary,
    MAX(salary) AS max_salary,
    SUM(salary) AS total_payroll
FROM employees;

Result:

total_employees	with_department	avg_salary	min_salary	max_salary	total_payroll
8	7	78750.00	65000.00	95000.00	630000.00

tip

COUNT(*) counts all rows. COUNT(column) counts only rows where column is not NULL. This distinction matters when your column contains NULLs — Hank has no department, so COUNT(dept_id) returns 7, not 8.

GROUP BY: Grouping Rows

GROUP BY splits rows into groups based on column values, then applies aggregate functions to each group independently.

SELECT
    dept_id,
    COUNT(*) AS employee_count,
    AVG(salary) AS avg_salary,
    SUM(salary) AS total_salary
FROM employees
GROUP BY dept_id;

Result:

dept_id	employee_count	avg_salary	total_salary
10	3	90000.00	270000.00
20	2	76500.00	153000.00
30	2	71000.00	142000.00
NULL	1	65000.00	65000.00

Rules of GROUP BY

Every column in SELECT (that is not aggregated) must appear in GROUP BY:

-- Valid: dept_id is in GROUP BY
SELECT dept_id, AVG(salary)
FROM employees
GROUP BY dept_id;

-- INVALID in most databases: name is not in GROUP BY
-- (PostgreSQL/SQL Server will error; MySQL may allow it with unpredictable results)
SELECT dept_id, name, AVG(salary)
FROM employees
GROUP BY dept_id;

NULL forms its own group — all rows with NULL in the grouping column are grouped together.
You can group by multiple columns:

SELECT dept_id, hire_date
FROM employees
GROUP BY dept_id, hire_date;

GROUP BY with JOIN

Combine grouping with joins to produce reports across related tables:

SELECT
    d.dept_name,
    COUNT(*) AS employee_count,
    AVG(e.salary) AS avg_salary
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
GROUP BY d.dept_name;

Result:

dept_name	employee_count	avg_salary
Engineering	3	90000.00
Marketing	2	76500.00
Sales	2	71000.00

tip

Notice that Hank (no department) and HR (no employees) are absent. INNER JOIN excludes both. Use LEFT JOIN if you want to include all employees or all departments.

WHERE vs HAVING

This is the most common interview question about aggregation. The key insight is the order of execution:

Clause	Filters	When it runs	Can use aggregates?
WHERE	Individual rows	Before grouping	No
HAVING	Groups (after aggregation)	After grouping	Yes

WHERE: Filter Before Grouping

-- "Average salary per department for employees hired after 2022"
SELECT
    dept_id,
    COUNT(*) AS employee_count,
    AVG(salary) AS avg_salary
FROM employees
WHERE hire_date >= '2023-01-01'
GROUP BY dept_id;

Result:

dept_id	employee_count	avg_salary
30	2	71000.00
NULL	1	65000.00

Only employees hired after 2022 are included before grouping happens. Engineering (Alice, Carol, Eve all hired before 2023) disappears entirely.

HAVING: Filter After Grouping

-- "Departments with more than 1 employee"
SELECT
    dept_id,
    COUNT(*) AS employee_count,
    AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_id
HAVING COUNT(*) > 1;

Result:

dept_id	employee_count	avg_salary
10	3	90000.00
20	2	76500.00
30	2	71000.00

All rows are grouped first, then groups with COUNT(*) <= 1 (Hank's NULL group) are removed.

Combining WHERE and HAVING

-- "Departments where employees hired after 2020 have an average salary over 72000"
SELECT
    dept_id,
    COUNT(*) AS employee_count,
    AVG(salary) AS avg_salary
FROM employees
WHERE hire_date >= '2020-01-01'    -- filter rows first
GROUP BY dept_id                    -- then group
HAVING AVG(salary) > 72000;         -- then filter groups

Result:

dept_id	employee_count	avg_salary
10	3	90000.00
20	2	76500.00

Execution breakdown:

FROM — all 8 employees
WHERE — remove Eve (hired 2019-11-05) → 7 rows remain
GROUP BY — group the 7 rows by dept_id
HAVING — keep only groups where AVG(salary) > 72000
SELECT — output the final columns

DISTINCT vs GROUP BY

Both can produce unique values, but they serve different purposes:

-- Get unique dept_ids (both produce the same result)
SELECT DISTINCT dept_id FROM employees;
SELECT dept_id FROM employees GROUP BY dept_id;

Result (both queries):

dept_id
10
20
30
NULL

When they differ

Aspect	DISTINCT	GROUP BY
Purpose	Remove duplicate rows	Aggregate data per group
Aggregate functions	Cannot use	Can use (`COUNT`, `SUM`, `AVG`, etc.)
Multiple columns	Deduplicates across all listed columns	Groups by all listed columns
Performance	Often similar	Often similar (same execution plan)
Filtering	No group-level filtering	Supports `HAVING`

Use DISTINCT when:

You only need unique values without any aggregation

-- "Which departments have employees?"
SELECT DISTINCT dept_id FROM employees;

Use GROUP BY when:

You need to aggregate data per group

-- "How many employees per department?"
SELECT dept_id, COUNT(*) FROM employees GROUP BY dept_id;

You need to filter groups with HAVING

-- "Departments with more than 2 employees"
SELECT dept_id FROM employees GROUP BY dept_id HAVING COUNT(*) > 2;

tip

Under the hood, most databases produce the same execution plan for SELECT DISTINCT col and SELECT col GROUP BY col. Use whichever communicates your intent more clearly.

SQL Clause Execution Order

Understanding the logical order of SQL clause execution helps you write correct queries:

Order	Clause	Purpose	Can use aliases?	Can use aggregates?
1	`FROM`	Identify source tables	No	No
2	`JOIN`	Combine tables	No	No
3	`WHERE`	Filter individual rows	No	No
4	`GROUP BY`	Group rows	No (in most DBs)	No
5	`HAVING`	Filter groups	No	Yes
6	`SELECT`	Compute output columns	N/A	Yes
7	`ORDER BY`	Sort results	Yes	Yes
8	`LIMIT`	Restrict row count	No	No

warning

Aliases defined in SELECT are not available in WHERE or HAVING — because SELECT runs after WHERE and HAVING. They are available in ORDER BY (which runs last).

Advanced Patterns

COUNT with conditionals

Count rows matching a condition within a group:

SELECT
    dept_id,
    COUNT(*) AS total,
    COUNT(CASE WHEN salary > 80000 THEN 1 END) AS high_earners,
    COUNT(CASE WHEN salary <= 80000 THEN 1 END) AS standard_earners
FROM employees
WHERE dept_id IS NOT NULL
GROUP BY dept_id;

Result:

dept_id	total	high_earners	standard_earners
10	3	3	0
20	2	1	1
30	2	0	2

This is often faster than writing multiple subqueries.

GROUP BY with multiple columns

SELECT
    d.dept_name,
    EXTRACT(YEAR FROM e.hire_date) AS hire_year,
    COUNT(*) AS hires_that_year
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
GROUP BY d.dept_name, EXTRACT(YEAR FROM e.hire_date)
ORDER BY d.dept_name, hire_year;

Result:

dept_name	hire_year	hires_that_year
Engineering	2019	1
Engineering	2020	1
Engineering	2022	1
Marketing	2021	1
Marketing	2024	1
Sales	2023	2

Aggregates with NULLs

SELECT
    COUNT(*) AS count_all,
    COUNT(dept_id) AS count_dept,
    SUM(salary) AS sum_salary,
    AVG(salary) AS avg_salary
FROM employees;

Result:

count_all	count_dept	sum_salary	avg_salary
8	7	630000.00	78750.00

COUNT(*) = 8 (counts all rows)
COUNT(dept_id) = 7 (skips Hank's NULL)
AVG(salary) = 78750.00 (average of all 8 non-NULL salaries)

warning

AVG ignores NULL values, not treating them as zero. If you want NULLs counted as zero, use COALESCE:

-- AVG treats NULLs as "not exist"
-- COALESCE forces NULLs to be 0, changing the average
SELECT AVG(COALESCE(bonus, 0)) FROM employees;

Common Pitfalls

Using WHERE with aggregate functions — WHERE COUNT(*) > 1 is invalid. Use HAVING COUNT(*) > 1 instead.

-- WRONG
SELECT dept_id, COUNT(*)
FROM employees
WHERE COUNT(*) > 1
GROUP BY dept_id;

-- CORRECT
SELECT dept_id, COUNT(*)
FROM employees
GROUP BY dept_id
HAVING COUNT(*) > 1;

SELECT column not in GROUP BY — Every non-aggregated column in SELECT must be in GROUP BY. Most databases will error.

-- WRONG: name is not aggregated and not in GROUP BY
SELECT dept_id, name, AVG(salary)
FROM employees
GROUP BY dept_id;

-- CORRECT: either aggregate name or add to GROUP BY
SELECT dept_id, MAX(name) AS representative_name, AVG(salary)
FROM employees
GROUP BY dept_id;

HAVING without GROUP BY — Using HAVING without GROUP BY treats the entire table as one group. This is valid but confusing — prefer WHERE.
Filtering in the wrong clause — Using HAVING when WHERE suffices. HAVING is evaluated after grouping, so it processes more data than necessary.

-- INEFFICIENT: HAVING filters after grouping all rows
SELECT dept_id, COUNT(*)
FROM employees
HAVING dept_id IS NOT NULL
GROUP BY dept_id;

-- BETTER: WHERE filters before grouping (fewer rows to group)
SELECT dept_id, COUNT(*)
FROM employees
WHERE dept_id IS NOT NULL
GROUP BY dept_id;

COUNT(*) vs COUNT(col) confusion — COUNT(*) counts all rows; COUNT(col) skips NULLs. Know which one you need.

Best Practices

Filter early with WHERE — reduce the number of rows before grouping for better performance
Use HAVING only for aggregate conditions — if the condition doesn't use an aggregate, it belongs in WHERE
Always GROUP BY all non-aggregated columns — don't rely on loose MySQL behavior
Use meaningful aliases — COUNT(*) AS employee_count is clearer than bare COUNT(*)
Index columns used in GROUP BY — grouping often involves sorting or hashing
Check execution plans — use EXPLAIN to verify the database is using indexes and not scanning the entire table
Consider materialized views — for expensive aggregation queries run frequently

Interview Questions

1. What is the difference between WHERE and HAVING?

	WHERE	HAVING
Filters	Individual rows	Groups (after aggregation)
Runs	Before GROUP BY	After GROUP BY
Can use aggregates?	No	Yes
Use case	"Filter employees with salary > 80000"	"Filter departments with avg salary > 80000"

-- WHERE: filter rows before grouping
SELECT dept_id, AVG(salary)
FROM employees
WHERE salary > 70000
GROUP BY dept_id;

-- HAVING: filter groups after grouping
SELECT dept_id, AVG(salary)
FROM employees
GROUP BY dept_id
HAVING AVG(salary) > 70000;

The first query excludes individual employees earning 70000 or less, then averages the rest. The second query averages all employees per department, then keeps departments where that average exceeds 70000.

2. What is the difference between COUNT(*) and COUNT(column)?

COUNT(*) counts all rows, including those with NULL values
COUNT(column) counts only rows where column is **not NULL`

SELECT
    COUNT(*) AS total_rows,        -- 8
    COUNT(dept_id) AS with_dept    -- 7 (Hank has NULL dept_id)
FROM employees;

3. What is the difference between DISTINCT and GROUP BY?

Both can return unique values:

-- These produce identical results
SELECT DISTINCT dept_id FROM employees;
SELECT dept_id FROM employees GROUP BY dept_id;

But GROUP BY can do more — it supports aggregation and HAVING:

-- GROUP BY with aggregation (DISTINCT can't do this)
SELECT dept_id, COUNT(*), AVG(salary)
FROM employees
GROUP BY dept_id;

Use DISTINCT for simple deduplication. Use GROUP BY when you need aggregation or group-level filtering.

4. Write a query to find departments with more than 2 employees earning above 75000.

This requires both WHERE and HAVING:

SELECT
    d.dept_name,
    COUNT(*) AS high_earner_count,
    AVG(e.salary) AS avg_high_salary
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
WHERE e.salary > 75000           -- filter individual rows first
GROUP BY d.dept_name             -- then group by department
HAVING COUNT(*) > 2;             -- then filter groups

5. What is the order of execution of SQL clauses?

FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT

This explains why:

You can't use SELECT aliases in WHERE (WHERE runs first)
You can't use aggregate functions in WHERE (aggregation hasn't happened yet)
You can use SELECT aliases in ORDER BY (ORDER BY runs last)

6. What happens when you GROUP BY a column that contains NULLs?

All rows with NULL in the grouping column are placed into one group:

SELECT dept_id, COUNT(*)
FROM employees
GROUP BY dept_id;

This produces a row for dept_id = NULL with count 1 (Hank). NULL is treated as a distinct grouping value.

7. How do you count rows matching a condition within a group?

Use COUNT(CASE WHEN ... THEN 1 END) or the FILTER clause (PostgreSQL):

-- Standard SQL (works everywhere)
SELECT
    dept_id,
    COUNT(*) AS total,
    COUNT(CASE WHEN salary > 80000 THEN 1 END) AS high_earners
FROM employees
GROUP BY dept_id;

-- PostgreSQL-specific (cleaner syntax)
SELECT
    dept_id,
    COUNT(*) AS total,
    COUNT(*) FILTER (WHERE salary > 80000) AS high_earners
FROM employees
GROUP BY dept_id;

Learn More

SQL Joins — combining data from multiple tables before aggregation
Database Indexing — how indexes speed up GROUP BY and WHERE
ACID Properties — transaction guarantees for data integrity
Materialized Views — pre-computed aggregation results for performance

What You'll Learn​

The Problem: Summarizing Data​

Sample Tables​

Aggregate Functions​

GROUP BY: Grouping Rows​

Rules of GROUP BY​

GROUP BY with JOIN​

WHERE vs HAVING​

WHERE: Filter Before Grouping​

HAVING: Filter After Grouping​

Combining WHERE and HAVING​

DISTINCT vs GROUP BY​

When they differ​

Use DISTINCT when:​

Use GROUP BY when:​

SQL Clause Execution Order​

Advanced Patterns​

COUNT with conditionals​

GROUP BY with multiple columns​

Aggregates with NULLs​

Common Pitfalls​

Best Practices​

Interview Questions​

1. What is the difference between WHERE and HAVING?​

2. What is the difference between COUNT(*) and COUNT(column)?​

3. What is the difference between DISTINCT and GROUP BY?​

4. Write a query to find departments with more than 2 employees earning above 75000.​

5. What is the order of execution of SQL clauses?​

6. What happens when you GROUP BY a column that contains NULLs?​

7. How do you count rows matching a condition within a group?​

Learn More​

What You'll Learn

The Problem: Summarizing Data

Sample Tables

Aggregate Functions

GROUP BY: Grouping Rows

Rules of GROUP BY

GROUP BY with JOIN

WHERE vs HAVING

WHERE: Filter Before Grouping

HAVING: Filter After Grouping

Combining WHERE and HAVING

DISTINCT vs GROUP BY

When they differ

Use DISTINCT when:

Use GROUP BY when:

SQL Clause Execution Order

Advanced Patterns

COUNT with conditionals

GROUP BY with multiple columns

Aggregates with NULLs

Common Pitfalls

Best Practices

Interview Questions

1. What is the difference between WHERE and HAVING?

2. What is the difference between COUNT(*) and COUNT(column)?

3. What is the difference between DISTINCT and GROUP BY?

4. Write a query to find departments with more than 2 employees earning above 75000.

5. What is the order of execution of SQL clauses?

6. What happens when you GROUP BY a column that contains NULLs?

7. How do you count rows matching a condition within a group?

Learn More