Primary Key vs Foreign Key
What You'll Learn
By the end of this page, you will be able to:
- Explain what a primary key is and why every table needs one
- Explain what a foreign key is and how it establishes relationships between tables
- Compare primary keys vs foreign keys — their purpose, constraints, and behavior
- Understand referential integrity and how foreign keys enforce it
- Choose between natural keys and surrogate keys with trade-offs
- Apply best practices for defining keys in real-world schemas
The Problem: Identifying and Relating Data
Imagine a database for an e-commerce platform. You have millions of orders, thousands of products, and thousands of users. Two fundamental questions arise:
- How do you uniquely identify each row? — Without a reliable identifier, you cannot distinguish one order from another, one user from another.
- How do you link related data across tables? — An order belongs to a user and contains products. Without explicit relationships, the data is disconnected and unreliable.
Primary keys solve the first problem. Foreign keys solve the second. Together, they are the backbone of relational database design.
Primary Key
A primary key is a column (or set of columns) that uniquely identifies each row in a table. Every table in a relational database should have exactly one primary key.
Properties
| Property | Description |
|---|---|
| Unique | No two rows can have the same primary key value |
| Not NULL | Primary key columns cannot contain NULL values |
| Immutable | Primary key values should not change over time |
| Only one per table | A table can have only one primary key (though it can be composite) |
Creating a primary key
-- Single-column primary key
CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(255)
);
-- Named constraint (preferred for production)
CREATE TABLE users (
id INT,
name VARCHAR(100),
email VARCHAR(255),
CONSTRAINT pk_users PRIMARY KEY (id)
);
-- Composite primary key (multiple columns)
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
PRIMARY KEY (order_id, product_id)
);
What makes a good primary key?
| Criteria | Why it matters |
|---|---|
| Unique | Must never duplicate — ensures every row is identifiable |
| Never NULL | NULL cannot be compared, so it cannot serve as a reliable identifier |
| Immutable | Changing a primary key requires updating all foreign keys referencing it |
| Simple | Prefer a single integer column over multi-column composite keys |
| Efficient | Small data types (INT, BIGINT) index and join faster than strings or UUIDs |
Foreign Key
A foreign key is a column (or set of columns) in one table that references the primary key of another table. It establishes a relationship between two tables and enforces referential integrity.
Properties
| Property | Description |
|---|---|
| References a primary key | The foreign key points to a primary key (or unique key) in another table |
| Can be NULL | Foreign keys can be NULL, meaning the relationship is optional |
| Can duplicate | Multiple rows can reference the same parent row (one-to-many) |
| Multiple per table | A table can have multiple foreign keys referencing different tables |
Creating a foreign key
CREATE TABLE orders (
id INT PRIMARY KEY,
user_id INT,
order_date DATE,
status VARCHAR(20),
CONSTRAINT fk_orders_users
FOREIGN KEY (user_id)
REFERENCES users(id)
);
Referential integrity
Referential integrity ensures that every foreign key value matches an existing primary key in the referenced table — no orphaned references.
- Orders 101, 102, and 103 are valid —
user_idreferences real users. - Order 104 is invalid —
user_id = 999does not exist inusers. The foreign key constraint would reject this insert.
ON DELETE / ON UPDATE actions
When a referenced row is deleted or updated, the database must decide what happens to the rows that reference it. Foreign keys support referential actions:
CONSTRAINT fk_orders_users
FOREIGN KEY (user_id)
REFERENCES users(id)
ON DELETE CASCADE -- delete orders when user is deleted
ON UPDATE CASCADE; -- update user_id when users.id changes
| Action | ON DELETE | ON UPDATE |
|---|---|---|
| CASCADE | Delete referencing rows automatically | Update referencing rows automatically |
| SET NULL | Set foreign key to NULL | Set foreign key to NULL |
| SET DEFAULT | Set foreign key to its default value | Set foreign key to its default value |
| RESTRICT | Prevent the delete (error) | Prevent the update (error) |
| NO ACTION | Same as RESTRICT (deferred in some DBs) | Same as RESTRICT (deferred in some DBs) |
RESTRICT (or NO ACTION) is the safest default — it forces the application to handle the relationship explicitly. Use CASCADE only when you are certain that deleting a parent should always delete its children (e.g., order items when an order is deleted).
Primary Key vs Foreign Key: Comparison
| Primary Key | Foreign Key | |
|---|---|---|
| Purpose | Uniquely identifies each row in a table | References a primary key in another table |
| Uniqueness | Must be unique | Can have duplicate values |
| NULL | Cannot be NULL | Can be NULL (optional relationship) |
| Per table | Exactly one | Zero or more |
| Direction | Is referenced by foreign keys | References a primary key |
| Auto-generated | Often auto-incremented or UUID | Values come from the referenced table |
| Index | Automatically indexed (clustered in most DBs) | Not auto-indexed in some DBs (e.g., PostgreSQL, MySQL InnoDB) |
| Constraint type | Entity integrity | Referential integrity |
Relationship cardinality
Foreign keys define the cardinality of relationships between tables:
| Relationship | Example | How it works |
|---|---|---|
| One-to-Many | One user → many orders | Foreign key on the "many" side (orders.user_id) |
| One-to-One | One user → one profile | Foreign key on either side with a UNIQUE constraint |
| Many-to-Many | Students ↔ courses | Junction table with two foreign keys |
-- One-to-One: user profile
CREATE TABLE profiles (
id INT PRIMARY KEY,
user_id INT UNIQUE, -- UNIQUE makes it 1:1
bio TEXT,
CONSTRAINT fk_profiles_users FOREIGN KEY (user_id) REFERENCES users(id)
);
-- Many-to-Many: students ↔ courses (junction table)
CREATE TABLE enrollments (
student_id INT,
course_id INT,
enrolled_at TIMESTAMP,
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES students(id),
FOREIGN KEY (course_id) REFERENCES courses(id)
);
Natural Key vs Surrogate Key
When choosing a primary key, you face a fundamental design decision:
| Natural Key | Surrogate Key | |
|---|---|---|
| Definition | A column with business meaning (e.g., email, ISBN, SSN) | An artificially generated column with no business meaning (e.g., auto-increment INT, UUID) |
| Example | email, isbn, tax_id | id INT AUTO_INCREMENT, id UUID |
| Pros | No extra column; directly meaningful | Never changes; simple; uniform across tables |
| Cons | May change (email, SSN); may be composite; can be large | Extra column; no business meaning |
| Performance | Can be slow if the key is a long string | Fast joins on small integer types |
Surrogate key (recommended)
-- Auto-increment integer (most common)
CREATE TABLE users (
id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, -- PostgreSQL 10+
name VARCHAR(100),
email VARCHAR(255) UNIQUE
);
-- SQL Server
CREATE TABLE users (
id INT IDENTITY(1,1) PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(255) UNIQUE
);
-- UUID (for distributed systems)
CREATE TABLE users (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY, -- PostgreSQL 13+
name VARCHAR(100),
email VARCHAR(255) UNIQUE
);
When to use natural keys
Natural keys make sense when the business identifier is truly immutable and guaranteed unique:
- ISO country codes (
US,VN,DE) - Currency codes (
USD,EUR) - Language codes (
en,vi)
Avoid using emails, phone numbers, or SSNs as primary keys. They can change, be reused, or have privacy implications. Use a surrogate key and put a UNIQUE constraint on the natural identifier instead.
Foreign Key Indexing
A critical performance consideration: foreign key columns should be indexed.
Without an index on orders.user_id, the database must perform a full table scan on orders every time a row in users is deleted or updated (to enforce the foreign key constraint). This can be devastating for performance.
-- PostgreSQL, MySQL InnoDB: foreign keys are NOT auto-indexed on the referencing side
-- You must create the index manually:
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- SQL Server: foreign keys ARE auto-indexed... wait, no. They are NOT.
-- Same rule applies — always index your foreign key columns.
Always index foreign key columns. This is one of the most common performance mistakes in database design. Some ORMs and frameworks don't do this automatically.
Common Pitfalls
- Using a natural key that changes — if your primary key changes, every foreign key referencing it must also change. Use surrogate keys to avoid this.
- Missing foreign key indexes — without an index on the foreign key column, deletes and updates on the parent table cause full table scans.
- Overusing CASCADE —
ON DELETE CASCADEcan silently delete large amounts of data. Use it intentionally, not by default. - Not handling circular references — two tables referencing each other can make inserts and deletes complicated. Break the cycle with nullable foreign keys or deferred constraints.
- Using GUID/UUID as clustered index — random UUIDs cause index fragmentation. Use sequential UUIDs or integer surrogate keys for clustered primary keys.
- Composite primary keys in large tables — multi-column primary keys make all foreign key references wider and slower. Prefer single-column surrogate keys.
Best Practices
- Always define a primary key — every table should have one. Without it, you cannot reliably identify or update rows.
- Prefer surrogate keys — use auto-increment integers or UUIDs. Keep natural identifiers as UNIQUE constraints.
- Always define foreign keys — don't rely on application logic to maintain relationships. The database enforces integrity even when the application has bugs.
- Index every foreign key column — this is critical for join performance and constraint enforcement.
- Use RESTRICT by default — choose CASCADE only when you have a clear business requirement for cascading behavior.
- Keep primary keys small — INT or BIGINT is ideal. Every foreign key duplicates the primary key value, so smaller keys mean smaller indexes and faster joins.
- Never change primary key values — if you need to change an identifier, add a new surrogate key and keep the old value as a unique column.
Interview Questions
1. What is the difference between a primary key and a foreign key?
| Primary Key | Foreign Key | |
|---|---|---|
| Purpose | Uniquely identifies a row | References a row in another table |
| Uniqueness | Must be unique | Can have duplicates |
| NULL | Cannot be NULL | Can be NULL |
| Per table | Exactly one | Zero or more |
| Enforces | Entity integrity | Referential integrity |
| Auto-indexed | Yes (always) | Depends on the database |
2. What is referential integrity and how is it enforced?
Referential integrity ensures that every foreign key value matches an existing primary key in the referenced table. It prevents:
- Inserting a child row with a non-existent parent
- Deleting a parent row that is still referenced by children (unless CASCADE)
- Updating a parent's primary key while children still reference the old value
It is enforced by the database through foreign key constraints — the database checks every INSERT, UPDATE, and DELETE against the constraint.
3. What is the difference between a natural key and a surrogate key?
| Natural Key | Surrogate Key | |
|---|---|---|
| Source | Business data (email, ISBN, SSN) | Generated (auto-increment, UUID) |
| Meaningful | Yes — has business meaning | No — artificial identifier |
| Stability | May change over time | Never changes |
| Size | Can be large (strings) | Small and uniform (INT, BIGINT) |
| Recommendation | Use as UNIQUE constraint | Use as primary key |
Surrogate keys are generally preferred because they are stable, simple, and performant.
4. What happens when you delete a row that is referenced by a foreign key?
It depends on the ON DELETE action defined on the foreign key constraint:
| Action | Result |
|---|---|
| RESTRICT / NO ACTION | Delete is rejected — error is raised |
| CASCADE | All referencing rows are automatically deleted |
| SET NULL | Foreign key columns in referencing rows are set to NULL |
| SET DEFAULT | Foreign key columns are set to their default value |
The default is RESTRICT, which is the safest option.
5. Why should foreign key columns be indexed?
Without an index on the foreign key column:
- Deleting a parent row requires a full table scan on the child table to verify no references exist (or to find rows to cascade)
- Joining parent and child tables is slow — the database cannot efficiently look up matching rows
- Locking can escalate from row-level to table-level under some databases (e.g., MySQL InnoDB)
Indexing foreign key columns ensures O(log n) lookups instead of O(n) scans.
6. Can a table have multiple foreign keys?
Yes. A table can have as many foreign keys as needed, each referencing a different table (or even the same table for self-referencing relationships).
CREATE TABLE order_items (
id INT PRIMARY KEY,
order_id INT REFERENCES orders(id), -- FK to orders
product_id INT REFERENCES products(id), -- FK to products
quantity INT
);
Common examples:
order_itemsreferences bothordersandproductsemployeesreferencesdepartments(department) andemployees(manager) — self-referencingpostsreferencesusers(author) andcategories(category)
Learn More
- ACID Properties — how transactions guarantee consistency
- SQL Joins — querying data across related tables
- Database Indexing — how indexes speed up queries on keys
- Database Locking — concurrency control when modifying related data