Skip to main content

Primary Key vs Foreign Key

What You'll Learn

By the end of this page, you will be able to:

  • Explain what a primary key is and why every table needs one
  • Explain what a foreign key is and how it establishes relationships between tables
  • Compare primary keys vs foreign keys — their purpose, constraints, and behavior
  • Understand referential integrity and how foreign keys enforce it
  • Choose between natural keys and surrogate keys with trade-offs
  • Apply best practices for defining keys in real-world schemas

The Problem: Identifying and Relating Data

Imagine a database for an e-commerce platform. You have millions of orders, thousands of products, and thousands of users. Two fundamental questions arise:

  1. How do you uniquely identify each row? — Without a reliable identifier, you cannot distinguish one order from another, one user from another.
  2. How do you link related data across tables? — An order belongs to a user and contains products. Without explicit relationships, the data is disconnected and unreliable.

Primary keys solve the first problem. Foreign keys solve the second. Together, they are the backbone of relational database design.

Primary Key

A primary key is a column (or set of columns) that uniquely identifies each row in a table. Every table in a relational database should have exactly one primary key.

Properties

PropertyDescription
UniqueNo two rows can have the same primary key value
Not NULLPrimary key columns cannot contain NULL values
ImmutablePrimary key values should not change over time
Only one per tableA table can have only one primary key (though it can be composite)

Creating a primary key

-- Single-column primary key
CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(255)
);

-- Named constraint (preferred for production)
CREATE TABLE users (
id INT,
name VARCHAR(100),
email VARCHAR(255),
CONSTRAINT pk_users PRIMARY KEY (id)
);

-- Composite primary key (multiple columns)
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
PRIMARY KEY (order_id, product_id)
);

What makes a good primary key?

CriteriaWhy it matters
UniqueMust never duplicate — ensures every row is identifiable
Never NULLNULL cannot be compared, so it cannot serve as a reliable identifier
ImmutableChanging a primary key requires updating all foreign keys referencing it
SimplePrefer a single integer column over multi-column composite keys
EfficientSmall data types (INT, BIGINT) index and join faster than strings or UUIDs

Foreign Key

A foreign key is a column (or set of columns) in one table that references the primary key of another table. It establishes a relationship between two tables and enforces referential integrity.

Properties

PropertyDescription
References a primary keyThe foreign key points to a primary key (or unique key) in another table
Can be NULLForeign keys can be NULL, meaning the relationship is optional
Can duplicateMultiple rows can reference the same parent row (one-to-many)
Multiple per tableA table can have multiple foreign keys referencing different tables

Creating a foreign key

CREATE TABLE orders (
id INT PRIMARY KEY,
user_id INT,
order_date DATE,
status VARCHAR(20),
CONSTRAINT fk_orders_users
FOREIGN KEY (user_id)
REFERENCES users(id)
);

Referential integrity

Referential integrity ensures that every foreign key value matches an existing primary key in the referenced table — no orphaned references.

  • Orders 101, 102, and 103 are valid — user_id references real users.
  • Order 104 is invaliduser_id = 999 does not exist in users. The foreign key constraint would reject this insert.

ON DELETE / ON UPDATE actions

When a referenced row is deleted or updated, the database must decide what happens to the rows that reference it. Foreign keys support referential actions:

CONSTRAINT fk_orders_users
FOREIGN KEY (user_id)
REFERENCES users(id)
ON DELETE CASCADE -- delete orders when user is deleted
ON UPDATE CASCADE; -- update user_id when users.id changes
ActionON DELETEON UPDATE
CASCADEDelete referencing rows automaticallyUpdate referencing rows automatically
SET NULLSet foreign key to NULLSet foreign key to NULL
SET DEFAULTSet foreign key to its default valueSet foreign key to its default value
RESTRICTPrevent the delete (error)Prevent the update (error)
NO ACTIONSame as RESTRICT (deferred in some DBs)Same as RESTRICT (deferred in some DBs)
tip

RESTRICT (or NO ACTION) is the safest default — it forces the application to handle the relationship explicitly. Use CASCADE only when you are certain that deleting a parent should always delete its children (e.g., order items when an order is deleted).

Primary Key vs Foreign Key: Comparison

Primary KeyForeign Key
PurposeUniquely identifies each row in a tableReferences a primary key in another table
UniquenessMust be uniqueCan have duplicate values
NULLCannot be NULLCan be NULL (optional relationship)
Per tableExactly oneZero or more
DirectionIs referenced by foreign keysReferences a primary key
Auto-generatedOften auto-incremented or UUIDValues come from the referenced table
IndexAutomatically indexed (clustered in most DBs)Not auto-indexed in some DBs (e.g., PostgreSQL, MySQL InnoDB)
Constraint typeEntity integrityReferential integrity

Relationship cardinality

Foreign keys define the cardinality of relationships between tables:

RelationshipExampleHow it works
One-to-ManyOne user → many ordersForeign key on the "many" side (orders.user_id)
One-to-OneOne user → one profileForeign key on either side with a UNIQUE constraint
Many-to-ManyStudents ↔ coursesJunction table with two foreign keys
-- One-to-One: user profile
CREATE TABLE profiles (
id INT PRIMARY KEY,
user_id INT UNIQUE, -- UNIQUE makes it 1:1
bio TEXT,
CONSTRAINT fk_profiles_users FOREIGN KEY (user_id) REFERENCES users(id)
);

-- Many-to-Many: students ↔ courses (junction table)
CREATE TABLE enrollments (
student_id INT,
course_id INT,
enrolled_at TIMESTAMP,
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES students(id),
FOREIGN KEY (course_id) REFERENCES courses(id)
);

Natural Key vs Surrogate Key

When choosing a primary key, you face a fundamental design decision:

Natural KeySurrogate Key
DefinitionA column with business meaning (e.g., email, ISBN, SSN)An artificially generated column with no business meaning (e.g., auto-increment INT, UUID)
Exampleemail, isbn, tax_idid INT AUTO_INCREMENT, id UUID
ProsNo extra column; directly meaningfulNever changes; simple; uniform across tables
ConsMay change (email, SSN); may be composite; can be largeExtra column; no business meaning
PerformanceCan be slow if the key is a long stringFast joins on small integer types
-- Auto-increment integer (most common)
CREATE TABLE users (
id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, -- PostgreSQL 10+
name VARCHAR(100),
email VARCHAR(255) UNIQUE
);

-- SQL Server
CREATE TABLE users (
id INT IDENTITY(1,1) PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(255) UNIQUE
);

-- UUID (for distributed systems)
CREATE TABLE users (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY, -- PostgreSQL 13+
name VARCHAR(100),
email VARCHAR(255) UNIQUE
);

When to use natural keys

Natural keys make sense when the business identifier is truly immutable and guaranteed unique:

  • ISO country codes (US, VN, DE)
  • Currency codes (USD, EUR)
  • Language codes (en, vi)
warning

Avoid using emails, phone numbers, or SSNs as primary keys. They can change, be reused, or have privacy implications. Use a surrogate key and put a UNIQUE constraint on the natural identifier instead.

Foreign Key Indexing

A critical performance consideration: foreign key columns should be indexed.

Without an index on orders.user_id, the database must perform a full table scan on orders every time a row in users is deleted or updated (to enforce the foreign key constraint). This can be devastating for performance.

-- PostgreSQL, MySQL InnoDB: foreign keys are NOT auto-indexed on the referencing side
-- You must create the index manually:
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- SQL Server: foreign keys ARE auto-indexed... wait, no. They are NOT.
-- Same rule applies — always index your foreign key columns.
tip

Always index foreign key columns. This is one of the most common performance mistakes in database design. Some ORMs and frameworks don't do this automatically.

Common Pitfalls

  • Using a natural key that changes — if your primary key changes, every foreign key referencing it must also change. Use surrogate keys to avoid this.
  • Missing foreign key indexes — without an index on the foreign key column, deletes and updates on the parent table cause full table scans.
  • Overusing CASCADEON DELETE CASCADE can silently delete large amounts of data. Use it intentionally, not by default.
  • Not handling circular references — two tables referencing each other can make inserts and deletes complicated. Break the cycle with nullable foreign keys or deferred constraints.
  • Using GUID/UUID as clustered index — random UUIDs cause index fragmentation. Use sequential UUIDs or integer surrogate keys for clustered primary keys.
  • Composite primary keys in large tables — multi-column primary keys make all foreign key references wider and slower. Prefer single-column surrogate keys.

Best Practices

  1. Always define a primary key — every table should have one. Without it, you cannot reliably identify or update rows.
  2. Prefer surrogate keys — use auto-increment integers or UUIDs. Keep natural identifiers as UNIQUE constraints.
  3. Always define foreign keys — don't rely on application logic to maintain relationships. The database enforces integrity even when the application has bugs.
  4. Index every foreign key column — this is critical for join performance and constraint enforcement.
  5. Use RESTRICT by default — choose CASCADE only when you have a clear business requirement for cascading behavior.
  6. Keep primary keys small — INT or BIGINT is ideal. Every foreign key duplicates the primary key value, so smaller keys mean smaller indexes and faster joins.
  7. Never change primary key values — if you need to change an identifier, add a new surrogate key and keep the old value as a unique column.

Interview Questions

1. What is the difference between a primary key and a foreign key?

Primary KeyForeign Key
PurposeUniquely identifies a rowReferences a row in another table
UniquenessMust be uniqueCan have duplicates
NULLCannot be NULLCan be NULL
Per tableExactly oneZero or more
EnforcesEntity integrityReferential integrity
Auto-indexedYes (always)Depends on the database

2. What is referential integrity and how is it enforced?

Referential integrity ensures that every foreign key value matches an existing primary key in the referenced table. It prevents:

  • Inserting a child row with a non-existent parent
  • Deleting a parent row that is still referenced by children (unless CASCADE)
  • Updating a parent's primary key while children still reference the old value

It is enforced by the database through foreign key constraints — the database checks every INSERT, UPDATE, and DELETE against the constraint.

3. What is the difference between a natural key and a surrogate key?

Natural KeySurrogate Key
SourceBusiness data (email, ISBN, SSN)Generated (auto-increment, UUID)
MeaningfulYes — has business meaningNo — artificial identifier
StabilityMay change over timeNever changes
SizeCan be large (strings)Small and uniform (INT, BIGINT)
RecommendationUse as UNIQUE constraintUse as primary key

Surrogate keys are generally preferred because they are stable, simple, and performant.

4. What happens when you delete a row that is referenced by a foreign key?

It depends on the ON DELETE action defined on the foreign key constraint:

ActionResult
RESTRICT / NO ACTIONDelete is rejected — error is raised
CASCADEAll referencing rows are automatically deleted
SET NULLForeign key columns in referencing rows are set to NULL
SET DEFAULTForeign key columns are set to their default value

The default is RESTRICT, which is the safest option.

5. Why should foreign key columns be indexed?

Without an index on the foreign key column:

  • Deleting a parent row requires a full table scan on the child table to verify no references exist (or to find rows to cascade)
  • Joining parent and child tables is slow — the database cannot efficiently look up matching rows
  • Locking can escalate from row-level to table-level under some databases (e.g., MySQL InnoDB)

Indexing foreign key columns ensures O(log n) lookups instead of O(n) scans.

6. Can a table have multiple foreign keys?

Yes. A table can have as many foreign keys as needed, each referencing a different table (or even the same table for self-referencing relationships).

CREATE TABLE order_items (
id INT PRIMARY KEY,
order_id INT REFERENCES orders(id), -- FK to orders
product_id INT REFERENCES products(id), -- FK to products
quantity INT
);

Common examples:

  • order_items references both orders and products
  • employees references departments (department) and employees (manager) — self-referencing
  • posts references users (author) and categories (category)

Learn More