What is the difference between a clustered and a non-clustered index?
sql-mid-001
Your answer
Answer as you would in a real interview — explain your thinking, not just the conclusion.
Model answer
A clustered index determines the physical order of rows on disk — the table's data pages are sorted according to the clustered index key. Because of this, a table can only have one clustered index, and it is usually the primary key. A range scan on the clustered key is very efficient because the rows are physically adjacent. A non-clustered index is a separate structure that stores the index key plus a pointer (row ID or clustered key) back to the actual row, so you pay an extra lookup to fetch non-indexed columns. When a query can be satisfied entirely from the non-clustered index columns it becomes a 'covering index' and avoids the lookup altogether. In SQL Server the heap (table without a clustered index) uses RID lookups; in PostgreSQL all indexes are non-clustered but CLUSTER command can physically reorder the heap once.
Code example
-- SQL Server: clustered index is created on PK by default
CREATE TABLE orders (
id INT PRIMARY KEY CLUSTERED, -- clustered
customer_id INT,
status VARCHAR(20),
created_at DATETIME2
);
-- Non-clustered index — separate structure, pointer to row
CREATE NONCLUSTERED INDEX idx_nc_status
ON orders(status)
INCLUDE (customer_id, created_at); -- covering adds columns
-- PostgreSQL equivalent
CREATE INDEX idx_status ON orders(status) INCLUDE (customer_id, created_at);
Follow-up
When a non-clustered index lookup performs a 'key lookup' back to the base table, what is the performance implication, and how would you fix it?