SQL System Design SystemDesign Occasional SQL time-series partitioning IoT PostgreSQL TimescaleDB

Design a time-series data store for IoT sensor readings at scale.

sql-sys-004

Your answer

Answer as you would in a real interview — explain your thinking, not just the conclusion.

Model answer

For IoT time-series at scale I first consider purpose-built engines like TimescaleDB (PostgreSQL extension), InfluxDB, or ClickHouse before going with vanilla PostgreSQL, because they handle the specific access patterns — high-frequency appends, range queries over time, automatic data retention — far more efficiently. If staying in PostgreSQL, the critical design decisions are: partition the table by time range (monthly or weekly) so old partitions can be dropped or compressed without full-table locks; use (sensor_id, recorded_at) as the primary key and index key, with recorded_at as the leading column for recent-data queries; choose column-oriented storage or BRIN indexes for range scans on immutable time columns. For retention I implement automated partition pruning via a cron job rather than row-by-row deletes. For aggregation I materialise summaries (hourly/daily rollups) in a separate table and update them asynchronously, because querying raw data at millisecond granularity for dashboards is impractical. I also design the ingestion path to batch writes and use COPY instead of individual INSERTs to maximise throughput.

Code example

-- PostgreSQL native range partitioning for time-series
CREATE TABLE sensor_readings (
    sensor_id   UUID        NOT NULL,
    recorded_at TIMESTAMPTZ NOT NULL,
    value       DOUBLE PRECISION,
    unit        VARCHAR(20)
) PARTITION BY RANGE (recorded_at);

-- Create monthly partitions
CREATE TABLE sensor_readings_2025_05
    PARTITION OF sensor_readings
    FOR VALUES FROM ('2025-05-01') TO ('2025-06-01');

CREATE TABLE sensor_readings_2025_06
    PARTITION OF sensor_readings
    FOR VALUES FROM ('2025-06-01') TO ('2025-07-01');

-- Index on each partition (auto-inherited from parent)
CREATE INDEX ON sensor_readings(sensor_id, recorded_at DESC);

-- Materialised hourly rollup
CREATE TABLE sensor_hourly_avg (
    sensor_id   UUID        NOT NULL,
    hour_bucket TIMESTAMPTZ NOT NULL, -- truncated to the hour
    avg_value   DOUBLE PRECISION,
    sample_count INT,
    PRIMARY KEY (sensor_id, hour_bucket)
);

-- Efficient range query using partition pruning
SELECT sensor_id, recorded_at, value
FROM sensor_readings
WHERE sensor_id = 'abc-123'
  AND recorded_at BETWEEN '2025-05-20' AND '2025-05-21'
ORDER BY recorded_at;

Follow-up

How would you handle late-arriving data that comes in hours after the fact? How does TimescaleDB's chunk architecture differ from Postgres native range partitioning?