Jul 04, 2026

OLAP – Phase 4 Table, Catalog, and Bulk Loading

Phases 1-3 built the storage primitives — vectors, row groups, column segments, and compression. But there’s no concept of a “table” yet. There’s no way to define a schema (CREATE TABLE), no way to add rows one at a time, no way to bulk-load from a CSV, and no way to persist and reload the database across restarts.

This phase adds the Table (schema + row groups + append state), the Catalog (table registry with JSON manifest), and a CSV loader for bulk ingestion.

DuckDB

Jul 01, 2026

OLAP – Phase 3 Compression

Columnar storage has a property that row-based storage doesn’t: values in a column tend to be similar. A status column might have only 5 distinct values across millions of rows. A timestamp column is monotonically increasing. A year column spans a narrow range. Compression codecs exploit these patterns to shrink data 5-10x, and because columns are stored independently, each column gets the codec that fits its data best.

This phase implements four compression codecs — RLE, Dictionary, Bitpacking, and Delta — with automatic selection that tries all applicable codecs and picks the smallest.

DuckDB

Jun 27, 2026

OLAP – Phase 2 Columnar Storage

Phase 1 gave us in-memory vectors and data chunks. But a database needs to persist data to disk and read it back. The challenge is designing a disk format that preserves the benefits of columnar layout: read only the columns you need, skip entire sections when the data can’t match your query.

This phase builds two structures: RowGroups (horizontal partitions of a table, each holding up to 122,880 rows) and ColumnSegments (one column’s serialized data within a row group, annotated with min/max zone maps).

DuckDB

Jun 24, 2026

OLAP – Phase 1 Vectors and DataChunks

The fundamental difference between OLTP and OLAP databases is how they store and process data. An OLTP database like PostgreSQL stores rows — when you insert a customer record, all columns (name, email, balance) sit together on one page. An OLAP database like DuckDB stores columns — all names sit together, all emails sit together, all balances sit together. This layout is what makes analytical queries fast: SELECT SUM(balance) FROM customers only touches the balance column, ignoring everything else.

This phase builds the two foundational data structures of a columnar engine: Vectors (typed columnar arrays) and DataChunks (batches of column vectors).

DuckDB

Jun 20, 2026

OLTP – Phase 10 Transactions, Concurrency, REPL, and Server

The database is fully functional — it can store data, run SQL, use indexes, and survive crashes. But it’s missing three things that make a database a database:

  1. Transactions: grouping multiple operations into an atomic unit (BEGIN/COMMIT/ROLLBACK)
  2. Concurrency control: preventing multiple connections from corrupting each other’s data
  3. An interface: a way for users and applications to actually connect and run queries

This phase ties everything together. The WAL from Phase 9 gave us durability (the “D” in ACID). Now we add atomicity (“A”) via transactions, and isolation (“I”) via locking.

PostgreSQL

Jun 17, 2026

OLTP – Phase 9 Write-Ahead Log (WAL)

Everything we’ve built so far has a fatal flaw: if the power goes out, data is lost. The buffer pool keeps dirty pages in memory and flushes them to disk lazily. A crash before that flush means those changes vanish.

The write-ahead log solves this. Before changing any data page, we write a description of the change to a separate, sequential log file. If the database crashes, we replay the log on restart and reconstruct the lost changes. This guarantees durability — the “D” in ACID.

The key insight: writing to a sequential log file is cheap. It’s one append to one file, always at the end. Writing random data pages is expensive — they’re scattered across multiple files. So we make the cheap write first (WAL), and let the expensive writes (data pages) happen whenever convenient. If we crash before the data pages are written, the WAL has everything we need to recover.

In PostgreSQL, this is xlog.c, xlogrecord.h, and xlogrecovery.c.

PostgreSQL

Jun 13, 2026

OLTP – Phase 8 B-Tree Index

Without an index, every query scans every row. WHERE id = 42 on a million-row table reads all million rows to find one. That’s O(n) — slow.

A B+ tree index is a sorted data structure that maps key values (like id) to the physical location of rows (page number + slot). It enables O(log n) lookups — finding a row in a million-row table reads 3-4 pages instead of thousands.

This is the same reason a book index exists: instead of reading every page to find “B-tree,” you check the index at the back, which tells you the page number directly.

PostgreSQL

Jun 10, 2026

OLTP – Phase 7 Query Executor (The Integration Phase)

Before this phase, the parser and storage engine were separate systems. The parser could turn SQL into an AST, and the storage engine could read/write rows — but they didn’t talk to each other. Queries were still built manually in Go code.

Phase 7 connects everything. The Executor takes a SQL string, parses it, looks up the table in the catalog, converts the AST to an execution plan, runs it, and returns results.

PostgreSQL

Jun 06, 2026

OLTP – Phase 6 SQL Parser

Until now, every query is built manually in Go — constructing executor nodes and expression structs by hand. That works for testing, but a real database needs to understand SQL text. Phase 6 adds a parser that turns a SQL string like SELECT name, age FROM users WHERE age > 25 into a tree of Go structs (an AST) that the executor can work with. This phase is pure syntax — no storage, no execution. It takes a string and produces a data structure.

PostgreSQL

Jun 03, 2026

OLTP – Phase 5 Buffer Pool Manager

Every time the database reads a row, it calls HeapFile.ReadPage() which seeks to a position in the file, reads 4096 bytes from disk, and copies them into memory. When a sequential scan reads page 0, then page 1, then page 0 again — it reads page 0 from disk twice. The OS has a page cache that might help, but the database can’t control or rely on it.

Phase 5 adds a buffer pool: a fixed-size array of page-sized slots in memory. When you need a page, the pool checks if it’s already cached. If so, you get the in-memory copy instantly. If not, it reads from disk and caches it for next time.

This is the single biggest performance optimization in any database. Everything else — indexes, query optimization, parallel scans — builds on top of having fast page access.

PostgreSQL

May 30, 2026

OLTP – Phase 4 Sequential Scan and Filtering

Phase 3 gave us table.ScanAll() — iterate through every row. But there’s no way to say “only rows where age > 25” or “only the name column.” Phase 4 adds that.

This is the first time the database actually looks at what’s inside the rows. Before this, data was just stored and retrieved. Now it’s evaluated.

PostgreSQL

May 27, 2026

OLTP – Phase 3 Table and Catalog

Phase 1 gave us pages and heap files (raw byte storage on disk). Phase 2 gave us schemas and serialization (typed values to/from bytes). But these two layers don’t know about each other — you’d have to manually serialize values, manually find a page with space, manually write it back.

Phase 3 connects them. A Table ties a schema to a heap file so you can just say InsertRow(values). A Catalog manages multiple tables so you can say GetTable("users").

PostgreSQL

May 23, 2026

OLTP – Phase 2 Tuple/Record Format

In Phase 1, pages store raw bytes — InsertTuple([]byte("alice")). The page has no idea what those bytes mean. Phase 2 adds the layer that gives bytes meaning: a schema that defines columns and types, and a serializer that converts typed values to/from bytes.

After this phase, we can say “this row has an INT called id, a VARCHAR called name, and a BOOL called active” and reliably pack/unpack those values.

In PostgreSQL, this is implemented in src/backend/access/common/heaptuple.c with the tuple header defined in src/include/access/htup_details.h.

PostgreSQL

May 16, 2026

OLTP – Phase 1 Pages and Heap Files

Every database needs to store data on disk and read it back. But disks don’t work like memory — you can’t read a single byte efficiently. Disk I/O works in blocks. The operating system reads and writes in chunks (typically 4KB). Databases lean into this by organizing all data into fixed-size pages that match the OS block size.

A heap file is the simplest way to organize pages — just stack them end-to-end in a file. “Heap” means unordered: rows go wherever there’s room.

PostgreSQL

Jul 15, 2022

Prevent Unnecessary Rendering When Using React Hooks

When working with state hooks like useReducer or useState or useContext, it can become expensive to render components. If we let the framework handle rendering, all components and children components will be rendered if the parent uses state or context hooks or an action is dispatch into the reducer hook. Today we will look at optimisation to stop the rendering propagation for components which do not need to be rendered.

Typescript React