Enhancing Database Performance with ULID: A Go and PostgreSQL Guide

Backend Development

Explore ULID (Universally Unique Lexicographically Sortable Identifier) as a powerful alternative to traditional UUIDs. Learn how ULID enhances database performance through its sortable structure and integrate it into Go applications with PostgreSQL, understanding its benefits and minor limitations.

Universally Unique Identifiers (UUIDs) are a widely adopted standard for generating unique identifiers. However, their broad application often reveals inherent limitations that can hinder system performance and usability. Traditional UUID versions are not always character-efficient or human-readable. UUID v1/v2 require unique MAC addresses, making them impractical in many environments, while UUID v3/v5 depend on unique seeds. UUID v4, based purely on randomness, can lead to database fragmentation, particularly in B-tree indexes, negatively impacting write performance.

Introducing ULID: A Superior Identifier

Several projects have successfully adopted ULID (Universally Unique Lexicographically Sortable Identifier) as an alternative, offering significant advantages. This article explores the benefits of ULID, with a specific focus on its implementation in Go applications utilizing PostgreSQL databases, though the principles apply broadly across languages and database systems.

The full specification for ULID can be found here.

ulid() // 01ARZ3NDEKTSV4RRFFQ69G5FAV

Why ULID Excels

ULID addresses the shortcomings of conventional UUIDs through four key characteristics:

  1. Lexicographically Sortable: This is ULID's most significant advantage, enabling efficient database indexing as IDs can be naturally sorted by creation time.
  2. Case Insensitive: Simplifies handling and storage.
  3. No Special Characters: Ensures URL safety and easy integration into various systems.
  4. UUID Compatible: ULID can often be used with existing UUID columns in databases, minimizing schema changes.

Structurally, a ULID is 128 bits long, similar to a UUID, but its bits are deliberately arranged: 48 bits represent a timestamp, followed by 80 bits of cryptographically secure randomness.

01AN4Z07BY 79KA1307SR9X4MV3 |----------| |----------------| Timestamp Randomness 48bits 80bits

Practical Example: Go and PostgreSQL

ULID's power lies in its seamless integration, even with systems already using the UUID data type. Here's a demonstration using Go with the popular pgx driver for PostgreSQL and the oklog/ulid package.

The following Go code connects to a PostgreSQL instance, creates a table with a UUID primary key, and then inserts records using both standard UUID v4 and ULID.

package main

import (
  "context"
  "fmt"
  "os"

  "github.com/google/uuid"
  "github.com/jackc/pgx/v5"
  "github.com/oklog/ulid/v2"
)

func main() {
  ctx := context.Background()

  conn, err := pgx.Connect(ctx, "postgres://...")
  if err != nil {
    panic(err)
  }
  defer conn.Close(ctx)

  _, err = conn.Exec(ctx, `
CREATE TABLE IF NOT EXISTS ulid_test (
  id UUID PRIMARY KEY,
  kind TEXT NOT NULL,
  value TEXT NOT NULL
);`)
  if err != nil {
    panic(err)
  }

  insertUUID(ctx, conn, "1")
  insertUUID(ctx, conn, "2")
  insertUUID(ctx, conn, "3")
  insertUUID(ctx, conn, "4")
  insertUUID(ctx, conn, "5")

  insertULID(ctx, conn, "1")
  insertULID(ctx, conn, "2")
  insertULID(ctx, conn, "3")
  insertULID(ctx, conn, "4")
  insertULID(ctx, conn, "5")
}

func insertUUID(ctx context.Context, conn *pgx.Conn, value string) {
  id := uuid.New()
  conn.Exec(ctx, "INSERT INTO ulid_test (id, value, kind) VALUES ($1, $2, 'uuid')", id, value)

  fmt.Printf("Inserted UUID: %s\
", id.String())
}

func insertULID(ctx context.Context, conn *pgx.Conn, value string) {
  id := ulid.Make()

  // As you can see, we don’t need to format the ULID as a string; it can be used directly
  conn.Exec(ctx, "INSERT INTO ulid_test (id, value, kind) VALUES ($1, $2, 'ulid')", id, value)

  fmt.Printf("Inserted ULID: %s\
", id.String())
}

The oklog/ulid package implements database/sql/driver.Valuer and encoding.TextMarshaler interfaces, enabling automatic conversion to a format compatible with PostgreSQL's UUID column type via the pgx driver. This allows developers to leverage ULID's sortable benefits without altering the underlying database schema.

Sortability and Enhanced Performance

The time-based prefix in ULIDs ensures that newer IDs are always greater than older ones. This property leads to new records being physically appended to the end of database indexes, a stark contrast to UUID v4's scattered, random insertions.

With traditional UUID v4, sorting records by insertion time typically requires an additional timestamp column. ULID, however, inherently provides this sort order. Observe the output of a simple database query:

select * from ulid_test where kind = 'ulid' order by id;

Output:

019aaae4-be9c-d307-238f-be1692b3e8d7 | ulid | 1 019aaae4-be9d-011f-b82e-b870ca2abe9d | ulid | 2 019aaae4-be9f-e9d7-6efc-5b298ecc572b | ulid | 3 019aaae4-bea0-deae-6408-d89e7e3ce030 | ulid | 4 019aaae4-bea1-8ed2-c2f5-144bb1ffedde | ulid | 5

As demonstrated, the records are retrieved in their insertion order. Furthermore, ULIDs are more concise and cleaner for use in contexts like URLs: /users/01KANDQMV608PBSMF7TM9T1WR4. ULID can generate an astounding 1.21e+24 unique IDs per millisecond, sufficient for most applications.

Limitations and Future Outlook

While ULID offers significant advantages, it's important to understand its limitations. For extremely high-volume write systems, ULIDs can potentially lead to "hot spots" around the current index key. Since all new writes cluster based on the current timestamp, this can cause contention and increased latency on specific index blocks.

Despite this, ULID's benefits have propelled its recognition as a major influence in identifier standards. Other alternatives exist, such as CUID or NanoID. Notably, the proposed UUID v7 standard aims to resolve the sortability and database performance issues of older UUID versions by adopting a similar time-ordered structure to ULID, signifying a convergence towards more performant identifier designs.

Further Resources