TanStack DB 0.5: Revolutionizing Data Loading with Query-Driven Sync

web development

TanStack DB 0.5 introduces Query-Driven Sync, turning client queries into precise API calls. It tackles API sprawl, optimizes data fetching, and offers eager, on-demand, and progressive sync modes for efficient app development.

As TanStack tools gain wider adoption, developers increasingly seek efficient and predictable data management solutions that avoid the proliferation of API endpoints. TanStack DB's newest feature, Query-Driven Sync (QDS), directly addresses this need.

Applications often face "API sprawl" as they grow in size and complexity, necessitating numerous endpoints for various clients or components. This complexity hinders developer experience, requiring extensive API documentation navigation and complicating lifecycle management for updates or deprecations.

With its v0.5 update, TanStack DB introduces Query-Driven Sync, fundamentally altering how developers approach data fetching, caching, and updates. Instead of crafting custom backend endpoints or GraphQL resolvers for each data view, QDS transforms a component's query into the actual API call. Developers define queries directly within their client components, and TanStack DB automatically translates these into precise network requests.

This article will delve into the mechanics of Query-Driven Sync and demonstrate how to leverage it for building efficient, scalable React applications. A foundational understanding of TanStack DB and its core concepts is recommended to follow along; an introductory guide on TanStack DB is available for newcomers.

What is Query-Driven Sync?

TanStack DB operates on a declarative principle: developers specify the required data, and the system manages fetching, caching, and updates. While generally effective, this approach can lead to performance inconsistencies between large and small datasets due to overfetching and excessive upfront data propagation.

Query-Driven Sync (QDS) was developed to mitigate these issues by empowering developers to control data loading into collections using client-defined queries. Essentially, QDS is the mechanism for synchronizing TanStack DB collections through live queries.

Prior to this update, collections would sync all records upon initialization. This meant a client had to fetch the entire dataset from the server before a collection was considered ready.

Consider the following code example:

import { createCollection } from '@tanstack/db'
import { queryCollectionOptions } from '@tanstack/query-db-collection'

const todoCollection = createCollection(
  queryCollectionOptions({
    queryKey: ['todos'],
    queryFn: async () => {
      // This fetch returns all todos from the backend
      const response = await fetch('/api/todos')
      return response.json() 
    },
    getKey: (todo) => todo.id,
  })
)

// In a React component
function TodoList() {
  const { data: todos, status } = useLiveQuery((q) =>
    q.from({ todos: todoCollection })
  )

  if (status === 'loading') {
    return <div>Loading …</div>
  }

  return (
    <ul>
      {todos.map((t) => (
        <li key={t.id}>{t.text}</li>
      ))}
    </ul>
  )
}

In this example, when todoCollection is created and begins syncing, it immediately invokes queryFn(). This triggers a network request to the /api/todos endpoint, fetching all todos from the server. The collection.status only becomes 'ready' after the complete list is returned and written into the collection, at which point useLiveQuery can be used to query the data locally.

This approach becomes problematic with large datasets, such as 50,000 rows or more. Query-Driven Sync resolves this by allowing collections to define schema and security, while queries determine the specific subset of data to load into a collection.

Sync Modes

TanStack DB 0.5 introduces three distinct sync modes for implementing Query-Driven Sync: Eager, On-demand, and Progressive. Each mode is designed to cater to varying data-loading requirements across components and applications.

Eager Mode

Eager mode is the default sync behavior and was the sole option before version 0.5. As demonstrated earlier, this mode pre-loads the entire dataset. Once loaded, all subsequent client-side operations like filters, joins, sorts, and reads execute with sub-millisecond performance.

Enabling eager mode is simple: add a syncMode property with the value 'eager' to your collection's loader configuration, as shown:

const todoCollection = createCollection({ 
  queryKey: ['todos'], 
  queryFn: async () => { 
    const resp = await fetch('/api/todos'); 
    return resp.json(); 
  }, 
  getKey: todo => todo.id, 
  syncMode: 'eager' 
});

However, eager mode's efficiency diminishes with larger datasets because it necessitates immediate fetching of the entire collection, potentially leading to increased load times and bandwidth consumption. Therefore, eager mode is best suited for small (under 10,000 rows) and relatively static datasets, such as user preferences or small reference tables.

On-Demand Mode

On-demand mode addresses the limitations of eager mode by enabling collections to selectively load large, dynamic datasets based on specific query requirements. When activated, a collection fetches only the data actively requested by your queries.

Unlike eager mode, implementing on-demand mode is more involved as TanStack DB doesn't automatically translate queries into network requests. This requires manual implementation via predicate mapping.

Before predicate mapping, TanStack DB performs predicate pushdown. When queries are defined, QDS pushes query predicates (logical expressions like filters, limits, where clauses, and orderBy) to the collection's loader through the loadSubsetOptions metadata available in the context.

Let's examine this in practice. Suppose we have a product collection. We can enable on-demand sync by adding syncMode: 'on-demand' to the collection loader, like this:

import { createCollection } from '@tanstack/db'; 
import { queryCollectionOptions } from '@tanstack/query-db-collection'; 

const productsCollection = createCollection(queryCollectionOptions({ 
  queryKey: ['products'], 
  queryFn: async (...) => { /* ... */ }, // queryFn will access loadSubsetOptions 
  syncMode: 'on-demand' 
}));

With on-demand enabled, queryFn gains access to loadSubsetOptions via the query collection's context, where query predicates are stored as expression trees. These can be parsed into a structured object using TanStack's parseLoadSubsetOptions helper function:

const productCollection = createCollection( 
  queryCollectionOptions({ 
    queryKey: ['products'], 
    queryFn: async (ctx) => { 
      // Parse your query predicates into a structured object 
      const parsed = parseLoadSubsetOptions(ctx.meta?.loadSubsetOptions); 

      // Example: GET /api/products with query-specific filters 
      const response = await fetch(/* map params to make network request */); 
      return response.json(); 
    }, 
    syncMode: 'on-demand', 
  }) 
);

When a live query is written on the client, such as:

const { data: cheapElectronics } = useLiveQuery(q => 
  q 
    .from({ products: productCollection }) 
    .where(({ products }) => eq(products.category, 'electronics')) 
    .where(({ products }) => lt(products.price, 100)) 
    .orderBy(({ products }) => products.price, 'asc') 
    .limit(10) 
);

TanStack DB interprets these predicates (e.g., category = electronics, price < 100, limit 10), passes them to queryFn, and parses them into a structured object. This object can then be converted into API parameters resembling: GET /api/products?category=electronics&price_lt=100&sort=price:asc&limit=10.

Predicate Mapping

Predicate mapping is the crucial step of converting parsed query predicates into valid query strings for network requests. Understanding the structure of these parsed predicates is essential. They typically contain:

filters: An array of objects, each with { field, operator, value }
sorts: An array of objects, each with { field, direction }
limit: A number or null

Here’s an example of a parsed predicate output:

{ 
  "filters":  [ 
   { "field": ["products", "category"], "operator": "eq", "value": "electronics" }, 
    { "field": ["products", "price"], "operator": "lt", "value": 100 } 
  ], 
  "sorts": [ 
    { "field": ["products", "rating"], "direction": "asc" } 
  ],
  "limit": 5
}

This object precisely describes the client's data requirements:

category must be equal (eq) to "electronics"
price must be less than (lt) 100
The list should be sorted in ascending (asc) order
Only the first 10 items should be returned (limit)

While these properties can be accessed from the parsed variable, it's often cleaner to destructure them directly from the parseLoadSubsetOptions helper function:

const { filters, sorts, limit } = parseLoadSubsetOptions(ctx.meta?.loadSubsetOptions);

To translate these predicates into a query string, implement the following logic within your queryFn:

queryFn: async (ctx) => { 
  const { filters, sorts, limit } = parseLoadSubsetOptions(ctx.meta?.loadSubsetOptions); 

  const params = new URLSearchParams(); 

  filters.forEach(({ field, operator, value }) => { 
    const name = field.join('.'); 
    switch (operator) { 
      case 'eq': 
        params.set(name, String(value)); 
        break; 
      case 'lt': 
        params.set(`${name}_lt`, String(value)); 
        break; 
      // Handle other operators, e.g., 'lte', 'gt', etc. 
    } 
  }); 

  sorts.forEach(({ field, direction }) => { 
    params.set('sortBy', field.join('.')); 
    params.set('order', direction); 
  }); 

  if (limit != null) { 
    params.set('limit', String(limit)); 
  } 

  const resp = await fetch(`/api/products?${params.toString()}`); 
  return resp.json(); 
}

Let's break down this process:

A new URLSearchParams instance is created to simplify query string construction.
```
const params = new URLSearchParams(); 
```
The filters array is iterated. For each filter, hierarchical field names are joined with a dot (e.g., products.category) to form the name variable. The operator is then appended.
- An equality operator (eq) maps to a standard query parameter like products.category=electronics.
```
case 'eq': 
   params.set(name, String(value)); 
   break; 
```
- A "less than" operator (lt) appends _lt to the parameter name, such as price_lt=100.
```
case 'lt': 
    params.set(`${name}_lt`, String(value)); 
    break; 
```
The sorts array is processed similarly. Field names are joined and appended to a sortBy parameter, while an order parameter is set with the direction (asc or desc).
```
sorts.forEach(({ field, direction }) => { 
    params.set('sortBy', field.join('.')); 
    params.set('order', direction); 
  }); 
```
This translates to query parameters like sortBy=products.rating&order=asc.

Finally, if the limit property is not null, it's added as a parameter.

if (limit != null) { 
    params.set('limit', String(limit)); 
  }

This predicate mapping generates a precise, minimal network request, fetching only the required data slices (e.g., 10 products in the 'electronics' category, sorted ascending, with a price less than 100). This embodies the core principle of Query-Driven Sync: making your component's query the API call.

For APIs with custom formats, such as GraphQL, predicate mapping works similarly but uses parseWhereExpression instead of parseLoadSubsetOptions, mapping predicates to GraphQL's whereClause format:

queryFn: async (ctx) => { 
  const { where, orderBy, limit } = ctx.meta?.loadSubsetOptions; 

  // Map to GraphQL's where clause format 
  const whereClause = parseWhereExpression(where, {
    handlers: {
      eq: (field, value) => ({ [field.join('_')]: { _eq: value } }),
      lt: (field, value) => ({ [field.join('_')]: { _lt: value } }),
      and: (...conditions) => ({ _and: conditions }),
    },
  }); 

  // Use whereClause in your GraphQL query... 
}

The advantage of predicate mapping is its one-time implementation per collection. Once configured, TanStack DB automatically generates the correct query strings for appropriate API calls, regardless of how live queries are defined on the client.

Progressive Mode

Progressive mode offers a blend of eager and on-demand functionalities for optimized data loading. It immediately fetches the initial queried data batch using on-demand techniques, ensuring a fast first paint or initial render for the user. Concurrently, while the user interacts with this initial data subset, TanStack DB progressively synchronizes the remaining dataset in the background.

const projectCollection = createCollection( 
  electricCollectionOptions({ 
    table: 'projects', 
    syncMode: 'progressive', 
  }) 
); 

// On the client 
const { data: myProjects } = useLiveQuery(q => 
  q 
    .from({ projects: projectCollection }) 
    .where(({ projects }) => eq(projects.ownerId, currentUserId)) 
    .limit(20) 
);

While Query-Driven Sync supports REST, GraphQL, or tRPC APIs, the TanStack team advises using progressive mode with dedicated sync engines like Electric, Trailbase, and PowerSync. These engines seamlessly integrate with QDS, offering enhanced data synchronization and performance.

Enhanced Performance with Sync Engines

Sync engines are specialized systems for real-time data synchronization across distributed environments, typically between multiple clients and a central database. They automatically detect database changes and efficiently propagate these updates as incremental "deltas" to all subscribed clients.

Employing progressive mode with a traditional fetch approach can become costly, as it still requires background fetches for the entire dataset after the initial request. In contrast, sync engines transmit only the changed rows (deltas), enabling the maintenance of large client-side datasets without the overhead of repeated full data fetches.

Furthermore, sync engines eliminate the need for manual predicate mapping with Query-Driven Sync, as they handle predicate translation automatically. This allows developers to simply define live queries on the client, with the sync engine managing the rest, thus requiring no explicit API endpoints. This represents Query-Driven Sync operating precisely as envisioned.

Benefits of Query-Driven Sync for Developers and Teams

Query-Driven Sync (QDS) significantly enhances the developer experience through optimized dataset loading and several key "request economics" improvements inherent to TanStack DB's design:

Automatic Request Collapsing

If multiple components define identical queries (e.g., with the same filters and limits), TanStack DB automatically deduplicates these requests, sending only a single network call. For instance:

// Component A 
const { data: electronics } = useLiveQuery((q) => 
  q.from({ products }).where(({ products }) => eq(products.category, "electronics")) 
); 

// Component B (same query, different component) 
const { data: electronics } = useLiveQuery((q) => 
  q.from({ products }).where(({ products }) => eq(products.category, "electronics")) 
);

Despite two declarations, only one network request will be made to /api/products?category=electronics.

Subset/Delta Loading

Should you expand your query to include additional data, TanStack DB intelligently fetches only the incremental data (the "delta") instead of reloading the entire dataset.

Optimized Joins

When a query involves joining two or more collections, TanStack DB precisely identifies and fetches only the necessary related records, rather than entire collections. Consider this query:

useLiveQuery((q) => 
  q 
    .from(todos) 
    .join(projects, (t, p) => eq(t.projectId, p.id)) 
    .where(eq(todos.completed, false)) 
)

This query seeks all incomplete todos along with their associated project details. Prior to Query-Driven Sync, achieving such a join typically involved:

Loading all todos from /api/todos
Loading all projects from /api/projects

followed by client-side joining or the creation of a custom endpoint like /api/todos-with-projects.

With Query-Driven Sync, TanStack DB analyzes the join conditions and determines the minimal backend calls required to fulfill the query:

Fetch only the required todo IDs: /api/todos?completed=false
Fetch only the specific projects corresponding to those IDs: /api/projects?ids=abc,def,ghi

This intelligent optimization allows QDS to significantly outperform many hand-written endpoints, which often overlook efficient batching or lead to over-fetching.

Incremental Adoption and Backend-Agnostic Setup

Should your backend evolve to support more flexible querying capabilities, only the predicate mapping layer requires updates, leaving your UI queries unchanged. Furthermore, you can seamlessly integrate a sync engine (such as Electric or PowerSync) over time to achieve real-time or delta-only syncing without modifying your existing query logic.

Enhanced Cache Utilization

TanStack DB leverages TanStack Query’s robust cache policies (e.e., staleTime, gcTime). Queries executed within their "fresh" window are served directly from the cache, bypassing unnecessary network requests. When query parameters change (e.g., for filtering or sorting), the DB intelligently fetches only the newly required data, rather than re-fetching all previously loaded information.

Best Practices for Query-Driven Sync

Query-Driven Sync offers a potent and declarative approach for UI queries to dictate data loading. To maximize its benefits, ensuring fast queries, efficient network usage, and a maintainable data layer at scale, consider these recommended best practices:

Model Collections Around Stable Boundaries

Design your collections to represent stable, predictably changing entities within your system, such as database tables, resources, or domain objects. Effective collection modeling simplifies predicate mapping and enhances predictability:

One collection per backend resource: (e.g., products, todos, projects)
Avoid mixing unrelated data: Maintain distinct collections for disparate data types.
Ensure typed, versioned, and documented schemas: Clarity in schema definition is crucial.

Well-defined collection boundaries provide Query-Driven Sync with the necessary context to generate optimal subset loads and incremental deltas.

Keep Joins Lightweight and Purposeful

When performing collection joins, limit relationships to only what the UI genuinely requires. Avoid joining massive datasets without applying significant constraints. More focused join predicates enable TanStack DB to more efficiently batch related lookups and minimize unnecessary network overhead.

Use Sync Modes Intentionally

Effective Query-Driven Sync implementation involves selecting the appropriate sync mode for each collection, based on its specific use case and dataset size. Eager mode is suitable for small, stable datasets; on-demand mode is ideal for large collections; and progressive mode offers a balanced approach. Strategically mixing sync modes across different data domains within your application optimizes resource utilization and user experience.

Leverage Cached Results and Delta Loads

Query-Driven Sync excels when coupled with TanStack DB’s built-in caching and intelligent request deduplication. By maintaining stable query inputs and allowing the sync engine to compute minimal differences between prior and expanded results, you gain from efficient delta-fetching rather than complete reloads. This pattern is especially advantageous for components implementing infinite scrolling or progressive data refinement.

Conclusion

This article has explored the mechanics of Query-Driven Sync (QDS) and its transformative impact on data fetching and synchronization within TanStack DB. By converting each component query into a precise API call, QDS effectively addresses the pervasive challenge of API sprawl in modern applications.

Regardless of application scale, Query-Driven Sync offers significant advantages in performance, efficiency, and simplicity. With the 0.5 update signifying the completion of its core architecture, TanStack DB is now progressing towards version 1. Early adoption and feedback are encouraged to help the team refine any outstanding issues and ensure a seamless final release.