Introducing the IDE for Data Engineering: The New Way to Build Pipelines on Databricks

Data Engineering

Databricks introduces a new IDE for Spark Declarative Pipelines, enhancing productivity, debugging, and collaboration. This dedicated developer experience streamlines pipeline creation from development to deployment.

Published: November 19, 2025

Announcements | 5 min read

by Adriana Ispas, Lennart Kats, Camiel Steenstra and Monica Alvarez Vicente

Summary

  • Spark Declarative Pipelines now feature a dedicated IDE developer experience within the Databricks Workspace.
  • This new IDE significantly improves productivity and debugging capabilities through features like dependency graphs, data previews, and execution insights.
  • It supports a wide range of use cases, from quick onboarding for new users to advanced scenarios involving Git integration, CI/CD, and robust observability.

At this year’s Data + AI Summit, we unveiled the IDE for Data Engineering: a groundbreaking developer experience designed specifically for authoring data pipelines directly within the Databricks Workspace. As the new default development environment, this IDE embodies our structured approach to data engineering: it's declarative by default, modular in its architecture, seamlessly integrated with Git, and enhanced with AI assistance.

In essence, the IDE for Data Engineering provides all the tools you need to author and test data pipelines—all consolidated in one intuitive location.

With this new development experience now available in Public Preview, this blog post aims to explain why declarative pipelines greatly benefit from a dedicated IDE and to highlight the core features that accelerate pipeline development, enhance organization, and simplify debugging.

Declarative data engineering gets a dedicated developer experience

Declarative pipelines streamline data engineering by allowing you to specify what you want to achieve, rather than providing detailed, step-by-step instructions on how to build it. While declarative programming is an incredibly powerful method for constructing data pipelines, managing multiple datasets and the entire development lifecycle can become challenging without specialized tooling.

This is precisely why we developed a comprehensive IDE experience for declarative pipelines, integrated directly into the Databricks Workspace. Available as a new editor for Lakeflow Spark Declarative Pipelines, it empowers you to declare datasets and quality constraints in files, organize them into folders, and visualize their connections through an automatically generated dependency graph displayed alongside your code. The editor intelligently evaluates your files to determine the most efficient execution plan and enables rapid iteration by allowing you to rerun individual files, a selected set of changed datasets, or the entire pipeline.

The editor also surfaces valuable execution insights, provides built-in data previews, and includes advanced debugging tools to help you fine-tune your code. Furthermore, it integrates with version control and scheduled execution via Lakeflow Jobs, consolidating all pipeline-related tasks onto a single interface.

By uniting these capabilities into a single, IDE-like surface, the editor delivers the productivity and practices data engineers expect from a modern IDE, while steadfastly adhering to the declarative paradigm.

The video embedded below showcases these features in action, with further details discussed in the subsequent sections.

"The new editor brings everything into one place - code, pipeline graph, results, configuration, and troubleshooting. No more juggling browser tabs or losing context. Development feels more focused and efficient. I can directly see the impact of each code change. One click takes me to the exact error line, which makes debugging faster. Everything connects - code to data; code to tables; tables to the code. Switching between pipelines is easy, and features like auto-configured utility folders remove complexity. This feels like the way pipeline development should work." \— Chris Sharratt, Data Engineer, Rolls-Royce

"In my opinion, the new Pipelines Editor is a huge improvement. I find it much easier to manage complex folder structures and switch between files thanks to the multi-soft tab experience. The integrated DAG view really helps me stay on top of intricate pipelines, and the enhanced error handling is a game changer-it helps me pinpoint issues quickly and streamlines my development workflow." \— Matt Adams, Senior Data Platforms Developer, PacificSource Health Plans

Ease of getting started

We designed the editor to ensure that even users new to the declarative paradigm can quickly build their first pipeline.

  • Guided setup allows new users to begin with sample code, while experienced users can configure advanced setups, such as pipelines integrated with CI/CD via Databricks Asset Bundles.
  • Suggested folder structures provide an excellent starting point for organizing assets without imposing rigid conventions, enabling teams to implement their own established organizational patterns. For example, transformations can be grouped into folders for each medallion stage, with one dataset per file.
  • Default settings let users write and run their initial code without significant upfront configuration overhead, with the flexibility to adjust settings later once their end-to-end workload is defined.

These features empower users to become productive rapidly and smoothly transition their work into production-ready pipelines.

Efficiency in the inner development loop

Building pipelines is inherently an iterative process. The editor optimizes this process with features designed to simplify authoring and accelerate testing and refinement:

  • AI-powered code generation and code templates speed up the definition of code datasets and data quality constraints, eliminating repetitive steps.
  • Selective execution allows you to run a single table, all tables within a file, or the entire pipeline.
  • Interactive pipeline graph provides a clear overview of dataset dependencies and offers quick actions such as data previews, reruns, navigation to code, or adding new datasets with auto-generated boilerplate.
  • Built-in data previews enable you to inspect table data directly within the editor, without needing to switch contexts.
  • Contextual errors appear alongside the relevant code, often accompanied by suggested fixes from the Databricks Assistant.
  • Execution insights panels display crucial dataset metrics, expectations, and query performance, offering direct access to query profiles for precise performance tuning.

These capabilities minimize context switching, allowing developers to remain focused on constructing pipeline logic.

A single surface for all tasks

Pipeline development encompasses more than just writing code. The new developer experience brings all related tasks together onto a unified surface, from modularizing code for maintainability to setting up automation and observability:

  • Organize adjacent code, such as exploratory notebooks or reusable Python modules, into dedicated folders. Edit files in multiple tabs and execute them independently from the pipeline logic, ensuring related code is discoverable and your pipeline remains clean.
  • Integrated version control via Git folders facilitates safe, isolated work, streamlined code reviews, and efficient pull requests into shared repositories.
  • CI/CD with Databricks Asset Bundles support for pipelines seamlessly connects inner-loop development to deployment. Data administrators can enforce testing and automate promotion to production using templates and configuration files, all without adding complexity to a data practitioner’s workflow.
  • Built-in automation and observability enable scheduled pipeline execution and provide quick access to past runs for comprehensive monitoring and troubleshooting.

By unifying these capabilities, the editor streamlines both daily development workflows and long-term pipeline operations.

Check out the video below for a more detailed demonstration of all these features in action.

What’s next

Our commitment to innovation continues. Here’s a glimpse of what we are currently exploring:

  • Native support for data tests in Lakeflow Spark Declarative Pipelines and integrated test runners within the editor.
  • AI-assisted test generation to accelerate validation processes.
  • An agentic experience for Lakeflow Spark Declarative Pipelines.

Your feedback is invaluable and drives our development — please let us know what else you’d like to see.

Get started with the new developer experience today

The IDE for data engineering is available across all cloud platforms. To enable it, simply open a file associated with an existing pipeline, click the ‘Lakeflow Pipelines Editor: OFF’ banner, and toggle it on. Alternatively, you can enable it during pipeline creation via a similar toggle or from the User Settings page.

Learn more using these resources: