SQL: The Enduring Language of Data in Modern Data Engineering

Data Engineering

Discover SQL's essential role in data management, its evolution in modern analytics, and its synergy with data engineering tools and paradigms.

SQL, a powerful domain-specific language, is indispensable in programming, primarily for managing data within relational database management systems and for stream processing in relational data stream systems. It is closely integrated with Jinja Template within dbt. Beyond traditional Business Intelligence tools, an increasing number of data engineering tools are now exclusively integrating with SQL. This deep integration is further explored in "Building an Analytics API with GraphQL - The Next Level of Data Engineering" and complemented by insights from "Components of an Analytics API".

SQL and Databases

SQL is widely recognized as the language of data and represents a fundamental skill for anyone engaged in data-related work. Its utility is nearly universal; even when interacting with a REST API that lacks a direct SQL interface, understanding SQL is invaluable, as the REST service will most likely translate your request into a SQL query against a database. A strong grasp of concepts like joins and filters allows you to optimize requests by adding key filters or splitting complex queries, thus circumventing potentially resource-intensive operations.

Conversely, Python serves as the primary tooling language for data engineers, acting as the 'glue' that binds various data engineering steps together. It's the ultimate toolkit for tasks such as pulling data from REST APIs or web sources, cleaning insufficient data, and storing it in databases like Postgres. Python provides the necessary framework to perform these operations in a safe and ordered manner.

Historically, stored procedures were common. Within the closed ecosystems of large vendors like SAP, Oracle, or Microsoft, you could embed 'glue code' directly into the database using languages like PL/SQL or T-SQL. However, for integrations with external tools, such as REST APIs, Python's extensive and robust tooling ecosystem offers a significantly easier and more flexible approach.

History

SQL was invented in the 1970s, building upon the principles of the Relational Data Model. Initially, it was referred to as the Structured English Query Language (SEQUEL), a name later shortened to SQL. Oracle, then known as Relational Software, pioneered the commercial offering of an SQL relational database management system. This evolution of data paradigms can be broadly traced as:

SQL -> Data Mart -> Materialized View -> BI Report -> Traditional OLAP -> BI Dashboard -> Modern OLAP -> dbt tables -> One Big/Wide/Super Table -> Semantic Layer -> Natural Language Queries

SQL vs. NoSQL or Big Data

Since its inception, SQL has largely dominated the data landscape. As the saying goes, "Never bet against SQL." The majority of users have transitioned to cloud SQL services like BigQuery and Snowflake, prioritizing convenience over the raw power and flexibility offered by systems like Apache Hadoop or Spark. SQL's capabilities have expanded to encompass various workloads—including streaming, data transformation, document management, geospatial analysis, and machine learning—many of which were once exclusive to non-SQL systems. However, two key areas where SQL has yet to fully prevail are Business Intelligence and data-intensive computing. The question remains whether SQL or alternative solutions will ultimately dominate these domains, a topic explored by Julian Hyde in Into the Wilderness.

ANSI SQL

What defines ANSI SQL?

SQL Syntax

Noteworthy new features include UNION ALL BY NAME.

Modern SQL

For a deeper dive into contemporary SQL, the website Modern SQL: A lot has changed since SQL-92 by Markus Winand comprehensively showcases the significant advancements and changes in SQL since its SQL-92 standard.

Declarative Nature

SQL is inherently declarative.

Extending SQL for Analytics

SQL, being the foundational language for data, has continuously evolved with new extensions over the years. Features like geospatial capabilities and window functions have significantly enhanced its utility. This raises the question: why not further extend SQL to incorporate more robust analytics semantics? This concept is further explored in "Extending SQL for analytics".

Anatomy of an SQL Statement

Understanding an SQL statement involves phases from parsing to compiling and execution. These processes can be managed with tools like SDF and SQLGlot.

Source

The Future of SQL

Julian Hyde discussed 'More Than Query: Future Directions of Query Languages, from SQL to Morel' at Data Council on June 4, 2025.

Enduring Patterns of SQL

Matthias Broecheler, in a LinkedIn post, highlights SQL's tendency to absorb and integrate new paradigms:

  • MapReduce? Evolved into Hive, then SparkSQL.
  • NoSQL? Transformed into "Not Only SQL" and subsequently NewSQL.
  • Graph databases? Influenced pattern matching capabilities within SQL.
  • Vector databases? Integrated as a SQL index type.
  • Streaming data? Manifested in solutions like Flink SQL.

He elaborates further on this phenomenon in his article, The Gravitational Pull of SQL (by Matthias Broecheler).

Resources

Recommended Reading:

  • Efficient Data Processing in SQL by Joseph Machado

Comprehensive Guide:

  • Ultimate SQL Guide from Count-co

Unique Learning Approach:

  • SQL Noir (Learn SQL as a detective)

Original Creation Date: August 8, 2022

Last Updated: December 3, 2025

Authored by: Simon Späti