SQL Window Functions: SUM, AVG, LAG, and LEAD

Posted by Kyle January 25, 2026


Window functions perform calculations across a set of rows related to the current row — without collapsing the result into groups like GROUP BY does. They are one of the most powerful features in modern SQL.

The OVER() Clause

Every window function uses OVER() to define which rows to include:

SELECT name, department, salary,
    SUM(salary) OVER () AS total_salary
FROM employees;

This adds a total_salary column with the sum of all salaries — without grouping. Every row still appears individually.

PARTITION BY

PARTITION BY divides rows into groups (like GROUP BY, but without collapsing):

SELECT name, department, salary,
    SUM(salary) OVER (PARTITION BY department) AS dept_total,
    AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;

Each row shows the department total and average alongside the individual salary.

ORDER BY in OVER()

Adding ORDER BY creates a running calculation:

SELECT order_date, amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;
order_date amount running_total
Jan 1 100 100
Jan 2 150 250
Jan 3 75 325
Jan 4 200 525

LAG and LEAD

Compare the current row to previous or next rows:

SELECT
    month,
    revenue,
    LAG(revenue) OVER (ORDER BY month) AS prev_month,
    revenue - LAG(revenue) OVER (ORDER BY month) AS month_over_month
FROM monthly_revenue;
month revenue prev_month month_over_month
Jan 10000 NULL NULL
Feb 12000 10000 2000
Mar 11500 12000 -500

LAG(col, n) looks back n rows (default 1). LEAD(col, n) looks forward n rows.

Default Values

Avoid NULLs for the first/last row:

LAG(revenue, 1, 0) OVER (ORDER BY month)  -- returns 0 instead of NULL

FIRST_VALUE and LAST_VALUE

SELECT name, department, salary,
    FIRST_VALUE(name) OVER (PARTITION BY department ORDER BY salary DESC) AS highest_paid
FROM employees;

Percent of Total

SELECT name, department, salary,
    ROUND(100.0 * salary / SUM(salary) OVER (PARTITION BY department), 1) AS pct_of_dept
FROM employees;

Moving Average

Use a frame specification to average over a sliding window:

SELECT order_date, amount,
    AVG(amount) OVER (
        ORDER BY order_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS seven_day_avg
FROM daily_sales;

Frame Types

Frame Meaning
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW Current row + 2 rows before
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW All rows from start to current (default for running totals)
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING 3-row window centered on current
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING All rows in partition

Combining Multiple Window Functions

SELECT
    name, department, salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
    SUM(salary) OVER (PARTITION BY department) AS dept_total,
    ROUND(100.0 * salary / SUM(salary) OVER (), 2) AS pct_of_company
FROM employees
ORDER BY department, salary DESC;

WINDOW Clause (Reusable Definitions)

Avoid repeating the same OVER clause:

SELECT
    order_date, amount,
    SUM(amount) OVER w AS running_total,
    AVG(amount) OVER w AS running_avg,
    COUNT(*) OVER w AS running_count
FROM orders
WINDOW w AS (ORDER BY order_date);

Supported in PostgreSQL, MySQL 8.0+, and SQLite 3.28+. Not supported in SQL Server or Oracle.

Database Compatibility

Feature MySQL PostgreSQL SQL Server Oracle SQLite
Basic window functions 8.0+ 8.4+ 2005+ 8i+ 3.25+
LAG / LEAD 8.0+ 8.4+ 2012+ 8i+ 3.25+
Frame specification 8.0+ 8.4+ 2012+ 8i+ 3.28+
WINDOW clause 8.0+ Yes No No 3.28+

UNION vs UNION ALL in SQL

Posted by Kyle November 15, 2025


UNION and UNION ALL both combine results from multiple SELECT statements, but they handle duplicate rows differently — and the performance difference can be significant.

The Difference

  • UNION removes duplicate rows from the combined result (like an implicit DISTINCT)
  • UNION ALL keeps all rows, including duplicates
-- Table A: 1, 2, 3
-- Table B: 2, 3, 4

SELECT id FROM table_a
UNION
SELECT id FROM table_b;
-- Result: 1, 2, 3, 4 (duplicates removed)

SELECT id FROM table_a
UNION ALL
SELECT id FROM table_b;
-- Result: 1, 2, 3, 2, 3, 4 (all rows kept)

Performance

UNION must sort or hash the entire result set to find and remove duplicates. On large datasets, this is expensive.

UNION ALL simply appends the results — no additional processing.

Rule of thumb: Use UNION ALL unless you specifically need duplicate removal. In practice, this is the majority of cases.

When UNION ALL is Correct

If the source queries cannot produce overlapping rows, deduplication is wasted work:

-- Orders from different months — no overlap possible
SELECT * FROM orders WHERE order_date >= '2025-01-01' AND order_date < '2025-02-01'
UNION ALL
SELECT * FROM orders WHERE order_date >= '2025-02-01' AND order_date < '2025-03-01';
-- Different tables entirely
SELECT 'customer' AS source, name, email FROM customers
UNION ALL
SELECT 'vendor' AS source, name, email FROM vendors;

When UNION is Correct

Use UNION when you genuinely need to deduplicate:

-- Find all cities where we have customers OR offices
SELECT city FROM customers
UNION
SELECT city FROM offices;

Rules for UNION / UNION ALL

Both require:

  1. Same number of columns in each SELECT
  2. Compatible data types (the database will attempt implicit conversion)
-- This works
SELECT name, email FROM customers
UNION ALL
SELECT company_name, contact_email FROM vendors;

-- This fails (different column counts)
SELECT name, email, phone FROM customers
UNION ALL
SELECT company_name, contact_email FROM vendors;

Column names in the result come from the first SELECT statement.

ORDER BY with UNION

ORDER BY applies to the final combined result. Place it after the last SELECT:

SELECT name, 'customer' AS type FROM customers
UNION ALL
SELECT name, 'vendor' AS type FROM vendors
ORDER BY name;

You cannot put ORDER BY inside individual SELECT statements (except in subqueries).

UNION ALL with Aggregation

A common pattern — combine data then aggregate:

WITH all_transactions AS (
    SELECT amount, transaction_date FROM sales
    UNION ALL
    SELECT -amount, transaction_date FROM refunds
)
SELECT
    DATE_TRUNC('month', transaction_date) AS month,
    SUM(amount) AS net_revenue
FROM all_transactions
GROUP BY 1
ORDER BY 1;

Combining More Than Two Queries

You can chain multiple UNIONs:

SELECT name FROM customers
UNION ALL
SELECT name FROM vendors
UNION ALL
SELECT name FROM employees
ORDER BY name;

INTERSECT and EXCEPT

Related set operations worth knowing:

-- Rows in both queries
SELECT city FROM customers
INTERSECT
SELECT city FROM offices;

-- Rows in the first query but not the second
SELECT city FROM customers
EXCEPT            -- SQL Server, PostgreSQL, SQLite
SELECT city FROM offices;

-- Oracle uses MINUS instead of EXCEPT
SELECT city FROM customers
MINUS
SELECT city FROM offices;

MySQL 8.0.31+ supports INTERSECT and EXCEPT. Earlier versions do not.


Common Table Expressions (CTEs) in SQL

Posted by Kyle July 10, 2025


A Common Table Expression (CTE) is a named temporary result set that exists for the duration of a single query. CTEs make complex queries easier to read, write, and debug.

Basic CTE Syntax

WITH active_users AS (
    SELECT id, name, email
    FROM users
    WHERE status = 'active'
)
SELECT * FROM active_users WHERE email LIKE '%@company.com';

The WITH clause defines the CTE, and the main query uses it like a table. The CTE only exists for that one statement.

Why Use CTEs?

Compare a nested subquery to a CTE:

-- Nested subquery (hard to read)
SELECT d.name, stats.avg_salary
FROM departments d
JOIN (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    WHERE status = 'active'
    GROUP BY department_id
) stats ON d.id = stats.department_id
WHERE stats.avg_salary > 75000;

-- CTE (much clearer)
WITH dept_salaries AS (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    WHERE status = 'active'
    GROUP BY department_id
)
SELECT d.name, ds.avg_salary
FROM departments d
JOIN dept_salaries ds ON d.id = ds.department_id
WHERE ds.avg_salary > 75000;

Multiple CTEs

Chain multiple CTEs with commas:

WITH
active_users AS (
    SELECT * FROM users WHERE status = 'active'
),
user_orders AS (
    SELECT user_id, COUNT(*) AS order_count, SUM(total) AS total_spent
    FROM orders
    GROUP BY user_id
)
SELECT au.name, uo.order_count, uo.total_spent
FROM active_users au
JOIN user_orders uo ON au.id = uo.user_id
ORDER BY uo.total_spent DESC;

Each CTE can reference the ones defined before it.

Recursive CTEs

Recursive CTEs solve hierarchical data problems like org charts, category trees, and bill-of-materials.

Employee Org Chart

WITH RECURSIVE org_chart AS (
    -- Anchor: start with the CEO (no manager)
    SELECT id, name, manager_id, 0 AS depth
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive: find each employee's direct reports
    SELECT e.id, e.name, e.manager_id, oc.depth + 1
    FROM employees e
    JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT REPEAT('  ', depth) || name AS org_tree, depth
FROM org_chart
ORDER BY depth, name;

Building a Path (Breadcrumbs)

WITH RECURSIVE category_path AS (
    SELECT id, name, parent_id, name AS path
    FROM categories
    WHERE parent_id IS NULL

    UNION ALL

    SELECT c.id, c.name, c.parent_id,
           cp.path || ' > ' || c.name
    FROM categories c
    JOIN category_path cp ON c.parent_id = cp.id
)
SELECT name, path FROM category_path ORDER BY path;

Result: Electronics > Computers > Laptops

Number Series

Generate a sequence of numbers without a table:

WITH RECURSIVE numbers AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1 FROM numbers WHERE n < 100
)
SELECT n FROM numbers;

Database Compatibility

Database Basic CTE Recursive CTE Keyword
PostgreSQL Yes Yes WITH RECURSIVE
MySQL 8.0+ Yes Yes WITH RECURSIVE
SQL Server 2005+ Yes Yes WITH (no RECURSIVE keyword)
Oracle 11g+ Yes Yes WITH (no RECURSIVE keyword)
SQLite 3.8.3+ Yes Yes WITH RECURSIVE

SQL Server and Oracle do not use the RECURSIVE keyword — the database detects recursion automatically.

Preventing Infinite Loops

A bug in the recursive condition can cause infinite recursion. Protect against it:

-- PostgreSQL / MySQL: limit depth
WITH RECURSIVE tree AS (
    SELECT id, parent_id, 1 AS depth FROM nodes WHERE parent_id IS NULL
    UNION ALL
    SELECT n.id, n.parent_id, t.depth + 1
    FROM nodes n JOIN tree t ON n.parent_id = t.id
    WHERE t.depth < 50  -- safety limit
)
SELECT * FROM tree;

-- SQL Server: OPTION (MAXRECURSION)
WITH tree AS (...)
SELECT * FROM tree OPTION (MAXRECURSION 100);

CTE vs Subquery vs Temp Table

CTE Subquery Temp Table
Readability Best Worst (nested) Good
Reusable in same query Yes No Yes
Persists across queries No No Yes
Indexable No No Yes
Recursive support Yes No No

How to Drop a Database with Active Connections

Posted by Kyle March 20, 2025


Trying to drop a database while other sessions are connected to it will fail. Here is how to terminate those connections and drop the database safely in each database system.

PostgreSQL

The Error

ERROR: database "mydb" is being accessed by other users
DETAIL: There are 3 other sessions using the database.

Solution: Terminate Connections, Then Drop

-- Step 1: Prevent new connections
REVOKE CONNECT ON DATABASE mydb FROM PUBLIC;

-- Step 2: Terminate existing connections
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = 'mydb' AND pid <> pg_backend_pid();

-- Step 3: Drop the database
DROP DATABASE mydb;

PostgreSQL 13+: DROP DATABASE FORCE

PostgreSQL 13 added a FORCE option that terminates connections automatically:

DROP DATABASE mydb WITH (FORCE);

This is the simplest approach if you are on PostgreSQL 13 or later.

Checking Active Connections

SELECT pid, usename, application_name, state, query_start
FROM pg_stat_activity
WHERE datname = 'mydb';

SQL Server

The Error

Cannot drop database "mydb" because it is currently in use.

Solution: Set Single User Mode

-- Terminate all connections and switch to single-user mode
ALTER DATABASE mydb SET SINGLE_USER WITH ROLLBACK IMMEDIATE;

-- Drop the database
DROP DATABASE mydb;

WITH ROLLBACK IMMEDIATE rolls back any in-progress transactions and disconnects all users immediately.

Alternative: Kill Connections Manually

-- Find active sessions
SELECT session_id, login_name, status, last_request_start_time
FROM sys.dm_exec_sessions
WHERE database_id = DB_ID('mydb');

-- Kill each one
KILL 52;
KILL 53;

-- Then drop
DROP DATABASE mydb;

MySQL

MySQL is more lenient — DROP DATABASE does not require exclusive access. However, active queries will continue running against the dropped tables until they finish or error out.

To cleanly terminate connections first:

-- Find connections to the database
SELECT ID, USER, HOST, DB, COMMAND, TIME, STATE
FROM INFORMATION_SCHEMA.PROCESSLIST
WHERE DB = 'mydb';

-- Kill each connection
KILL 42;
KILL 43;

-- Drop the database
DROP DATABASE mydb;

Or in MySQL 8.0+:

SELECT CONCAT('KILL ', ID, ';')
FROM INFORMATION_SCHEMA.PROCESSLIST
WHERE DB = 'mydb';

Run the output to kill all connections at once.

Quick Reference

Database Command
PostgreSQL 13+ DROP DATABASE mydb WITH (FORCE)
PostgreSQL < 13 pg_terminate_backend() then DROP DATABASE
SQL Server ALTER DATABASE SET SINGLE_USER WITH ROLLBACK IMMEDIATE then DROP
MySQL KILL connections then DROP DATABASE

Safety Considerations

  • Always double-check the database name before dropping. There is no undo.
  • In production: Take a backup first. Use pg_dump, mysqldump, or SQL Server backup.
  • In CI/CD: The PostgreSQL WITH (FORCE) option is ideal for tearing down test databases.
  • Never run these commands against production databases unless you are absolutely certain. A typo in the database name can be catastrophic.

How to Improve SQLite INSERT Performance

Posted by Kyle December 5, 2024


SQLite is fast for reads, but naive inserts can be surprisingly slow. A few configuration changes can improve insert performance by 50x or more.

The Problem

By default, each individual INSERT in SQLite is wrapped in its own transaction. Each transaction forces a disk sync — a slow operation. Inserting 10,000 rows with 10,000 separate INSERTs means 10,000 disk syncs.

1. Wrap Inserts in a Transaction

The single biggest improvement:

BEGIN TRANSACTION;
INSERT INTO logs (message, level) VALUES ('Starting', 'INFO');
INSERT INTO logs (message, level) VALUES ('Processing', 'DEBUG');
-- ... thousands more inserts
INSERT INTO logs (message, level) VALUES ('Done', 'INFO');
COMMIT;

This changes 10,000 disk syncs to just one. In benchmarks, this alone can improve insert speed from ~85 inserts/second to ~50,000 inserts/second.

2. Use WAL Mode

Write-Ahead Logging mode allows concurrent reads during writes and batches disk writes more efficiently:

PRAGMA journal_mode = WAL;

WAL mode is persistent — you only need to set it once per database file. Benefits:

  • Readers do not block writers
  • Writers do not block readers
  • Better write performance due to sequential I/O

3. Reduce Synchronous Level

PRAGMA synchronous = NORMAL;
Value Safety Speed
FULL (default) Maximum durability Slowest
NORMAL Safe with WAL mode Fast
OFF Risk of corruption on crash Fastest

NORMAL is a good balance — with WAL mode, it is safe against application crashes (but not power loss during a write).

4. Use Prepared Statements

Parsing SQL is expensive. Prepare the statement once, then bind and execute repeatedly:

# Python example
cursor = conn.cursor()
stmt = "INSERT INTO logs (message, level) VALUES (?, ?)"

conn.execute("BEGIN")
for msg, level in data:
    cursor.execute(stmt, (msg, level))
conn.execute("COMMIT")
// Swift example
let stmt = try db.prepare("INSERT INTO logs (message, level) VALUES (?, ?)")
try db.transaction {
    for (msg, level) in data {
        try stmt.run(msg, level)
    }
}

5. Increase Cache Size

The default page cache is 2MB. For large imports, increase it:

PRAGMA cache_size = -20000;  -- 20MB (negative = kilobytes)

More cache means fewer disk reads during the import.

6. Use Memory-Mapped I/O

PRAGMA mmap_size = 268435456;  -- 256MB

Memory mapping lets the OS handle caching more efficiently for large databases.

7. Multi-Row INSERT

SQLite 3.7.11+ supports multi-row VALUES:

INSERT INTO logs (message, level) VALUES
    ('Starting', 'INFO'),
    ('Processing', 'DEBUG'),
    ('Done', 'INFO');

This reduces parsing overhead compared to individual INSERT statements.

Benchmark Summary

Inserting 100,000 rows (typical results):

Configuration Time Rows/sec
Default (no transaction) ~20 min ~85
With transaction ~1.5 sec ~65,000
Transaction + WAL ~0.8 sec ~125,000
Transaction + WAL + prepared ~0.5 sec ~200,000
All optimizations ~0.3 sec ~300,000+

Complete Setup

For maximum insert performance, run these PRAGMAs at connection time:

PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL;
PRAGMA cache_size = -20000;
PRAGMA mmap_size = 268435456;
PRAGMA temp_store = MEMORY;

Then wrap all inserts in explicit transactions using prepared statements.

Using SQLPro Studio

SQLPro Studio opens SQLite database files directly — just open any .db, .sqlite, or .sqlite3 file. You can run PRAGMA commands and test insert performance in the query editor, and import data from CSV or JSON files using the built-in import feature.


Tags: SQLite

More articles: