How to Transform SQL Queries to Crosstabs in PostgreSQL
Table of Contents
A SELECT SQL query retrieves data from a database in a tabular form. The first row, or header row, has the column names, and all the other rows have the data that was retrieved.
In a crosstab, the data is condensed and the names of one or more columns are rotated. You can add row and column totals to a crosstab. For example, a list of salesmen’s monthly sales may include the months into the column headers. By rotating the data, it is easier to read and understand how the facts relate to one another.
Other names you might come across for crosstabs are matrix reports, pivot for SQL server databases, and pivot tables on spreadsheets like Microsoft Excel, Google Sheets, and LibreOffice.
The following topics are covered in this post:
- Crosstabs using CASE conditional expression
- Using Common Table Expressions (CTE) for a crosstab
- Crosstabs using aggregate FILTER clause
- Using the
crosstab()
function in PostgreSQL - Using a PostgreSQL
crosstab()
function with more than three columns - Using ARRAY data type to re-arrange extra crosstab columns
This post uses queries from a database populated with data from the Northwind database for Postgres. The queries are run using DBeaver SQL client.
Crosstabs Using CASE Conditional Expression
The SQL CASE expression enables you to choose a value depending on a condition, much like an if-then-else conditional statement. The syntax for the CASE expression is as follows:
CASE WHEN condition THEN result
[WHEN ...]
[ELSE result]
END
If a condition
evaluates to true
, the result
value will be chosen, otherwise; the result
value contained in the optional ELSE expression will be chosen.
In the example below, the SQL CASE statement is used to make a crosstab of monthly total sales by employees for the first four months of 1997.
If you run the query above, you will get a crosstab like the one below.
You can combine the previous query with a UNION ALL operator and the following SQL SELECT query to include the total for each month in the cross-tab.
The SQL CASE expression is supported by most database systems.
Using Common Table Expressions (CTE) For A Crosstab
Common Table Expressions (CTE) in SQL let you to create a temporary table in a SELECT SQL statement. It is then possible to reference this named temporary table in subsequent SELECT, INSERT, UPDATE, or DELETE SQL statements. The CTEs simplify the SQL queries by breaking them down into smaller, more manageable parts that are easier to read and understand. In PostgreSQL, CTEs are called WITH queries.
You can rewrite the previous CASE crosstab query using a CTE as follows:
Crosstabs Using Aggregate FILTER Clause
From PostgreSQL 9.4, you can use the FILTER clause to perform aggregate functions on specific records. The FILTER clause is less wordy and has a cleaner syntax than the CASE statement.
The following SQL query uses the FILTER clause to generate a crosstab.
The SQL COALESCE function replaces any null values in the crosstab with zero (0) values.
Using PostgreSQL crosstab()
Function
The crosstab()
function is part of the optional tablefunc
module.
You can run the SELECT COUNT(*) FROM pg_extension WHERE extname='tablefunc';
query to see if the tablefunc
extension is installed on the database you are using. If the result of the query is 0, install and activate the tablefunc
extension using the CREATE EXTENSION IF NOT EXIST tablefunc;
SQL command. The tablefunc
module can be installed by non-superusers who have the CREATE
privilege.
The crosstab
function has several options, crosstab(sql text)
, crosstabN(sql text)
, and crosstab(source_sql text, category_sql text)
. Examples in this post use the crosstab(source_sql text, category_sql text)
function option.
The first parameter (source_sql text
) in a crosstab(source_sql text, category_sql text)
function is the source SQL SELECT query statement and must return at least three (3) columns of data to pivot or rotate. The first column (row_name
) contains data values to be used as row identifiers in the final result; data in the second column (category
) represents category values that will be rotated to column headers in the pivot table, and the third column (value
) contains data to be assigned to each cell of the final crosstab. The second parameter text category_sql
is a query returning a category list for the columns.
The crosstab function returns a set of records with unknown data types. Therefore, you must alias the returned columns with column names and types using the AS (col1 type, col2 type, ...)
clause. Failing to alias the columns will cause a column definition list is required for functions returning “record” error.
The SQL query below shows how to use the crosstab(source_sql text, category_sql text)
function. A dollar-quoted string constant ($$
) has been used to maintain formatting consistency and remove the need to escape single quotes (’) by doubling them.
Using A PostgreSQL crosstab()
Function With More Than Three Columns
If your source_sql text
SQL query returns more than three columns, the additional or extra columns must be placed between the row_name
and category
columns. In the previous crosstab SQL query, any extra or additional columns must be placed between the salesman
and order_month
columns.
PostgreSQL converts identifiers/column names to lowercase by default. To capitalize the crosstab column headers, enclose them within double quotes.
Using ARRAY Data Type To Re-Arrange Extra Crosstab Columns
A crosstab with more than three columns has limitations in the order placement of the additional columns in the final crosstab. However, you can re-arrange the crosstab columns by inserting the extra columns into an arrays data type.
Conclusion
Crosstabs are a powerful way to summarize and analyze data from a database. By presenting data in a condensed and organized format, crosstabs make it easier to analyze relationships between different variables.
The CASE conditional expression, aggregate FILTER clause, and the PostgreSQL crosstab() function are some of the methods for creating crosstabs. If the crosstab() function uses more than three columns to summarize data, the ARRAY data type may be used to re-arrange the extra columns into the correct positions. Using crosstabs, PostgreSQL users can create reports and dashboards that help them make informed business decisions.