Skip to content
Ivan Kovmir edited this page Oct 8, 2024 · 56 revisions

PostgreSQL Development Introduction

For absolute beginners.

Warning! I am not a PostgreSQL wizard, and this is not the best guide in the world. If you think this article is garbage, you are probably right. If you can suggest improvements, do it.
Prerequisites Basic C, Git, GNU Make and SQL knowledge, feeling comfortable working within the Linux terminal.

Introduction

This article is a proper aggregation of the information already available at the time of writing. My goal is not to duplicate information, but to lead you through it step by step. It is important to do it as you read it.

Compile and Install PostgreSQL

First, learn to compile and install PostgreSQL.

Go to git.postgresql.org, find postgresql.git and git clone that repository. A build system is used to compile PostgreSQL, the build system is responsible for finding the dependencies/libraries on disk and enabling/disabling features. PostgreSQL versions 16 and newer use Meson build system, 15 and older rely on GNU Autoconf.

Different build systems require different commands to compile the source code. A simple online search would give this page showing you how to compile PostgreSQL. Here is a quick Meson and Autoconf command reference.

You will have to gather stack traces a lot, so you must compile PostgreSQL with -Og -g flags to instruct C compiler to add debug symbols and not to optimize the code beyond sufficient: meson setup -Ddebug=true -Dbuildtype=debugoptimized .... These are the defaults, but it is nice to be aware of them.

To be able to work with multiple major versions simultaneously, create a git worktree per each major version. This allows one to work with multiple branches at the same time. With worktrees there is no need to commit/stash changes, resolve conflicts, or lose build artifacts to switch branches.

Visit all the URLs, make sure you understand the commands you have to enter. Compile and install all the currently supported major PostgreSQL versions, satisfy the dependencies if needed. Install under $HOME, not /usr, to avoid sudo. Have a quick walk throughout the source tree, cd to the root folder of the repository and observe tree -d -L 5 | less output. Have a look at git tag and git branch --remote. Use online search to figure out how to run unit tests, and where to find the entire compilation output.

Read about PostgreSQL versioning policy, notice they support five major versions at a time.

Run PostgreSQL

This is what the folder structure looks like when you install multiple PostgreSQL major versions:

├── 13
│   ├── bin
│   ├── include
│   ├── lib
│   └── share
├── 15
│   ├── bin
│   ├── include
│   ├── lib
│   └── share
└── 16
    ├── bin
    ├── include
    ├── lib64
    └── share

To be able to execute PostgreSQL binaries by name, you have to add the appropriate bin folder to your $PATH. For the development purposes it is also necessary to switch bin directories quickly, without doing export PATH="$PATH:/whatever/path. Here is a shell function to serve that particular purpose:

# Choose PostgreSQL bin directory and append it to your $PATH.
pgenv() {
	# Major version
	_pg_ver="$1"
	# Full PostgreSQL `bin` directory path.
	_pg_path="$HOME/.local/usr/psql/$_pg_ver/bin"

	# Freak out if such a directory does not exist.
	if [ ! -d "$_pg_path" ]; then
		echo "$_pg_path does not exist"
		return 1
	fi

	# Remove all paths from $PATH that have `psql` directory in them.
	_clean_path="$(echo -n "$PATH" | tr ':' '\n' | grep -v psql | tr '\n' ':')"
	# Export new `bin` directory.
	export PATH="${_pg_path}:${_clean_path}"
}
# Call immediately to set a default version.
pgenv 16

Append the above function to your ~/.bashrc or ~/.zshrc, and reload your shell config. If you want to work with PostgreSQL 17, run pgenv 17 in your terminal window. To change the version, just call it again with a different number. To verify it works, invoke which pg_config to see which one is going to be executed.

pg_ctl -D /path/to/cluster/dir start is usually used to start PostgreSQL. To run the server in the foreground and see the logs, start the server binary directly postgres -D /path/to/cluster/dir. Specifying -D gets annoying, as is telling psql what database to connect to. Define PGDATA, PGDATABASE, and the other environment variables in order to bypass that. I have the following attached to pgenv function body:

# Set environment for PostgreSQL binaries.
export PGHOST="localhost"
export PGPORT="5432"
export PGDATA="$HOME/.local/share/psql/$_pg_ver/main"
export PGUSER="$USER"
export PGDATABASE="postgres"

Now it is postgres instead of postgres -D /path/to/cluster, and psql instead of psql database_name.

Extension Development

Before you start contributing code to PostgreSQL, it is easier to begin with extension development in C. Let us create an extension to define a function that returns a string:

postgres=# select postgres_func();
      postgres_func
--------------------------
 Hello from my extension!
(1 row)

postgres=#

To create such an extension we need to write:

  • C source code, the extension itself.
  • SQL glue code of the extension.
  • Control file, extension metadata.
  • Makefile.

C Code

Save as my_extension.c:

/* Supplied with PostgreSQL, contain the following macro definitions. */
#include "postgres.h"
#include "fmgr.h"

/* Linker magic, each PostgreSQL extension must have it. */
PG_MODULE_MAGIC;

/* C function prototypes. */
static const char *c_func(void);
Datum postgres_func(PG_FUNCTION_ARGS);

/* Internal C function, declare them as usual. */
const char *
c_func(void)
{
	const char *msg = "Hello from my extension!";
	return msg;
}

PG_FUNCTION_INFO_V1(postgres_func);
/* Functions defined like the one below can be called from PostgreSQL shell
 * and must be preceded by the above magic linker call. */
Datum
postgres_func(PG_FUNCTION_ARGS)
{
	/* Yes, I could have supplied the string literal
	 * directly to that macro call. */
	PG_RETURN_CSTRING(c_func());
}

SQL and Control Files

Save as my_extension--1.0.sql:

-- It is a dummy SQL function that invokes our real C function
-- with that same name.
CREATE FUNCTION postgres_func()
RETURNS cstring -- data type of the return value.
AS 'MODULE_PATHNAME' -- expands to `module_pathname` value in control file.
LANGUAGE C;

-- You can insert arbitraty SQL code in here,
-- views, table definitions, whatever.

Save as my_extension.control:

comment = 'Briefly describe your extension here'
# This number corresponds to the above SQL file name.
default_version = '1.0'
# `$libdir` expands to `lib` directory path
# under your PostgreSQL install (--prefix) directory.
module_pathname = '$libdir/my_extension'

Here are all the possible values.

Makefile

Run pg_config binary, and observe the output:

$ pg_config
BINDIR = /home/username/.local/usr/psql/16/bin
DOCDIR = /home/username/.local/usr/psql/16/share/doc/postgresql
HTMLDIR = /home/username/.local/usr/psql/16/share/doc/postgresql/html
INCLUDEDIR = /home/username/.local/usr/psql/16/include
PKGINCLUDEDIR = /home/username/.local/usr/psql/16/include/postgresql
INCLUDEDIR-SERVER = /home/username/.local/usr/psql/16/include/postgresql/server
LIBDIR = /home/username/.local/usr/psql/16/lib64
PKGLIBDIR = /home/username/.local/usr/psql/16/lib64/postgresql
LOCALEDIR = /home/username/.local/usr/psql/16/share/locale
MANDIR = /home/username/.local/usr/psql/16/share/man
SHAREDIR = /home/username/.local/usr/psql/16/share/postgresql
SYSCONFDIR = /home/username/.local/usr/psql/16/etc/postgresql
PGXS = /home/username/.local/usr/psql/16/lib64/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = 
CC = ccache cc
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -Wdeclaration-after-statement -Wno-format-truncation -Wno-stringop-truncation
CFLAGS_SL = -fPIC
LDFLAGS = -Wl,--as-needed
LDFLAGS_EX = 
LDFLAGS_SL = 
LIBS = 
VERSION = PostgreSQL 16.1

As you can see, it knows all about our PostgreSQL installation. You can ask pg_config to print out only the information you want:

$ pg_config --libdir --bindir
/home/username/.local/usr/psql/16/lib64
/home/username/.local/usr/psql/16/bin

Notice --libdir value is what $libdir in control file will expand to. Have a look at --pgxs value, it is a path to a Makefile. Open it and read the first few lines:

# This file contains generic rules to build many kinds of simple
# extension modules.  You only need to set a few variables and include
# this file, the rest will be done here.

The above tells us we do not have to write recipes in order to compile a PostgreSQL extension. Instead, we have to define some variables and include this file.

Look further down to see how to include this file:

# Use the following layout for your Makefile:
#
#   [variable assignments, see below]
#
#   PG_CONFIG = pg_config
#   PGXS := $(shell $(PG_CONFIG) --pgxs)
#   include $(PGXS)
#
#   [custom rules, rarely necessary]

Basically, the above asks pg_config where to find --pgxs file and includes it. PG_CONFIG variable is defined to be able to supply custom path to pg_config binary.

Look further down to see the variables you can define within the Makefile you are going to write:

# Set one of these three variables to specify what is built:
#
#   MODULES -- list of shared-library objects to be built from source files
#     with same stem (do not include library suffixes in this list)
#   MODULE_big -- a shared library to build from multiple source files
#     (list object files in OBJS)
#   PROGRAM -- an executable program to build (list object files in OBJS)
#
# The following variables can also be set:
#
#   EXTENSION -- name of extension (there must be a $EXTENSION.control file)
#   MODULEDIR -- subdirectory of $PREFIX/share into which DATA and DOCS files
#     should be installed (if not set, default is "extension" if EXTENSION
#     is set, or "contrib" if not)
#   DATA -- random files to install into $PREFIX/share/$MODULEDIR
...

I will not include the full list, go see for yourself. So, now that we know how to write the Makefile for our extension, let us actually write it down:

# `my_extension.control` must exist.
EXTENSION = my_extension
# We want a single shared library.
MODULE_big = my_extension
# A single shared library from the following object files.
# As we have a single source file, we have a single object file.
OBJS = my_extension.o
# Files to install along with the extension.
DATA = my_extension--1.0.sql

PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

Save all the four files, invoke make to build the extension, make install to install it. (Re)start the server, connect with psql to it, then invoke CREATE EXTENSION my_extension;. Now you can call the function from the extension with SELECT postgres_func();

At this point you should look around, investigate, make sure you understand everything you have just read. Most importantly, try to break things, observe the errors, then fix them. Look at other extensions to see what else you can do, try to install and use them, notice the differences in the process, some of them require more than CREATE EXTENSION ...;. Try to break other extensions, see what happens if you do not follow all the steps outlined in the installation requirements. Download this extension template I created and study it, it has things I will not mention in this article.

Core Contributions

There is no outlined procedure on how to do it, and there is no precise definition of prerequisites either. There can only be a an everlasting list of advice to follow.

A good way to get your feet wet is to participate in reviewing code during commit fests.

Read these PostgreSQL Wiki pages:

Getting Help

First thing you absolutely have to do is to read this book, it is the greatest overview of PostgreSQL internals out there. The book itself is brief, feels more like a series of articles, and to the point.

Read Git commit messages that introduce the code snippets you are working with, it is usually the best source of information.

PostgreSQL Doxygen is pointless.

Third source of useful information are README files scattered across the source tree, use find to find them:

$ find /path/to/postgresql/repository -iname readme
./src/backend/utils/mb/README
./src/backend/utils/misc/README
./src/backend/access/gist/README
./src/backend/access/brin/README
./src/backend/access/rmgrdesc/README
./src/backend/access/transam/README
./src/backend/access/hash/README
./src/backend/access/gin/README
./src/backend/access/spgist/README
./src/backend/access/nbtree/README
./src/backend/regex/README
./src/backend/snowball/README
./src/backend/optimizer/README
./src/backend/optimizer/plan/README
...

Your effort is all good, but it is crucial to reach out to other people for help. Join mailing lists, attend conferences, connect with other developers using whatever means. I would judge PostgreSQL community as civilized, I have never seen hostility, so do not hesitate asking.

Navigating the Code

The following is not a strict list, it is legal to adjust it to your preferences/environment.

Git

Working with PostgreSQL source code, you would have to navigate a large Git repository. You may find git bisect useful for finding a particular commit that introduced a given code snippet, though it may not always work. git log -S'code_line' /path/to/file sometimes also helps to identify commits. git blame will help you identify the latest commit to edit a particular line. git cherry-pick will help you to re-apply a series of commits (a patch) to a different branch. I like cherry-pick because it never fails, it wants you to manually resolve conflicts on the way rather than abort the whole thing.

Grep

Have a look at a number of random PostgreSQL project function bodies:

Datum
hashhandler(PG_FUNCTION_ARGS)
{
    /* ... */
}

IndexBuildResult *
hashbuild(Relation heap, Relation index, IndexInfo *indexInfo)
{
    /* ... */
}

void
hashbuildempty(Relation index)
{
    /* ... */
}

Notice the code style, each function name is at the beginning of the line and is followed by ( character. It means you can find any function's body by name across the entire source tree:

$ grep -rni --include='*.c' ^hashbuild\(
src/backend/access/hash/hash.c:115:hashbuild(Relation heap, Relation index, IndexInfo *indexInfo)

grep tells us the file path and the line number, and it works with any function. As Neovim allows specifying the line number to jump to, I can reach that function directly with nvim +115 src/backend/access/bash/bash.c. In fact, this can even be scripted:

#!/bin/bash
# Find a given function witin a source tree and open the file at the exact
# location in $EDITOR.

# Terminate the script execution on any non-zero exit code.
set -e

_func_name=''

# There are function names with underscores, like `init_custom_variable(...)`.
# I would rather type `init custom variable` than `init_custom_variable`, and
# then let the script substitute all space characters with underscores.
#
# Join arguments with '_'.
for _arg in "$@"
do
	_func_name="${_func_name}_${_arg}"
done
# Remove the leading '_'.
_func_name="${_func_name#_}"

# Find that function.
_location=$(grep -rni --include='*.c' "^$_func_name(" | head -n 1)
# Parse its location and line number from grep output.
IFS=':'
read -r -a _lparts <<< "$_location"
# Open the file.
$EDITOR +"${_lparts[1]}" "${_lparts[0]}"

There is a function named init_custom_variable(...) somewhere within the source tree. To immediately open its body in my text editor I just have to type ffunc init custom variable. ffunc script will grep the function, parse its location and open it in my Neovim.

LSP

Code comments atop function definitions are extremely helpful, which is why you must configure your code editor to be able to quickly jump to them:

/*
 * execCurrentOf
 *
 * Given a CURRENT OF expression and the OID of a table, determine which row
 * of the table is currently being scanned by the cursor named by CURRENT OF,
 * and return the row's TID into *current_tid.
 *
 * Returns true if a row was identified.  Returns false if the cursor is valid
 * for the table but is not currently scanning a row of the table (this is a
 * legal situation in inheritance cases).  Raises error if cursor is not a
 * valid updatable scan of the specified table.
 */
bool
execCurrentOf(CurrentOfExpr *cexpr,
			  ExprContext *econtext,
			  Oid table_oid,
			  ItemPointer current_tid)
{

Meson compilations of PostgreSQL yield compile_commands.json, take a peek at it:

[
  {
    "directory": "/home/uninstance/dev/c/rel16/build",
    "command": "ccache cc -Isrc/port/libpgport_srv_crc.a.p -Isrc/include -I../src/include -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O2 -g -fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -Wdeclaration-after-statement -Wno-format-truncation -Wno-stringop-truncation -fPIC -DBUILDING_DLL -msse4.2 -MD -MQ src/port/libpgport_srv_crc.a.p/pg_crc32c_sse42.c.o -MF src/port/libpgport_srv_crc.a.p/pg_crc32c_sse42.c.o.d -o src/port/libpgport_srv_crc.a.p/pg_crc32c_sse42.c.o -c ../src/port/pg_crc32c_sse42.c",
    "file": "../src/port/pg_crc32c_sse42.c",
    "output": "src/port/libpgport_srv_crc.a.p/pg_crc32c_sse42.c.o"
  },
  {
    "directory": "/home/uninstance/dev/c/rel16/build",
    "command": "ccache cc -Isrc/port/libpgport_srv.a.p -Isrc/include -I../src/include -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O2 -g -fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -Wdeclaration-after-statement -Wno-format-truncation -Wno-stringop-truncation -fPIC -DBUILDING_DLL -MD -MQ src/port/libpgport_srv.a.p/bsearch_arg.c.o -MF src/port/libpgport_srv.a.p/bsearch_arg.c.o.d -o src/port/libpgport_srv.a.p/bsearch_arg.c.o -c ../src/port/bsearch_arg.c",
    "file": "../src/port/bsearch_arg.c",
    "output": "src/port/libpgport_srv.a.p/bsearch_arg.c.o"
  },
  {
    "directory": "/home/uninstance/dev/c/rel16/build",

It is literally the entire sequence of compilation commands, it will help our code editor to natigate the source tree. Miscrosoft released Language Server Protocol specification, it is a standard for code completion/navigation. For any text editor to provide code completion and advanced features like code actions, or jumping to definitions in any language, it only has to speak LSP. Text editors support LSP either natively or via plugins. Neovim, for instance, supports LSP natively. Clang toolchain provides clangd, the language server for C and C++. This particular language server requires the aforementioned compile_commands.json to work.

In summary, for code completion and navigation you need:

  • Text editor of your choice with LSP support.
  • C/C++ language server, clangd.
  • compile_commands.json, generated by Meson.

compile_commands.json gets generated by Meson for PostgreSQL builds, but it does not work with extensions at the moment. And, obviously, it will not work with PostgreSQL versions 15 and older, as they do not use Meson. Bear can generate compile commands for arbitrary projects, including PostgreSQL extensions; simply start the build like that: bear -- make.

Here is LSP in action, it allows me to see all the references of portal variable:

LSP References

Tree-sitter

Tree-sitter is super helpful in navigating the code. It is a generic language parser and syntax tree generator. It integrates well with text/code editors. Among many things it provides a proper syntax highlight based on syntax tree (not regular expressions) and allows navigating the project symbols:

Tree-sitter

Conclusion

Being a good database administrator is not necessary for being a good database developer. Also, there are no people who know all about PostgreSQL, neither developers, nor administrators. Developers in particular, quite often, specialize on certain areas of PostgreSQL.