-
Notifications
You must be signed in to change notification settings - Fork 0
Home
For absolute beginners.
Warning!
I am not a PostgreSQL wizard, and this is not the best guide in the world. If you think this article is garbage, you are probably right. If you can suggest improvements, do it.Prerequisites
Basic C, Git, GNU Make and SQL knowledge, feeling comfortable working within the Linux terminal.This article is a proper aggregation of the information already available at the time of writing. My goal is not to duplicate information, but to lead you through it step by step. It is important to do it as you read it.
First, learn to compile and install PostgreSQL.
Go to git.postgresql.org, find postgresql.git
and git clone
that
repository. A build system is used to compile PostgreSQL, the build system is
responsible for finding the dependencies/libraries on disk and
enabling/disabling features. PostgreSQL versions 16
and newer use Meson
build system, 15
and older rely on GNU Autoconf.
Different build systems require different commands to compile the source code. A simple online search would give this page showing you how to compile PostgreSQL. Here is a quick Meson and Autoconf command reference.
You will have to gather stack traces a lot, so you must compile
PostgreSQL with -Og -g
flags to instruct C compiler to add debug symbols and
not to optimize the code beyond sufficient: meson setup -Ddebug=true -Dbuildtype=debugoptimized ...
. These are the defaults, but it is nice to be
aware of them.
To be able to work with multiple major versions simultaneously, create a git worktree
per each major version. This allows one to work with multiple
branches at the same time. With worktrees there is no need to commit/stash
changes, resolve conflicts, or lose build artifacts to switch branches.
Visit all the URLs, make sure you understand the commands you have to enter.
Compile and install all the currently supported major PostgreSQL versions,
satisfy the dependencies if needed. Install under $HOME
, not /usr
, to avoid
sudo
. Have a quick walk throughout the source tree, cd
to the root folder
of the repository and observe tree -d -L 5 | less
output. Have a look at git tag
and git branch --remote
. Use online search to figure out how to run unit
tests, and where to find the entire compilation output.
Read about PostgreSQL versioning policy, notice they support five major versions at a time.
This is what the folder structure looks like when you install multiple PostgreSQL major versions:
├── 13
│ ├── bin
│ ├── include
│ ├── lib
│ └── share
├── 15
│ ├── bin
│ ├── include
│ ├── lib
│ └── share
└── 16
├── bin
├── include
├── lib64
└── share
To be able to execute PostgreSQL binaries by name, you have to add the
appropriate bin
folder to your $PATH
. For the development purposes it is
also necessary to switch bin
directories quickly, without doing export PATH="$PATH:/whatever/path
. Here is a shell function to serve that particular
purpose:
# Choose PostgreSQL bin directory and append it to your $PATH.
pgenv() {
# Major version
_pg_ver="$1"
# Full PostgreSQL `bin` directory path.
_pg_path="$HOME/.local/usr/psql/$_pg_ver/bin"
# Freak out if such a directory does not exist.
if [ ! -d "$_pg_path" ]; then
echo "$_pg_path does not exist"
return 1
fi
# Remove all paths from $PATH that have `psql` directory in them.
_clean_path="$(echo -n "$PATH" | tr ':' '\n' | grep -v psql | tr '\n' ':')"
# Export new `bin` directory.
export PATH="${_pg_path}:${_clean_path}"
}
# Call immediately to set a default version.
pgenv 16
Append the above function to your ~/.bashrc
or ~/.zshrc
, and reload your
shell config. If you want to work with PostgreSQL 17, run pgenv 17
in your
terminal window. To change the version, just call it again with a different
number. To verify it works, invoke which pg_config
to see which one is going
to be executed.
pg_ctl -D /path/to/cluster/dir start
is usually used to start PostgreSQL. To
run the server in the foreground and see the logs, start the server binary
directly postgres -D /path/to/cluster/dir
. Specifying -D
gets annoying, as
is telling psql
what database to connect to. Define PGDATA
, PGDATABASE
,
and the other environment variables in order to bypass that. I have the
following attached to pgenv
function body:
# Set environment for PostgreSQL binaries.
export PGHOST="localhost"
export PGPORT="5432"
export PGDATA="$HOME/.local/share/psql/$_pg_ver/main"
export PGUSER="$USER"
export PGDATABASE="postgres"
Now it is postgres
instead of postgres -D /path/to/cluster
, and psql
instead of psql database_name
.
Before you start contributing code to PostgreSQL, it is easier to begin with extension development in C. Let us create an extension to define a function that returns a string:
postgres=# select postgres_func();
postgres_func
--------------------------
Hello from my extension!
(1 row)
postgres=#
To create such an extension we need to write:
- C source code, the extension itself.
- SQL glue code of the extension.
- Control file, extension metadata.
-
Makefile
.
Save as my_extension.c
:
/* Supplied with PostgreSQL, contain the following macro definitions. */
#include "postgres.h"
#include "fmgr.h"
/* Linker magic, each PostgreSQL extension must have it. */
PG_MODULE_MAGIC;
/* C function prototypes. */
static const char *c_func(void);
Datum postgres_func(PG_FUNCTION_ARGS);
/* Internal C function, declare them as usual. */
const char *
c_func(void)
{
const char *msg = "Hello from my extension!";
return msg;
}
PG_FUNCTION_INFO_V1(postgres_func);
/* Functions defined like the one below can be called from PostgreSQL shell
* and must be preceded by the above magic linker call. */
Datum
postgres_func(PG_FUNCTION_ARGS)
{
/* Yes, I could have supplied the string literal
* directly to that macro call. */
PG_RETURN_CSTRING(c_func());
}
Save as my_extension--1.0.sql
:
-- It is a dummy SQL function that invokes our real C function
-- with that same name.
CREATE FUNCTION postgres_func()
RETURNS cstring -- data type of the return value.
AS 'MODULE_PATHNAME' -- expands to `module_pathname` value in control file.
LANGUAGE C;
-- You can insert arbitraty SQL code in here,
-- views, table definitions, whatever.
Save as my_extension.control
:
comment = 'Briefly describe your extension here'
# This number corresponds to the above SQL file name.
default_version = '1.0'
# `$libdir` expands to `lib` directory path
# under your PostgreSQL install (--prefix) directory.
module_pathname = '$libdir/my_extension'
Here are all the possible values.
Run pg_config
binary, and observe the output:
$ pg_config
BINDIR = /home/username/.local/usr/psql/16/bin
DOCDIR = /home/username/.local/usr/psql/16/share/doc/postgresql
HTMLDIR = /home/username/.local/usr/psql/16/share/doc/postgresql/html
INCLUDEDIR = /home/username/.local/usr/psql/16/include
PKGINCLUDEDIR = /home/username/.local/usr/psql/16/include/postgresql
INCLUDEDIR-SERVER = /home/username/.local/usr/psql/16/include/postgresql/server
LIBDIR = /home/username/.local/usr/psql/16/lib64
PKGLIBDIR = /home/username/.local/usr/psql/16/lib64/postgresql
LOCALEDIR = /home/username/.local/usr/psql/16/share/locale
MANDIR = /home/username/.local/usr/psql/16/share/man
SHAREDIR = /home/username/.local/usr/psql/16/share/postgresql
SYSCONFDIR = /home/username/.local/usr/psql/16/etc/postgresql
PGXS = /home/username/.local/usr/psql/16/lib64/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE =
CC = ccache cc
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -Wdeclaration-after-statement -Wno-format-truncation -Wno-stringop-truncation
CFLAGS_SL = -fPIC
LDFLAGS = -Wl,--as-needed
LDFLAGS_EX =
LDFLAGS_SL =
LIBS =
VERSION = PostgreSQL 16.1
As you can see, it knows all about our PostgreSQL installation. You can ask
pg_config
to print out only the information you want:
$ pg_config --libdir --bindir
/home/username/.local/usr/psql/16/lib64
/home/username/.local/usr/psql/16/bin
Notice --libdir
value is what $libdir
in control file will expand to. Have
a look at --pgxs
value, it is a path to a Makefile
. Open it and read the
first few lines:
# This file contains generic rules to build many kinds of simple
# extension modules. You only need to set a few variables and include
# this file, the rest will be done here.
The above tells us we do not have to write recipes in order to compile a PostgreSQL extension. Instead, we have to define some variables and include this file.
Look further down to see how to include this file:
# Use the following layout for your Makefile:
#
# [variable assignments, see below]
#
# PG_CONFIG = pg_config
# PGXS := $(shell $(PG_CONFIG) --pgxs)
# include $(PGXS)
#
# [custom rules, rarely necessary]
Basically, the above asks pg_config
where to find --pgxs
file and includes
it. PG_CONFIG
variable is defined to be able to supply custom path to
pg_config
binary.
Look further down to see the variables you can define within the Makefile
you
are going to write:
# Set one of these three variables to specify what is built:
#
# MODULES -- list of shared-library objects to be built from source files
# with same stem (do not include library suffixes in this list)
# MODULE_big -- a shared library to build from multiple source files
# (list object files in OBJS)
# PROGRAM -- an executable program to build (list object files in OBJS)
#
# The following variables can also be set:
#
# EXTENSION -- name of extension (there must be a $EXTENSION.control file)
# MODULEDIR -- subdirectory of $PREFIX/share into which DATA and DOCS files
# should be installed (if not set, default is "extension" if EXTENSION
# is set, or "contrib" if not)
# DATA -- random files to install into $PREFIX/share/$MODULEDIR
...
I will not include the full list, go see for yourself. So, now that we know how
to write the Makefile
for our extension, let us actually write it down:
# `my_extension.control` must exist.
EXTENSION = my_extension
# We want a single shared library.
MODULE_big = my_extension
# A single shared library from the following object files.
# As we have a single source file, we have a single object file.
OBJS = my_extension.o
# Files to install along with the extension.
DATA = my_extension--1.0.sql
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
Save all the four files, invoke make
to build the extension, make install
to install it. (Re)start the server, connect with psql
to it, then invoke
CREATE EXTENSION my_extension;
. Now you can call the function from the
extension with SELECT postgres_func();
At this point you should look around, investigate, make sure you understand
everything you have just read. Most importantly, try to break things, observe
the errors, then fix them. Look at other extensions to see what else you can
do, try to install and use them, notice the differences in the process, some of
them require more than CREATE EXTENSION ...;
. Try to break other extensions,
see what happens if you do not follow all the steps outlined in the
installation requirements. Download this extension template I created and
study it, it has things I will not mention in this article.
There is no outlined procedure on how to do it, and there is no precise definition of prerequisites either. There can only be a an everlasting list of advice to follow.
A good way to get your feet wet is to participate in reviewing code during commit fests.
Read these PostgreSQL Wiki pages:
First thing you absolutely have to do is to read this book, it is the greatest overview of PostgreSQL internals out there. The book itself is brief, feels more like a series of articles, and to the point.
Read Git commit messages that introduce the code snippets you are working with, it is usually the best source of information.
PostgreSQL Doxygen is pointless.
Third source of useful information are README
files scattered across the
source tree, use find
to find them:
$ find /path/to/postgresql/repository -iname readme
./src/backend/utils/mb/README
./src/backend/utils/misc/README
./src/backend/access/gist/README
./src/backend/access/brin/README
./src/backend/access/rmgrdesc/README
./src/backend/access/transam/README
./src/backend/access/hash/README
./src/backend/access/gin/README
./src/backend/access/spgist/README
./src/backend/access/nbtree/README
./src/backend/regex/README
./src/backend/snowball/README
./src/backend/optimizer/README
./src/backend/optimizer/plan/README
...
Your effort is all good, but it is crucial to reach out to other people for help. Join mailing lists, attend conferences, connect with other developers using whatever means. I would judge PostgreSQL community as civilized, I have never seen hostility, so do not hesitate asking.
The following is not a strict list, it is legal to adjust it to your preferences/environment.
Working with PostgreSQL source code, you would have to navigate a large Git
repository. You may find git bisect
useful for finding a particular
commit that introduced a given code snippet, though it may not always
work. git log -S'code_line' /path/to/file
sometimes also helps to
identify commits. git blame
will help you identify the latest commit to
edit a particular line. git cherry-pick
will help you to re-apply a
series of commits (a patch) to a different branch. I like cherry-pick because
it never fails, it wants you to manually resolve conflicts on the way rather
than abort the whole thing.
Have a look at a number of random PostgreSQL project function bodies:
Datum
hashhandler(PG_FUNCTION_ARGS)
{
/* ... */
}
IndexBuildResult *
hashbuild(Relation heap, Relation index, IndexInfo *indexInfo)
{
/* ... */
}
void
hashbuildempty(Relation index)
{
/* ... */
}
Notice the code style, each function name is at the beginning of the line and
is followed by (
character. It means you can find any function's body by name
across the entire source tree:
$ grep -rni --include='*.c' ^hashbuild\(
src/backend/access/hash/hash.c:115:hashbuild(Relation heap, Relation index, IndexInfo *indexInfo)
grep
tells us the file path and the line number, and it works with any
function. As Neovim allows specifying the line number to jump to, I can
reach that function directly with nvim +115 src/backend/access/bash/bash.c
.
In fact, this can even be scripted:
#!/bin/bash
# Find a given function witin a source tree and open the file at the exact
# location in $EDITOR.
# Terminate the script execution on any non-zero exit code.
set -e
_func_name=''
# There are function names with underscores, like `init_custom_variable(...)`.
# I would rather type `init custom variable` than `init_custom_variable`, and
# then let the script substitute all space characters with underscores.
#
# Join arguments with '_'.
for _arg in "$@"
do
_func_name="${_func_name}_${_arg}"
done
# Remove the leading '_'.
_func_name="${_func_name#_}"
# Find that function.
_location=$(grep -rni --include='*.c' "^$_func_name(" | head -n 1)
# Parse its location and line number from grep output.
IFS=':'
read -r -a _lparts <<< "$_location"
# Open the file.
$EDITOR +"${_lparts[1]}" "${_lparts[0]}"
There is a function named init_custom_variable(...)
somewhere within the
source tree. To immediately open its body in my text editor I just have to type
ffunc init custom variable
. ffunc
script will grep
the function, parse
its location and open it in my Neovim.
Code comments atop function definitions are extremely helpful, which is why you must configure your code editor to be able to quickly jump to them:
/*
* execCurrentOf
*
* Given a CURRENT OF expression and the OID of a table, determine which row
* of the table is currently being scanned by the cursor named by CURRENT OF,
* and return the row's TID into *current_tid.
*
* Returns true if a row was identified. Returns false if the cursor is valid
* for the table but is not currently scanning a row of the table (this is a
* legal situation in inheritance cases). Raises error if cursor is not a
* valid updatable scan of the specified table.
*/
bool
execCurrentOf(CurrentOfExpr *cexpr,
ExprContext *econtext,
Oid table_oid,
ItemPointer current_tid)
{
Meson compilations of PostgreSQL yield compile_commands.json
,
take a peek at it:
[
{
"directory": "/home/uninstance/dev/c/rel16/build",
"command": "ccache cc -Isrc/port/libpgport_srv_crc.a.p -Isrc/include -I../src/include -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O2 -g -fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -Wdeclaration-after-statement -Wno-format-truncation -Wno-stringop-truncation -fPIC -DBUILDING_DLL -msse4.2 -MD -MQ src/port/libpgport_srv_crc.a.p/pg_crc32c_sse42.c.o -MF src/port/libpgport_srv_crc.a.p/pg_crc32c_sse42.c.o.d -o src/port/libpgport_srv_crc.a.p/pg_crc32c_sse42.c.o -c ../src/port/pg_crc32c_sse42.c",
"file": "../src/port/pg_crc32c_sse42.c",
"output": "src/port/libpgport_srv_crc.a.p/pg_crc32c_sse42.c.o"
},
{
"directory": "/home/uninstance/dev/c/rel16/build",
"command": "ccache cc -Isrc/port/libpgport_srv.a.p -Isrc/include -I../src/include -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O2 -g -fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -Wdeclaration-after-statement -Wno-format-truncation -Wno-stringop-truncation -fPIC -DBUILDING_DLL -MD -MQ src/port/libpgport_srv.a.p/bsearch_arg.c.o -MF src/port/libpgport_srv.a.p/bsearch_arg.c.o.d -o src/port/libpgport_srv.a.p/bsearch_arg.c.o -c ../src/port/bsearch_arg.c",
"file": "../src/port/bsearch_arg.c",
"output": "src/port/libpgport_srv.a.p/bsearch_arg.c.o"
},
{
"directory": "/home/uninstance/dev/c/rel16/build",
It is literally the entire sequence of compilation commands, it will help our
code editor to natigate the source tree. Miscrosoft released Language Server
Protocol specification, it is a standard for code completion/navigation.
For any text editor to provide code completion and advanced features like code
actions, or jumping to definitions in any language, it only has to speak LSP.
Text editors support LSP either natively or via plugins. Neovim, for
instance, supports LSP natively. Clang toolchain provides clangd,
the language server for C and C++. This particular language server requires the
aforementioned compile_commands.json
to work.
In summary, for code completion and navigation you need:
- Text editor of your choice with LSP support.
- C/C++ language server, clangd.
-
compile_commands.json
, generated by Meson.
compile_commands.json
gets generated by Meson for PostgreSQL builds, but it
does not work with extensions at the moment. And, obviously, it will not work
with PostgreSQL versions 15 and older, as they do not use Meson. Bear can
generate compile commands for arbitrary projects, including PostgreSQL
extensions; simply start the build like that: bear -- make
.
Here is LSP in action, it allows me to see all the references of portal
variable:
Tree-sitter is super helpful in navigating the code. It is a generic language parser and syntax tree generator. It integrates well with text/code editors. Among many things it provides a proper syntax highlight based on syntax tree (not regular expressions) and allows navigating the project symbols:
Being a good database administrator is not necessary for being a good database developer. Also, there are no people who know all about PostgreSQL, neither developers, nor administrators. Developers in particular, quite often, specialize on certain areas of PostgreSQL.