Build A PostgreSQL Database

PostgreSQL is a free and open-source relational database management system (RDBMS) and is widely used as the primary database for many applications. In this article we will show you how to build a PostgreSQL database on a Linux machine.

Here are syntax notations we use to show the shell and psql commands:

$ psql -d postgres
postgres=# \q
$ 

The "$" sign is the Linux shell prompt. The PostgreSQL client program psql can be used to interact with database server and make queries. The "postgres=#" is the prompt of interactive mode. The "\q" command exits the interactive mode and brings us back to the Linux shell.

Let's start to build the database.

Initialize PostgreSQL Data Directory

Login to a user account that will be the owner of PostgreSQL database. From the PostgreSQL installation, a user account "postgres" may be created, but we can initialize database cluster in any account. For simplicity, we will choose an existing account, codespace, for our database cluster.

Choose or create a directory for PostgreSQL database. This is where the famous environment $PGDATA points to. The directory can be passed as an option (-D) or set as an environment. Let's set it as an environment and so we do not need to type it over and over again:

$ export PGDATA=~/database
$ initdb

After initialization, the contents in the directory look like

$ ls ~/database/
PG_VERSION  global        pg_dynshmem  pg_ident.conf  pg_multixact  pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspc    pg_wal   postgresql.auto.conf
base        pg_commit_ts  pg_hba.conf  pg_logical     pg_notify     pg_serial    pg_stat       pg_subtrans  pg_twophase  pg_xact  postgresql.conf

What we initialized is a new PostgreSQL database cluster. To bring up the PostgreSQL server from the directory, we can

$ pg_ctl start

To understand the files in the directory, let's get familiar with some PostgreSQL terminologies and its object hierarchies:

The object hierarchies form the logical structure of PostgreSQL database as shown below:

All objects in PostgreSQL are managed by their object identifiers (OIDs). The objects are stored in the tablespaces. The tablespace defines the location where PostgreSQL places the physical files of the database objects.

Two default tablespaces are automatically created by initdb: pg_global and pg_default. The pg_global is located at directory $PGDATA/global and the pg_default is at $PGDATA/base.

The pg_global tablespace is used for shared system catalogs. The pg_default tablespace is the default tablespace of the template1, template0 and postgres databases. Let's connect to the default postgres database and have a look of these two tablespaces:

$ psql -d postgres
postgres=# \db+
                                  List of tablespaces
    Name    |   Owner   | Location | Access privileges | Options |  Size  | Description 
------------+-----------+----------+-------------------+---------+--------+-------------
 pg_default | codespace |          |                   |         | 22 MB  | 
 pg_global  | codespace |          |                   |         | 565 kB | 
(2 rows)

postgres=# select oid, spcname from pg_tablespace;
 oid  |  spcname   
------+------------
 1663 | pg_default
 1664 | pg_global
(2 rows)

The two default tablespaces have OIDs 1663 and 1664, respectively. OIDs are 4 byte integers generated internally by the system. Let's also have a look of three default databases generated:

postgres=# \l
                                                     List of databases
   Name    |   Owner   | Encoding | Locale Provider | Collate |  Ctype  | ICU Locale | ICU Rules |    Access privileges    
-----------+-----------+----------+-----------------+---------+---------+------------+-----------+-------------------------
 postgres  | codespace | UTF8     | libc            | C.UTF-8 | C.UTF-8 |            |           | 
 template0 | codespace | UTF8     | libc            | C.UTF-8 | C.UTF-8 |            |           | =c/codespace           +
           |           |          |                 |         |         |            |           | codespace=CTc/codespace
 template1 | codespace | UTF8     | libc            | C.UTF-8 | C.UTF-8 |            |           | =c/codespace           +
           |           |          |                 |         |         |            |           | codespace=CTc/codespace
(3 rows)

postgres=# select oid, datname, dattablespace from pg_database;
 oid |  datname  | dattablespace 
-----+-----------+---------------
   5 | postgres  |          1663
   1 | template1 |          1663
   4 | template0 |          1663
(3 rows)

Three default databases have OIDs 1, 4, and 5. They are used as subdirectory names under $PGDATA/base (location of tablespace pg_default) to store database files:

$ ls -l database/base/
total 12
drwx------ 2 codespace codespace 4096 Nov 21 03:18 1
drwx------ 2 codespace codespace 4096 Nov 21 03:12 4
drwx------ 2 codespace codespace 4096 Nov 21 03:19 5

The template0 is the standard system database and should never be changed after the database cluster has been initialized. It contains the standard objects predefined by PostgreSQL. The initial contents of template1 database is same as template0. You can add more objects to template1, and these objects will be copied into subsequently created user databases. Thus it is the "template" from which the new database is made. The postgres is the default database for users and applications to connect to. It is simply a copy of template1 initially and can be dropped and recreated if needed later.

Create Tablespace

By default, the default tablespace pg_default is used for all newly created objects. We can create new tablespaces for storing the user objects. To do so, choose a directory (or create it) and do

pregres=# create tablespace mytblspc location '/home/codespace/tablespace';
pregres=# \db+
                                           List of tablespaces
    Name    |  Owner   |          Location          | Access privileges | Options |  Size   | Description 
------------+----------+----------------------------+-------------------+---------+---------+-------------
 mytblspc   | postgres | /home/codespace/tablespace |                   |         | 0 bytes | 
 pg_default | postgres |                            |                   |         | 22 MB   | 
 pg_global  | postgres |                            |                   |         | 565 kB  | 
(3 rows)

postgres=# select oid, spcname from pg_tablespace;
  oid  |  spcname   
-------+------------
  1663 | pg_default
  1664 | pg_global
 16388 | mytblspc
(3 rows)

Now we have a new tablespace of 0 size and OID 16388. PostgreSQL creates a symlink in $PGDATA/pg_tblspc to its location:

$ ls -l ~/database/pg_tblspc/
total 0
lrwxrwxrwx 1 codespace codespace 26 Nov 22 15:48 16388 -> /home/codespace/tablespace
ls -l ~/tablespace/
total 4
drwx------ 2 codespace codespace 4096 Nov 22 15:48 PG_16_202307071

Inside the tablespace directory, a PG_server_version directory has been created and all objects created in this tablespace will be put into this directory.

Create Database

By default, all objects are created in default tablespace pg_default unless specified by tablespace clause in create statement. Let's create a database in the newly created tablespace:

postgres=# create database mydb tablespace mytblspc;
postgres=# select oid, datname, dattablespace from pg_database;
  oid  |  datname  | dattablespace 
-------+-----------+---------------
     5 | postgres  |          1663
 16389 | mydb      |         16388
     1 | template1 |          1663
     4 | template0 |          1663
(4 rows)

A new database is created with OID 16389 in tablespace mytblspc (OID 16388):

$ ls -l ~/database/pg_tblspc/16388/PG_16_202307071/
total 4
drwx------ 2 codespace codespace 4096 Nov 22 16:06 16389

If we are not going to use the default postgres database, we can swich to the new database and drop postgres database:

postgres=# \c mydb
mydb=# drop database postgres;

mydb=# \l
                                                     List of databases
   Name    |   Owner   | Encoding | Locale Provider | Collate |  Ctype  | ICU Locale | ICU Rules |    Access privileges    
-----------+-----------+----------+-----------------+---------+---------+------------+-----------+-------------------------
 mydb      | codespace | UTF8     | libc            | C.UTF-8 | C.UTF-8 |            |           | 
 template0 | codespace | UTF8     | libc            | C.UTF-8 | C.UTF-8 |            |           | =c/codespace           +
           |           |          |                 |         |         |            |           | codespace=CTc/codespace
 template1 | codespace | UTF8     | libc            | C.UTF-8 | C.UTF-8 |            |           | =c/codespace           +
           |           |          |                 |         |         |            |           | codespace=CTc/codespace
(3 rows)

Notice that the prompt has been changed to "mydb=#" and remember next time you should use mydb to connect to the database server:

$ psql -d mydb
< PreviousNext >