You are on page 1of 6

2016/5/20

PostgreSQLUpdateInternals|jeremiah

PostgreSQL Update Internals


2011-04-19

Jeremiah Peschka

I recently covered the internals of a row in PostgreSQL, but that was just the storage piece. I got more curious
and decided that I would look into what happens when a row gets updated. There are a lot of complexities to
data, after all, and its nice to know how our database is going to be aected by updates.

Getting Set Up
I started by using the customer table from the pagila sample database. Rather than come up with a set of
sample data, I gured it would be easy to work within an existing set of data.
The rst trick was to nd a customer to update. Since the goal is to look at an existing row, update it, and then
see what happens to the row, well need to be able to locate the row again. This is actually pretty easy to do.
The rst thing I did was retrieve the ctid along with the rest of the data in the row. I did this by running:

SELECTctid,*
FROMcustomer
ORDERBYctid
LIMIT10;

This gives us the primary key of a customer to mess with as well as the location of the row on disk. Were going
to be looking at the customer with acustomer_id of 1: Mary Smith. Using that select statement, we can see
that Mary Smiths data lives on page 0 and in row 1

Updating a Row
Now that we know who were going to update, we can go ahead and mess around with the data. We can take a
look at the row on the disk using theget_raw_page function to examine page 0 of the customer table. Mary
Smiths data is at the end of the page.

Why is Marys data the rst row in the table but the last entry on the page? PostgreSQL starts writing data
from the end of the page but writes item identiers from the beginning of the page.
We already know that Marys row is in page 0, position 1 because of the ctidwe retrieved in our rst query.
https://facility9.com/2011/04/postgresqlupdateinternals/

1/6

2016/5/20

PostgreSQLUpdateInternals|jeremiah

Lets see what happens when we update some of Marys data. Open up a connection to PostgreSQL using your
favorite interactive querying tool. I use psql on the command prompt, but there are plenty of great tools out
there.

BEGINTRANSACTION
UPDATEcustomer
SETemail='mary.smith@gmail.com'
WHEREcustomer_id=1;

Dont commit the transaction yet!


When we go to look for Marys data using the rst select ordered by ctid, we wont see her data anywhere.

Where did her data go? Interestingly enough, its in two places right now because we havent committed the
transaction. In the current query window, run the following command:

SELECTctid,xmin,xmax,*FROMcustomerWHEREcustomer_id=1;

After running this, we can see that the customers row has moved o of page 0 and is now on page 8 in slot 2.
The other interesting thing to note is that thexmin value has changed. Transactions with a transaction id lower
than xminwont be able to see the row.
In another query window, run the previous select again. Youll see that the row is still there with all of the
original data present; the email address hasnt changed. We can also see that both the xmin and xmax columns
now have values. This shows us the range of transactions where this row is valid.

Astute readers will have noticed that the row is on disk in two places at the same time. Were going to dig into
this in a minute, but for now go ahead and commit that rst transaction. This is important because we want to
look at whats going on with the row after the update is complete. Looking at rows during the update process is
interesting, but the after eects are much more interesting.

Looking at page 0 of the customer table, we can see that the original row is still present. It hasnt been deleted
yet. However, PostgreSQL has marked the row as being old by setting the xmax value as well as setting the
t_ctid value to 000000080002. This tells us that if we look on page 8 in position 2 well nd the newest
https://facility9.com/2011/04/postgresqlupdateinternals/

2/6

2016/5/20

PostgreSQLUpdateInternals|jeremiah

version of the data that corresponds to Mary Smith. Eventually this old row (commonly called a dead tuple) will
be cleaned up by the vacuum process.

If we update the row again, well see that it moves to a new position on the page, from (8,2) to (8,3). If we did
back in and look at the row, well see that thet_ctid value in Mary Smiths record at page 8, slot 2 is updated
from 000000000000 to 000000080003. We can even see the original row in the hex dump from page
8. We can see the same information much more legibly by using the heap_page_items function:

select*fromheap_page_items(get_raw_page('customer',8));

There are three rows listed on the page. The row with lp 1 is the row that was originally on this page before we
started messing around with Mary Smiths email address. lp 2 is the rst update to Marys email address.
Looking at t_infomask2 on row 2 we can immediately see two things I lied, I cant immediately see anything
apart from some large number. But, once I applied the bitmap deciphering technology that I call swear
repeatedly, I was able to determine that this row was HEAP_HOT_UPDATED and contains 10 attributes. Refer to
htup.h for more info about the bitmap meanings.

The HOTness

PostgreSQL has a unique feature called heap only tuples (HOT for short). The HOT mechanism is designed to
minimize the load on the database server in certain update conditions:
1. A tuple is repeatedly updated
2. The updates do not change indexed columns
For denition purposes, an indexed column includes any columns in the index denition, whether they are
directly indexes or are used in a partial-index predicate. If your index denition mentions it, it counts.
In our case, there are no indexes on the email column of the customertable. The updates weve done are going
to be HOT updates since they dont touch any indexed columns. Whenever we update a new row, PostgreSQL
is going to write a new version of the row and update the t_ctid column in the most current row.
When we read from an index, PostgreSQL is going to read from the index and then follow the t_ctid chains to
nd the current version of the row. This lets us avoid additional hits to disk when were updating rows.
PostgreSQL just updates the row pointers. The indexes will still point to the original row, which points to the
most current version of the row. We potentially take an extra hit on read, but we save on write.
To verify this, we can look at the index page contents using thebt_page_items function:

https://facility9.com/2011/04/postgresqlupdateinternals/

3/6

2016/5/20

PostgreSQLUpdateInternals|jeremiah

SELECT*
FROMbt_page_items('idx_last_name',2)
ORDERBYctidDESC;

We can nd our record by moving through the dierent pages of the index. I found the row on page 2. We can
locate our index row by matching up thectid from earlier runs. Looking at that row, we can see that it points to
thectid of a row with a forwarding ctid. PostgreSQL hasnt changed the index at all. Instead, when we do a
look up based on idx_last_name, well read from the index, locate any tuples with a last name of SMITH, and
then look for those rows in the heap page. When we get to the heap page, well nd that the tuple has been
updated. Well follow the update chain until we get to the most recent tuple and return that data.
If you want to nd out more about the workings of the Heap Only Tuples feature of PostgreSQL, check out the
README.

Jeremiah Peschka

search ...

This is Jeremiah

https://facility9.com/2011/04/postgresqlupdateinternals/

4/6

2016/5/20

PostgreSQLUpdateInternals|jeremiah

I live in Portland, OR. I have two cats and a dog.


I'm a Microsoft MVP with a pile of certications. Somewhere along the way, I wrote a database client for Riak
and then handed it o to the community. I also maintain the Stack Overow Data Dump Importer.

Recent Posts

Writing Documentation in Rust


2016-05-10

The State of Rust Docs


2016-04-29

Hijacking stderr in Rust


2016-04-27

Talking About Rusts Traits


2016-04-19

The Basics of Rust Structs


2016-04-12

Subscribe!
Email

SUBMIT

https://facility9.com/2011/04/postgresqlupdateinternals/

5/6

2016/5/20

PostgreSQLUpdateInternals|jeremiah

https://facility9.com/2011/04/postgresqlupdateinternals/

6/6

You might also like