Jan 05, 2008 13:51
A couple of days ago I debugged a performance issue that was coming up
when trying to remove rows from tables in a PostgreSQL database.
There were three tables involved:
parent:
id SERIAL PRIMARY KEY
child_a:
id SERIAL PRIMARY KEY
parent_id INTEGER NOT NULL ON DELETE CASCADE
child_b:
id SERIAL PRIMARY KEY
parent_id INTEGER NOT NULL ON DELETE CASCADE
The parent table contained about 1400 rows, 700 of which I wanted to
delete. The child_a and child_b tables contained 5 million and 2
million rows, respectively, related to the 700 parent rows I wanted to
delete. The first thing I did was run a delete query without changing
anything about the schema. The query took 198 minutes to run.
The first obvious problem, exposed by PostgreSQL's EXPLAIN statement,
was that the foreign key columns in the child tables weren't indexed.
Running the same delete query took 97 minutes after adding indexes for
the parent_id columns in the child_a and child_b tables. A
significant win, but still far too long. It's not acceptable to run a
query on our production servers that would effectively make the
database unusable for 97 minutes.
I dug further into PostgreSQL's EXPLAIN documentation and discovered
EXPLAIN ANALYZE. When the ANALYZE keyword is provided the query is
executed to help produce a more detailed query plan. Running the
query with ANALYZE showed that a significant portion of the time was
taken up with foreign key triggers. I wondered if the issue was
related to foreign key constraints. I removed the constraints
entirely and ran queries to remove the same set of data. They ran in
77 seconds. What a huge performance gain! I then wondered if the
issue wasn't really with the foreign key constraints themselves but
with the ON CASCADE DELETE rules. I added the constraints back
without the ON CASCADE DELETE rules and the delete query ran in 148
seconds. That's still fast enough without giving up the safety
provided by foreign key checks.
The conclusion is that ON CASCADE DELETE is not suitable for anything
beyond very small datasets.
postgres database