Skip to content

Cures

There are two sorts of responses to impaired database performance: immediate and delayed.

Immediate responses tend to be riskier and more consequential, but in some cases, there are no alternatives. If transactions or memory issues are causing the database to fall over, it is sometimes necessary to:

  • Cancel or kill idle/blocking transactions.
  • Restart the database (dire situation).
  • Optimise troublesome queries.
  • Add indices.
  • Rework application logic.

Cancel a query

Cancelling or killing a query is an extreme measure; stopping a data-modifying query can result in data loss, and even cancelling a SELECT query can cause a POST or PATCH operation to fail. As such, we should only cancel or kill queries when we can see that the query is related to a GET endpoint or is otherwise not likely to affect data integrity.

In some cases, however, we may be forced to cancel or kill a data-modifying query. If the query in question is likely to be stuck for an extended period of time and is forcing more and more other stuck queries to build up, it may be worth ending it. There is no hard and fast rule here; we need to treat each case on its own merits and in the context of what else is happening in the DB at that point in time.

Before cancelling or killing a process, however, it is important to copy the query text so that if specific objects are being referenced, we can track them and attend to any issues after the process has been stopped.

To cancel a query, first, get the process ID from the blocked queries script results, then run the following, making sure to replace <pid> with the process ID we found previously:

SELECT
   pg_cancel_backend(
       <pid>
   );

It can take anywhere from a few seconds to a couple of minutes for a query to be cancelled, but in some cases, the system will be unable to cancel it. In these cases, we can resort to killing the process:

SELECT
   pg_terminate_backend(
       <pid>
   );

Only terminate a back-end process if it has been cancelled multiple times and refuses to be cleared from the active processes list; this command has fewer safeguards than the cancel command, so we should only use it if absolutely necessary.

Restart the database

There are some cases we have encountered where the only solution appears to be to restart the database. These are usually due to very expensive queries filling up buffer memory, which results in the buffer IO wait events building up continuously even after the offending query has been terminated. It is important to ensure that we deal with the underlying cause or query that led to such a dire situation, as otherwise we will end up in exactly the same position. By the time we resort to restarting the database we should have already identified the queries that were responsible for it, and we should prioritise fixing those queries as soon as possible.

Optimise troublesome queries

Optimising queries is an entire subject all by itself. The very basics of it are:

  • Run EXPLAIN/ANALYSE for the query (caution: ANALYSE executes the query, so don’t use it for UPDATE/DELETE/INSERT queries).
  • Look for expensive steps.
  • Try to rewrite the query so that it avoids the expensive steps.

Sometimes, there isn’t a better way to write the query, so we either need to add a missing index or rework the application’s logic.

Add indices

In many cases, an index will help tremendously in executing a slow query. There are cases when an extra index can confuse the planner, however, so it is important to work through the EXPLAIN/ANALYSE step before trying to add indices. This is another deep topic, so go for more specific information and examples. Note: Remember to add indices using the CONCURRENTLY keyword. Not doing so will lock the table, which can be disastrous for large write-heavy tables.

Rework application logic

In some rare cases, there is no way to speed up the query, so we need to resort to changing how we interact with the database. This can be done through rollups or batching, requiring a different approach for each problem.