Fast Parallel Transaction Recovery

A colleague showed me an interesting scenario, after a restore and start up of a database we noticed that we had a lot of work being performed, but no users logged on.
We could see a lot of parallel processes running as SYS.

The processes was waiting on Other. When we selected wait event Other, we noticed that we where waiting on “Wait For a Undo Record”.Smon was waiting for “Wait for stopper event to be increased”

Background

On Wednesday that week the customer requested a restore of the database, at that point we had a hugh delete running.. The delete was on a table with 400 Million rows
and it has been running since Monday the same week.

Analyse

If we have long running transactions (insert,delete,update) aborted by killing the session or aborting the instance,we must clean up the aborted transaction.
We can expect the clean up to take a long time. SMON is handling the roll back and will use parallel roll back when appropriate.
(when recovery work reaches a certain limit)

When the limit is reached smon will start parallel processes handling the roll back.For some transactions PQ slaves can start competing with each other about resources,
when that happens parallel transaction recovery can be slower then serial transaction recovery. At times the database appear to hang when doing parallel transaction recovery.

Parallel Rollback is handled by parameter fast_start_parallel_rollback, valid values are:

False

Parallel rollback is disabled

Low (default)

DOP can be maximum 2 * CPU_COUNT

High

DOP can be maximum 4 * CPU_COUNT

Note Do not change the parameter dynamically on a system with many active transaction, it can cause severe problems.
Rollback strategy for active transactions may change. Do this change with a restart of the database.

How to solve

This type of clean up jobs must be allowed, so the best way is to wait for it to finish.We can monitor the progress by
accessing the view v$fast_start_transactions.

We have some options to influence the performance by adjusting the parameter fast_start_parallel_rollback.
We can also temporarily postpone the clean up by using an event, but remember to find the time to allow
the clean up to finish.

Please read the notes listed below for details how to monitor and handle problems related to fast parallel transaction rollback.
 

References:

  • Database Appears to Hang Waits for “Wait for a undo record” and “Wait for stopper event to be increased” Due to Parallel Transaction Recover [ID 464246.1]
  • Database Hangs Because SMON Is Taking 100% CPU Doing Transaction Recovery [ID 414242.1]
  • Oracle® Database Reference
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s