The Weekend DBA: The Background Processes

A memory area, like the SGA, is useless unless something (or some many things) can write data into the memory area and read data out of it. You need, in short, a set of processes which know how to interact with the various memory areas which make up an Oracle SGA. There are indeed several such processes, and they are collectively known as background processes, because they run and do their work in the background, not interfering in any obvious way with what the users of the database are trying to achieve in terms of practical work outcomes.

There are five compulsory background processes, and all the rest are optional. Taking them in turn...

SMON (System Monitor)

SMON checks the database for consistency and health every time you start up an instance and try and use it to open a database. If the database is, for whatever reason, inconsistent SMON will either fix up the problem entirely automatically if it is in its power to do so, or it will prompt you to rectify the problem if it can't do it itself. SMON, in short, either performs or prompts for the performance of some sort of recovery. A recovery that SMON can perform entirely automatically is known as an instance recovery. One which requires manual intervention from the DBA is known as a media recovery.

A reasonable question to ask is, 'On what basis does SMON assess a database as being healthy or in need of recovery?'. To which the answer is: checkpoint change numbers. Periodically, the database is "timestamped" with a new sequence number (the latest number from a pool of ever-increasing sequence numbers). So long as every file in the database has been stamped with the same number, we can say that the database is 'consistent to that point of time'. If all the checkpoint change numbers that SMON reads during the database opening procedure, and which are stored in the headers of every physical file constituting the database, are the same, the database is healthy. If they are not all the same, and if there are no good reasons for the difference, the database is presumed to have suffered some sort of failure and a recovery procedure is therefore invoked.

CKPT (Checkpoint)

It is another reasonable question to then ask, 'What writes the checkpoint change number, which SMON reads, into the headers of each database file?'. To which the answer is, 'Another background process called Checkpoint, abbreviated as CKPT'. The occasions when CKPT writes its information into the headers of all database files are known as checkpoint events. Every transaction that takes place on an Oracle database (that is, every insert, update or delete of a record) causes a system-wide sequence number called the System Change Number or SCN to be incremented. At a checkpoint event, the latest SCN is deemed to be the latest Checkpoint change number, and it is this CCN which is written by CKPT into the headers of all database files.

It is fair to say that, during prolonged database operation, CKPT will forever be writing new checkpoint change numbers into the headers of all the database files; it is SMON's job, at the opening of the database, to ensure that every constituent component of the database agrees on what the latest checkpoint change number should be. If they are all in agreement, the database opens. If there is disagreement, recovery is called for.

To make things even simpler: SMON reads what CKPT writes, and both processes are intimately involved in identifying when databases need recovery, and how much recovery they need, if any.

DBWn (Database Writer)

Database Writer is the process responsible for periodically saving the modified contents of the Buffer Cache back down to the physical database files. By writing modified buffers back to disk, Database Writer makes that data safe, immune from the vagaries of operating system crashes or power failures.

Since a buffer in memory that contains modified data is known as a dirty buffer, and because Database Writer's job is, in effect, to re-synchronise the version of data on disk with the version of data held in the Buffer Cache, we sometimes say that Database Writer's job is to clean buffers.

Furthermore, since a clean buffer only contains data which could simply be re-read off disk (if we had to), it is OK to over-write the contents of a clean buffer. By contrast, a dirty buffer is the only version of a modified buffer -it hasn't yet been saved to disk- so it would not be very sensible to let you over-write the contents of a dirty buffer. Therefore, we can also say that Dataabase Writer's job is to permit the recycling of the Buffer Cache: by 'cleansing' buffers, it makes them re-usable.

Although we call the process 'Database Writer', since 8.0 it's been possible to have multiple independent database writers, and that's why you see an 'n' in its abbreviated name: on a production system where several writers have been configured, you'd see processes DBW0, DBW1, DBW2 and so on.

LGWR (Log Writer)

As Database Writer takes the contents of the Buffer Cache and writes them to disk, so the Log Writer background process periodically saves the contents of the Log Buffer to disk. The Log Buffer, you will recall, stores redo -a series of instructions about how to re-perform (and hence recover) data modifications. Log Writer therefore makes the record of transactions that was stored originally in memory permanently safe by writing them to hard disk.

In the process, Log Writer permits the 'recycling' of the contents of the Log Buffer, just as Database Writer allows data buffers to be re-used: you can't over-write the contents of any part of memory if they're the only record you've got of something happening to the database, but once there's a copy of the old stuff on disk, the memory in the Log Buffer can be re-used by new redo entries.

There is only ever one Log Writer process. Unlike Database Writer, for example, which can have several 'clone' processes sharing the work burden, redo log records must be written to disk sequentially -and that means only one process can be in charge of working out what to write to disk. If that wasn't true, you might come to grief when a series of transactions have dependencies. If you first create a new product record and then raise a new sale record to indicate that the new product has just been sold, for example, you need the product transaction to be recorded before the sale transaction -otherwise, during a recovery, you might end up trying to sell a product that doesn't exist yet. To prevent that logical nonsense from happening, therefore, a single Log Writer process writes everything in the correct sequence.

Whilst it guarantees the correct order of events, though, only have one Log Writer does sometime mean that Log Writer becomes a bottleneck on the system: there's no way to speed him up by sharing his work amongst other processes, for example. That's why the files to which Log Writer saves data (called redo logs) should ideally be stored on the very fastest hard disks you have available. It's far more important to let Log Writer write as fast as possible than it is to let Database Writer write quickly, for example.

PMON (Process Monitor)

Process Monitor keeps an eye on all the other processes. If any of them hang or die, PMON can try to re-start them -and sometimes it actually succeeds! If it can't succeed, then it takes appropriate action -which is, quite often, to crash the entire instance. PMON also keeps an eye on users: if they are in the middle of a transaction, for example, and their PC hangs or the network cable gets disconnected, PMON will spot the abnormal disconnection of the user and rollback whatever transaction they happened to be in the middle of. That means locks on rows are removed, memory is freed and any other 'database resources' the user was consuming at the time get reclaimed.

ARCn (Archiver)

I've mentioned already that Log Writer allows us to re-use the Log Buffer by periodically saving its contents to disk, by writing them to one of the online redo logs. If it writes enough redo to these logs, Log Writer must eventually run out of space -and must, at that point, begin to overwrite the redo previously stored in the online logs. Put another way: online redo log contents are recycled. That in turn must mean there is a limit to how far back you can go and still hope to be able to re-perform transactions during a recovery -and that is indeed the case.

Log Writer on its own protects the database for only the relatively short amount of time it takes to fill up all online logs. If you want more protection than that, you need to copy the contents of the online logs out to some more permanent store before you over-write them... and that is exactly what the Archiver Process does. Once one online log has filled up, it copies its contents to an archive redo log. The online log can then be over-written whenever LGWR feels like it and yet there is a continual stream of redo available from the archives. Recoveries going back days, weeks or months therefore remain entirely possible.

Archiver therefore makes a permanent record of redo. With that permanent record, you can guarantee recovery of a database without loss of committed data. Without it, you have to pray the redo you want to re-play is still available in the online logs, and it might or might not be. Technically, therefore, although Archiver is an optional background process, every production database that cares about not losing committed data would have it switched on.

You'll note that the process name abbreviation includes an 'n', like Database Writer does. That's for the same reason, too: just as you can have up to ten Database Writers, so you can have up to ten Archivers, and hence the processes get named ARC0, ARC1, ARC2 and so on.

Others

I've just listed the six background processes which are most closely involved in saving your data, saving the instructions for how to modify your data and checking the health and consistency of your database. Five of them are compulsory and one (ARCn) should be considered almost so.

There are, however, dozens of additional background processes that might be running in any given Oracle instance, depending on what extra software has been installed or what additional database options have been implemented. If you configure an Oracle RAC, for example, you will immediately get five additional background processes (LMON, LMD, LMS, LCK, DIAG). If you start to schedule database jobs, you'll see a new process called CJQ0 ("coordinator of the job queue"). If you configure to run in what is known as Shared Server mode, you'll see a lot of processes called S000, S001 and so on (shared servers).

When you install or configure these 'exotic' configurations of Oracle, it's as well to know what the extra processes are called and what they do (and how they do it). But for now, you simply need to know that there can be many more background processes running than the quick half dozen I've described to you so far -but that half dozen are 'core', and every other background process you might see are configurable extras.

The Weekend DBA

Thursday, 6 August 2009

The Background Processes