Sunday, 9 August 2009

The Database

Whilst an Oracle instance is made up of many different and various memory areas together with countless different background processes, each doing a specific task, an Oracle database consists of just three types of physical disk file. Theya re all binary files (so don't go opening them in vi or notepad!) and have an internal structure that only an Oracle instance can make sense of. Taking each of the file types in turn, therefore, we have:

Control Files

Every database must have at least one control file. It is a small binary file (typically, no more than about 10MB in size) and has two main jobs. In the first place, it points to the physical location of all the other files that make up the database. That is, it physically stores the full path and file name of every other constituent file. Secondly, it stores the latest checkpoint change number, which is the timestamp written by the CKPT process at the end of each checkpoint event. This is the "CKPT timestamp" I mentioned earlier.

Other files in the database get timestamped by CKPT at every checkpoint -but the control file knows how to find every other file in the database, it must be the one file which is read first whenever a database is opened. That means the timestamp stored in the control file is read first, and is assumed to be the timestamp to which all the other files in the database will agree. In a sense, therefore, you can say that the control file is the database's master clock.

In short, the control file tells us where everything else in the database can be found and how old it should be when we find it. If you lose your control file, therefore, you are in deep trouble: not only can your instance not find out where every other file is located, but it won't have any idea what checkpoint change number everything should be up to, and therefore SMON has no basis for working out whether the database is clean and consistent. For this reason, although you technically only need to have one control file, sensible people will multiplex their control files, so that the database sees two (or more) copies of it, and will therefore maintain both (or all) copies identically. At that point, if you lose one copy of the control file, there's another copy which you can use to recover with, and no major harm is done.

Note that multiplexing your control files is not the same thing as mirroring them. In a hardware mirror, the instance sees and maintains only one control file copy. Whatever is done to that one file, the hardware kicks in and replicates to a mirror copy. Multiplexing means telling the instance that there are two or more files to write to, and anything written by the instance to one copy is written afresh by the instance to the second copy. To really protect your control file, you really want to do both: multiplex two or three times, and then mirror: potentially, you'll have four or six copies of the control file, and if one copy is lost, there are three or five others you can rely on for recovery purposes.

Data Files

Data files are, funnily enough, where your data is stored, in tables, indexes, clusters and other more exotic kinds of storage structures. All Oracle databases start out with at least two data files: one called system and the other called sysaux (sysaux is new in 10g). Both these files store special system-related data, such as the data dictionary tables which tell us what users exist, what privileges they have, what tables have been created and what columns they contain. Although there's nothing stopping you from creating your own tables inside system or sysaux, it's a bad idea to do so: having good access to the data dictionary is extremely important, because it's needed every time a SQL statement gets parsed. Having your own tables being worked on inside the same file would cause I/O issues, and slow down parsing.

Therefore, although every database starts out with just two data files, you will soon begin to add more -and in fact you can add up to another 64,000 or so data files if you really needed to, each up to several Terabytes in size. The entire collection of data files in an Oracle database can therefore get significantly big!

Internally, data files are organised into discrete 'chunks' of data known as an Oracle block. Blocks are usually 2K, 4K, 8K, 16K or (on some O/Ses only) 32K in size. Once you've specified a block size, that's it: you can't change your mind and ask for an 8K-block data file to re-organise itself so that its blocks are now 16K in size. Whatever block size you decide to use, that is then the smallest unit of I/O that Oracle will use when reading or writing to the data file: ask to see one column of one row in a table, and an entire 16K (or 8K etc.) block is read instead.

The very first blocks of any data file are known as the data file header, and in the header of every data file, CKPT periodically writes the latest checkpoint change number. All data files in a healthy database must have the same checkpoint change number stored in their headers, and all must agree with whatever checkpoint change number CKPT also wrote into the control file. If these conditions are met, we say that the database is consistent. (Note that there one or two exceptions to the 'every file must have the same checkpoint number. Files which are marked read-only, for example, are allowed to be different).

Although a data file doesn't care what data is stored inside itself, it is common practice to make sure that each data file only stores one 'type' of data. For example, you will often see a data file which stores only index data, and another one which stores only tables -just as the system data file is supposed to store only the data dictionary information. Such 'specialisation' in the type of data stored in a data file is a management convenience only and it requires constant vigilence to make sure an index, for example, is not accidentally created inside the tables' usual data file (though no actual harm will arise if such a thing were to occur).

Online Redo Logs

There must be a minimum of two online redo logs, which are then each known as a redo log group. Each group is usually not that big (say, a couple of hundred megabytes), but that obviously depends on the specific requirements of each database installation.

Log Writer saves the contents of the Log Buffer to one of the log groups -which is therefore known as the current log group. When the current log is filled to capacity, Log Writer switches to writing to the other group. When that in turn is filled, Log Writer switches back to the first group and begins to over-write its contents. In this way, a couple of relatively small online logs are re-used in a cyclical fashion.

The contents of the online logs is known generically as redo. Redo lets the database re-perform transactions should the database ever need recovering. Therefore, losing redo is not good -in fact, it's about as close to a disaster as it ever gets with Oracle. Therefore, just as with the control files, each online redo log group should be multiplexed so that Log Writer writes into multiple copies of the same group. Lose one of the copies (known as a redo log member), and the other member is still able to be used and no redo has been irretrievably lost. Multiplexing of redo log groups is not actually compulsory... but if you care about not losing any of your data, it ought to be considered so.

Whereas data files are internally made up of Oracle blocks, redo logs are internally made up of file system blocks which are almost always 512 bytes in size. That size is a product of how a disk is formatted with a file system and is therefore not configurable from within Oracle at all. Some documentation available on the Internet refers occasionally to 'redo log blocks', but they're really just talking about 'file system blocks'.

Summary of the Oracle Database

Oracle databases can be huge: up to 8 Exabytes (that's eight million terabytes!) in fact. But however big they get, they are always only ever made up of three sorts of file: a 10MB-ish control file, two or more 100MB-ish online redo logs, and large numbers of extremely large data files. (The sizes mentioned here are just typical sorts of values. Specific requirements will mean each database is different, though).

Just as we can say 'An instance = SGA + Background Processes", so we can therefore say:

Database = Control Files + Data Files + Online Redo Logs

Each file type has a specific job to do. Data files store data (yours and the database's internal housekeeping variety). Redo logs store redo, to use in case of recovery of transactions. Control files store the location of every other file and the master timestamp for the database as a whole.

All database files are forever being read from, and written to, by background and other processes, and losing any one of them is not an option. If you lose just one of these files, your database must be considered damaged; it will probably cease functioning and the instance managing it will probably crash; and you will have to perform some sort of recovery. We can use multiplexing techniques to minimise the damage done by the loss of one file; we can also use hardware mirroring to protect against hardware failure. A truly safe database will be using both techniques.

No comments: