Sunday, 26 July 2009

Parts of the Database (5) - Data Blocks

Data Blocks

If you could only pick one thing in Oracle to know well, you should pick the DATA BLOCK. It is the key to everything worth doing in Oracle.

If you can understand the data block, everything else in the Oracle RDBMS is easy. EVERYTHING.

Another name for an Oracle data block is a "logical block", but you won't hear many people call it that and I'm glad. Because for now I want you to consider it a physical construct in our layers of logical and physical pieces.

That's because data blocks impact/influence/determine storage, backup and recovery, SQL performance, memory usage, I/O, scalability, and probably anything else I haven't mentioned.

And the cool thing is, data blocks are very easy to understand.

---------------------------------------------
A lot of my developer friends never get proper database instruction. As a result they are forced to guess about how an RDBMS like Oracle works. They know that databases store data and that a document server stores data. So it's easy for them I think to make the analogy that a database is just a fancy document storage tool with some fancy retrieval/manipulation tools. And they think, "So what? My app is cooler than that." So they don't invest too much time or attention in this "flat file repository."

The ones that do have the time and interest sometimes are discouraged when they approach a DBA who is busy with his/her own work and may not even have a mastery of the basics themselves (people get mad when I say it, but if you can run a script, you can find someone who will hire you as a DBA).

That's why you hear DBA's say "it's complicated" and developers say "Oracle sucks."

Hopefully I can do a good enough job explaining these basics that you can avoid these retorts.

--------------------------------------------


WHAT IF ORACLE DIDN'T USE DATA BLOCKS?

If Oracle didn't use data blocks, then data would be stored in one big file and developers would be right to be unimpressed. That would mean that the database would just be a big sequential file and it would be a mess to get the data in and out, you'd have to scan the whole thing every time you wanted the data.

At the other extreme, you could save every row of data as a separate file. Then you'd...well, it's not hard to see how unworkable that would be.

In short, without data blocks, Oracle really would suck.

Data blocks are the way Oracle groups things like rows into manageable sets. How many rows you can fit into a data block depends on how big your row is and how big you make your data block.

And the beautiful thing is, you never have to deal with data blocks directly. You just have to understand them so you can design how you want to use them.

HOW BIG IS A DATA BLOCK?

As big as you tell it to be (kinda).

Oracle doesn't force you into an arbitrary data block size. You determine what your needs are and pick an appropriate size. The most common sizes I see for data blocks in regular (OLTP) databases are 4k or 8k.

-------------------------------------------------------
NOTE: check the block size of your OS before picking a block size, you want the data block size to be some multiple of the OS block size (1...64?) to avoid unnecessary IO. OS's read from disks in blocks as well and you don't want an uneven relationship between the OS and Oracle block sizes.

Now Oracle even allows you have multiple data block sizes within the same database so you can really customize it to your needs.

--------------------------------------------------------

The basic idea in choosing a size is that if you do a lot of single row operations --you tend to like a smaller block size; and if you're doing primarily bulk operations --you tend to like a larger block size. That's mainly because of performance reasons, but we'll get to those when we start discussing SQL.

For now let's just say that for a transactional database we'll use a 4k block size and for a reporting database we'll use a 16k block size. Now let's forget about the reporting database and just focus on the transactional database for this discussion.

I have data blocks that are 4k in size. That means if I have rows that are 1k each (not a small row, but I'm keeping the math easy), I can squeeze 4 rows into each of those data blocks.

Got it? Ok, now forget it, because you won't be able to get 4 1k rows into that block. That's because Oracle is not a flat file repository. And there are two extra parts of a data block that Oracle gives you to make things better: data block overhead and empty space.

FREE SPACE

--The empty space is easy. The whole idea is that if you have a row of 1k which includes a text (VARCHAR2) column with the value "Dratz is a jerk", what if you want to update that record to a value of "Dratz is a genius, I hope he DOESN'T get eaten by a bear."

Dratz is a jerk.
Dratz is a genius, I hope he doesn't get eaten by a bear.

See how much more space the second sentence takes up? If your data block were full, how could you update it? Oracle would let you, it would just move that row to another block that has room to fit it all in and leave a note behind telling the query engine where to find the real row (row migration). That means two reads instead of one to get that data. So it's a good idea to leave free space in every data block that you think has a chance of being updated to included more bytes. Read only data is not updated, so you don't have to waste free space with those.

The free space also allows new rows to be added so you don't have to allocate new data blocks. When you delete rows, you are also freeing up space in a data block that be reused by new rows or expanding existing rows.

OVERHEAD

--The overhead contains some really neat pieces. In later chapters I'll go into some more detail and explain how and why they are used.

First, there are 2 header pieces.

One is fixed in size for every data block and contains information about the data block, like what kind of block it is (data, etc.) and what the unique block address is. That may not seem very useful at this point, but we'll see how important they are later.

The other header piece is variable in size and the important piece to remember here is ITL (Interested Transaction List). This is what allows concurrency.

No comments: