The Weekend DBA: May 2009

Wednesday, 20 May 2009

INSTR

Oracle INSTR function
The Oracle function instr returns an integer indicating the position of the character in string that is the first character of this occurrence.

This function has the following syntax:

instr(string, substring [,position [,occurrence]])

with:
string: the string that is searched.
substring: the substring which we are looking for in the string
position: The position from which we start the search (an integer value). If position is negative, then Oracle counts and searches backward from the end of string. If omitted, this defaults to 1.
occurrence is an integer indicating which occurrence of string Oracle should search for. The value of occurrence must be positive. If this is omitted, this defaults to 1.

Both string and substring can be any of the datatypes CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB. The value returned is of NUMBER datatype
If no value is found, the instr will return the value 0.

Examples

INSTR('CORPORATE FLOOR','OR', 3, 2) = 14
INSTR('CORPORATE FLOOR','OR', -3, 2) = 2
INSTR('ab ab ab','ab') = 1
INSTR('ab ab ab','ab',1,2) = 4
INSTR('ab ab ab','ab',2,2) = 7
INSTR('abcabcabcdef','de') = 7

Wednesday, 6 May 2009

External Tables (again)

External Tables

External tables are operating system flat files (stored outside of Oracle), but whose definition is maintained in Oracle’s data dictionary. External tables appear as read-only tables to Oracle applications. They’re useful for reference to flat files that you don’t want to load into Oracle tables (perhaps the flat files are big and rarely accessed, for example). Here are the steps to set up an external table:

Create a directory object to define the external table. Example:

CREATE OR REPLACE DIRECTORY test_dir AS ‘c:\exttabs\data’;

GRANT privileges on that directory object
Create the external table definition with the CREATE TABLE command. The key clause here is ORGANIZATION EXTERNAL. Use keyword PARALLEL if you want to enable parallel query. By default the access driver will be ORACLE LOADER.

You can not place an index on an external table. Remember that since external tables are actually operating system files, one could bypass Oracle’s security to access the table.

Tuesday, 5 May 2009

Oracle Greatest Function

Oracle/PLSQL: Greatest Function

Here is what the documentation says :
GREATEST- GREATEST returns the greatest of the list of expressions. All expressions after the first are implicitly converted to the datatype of the first expression before the comparison. Oracle compares the exprsessions using nonpadded comparison semantics. Character comparison is based on the value of the character in the database character set. One character is greater than another if it has a highercharacter set value. If the value returned by this function is character data, its datatype is always VARCHAR2.

Syntax:
The syntax for the greatest function is:
greatest( expr1, expr2, ... expr_n )
expr1, expr2, . expr_n are expressions that are evaluated by the greatest function.

Example:
SELECT GREATEST ('A', 'B', 'C') "Greatest" FROM DUAL;
Greatest
--------
C

SELECT GREATEST (1, 9, 11) "Greatest" FROM DUAL;
Greatest
--------
11

SELECT GREATEST ('1', '9', '11') "Greatest" FROM DUAL;
Greatest
--------
9

GREATEST is not to be used on a singular item. Although it will return values with single item, as input the effect is not useful. Use GREATEST when you need to get the greatest values from a list of values. MAX on the other hand will return the max value on a column.

Applies To: * Oracle 8i, Oracle 9i, Oracle 10g, Oracle 11g

Sunday, 3 May 2009

RMAN Commands

RMAN

RMAN is short for Recovery Manager.

When the databases use the controlfile as catalog.

Following is a short list of the most common commands.

To connect to RMAN:>>

rman target /

 
 show full backups:>> list backup of database;
 show backups:>> list backup summary;
 show the archivelogs:>> list archivelog all;
 show the datafiles:>> report schema;
 show RMAN settings:>> show all;
 remove a backup:>> delete backupset #;
 remove several backups:>> delete noprompt backup of database completed before 'sysdate-7';
 remove old archivelogs:>> delete noprompt archivelog until time 'sysdate-7';
 test restore of backup:>> restore database validate;
 test restore of controlfile:>> restore controlfile validate;
 check validity:>> crosscheck backup;
 see what needs backing up:>> report need backup;

Friday, 1 May 2009

Oracle Text Indexing

Text Indexes

Oracle Text Indexing

There is a mechanism in the Oracle database to do more than just index a column. These are called domain indexes, one of which is a text index. (Others include Spatial and XML).

Limitations we Know About.

The limitations of the ordinary character column index are all too apparent to many database developers. The current indexes allow for a match from the first position in the column (surname LIKE ‘SMI%’). The match will find all surnames that start with the initial letters ‘SMI’ and have any number of characters following. To gain any benefit from the index the match must be exact in the initial few characters, including case and spacing. To find a match for a column where only the word ending is known (surname LIKE ‘%ING’ or the word does not start at the beginning position in the column (name LIKE ‘%JOHN%’).

Why?

Context indexes can search millions of rows of documents faster than most LIKE operations can run over 10000 rows of character data. You may (will) notice several new tables (created by and maintained by Oracle) for each text index, these hold the reverse tree blobs used to match tokens (search terms) to documents (columns, URLs, files etc....).

The user CTXSYS maintains all the context operations and metadata.

Starting with Context Indexes

At some stage it becomes necessary to search for a word in a column of text. This is where a domain index shows its strengths.

Superficially the index is created as would a normal index on a text column but includes the following :


CREATE INDEX idx_test_index ON mytab(mycol)
INDEXTYPE IS CTXSYS.CONTEXT;

There are many options available here to do with the storage and treatment of the column, but beyond the scope of this paper.

So there is a text index and what has changed.

For a start the way the data is queried has changed.


SELECT *
  FROM mytab
 WHERE contains (mycol, 'dummy') > 0;

This will return all rows from mytab where the column mycol has the word dummy anywhere contained anywhere along its length.

Example

MYTAB : MYCOL
This is the first row for test
This is the second row for test

Searching for rows with the word ‘this’ in would look something like:


SELECT *
  FROM mytab
 WHERE contains (mycol, 'this') > 0;

And the result...... both rows are returned (case is ignored, though there is the option in the contains clause to ask for a case match).

What if you need to narrow things down more and have more than one word to use in the search but you are unsure of their order or if there are words between your search terms?


SELECT *
  FROM mytab
 WHERE contains (mycol, 'this and second') > 0;

And the result would be the second row.

This is the second row for test

Keywords

From the above query you can see that keyword can be included in the search string to and the list of words is extensive but the most common words are

· AND

· OR

· NOT

· NEAR

· FUZZY

· %

· ?

If you are searching for a keyword you can wrap the word in curly braces {and} and it will not be parsed as an instruction, so you could have a search string that looks like

‘{cats} and {and} and {dogs}’

Scoring

Scoring helps to find the most relevant document. This is similar to the mechanism used by Google etc. to return the most relevant results.

It works by ‘counting’ the number of ‘hits’ in each document and ranking the documents in order of hits, unfortunately this means that – like Google – you may not get the results you want merely the results with the most matches of search terms.

Progressive Relaxation
You may be asked to return results, regardless of the accuracy, and this mechanism allows you to ensure that the ‘bad’ results appear last.

The search term is progressively relaxed – according to rules you set up – and the results gathered and returned according to the level of match they achieved.


SELECT /*+FIRST_ROWS(10)*/
       *
  FROM my_search_table
 WHERE contains
          (searchcolumn,
           'odeon cinematransform((TOKENS, "", "", " "))transform((TOKENS,"", "", " and "))transform((TOKENS, "", "", " , "))transform((TOKENS,"$", "", " and "))'
          ,1) > 0;

In the query above we are searching for odeon cinema and the rules are

Search for the phrase ‘odeon cinema’
Search for the words ‘odeon’ and ‘cinema’ anywhere in the document.
Search for the words ‘odeon’ or ‘cinema’ anywhere in the document.
Search for words ending like ‘odeon’ and ‘cinema’

The query can be rewritten as


SELECT /*+FIRST_ROWS(10)*/
       *
  FROM my_search_table se
 WHERE contains
          (searchcolumn,
           'odeon cinemaodeon and cinemaodeon, cinema$odeon and $cinema',1
          ) > 0;

Which may or may not be more apparent.

Advanced features:

Multi table queries:

So far we have only looked at single column indexes, but it is possible to have one index over multiple columns in the same table, multiple columns across multiple tables and master detail table / columns.

Ignoring words

You can create a personalised stoplist where you do not want the word ‘the’ to be indexed. An index created using this stoplist would return no results if ‘the’ was the only search term.

Thesaurus

You can create a thesaurus that holds all the abbreviations / expansions of words e.g. st – street , rd – road etc..... You would store either abbreviation or expansion and can query on either.

Fuzzy matching

In much the same way that SOUNDEX works you can ask for fuzzy matching either accepting the defaults by using the ? (question mark) operator, or a finer control using the fuzzy operator where you specify the ‘degree’ of fuzziness you require and how many steps removed from the original you are willing to look.

These are probably the most commonly used features of text searching, but there are many more, but as with all domain indexes can be tricky (infuriating) to implement.