Friday, March 04, 2005

 

[Oracle]Query data faster using sorted hash clusters

When data is stored in a normal table, the rows are physically stored in allocated blocks in the order in which you insert them into the database.


For example, if you have a table of information about employees, the employees' names would typically be stored in the table in the order in which they were added to the table.

If you have a large number of employees, the table would gradually get slower. You could speed up employee queries if you choose a column that gives a relatively equal distribution of values, such as the department number of the employee, and if you create a cluster table.

In a cluster table, if the employees are in the same department, the rows would physically be stored in the same set of blocks. This makes queries for employees faster since it requires fewer database block reads to retrieve the employees for a specific department. In the non-clustered table, you might have to read every database block to find all the employees.

When you have a large number of keys, you'll start to see performance problems because now you have many cluster blocks. One way to get around this is by providing a hash function to restrict the number of cluster blocks. A hash function takes a numerical value and restricts the range to a predetermined number, but it provides a relatively equal distribution of values. For example, you might create a hash function on the department number that only looks at the last two digits.

One problem with hash functions is that the function value tends to randomize the order that rows are naturally returned. You can usually fix this with an ORDER BY; however, there are cases in which there are a large number of records. Oracle 10g fixes this problem by allowing you to define a "natural order" to data so you can retrieve hash cluster data in the desired order without sorting.

For example, suppose you maintain a database of credit card transactions. You decide that using the credit card number as a cluster key will give you a good distribution of data. But, because there are a large number of credit cards, you use a hash function to restrict the number of cluster blocks. Since you want your data to come back in chronological order for most of your reports, use a sorted hash cluster rather than using ORDER BY in every query.

Here's the syntax:

create cluster credit_cluster
(
card_no varchar2(16),
transdate date sort
)
hashkeys 10000 hash is ora_hash(card_no)
size 256;

create table credit_orders
(
card_no varchar2(16),
transdate date,
amount number
)
cluster credit_cluster(card_no,transdate);

alter session set nls_date_format = "YYYYMMDDHH24MISS";
insert into credit_orders (card_no,transdate,amount)
values ('4111111111111111','20050131000123',57.99);
insert into credit_orders (card_no,transdate,amount)
values ('4111111111111111','20050130071216',16.59);
insert into credit_orders (card_no,transdate,amount)
values ('4111111111111111','20050131111111',39.00);
insert into credit_orders (card_no,transdate,amount)
values ('4111111111111111','20050130081001',25.16);

Notice that I use the new function ORA_HASH to create a numeric hash value for the credit card. Now, you can simply query the data for a single credit card, and it automatically comes back in sorted order, like this:

alter session set nls_date_format = "FMDay, Month ddth, YYYY FMHH:MI:SSAM";
select * from credit_orders where card_no = '4111111111111111';

CARD_NO TRANSDATE AMOUNT
---------------- ---------------------------------------- ------------
4111111111111111 Sunday, January 30th, 2005 07:12:16AM 16.59
4111111111111111 Sunday, January 30th, 2005 08:10:01AM 25.16
4111111111111111 Monday, January 31st, 2005 12:01:23AM 57.99
4111111111111111 Monday, January 31st, 2005 11:11:11AM .39

Scott Stephens worked for Oracle for more than 13 years in technical support, e-commerce, marketing, and software development.

Comments: Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?