Sunday, September 25, 2011

EXISTS

Most of us know the SQL keyword EXISTS as the condition with a subquery you use in a WHERE clause. But if you look at the documentation of EXISTS, you'll see that it says nothing about just using it in a WHERE clause. It's just a sort of function that accepts a query as input and returns "TRUE if a subquery returns at least one row". The SQL language doesn't know about booleans, but it calls these "conditions". And according to another part of the documentation about SQL conditions:

You can use a condition in the WHERE clause of these statements:

DELETE
SELECT
UPDATE

You can use a condition in any of these clauses of the SELECT statement:

WHERE
START WITH
CONNECT BY
HAVING

So you might be tempted to believe it is not possible to use EXISTS in the SELECT clause. However, in a CASE expression, more specifically the searched case expression, it expects a "condition" after the WHEN, as you can see in the syntax wire diagram. Meaning we can put an EXISTS condition after the WHEN.

An example of this came up recently when a colleague asked me for help with his conversion project with a query. His setup resembles this test case:

SQL> create table t1 (id)
2 as
3 select level
4 from dual
5 connect by level <= 4
6 /

Table created.

SQL> alter table t1 add constraint t1_pk primary key (id)
2 /

Table altered.

SQL> exec dbms_stats.gather_table_stats(user,'t1',cascade=>true)

PL/SQL procedure successfully completed.

SQL> create table t2 (id, t1_id, status, filler)
2 as
3 select 1, 1, 'w', lpad('*',1000,'*') from dual union all
4 select 2, 2, 'x', lpad('*',1000,'*') from dual union all
5 select 3, 2, 'y', lpad('*',1000,'*') from dual union all
6 select 4, 3, 'w', lpad('*',1000,'*') from dual union all
7 select 5, 3, 'x', lpad('*',1000,'*') from dual union all
8 select 6, 3, 'z', lpad('*',1000,'*') from dual union all
9 select 7, 4, 'w', lpad('*',1000,'*') from dual union all
10 select 8, 4, 'z', lpad('*',1000,'*') from dual
11 /

Table created.

SQL> alter table t2 add constraint t2_pk primary key (id)
2 /

Table altered.

SQL> alter table t2 add constraint t2_t1_fk foreign key (t1_id) references t1(id)
2 /

Table altered.

SQL> create index t2_i on t2 (status,t1_id)
2 /

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'t2',cascade=>true)

PL/SQL procedure successfully completed.

Two tables with a master detail relationship. The question for me was to come up with a query that selects all rows of t1 and if there is at least one row in t2 with status x or y, then print a 'yes' in an extra column, and 'no' otherwise.

The first query that I came up with, was to use a simple outer join and remove the duplicates with a distinct.

SQL> select /*+ gather_plan_statistics */
2 distinct t1.id
3 , nvl2(t2.id,'yes','no') exists_t2
4 from t1
5 left outer join t2
6 on ( t1.id = t2.t1_id
7 and t2.status in ('x','y')
8 )
9 order by t1.id
10 /

ID EXISTS_T2
----- ----------
1 no
2 yes
3 yes
4 no

4 rows selected.

And it is done in 4 consistent gets using this plan:

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 2mtksgp0pzx59, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ distinct t1.id ,
nvl2(t2.id,'yes','no') exists_t2 from t1 left outer join t2
on ( t1.id = t2.t1_id and t2.status in ('x','y')
) order by t1.id

Plan hash value: 1021492722

------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 |00:00:00.01 | 4 | | | |
| 1 | SORT UNIQUE | | 1 | 4 | 4 |00:00:00.01 | 4 | 2048 | 2048 | 2048 (0)|
| 2 | MERGE JOIN OUTER | | 1 | 4 | 5 |00:00:00.01 | 4 | | | |
| 3 | INDEX FULL SCAN | T1_PK | 1 | 4 | 4 |00:00:00.01 | 1 | | | |
|* 4 | SORT JOIN | | 4 | 4 | 3 |00:00:00.01 | 3 | 2048 | 2048 | 2048 (0)|
| 5 | INLIST ITERATOR | | 1 | | 3 |00:00:00.01 | 3 | | | |
| 6 | TABLE ACCESS BY INDEX ROWID| T2 | 2 | 4 | 3 |00:00:00.01 | 3 | | | |
|* 7 | INDEX RANGE SCAN | T2_I | 2 | 4 | 3 |00:00:00.01 | 2 | | | |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T1"."ID"="T2"."T1_ID")
filter("T1"."ID"="T2"."T1_ID")
7 - access(("T2"."STATUS"='x' OR "T2"."STATUS"='y'))


29 rows selected.

Later, I offered an alternative, after realizing I can also use EXISTS in the SELECT clause, like this:

SQL> select /*+ gather_plan_statistics */
2 t1.id
3 , case
4 when exists
5 ( select ''
6 from t2
7 where t2.t1_id = t1.id
8 and t2.status in ('x','y')
9 )
10 then 'yes'
11 else 'no'
12 end exists_t2
13 from t1
14 order by t1.id
15 /

ID EXISTS_T2
----- ----------
1 no
2 yes
3 yes
4 no

4 rows selected.

So, that's the proof that you can use an EXISTS in a SELECT clause. But this query uses 8 consistent gets (6+2):

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID b13u3f5m9ybuu, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ t1.id , case
when exists ( select '' from t2 where
t2.t1_id = t1.id and t2.status in ('x','y') )
then 'yes' else 'no' end exists_t2 from t1 order
by t1.id

Plan hash value: 1623664937

-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 |00:00:00.01 | 2 |
| 1 | INLIST ITERATOR | | 4 | | 2 |00:00:00.01 | 6 |
|* 2 | INDEX RANGE SCAN| T2_I | 6 | 1 | 2 |00:00:00.01 | 6 |
| 3 | INDEX FULL SCAN | T1_PK | 1 | 4 | 4 |00:00:00.01 | 2 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access((("T2"."STATUS"='x' OR "T2"."STATUS"='y')) AND
"T2"."T1_ID"=:B1)


25 rows selected.

So not really interesting with this data set, although the plan is shorter in number of operations. With other data distributions it can be useful though. For example, when blowing up the t2 table:

SQL> insert into t2
2 select n * 1000 + t2.id
3 , t2.t1_id
4 , t2.status
5 , t2.filler
6 from t2
7 cross join (select level n from dual connect by level <= 1000)
8 /

8000 rows created.

SQL> exec dbms_stats.gather_table_stats(user,'t2',cascade=>true,method_opt=>'for all columns size auto')

PL/SQL procedure successfully completed.

Now, the left outer join with the distinct has to join 4 rows with 3003 joins, and the resulting SQL uses 3017 consistent gets:

SQL> select /*+ gather_plan_statistics */
2 distinct t1.id
3 , nvl2(t2.id,'yes','no') exists_t2
4 from t1
5 left outer join t2
6 on ( t1.id = t2.t1_id
7 and t2.status in ('x','y')
8 )
9 order by t1.id
10 /

ID EXISTS_T2
----- ----------
1 no
2 yes
3 yes
4 no

4 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 2mtksgp0pzx59, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ distinct t1.id ,
nvl2(t2.id,'yes','no') exists_t2 from t1 left outer join t2
on ( t1.id = t2.t1_id and t2.status in ('x','y')
) order by t1.id

Plan hash value: 1021492722

------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 |00:00:00.01 | 3017 | | | |
| 1 | SORT UNIQUE | | 1 | 4 | 4 |00:00:00.01 | 3017 | 2048 | 2048 | 2048 (0)|
| 2 | MERGE JOIN OUTER | | 1 | 4 | 3005 |00:00:00.01 | 3017 | | | |
| 3 | INDEX FULL SCAN | T1_PK | 1 | 4 | 4 |00:00:00.01 | 1 | | | |
|* 4 | SORT JOIN | | 4 | 4 | 3003 |00:00:00.01 | 3016 | 115K| 115K| 102K (0)|
| 5 | INLIST ITERATOR | | 1 | | 3003 |00:00:00.01 | 3016 | | | |
| 6 | TABLE ACCESS BY INDEX ROWID| T2 | 2 | 4 | 3003 |00:00:00.01 | 3016 | | | |
|* 7 | INDEX RANGE SCAN | T2_I | 2 | 4 | 3003 |00:00:00.01 | 13 | | | |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T1"."ID"="T2"."T1_ID")
filter("T1"."ID"="T2"."T1_ID")
7 - access(("T2"."STATUS"='x' OR "T2"."STATUS"='y'))


29 rows selected.

And the second query is now a whole lot more efficient with only 15 consistent gets:

SQL> select /*+ gather_plan_statistics */
2 t1.id
3 , case
4 when exists
5 ( select ''
6 from t2
7 where t2.t1_id = t1.id
8 and t2.status in ('x','y')
9 )
10 then 'yes'
11 else 'no'
12 end exists_t2
13 from t1
14 order by t1.id
15 /

ID EXISTS_T2
----- ----------
1 no
2 yes
3 yes
4 no

4 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID b13u3f5m9ybuu, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ t1.id , case
when exists ( select '' from t2 where
t2.t1_id = t1.id and t2.status in ('x','y') )
then 'yes' else 'no' end exists_t2 from t1 order
by t1.id

Plan hash value: 1623664937

-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 |00:00:00.01 | 2 |
| 1 | INLIST ITERATOR | | 4 | | 2 |00:00:00.01 | 13 |
|* 2 | INDEX RANGE SCAN| T2_I | 6 | 1 | 2 |00:00:00.01 | 13 |
| 3 | INDEX FULL SCAN | T1_PK | 1 | 4 | 4 |00:00:00.01 | 2 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access((("T2"."STATUS"='x' OR "T2"."STATUS"='y')) AND
"T2"."T1_ID"=:B1)


25 rows selected.

The query selects the four rows of T1 by the INDEX FULL SCAN in line 3. The six starts of line 2 can be explained by the INLIST ITERATOR that splittes the T2 access into one for status = 'x' and one for status = 'y'. It searches for the 'x' for all four rows of T1. For t1.id 2 and 3, it finds an 'x' value, and it doesn't need to look for an 'y' value. Only for t1.id 1 and 4, it needs to search for an 'y' value.

Note that not only the data distribution is a factor, but the index on (t1_id,status) is needed as well to make a difference here.

I guess I just want to say that there is an alternative that can be useful in certain situations, that you might not have thought of initially.




PS: Something completely different: Kamran Agayev has posted an interview with me.

Tuesday, July 12, 2011

Partitioned outer join bug

I have experienced a strange bug when constructing an example showing how a partitioned outer join works. I used a setup which resembles a situation I have consulted about recently for timeseries data. The table can hold multiple timeseries whose data is coming from multiple sources. The data itself contains volumes that are reported for every 5 minutes. This table looks like this:

SQL> create table tradevolumes
2 ( timeseries_id number(4)
3 , tradedate date
4 , source_id number(4)
5 , volume number(4)
6 , constraint tradevolumes_pk primary key (timeseries_id,tradedate,source_id)
7 ) organization index
8 /

Table created.


An index organized table, which we fill with some data: 2 timeseries from 3 sources, and 108 five-minute-periods:

SQL> insert into tradevolumes
2 select 1 + mod(level-1,2)
3 , trunc(sysdate) + numtodsinterval(5*mod(trunc((level-1)/2),108),'minute')
4 , ceil(level/(2 * 108))
5 , ceil(dbms_random.value(0,1000))
6 from dual
7 connect by level <= 2 * 108 * 3
8 /

648 rows created.


A common problem for timeseries data is missing data. I'll simulate that by deleting several rows:

SQL> delete tradevolumes
2 where ( tradedate
3 between trunc(sysdate) + interval '35' minute
4 and trunc(sysdate) + interval '50' minute
5 or (timeseries_id = 1 and source_id = 3 and volume < 30)
6 or (timeseries_id = 2 and source_id = 2 and tradedate > trunc(sysdate) + interval '20' minute)
7 )
8 /

127 rows deleted.


And now gather some basic statistics for the table:

SQL> exec dbms_stats.gather_table_stats(user,'tradevolumes')

PL/SQL procedure successfully completed.


To get an idea of the data that is now in the table:

SQL> select *
2 from tradevolumes
3 order by timeseries_id
4 , tradedate
5 , source_id
6 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 400
1 12-07-2011 00:00:00 2 265
1 12-07-2011 00:00:00 3 307
1 12-07-2011 00:05:00 1 223
1 12-07-2011 00:05:00 2 975
1 12-07-2011 00:05:00 3 173
1 12-07-2011 00:10:00 1 794
1 12-07-2011 00:10:00 2 17
1 12-07-2011 00:10:00 3 468
1 12-07-2011 00:15:00 1 552
1 12-07-2011 00:15:00 2 979
1 12-07-2011 00:15:00 3 84
1 12-07-2011 00:20:00 1 73
1 12-07-2011 00:20:00 2 710
1 12-07-2011 00:20:00 3 54
1 12-07-2011 00:25:00 1 271
1 12-07-2011 00:25:00 2 643
1 12-07-2011 00:25:00 3 443
1 12-07-2011 00:30:00 1 408
1 12-07-2011 00:30:00 2 526
1 12-07-2011 00:30:00 3 725
1 12-07-2011 00:55:00 1 13
1 12-07-2011 00:55:00 2 943
1 12-07-2011 00:55:00 3 319
... skipped a lot of rows here ...
1 12-07-2011 08:50:00 1 239
1 12-07-2011 08:50:00 2 502
1 12-07-2011 08:50:00 3 297
1 12-07-2011 08:55:00 1 620
1 12-07-2011 08:55:00 2 531
1 12-07-2011 08:55:00 3 377
2 12-07-2011 00:00:00 1 778
2 12-07-2011 00:00:00 2 302
2 12-07-2011 00:00:00 3 959
2 12-07-2011 00:05:00 1 920
2 12-07-2011 00:05:00 2 827
2 12-07-2011 00:05:00 3 395
... skipped a lot of rows here ...
2 12-07-2011 08:45:00 1 495
2 12-07-2011 08:45:00 3 519
2 12-07-2011 08:50:00 1 556
2 12-07-2011 08:50:00 3 193
2 12-07-2011 08:55:00 1 368
2 12-07-2011 08:55:00 3 18

521 rows selected.


To show how a partitioned outer join works, my plan is to add all missing key combinations to the result set for the first hour. So the query only uses the times between 0:00 to 0:55 and we'll ignore the rows between 1:00 and 8:55. This is the data for the first hour:

SQL> select *
2 from tradevolumes
3 where tradedate between trunc(sysdate) and trunc(sysdate) + interval '55' minute
4 order by timeseries_id
5 , tradedate
6 , source_id
7 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 400
1 12-07-2011 00:00:00 2 265
1 12-07-2011 00:00:00 3 307
1 12-07-2011 00:05:00 1 223
1 12-07-2011 00:05:00 2 975
1 12-07-2011 00:05:00 3 173
1 12-07-2011 00:10:00 1 794
1 12-07-2011 00:10:00 2 17
1 12-07-2011 00:10:00 3 468
1 12-07-2011 00:15:00 1 552
1 12-07-2011 00:15:00 2 979
1 12-07-2011 00:15:00 3 84
1 12-07-2011 00:20:00 1 73
1 12-07-2011 00:20:00 2 710
1 12-07-2011 00:20:00 3 54
1 12-07-2011 00:25:00 1 271
1 12-07-2011 00:25:00 2 643
1 12-07-2011 00:25:00 3 443
1 12-07-2011 00:30:00 1 408
1 12-07-2011 00:30:00 2 526
1 12-07-2011 00:30:00 3 725
1 12-07-2011 00:55:00 1 13
1 12-07-2011 00:55:00 2 943
1 12-07-2011 00:55:00 3 319
2 12-07-2011 00:00:00 1 778
2 12-07-2011 00:00:00 2 302
2 12-07-2011 00:00:00 3 959
2 12-07-2011 00:05:00 1 920
2 12-07-2011 00:05:00 2 827
2 12-07-2011 00:05:00 3 395
2 12-07-2011 00:10:00 1 534
2 12-07-2011 00:10:00 2 156
2 12-07-2011 00:10:00 3 2
2 12-07-2011 00:15:00 1 642
2 12-07-2011 00:15:00 2 278
2 12-07-2011 00:15:00 3 76
2 12-07-2011 00:20:00 1 472
2 12-07-2011 00:20:00 2 210
2 12-07-2011 00:20:00 3 549
2 12-07-2011 00:25:00 1 349
2 12-07-2011 00:25:00 3 332
2 12-07-2011 00:30:00 1 371
2 12-07-2011 00:30:00 3 396
2 12-07-2011 00:55:00 1 408
2 12-07-2011 00:55:00 3 818

45 rows selected.


The 12 five-minute-periods should be present for the 2 timeseries and 3 sources, so we need 72 rows. Now watch the bug with a standard partitioned outer join:

SQL> with all_tradedates as
2 ( select trunc(sysdate) + numtodsinterval(5*(level-1),'minute') tradedate
3 from dual
4 connect by level <= 12
5 )
6 select v.timeseries_id
7 , a.tradedate
8 , v.source_id
9 , v.volume
10 from all_tradedates a
11 left outer join tradevolumes v
12 partition by (v.timeseries_id,v.source_id)
13 on (a.tradedate = v.tradedate)
14 order by v.timeseries_id
15 , a.tradedate
16 , v.source_id
17 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 400
1 12-07-2011 00:00:00 2 265
1 12-07-2011 00:00:00 3 307
1 12-07-2011 00:05:00 1 400
1 12-07-2011 00:05:00 2 265
1 12-07-2011 00:05:00 3 307
1 12-07-2011 00:10:00 1 400
1 12-07-2011 00:10:00 2 265
1 12-07-2011 00:10:00 3 307
1 12-07-2011 00:15:00 1 400
1 12-07-2011 00:15:00 2 265
1 12-07-2011 00:15:00 3 307
1 12-07-2011 00:20:00 1 400
1 12-07-2011 00:20:00 2 265
1 12-07-2011 00:20:00 3 307
1 12-07-2011 00:25:00 1 400
1 12-07-2011 00:25:00 2 265
1 12-07-2011 00:25:00 3 307
1 12-07-2011 00:30:00 1 400
1 12-07-2011 00:30:00 2 265
1 12-07-2011 00:30:00 3 307
1 12-07-2011 00:35:00 1 400
1 12-07-2011 00:35:00 2 265
1 12-07-2011 00:35:00 3 307
1 12-07-2011 00:40:00 1 400
1 12-07-2011 00:40:00 2 265
1 12-07-2011 00:40:00 3 307
1 12-07-2011 00:45:00 1 400
1 12-07-2011 00:45:00 2 265
1 12-07-2011 00:45:00 3 307
1 12-07-2011 00:50:00 1 400
1 12-07-2011 00:50:00 2 265
1 12-07-2011 00:50:00 3 307
1 12-07-2011 00:55:00 1 400
1 12-07-2011 00:55:00 2 265
1 12-07-2011 00:55:00 3 307
2 12-07-2011 00:00:00 1 778
2 12-07-2011 00:00:00 2 302
2 12-07-2011 00:00:00 3 959
2 12-07-2011 00:05:00 1 778
2 12-07-2011 00:05:00 2 302
2 12-07-2011 00:05:00 3 959
2 12-07-2011 00:10:00 1 778
2 12-07-2011 00:10:00 2 302
2 12-07-2011 00:10:00 3 959
2 12-07-2011 00:15:00 1 778
2 12-07-2011 00:15:00 2 302
2 12-07-2011 00:15:00 3 959
2 12-07-2011 00:20:00 1 778
2 12-07-2011 00:20:00 2 302
2 12-07-2011 00:20:00 3 959
2 12-07-2011 00:25:00 1 778
2 12-07-2011 00:25:00 2 302
2 12-07-2011 00:25:00 3 959
2 12-07-2011 00:30:00 1 778
2 12-07-2011 00:30:00 2 302
2 12-07-2011 00:30:00 3 959
2 12-07-2011 00:35:00 1 778
2 12-07-2011 00:35:00 2 302
2 12-07-2011 00:35:00 3 959
2 12-07-2011 00:40:00 1 778
2 12-07-2011 00:40:00 2 302
2 12-07-2011 00:40:00 3 959
2 12-07-2011 00:45:00 1 778
2 12-07-2011 00:45:00 2 302
2 12-07-2011 00:45:00 3 959
2 12-07-2011 00:50:00 1 778
2 12-07-2011 00:50:00 2 302
2 12-07-2011 00:50:00 3 959
2 12-07-2011 00:55:00 1 778
2 12-07-2011 00:55:00 2 302
2 12-07-2011 00:55:00 3 959

72 rows selected.


The query returns 72 rows as expected, but look at the volume column. When you make up new rows, those new rows should have a null value. And there's more: the rows that existed and weren't made up, now have a volume equaling the volume of "00:00:00" and not their own value...

You might ask: why did you create 108 rows? That's because I've found that when you use 107 rows or less, the bug doesn't appear. With 108 rows or more it creeps up. Here is the same sequence with a table containing 107 rows:

SQL> drop table tradevolumes purge
2 /

Table dropped.

SQL> create table tradevolumes
2 ( timeseries_id number(4)
3 , tradedate date
4 , source_id number(4)
5 , volume number(4)
6 , constraint tradevolumes_pk primary key (timeseries_id,tradedate,source_id)
7 ) organization index
8 /

Table created.

SQL> insert into tradevolumes
2 select 1 + mod(level-1,2)
3 , trunc(sysdate) + numtodsinterval(5*mod(trunc((level-1)/2),107),'minute')
4 , ceil(level/(2 * 107))
5 , ceil(dbms_random.value(0,1000))
6 from dual
7 connect by level <= 2 * 107 * 3
8 /

642 rows created.

SQL> delete tradevolumes
2 where ( tradedate
3 between trunc(sysdate) + interval '35' minute
4 and trunc(sysdate) + interval '50' minute
5 or (timeseries_id = 1 and source_id = 3 and volume < 30)
6 or (timeseries_id = 2 and source_id = 2 and tradedate > trunc(sysdate) + interval '20' minute)
7 )
8 /

128 rows deleted.

SQL> exec dbms_stats.gather_table_stats(user,'tradevolumes')

PL/SQL procedure successfully completed.

SQL> select *
2 from tradevolumes
3 order by timeseries_id
4 , tradedate
5 , source_id
6 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 159
1 12-07-2011 00:00:00 2 333
1 12-07-2011 00:00:00 3 696
1 12-07-2011 00:05:00 1 674
1 12-07-2011 00:05:00 2 932
... skipped a lot of rows here ...
2 12-07-2011 08:45:00 1 136
2 12-07-2011 08:45:00 3 618
2 12-07-2011 08:50:00 1 705
2 12-07-2011 08:50:00 3 106

514 rows selected.

SQL> with all_tradedates as
2 ( select trunc(sysdate) + numtodsinterval(5*(level-1),'minute') tradedate
3 from dual
4 connect by level <= 12
5 )
6 select v.timeseries_id
7 , a.tradedate
8 , v.source_id
9 , v.volume
10 from all_tradedates a
11 left outer join tradevolumes v
12 partition by (v.timeseries_id,v.source_id)
13 on (a.tradedate = v.tradedate)
14 order by v.timeseries_id
15 , a.tradedate
16 , v.source_id
17 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 159
1 12-07-2011 00:00:00 2 333
1 12-07-2011 00:00:00 3 696
1 12-07-2011 00:05:00 1 674
1 12-07-2011 00:05:00 2 932
1 12-07-2011 00:05:00 3
1 12-07-2011 00:10:00 1 315
1 12-07-2011 00:10:00 2 872
1 12-07-2011 00:10:00 3 778
1 12-07-2011 00:15:00 1 398
1 12-07-2011 00:15:00 2 272
1 12-07-2011 00:15:00 3 239
1 12-07-2011 00:20:00 1 546
1 12-07-2011 00:20:00 2 863
1 12-07-2011 00:20:00 3 839
1 12-07-2011 00:25:00 1 558
1 12-07-2011 00:25:00 2 576
1 12-07-2011 00:25:00 3 296
1 12-07-2011 00:30:00 1 767
1 12-07-2011 00:30:00 2 957
1 12-07-2011 00:30:00 3 935
1 12-07-2011 00:35:00 1
1 12-07-2011 00:35:00 2
1 12-07-2011 00:35:00 3
1 12-07-2011 00:40:00 1
1 12-07-2011 00:40:00 2
1 12-07-2011 00:40:00 3
1 12-07-2011 00:45:00 1
1 12-07-2011 00:45:00 2
1 12-07-2011 00:45:00 3
1 12-07-2011 00:50:00 1
1 12-07-2011 00:50:00 2
1 12-07-2011 00:50:00 3
1 12-07-2011 00:55:00 1 398
1 12-07-2011 00:55:00 2 110
1 12-07-2011 00:55:00 3 629
2 12-07-2011 00:00:00 1 615
2 12-07-2011 00:00:00 2 924
2 12-07-2011 00:00:00 3 526
2 12-07-2011 00:05:00 1 147
2 12-07-2011 00:05:00 2 589
2 12-07-2011 00:05:00 3 779
2 12-07-2011 00:10:00 1 554
2 12-07-2011 00:10:00 2 108
2 12-07-2011 00:10:00 3 9
2 12-07-2011 00:15:00 1 601
2 12-07-2011 00:15:00 2 111
2 12-07-2011 00:15:00 3 526
2 12-07-2011 00:20:00 1 418
2 12-07-2011 00:20:00 2 818
2 12-07-2011 00:20:00 3 578
2 12-07-2011 00:25:00 1 824
2 12-07-2011 00:25:00 2
2 12-07-2011 00:25:00 3 112
2 12-07-2011 00:30:00 1 649
2 12-07-2011 00:30:00 2
2 12-07-2011 00:30:00 3 689
2 12-07-2011 00:35:00 1
2 12-07-2011 00:35:00 2
2 12-07-2011 00:35:00 3
2 12-07-2011 00:40:00 1
2 12-07-2011 00:40:00 2
2 12-07-2011 00:40:00 3
2 12-07-2011 00:45:00 1
2 12-07-2011 00:45:00 2
2 12-07-2011 00:45:00 3
2 12-07-2011 00:50:00 1
2 12-07-2011 00:50:00 2
2 12-07-2011 00:50:00 3
2 12-07-2011 00:55:00 1 651
2 12-07-2011 00:55:00 2
2 12-07-2011 00:55:00 3 641

72 rows selected.


And here the results are good. And when I add the rows for 08:55:00 afterwards:

SQL> insert into tradevolumes
2 select timeseries_id
3 , tradedate + interval '5' minute
4 , source_id
5 , 1000
6 from tradevolumes
7 where tradedate = (select max(tradedate) from tradevolumes)
8 /

4 rows created.

SQL> commit
2 /

Commit complete.

SQL> exec dbms_stats.gather_table_stats(user,'tradevolumes')

PL/SQL procedure successfully completed.

SQL> with all_tradedates as
2 ( select trunc(sysdate) + numtodsinterval(5*(level-1),'minute') tradedate
3 from dual
4 connect by level <= 12
5 )
6 select v.timeseries_id
7 , a.tradedate
8 , v.source_id
9 , v.volume
10 from all_tradedates a
11 left outer join tradevolumes v
12 partition by (v.timeseries_id,v.source_id)
13 on (a.tradedate = v.tradedate)
14 order by v.timeseries_id
15 , a.tradedate
16 , v.source_id
17 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 159
1 12-07-2011 00:00:00 2 333
1 12-07-2011 00:00:00 3 696
1 12-07-2011 00:05:00 1 674
1 12-07-2011 00:05:00 2 932
1 12-07-2011 00:05:00 3
1 12-07-2011 00:10:00 1 315
1 12-07-2011 00:10:00 2 872
1 12-07-2011 00:10:00 3 778
1 12-07-2011 00:15:00 1 398
1 12-07-2011 00:15:00 2 272
1 12-07-2011 00:15:00 3 239
1 12-07-2011 00:20:00 1 546
1 12-07-2011 00:20:00 2 863
1 12-07-2011 00:20:00 3 839
1 12-07-2011 00:25:00 1 558
1 12-07-2011 00:25:00 2 576
1 12-07-2011 00:25:00 3 296
1 12-07-2011 00:30:00 1 767
1 12-07-2011 00:30:00 2 957
1 12-07-2011 00:30:00 3 935
1 12-07-2011 00:35:00 1
1 12-07-2011 00:35:00 2
1 12-07-2011 00:35:00 3
1 12-07-2011 00:40:00 1
1 12-07-2011 00:40:00 2
1 12-07-2011 00:40:00 3
1 12-07-2011 00:45:00 1
1 12-07-2011 00:45:00 2
1 12-07-2011 00:45:00 3
1 12-07-2011 00:50:00 1
1 12-07-2011 00:50:00 2
1 12-07-2011 00:50:00 3
1 12-07-2011 00:55:00 1 398
1 12-07-2011 00:55:00 2 110
1 12-07-2011 00:55:00 3 629
2 12-07-2011 00:00:00 1 615
2 12-07-2011 00:00:00 2 924
2 12-07-2011 00:00:00 3 526
2 12-07-2011 00:05:00 1 147
2 12-07-2011 00:05:00 2 589
2 12-07-2011 00:05:00 3 779
2 12-07-2011 00:10:00 1 554
2 12-07-2011 00:10:00 2 108
2 12-07-2011 00:10:00 3 9
2 12-07-2011 00:15:00 1 601
2 12-07-2011 00:15:00 2 111
2 12-07-2011 00:15:00 3 526
2 12-07-2011 00:20:00 1 418
2 12-07-2011 00:20:00 2 818
2 12-07-2011 00:20:00 3 578
2 12-07-2011 00:25:00 1 824
2 12-07-2011 00:25:00 2
2 12-07-2011 00:25:00 3 112
2 12-07-2011 00:30:00 1 649
2 12-07-2011 00:30:00 2
2 12-07-2011 00:30:00 3 689
2 12-07-2011 00:35:00 1
2 12-07-2011 00:35:00 2
2 12-07-2011 00:35:00 3
2 12-07-2011 00:40:00 1
2 12-07-2011 00:40:00 2
2 12-07-2011 00:40:00 3
2 12-07-2011 00:45:00 1
2 12-07-2011 00:45:00 2
2 12-07-2011 00:45:00 3
2 12-07-2011 00:50:00 1
2 12-07-2011 00:50:00 2
2 12-07-2011 00:50:00 3
2 12-07-2011 00:55:00 1 651
2 12-07-2011 00:55:00 2
2 12-07-2011 00:55:00 3 641

72 rows selected.


And now it works!

And using 108 rows with a regular heap table also isn't a problem:

SQL> drop table tradevolumes purge
2 /

Table dropped.

SQL> create table tradevolumes
2 ( timeseries_id number(4)
3 , tradedate date
4 , source_id number(4)
5 , volume number(4)
6 , constraint tradevolumes_pk primary key (timeseries_id,tradedate,source_id)
7 )
8 /

Table created.

SQL> insert into tradevolumes
2 select 1 + mod(level-1,2)
3 , trunc(sysdate) + numtodsinterval(5*mod(trunc((level-1)/2),108),'minute')
4 , ceil(level/(2 * 108))
5 , ceil(dbms_random.value(0,1000))
6 from dual
7 connect by level <= 2 * 108 * 3
8 /

648 rows created.

SQL> delete tradevolumes
2 where ( tradedate
3 between trunc(sysdate) + interval '35' minute
4 and trunc(sysdate) + interval '50' minute
5 or (timeseries_id = 1 and source_id = 3 and volume < 30)
6 or (timeseries_id = 2 and source_id = 2 and tradedate > trunc(sysdate) + interval '20' minute)
7 )
8 /

126 rows deleted.

SQL> exec dbms_stats.gather_table_stats(user,'tradevolumes')

PL/SQL procedure successfully completed.

SQL> select *
2 from tradevolumes
3 order by timeseries_id
4 , tradedate
5 , source_id
6 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 200
1 12-07-2011 00:00:00 2 157
1 12-07-2011 00:00:00 3 950
1 12-07-2011 00:05:00 1 707
1 12-07-2011 00:05:00 2 932
1 12-07-2011 00:05:00 3 258
... skipped a lot of rows here ...
2 12-07-2011 08:50:00 1 61
2 12-07-2011 08:50:00 3 974
2 12-07-2011 08:55:00 1 692
2 12-07-2011 08:55:00 3 714

522 rows selected.

SQL> with all_tradedates as
2 ( select trunc(sysdate) + numtodsinterval(5*(level-1),'minute') tradedate
3 from dual
4 connect by level <= 12
5 )
6 select v.timeseries_id
7 , a.tradedate
8 , v.source_id
9 , v.volume
10 from all_tradedates a
11 left outer join tradevolumes v
12 partition by (v.timeseries_id,v.source_id)
13 on (a.tradedate = v.tradedate)
14 order by v.timeseries_id
15 , a.tradedate
16 , v.source_id
17 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 200
1 12-07-2011 00:00:00 2 157
1 12-07-2011 00:00:00 3 950
1 12-07-2011 00:05:00 1 707
1 12-07-2011 00:05:00 2 932
1 12-07-2011 00:05:00 3 258
1 12-07-2011 00:10:00 1 586
1 12-07-2011 00:10:00 2 193
1 12-07-2011 00:10:00 3 753
1 12-07-2011 00:15:00 1 287
1 12-07-2011 00:15:00 2 420
1 12-07-2011 00:15:00 3 113
1 12-07-2011 00:20:00 1 971
1 12-07-2011 00:20:00 2 93
1 12-07-2011 00:20:00 3 967
1 12-07-2011 00:25:00 1 685
1 12-07-2011 00:25:00 2 63
1 12-07-2011 00:25:00 3 832
1 12-07-2011 00:30:00 1 471
1 12-07-2011 00:30:00 2 764
1 12-07-2011 00:30:00 3 277
1 12-07-2011 00:35:00 1
1 12-07-2011 00:35:00 2
1 12-07-2011 00:35:00 3
1 12-07-2011 00:40:00 1
1 12-07-2011 00:40:00 2
1 12-07-2011 00:40:00 3
1 12-07-2011 00:45:00 1
1 12-07-2011 00:45:00 2
1 12-07-2011 00:45:00 3
1 12-07-2011 00:50:00 1
1 12-07-2011 00:50:00 2
1 12-07-2011 00:50:00 3
1 12-07-2011 00:55:00 1 332
1 12-07-2011 00:55:00 2 909
1 12-07-2011 00:55:00 3 873
2 12-07-2011 00:00:00 1 865
2 12-07-2011 00:00:00 2 982
2 12-07-2011 00:00:00 3 290
2 12-07-2011 00:05:00 1 936
2 12-07-2011 00:05:00 2 651
2 12-07-2011 00:05:00 3 141
2 12-07-2011 00:10:00 1 843
2 12-07-2011 00:10:00 2 818
2 12-07-2011 00:10:00 3 218
2 12-07-2011 00:15:00 1 85
2 12-07-2011 00:15:00 2 530
2 12-07-2011 00:15:00 3 650
2 12-07-2011 00:20:00 1 475
2 12-07-2011 00:20:00 2 988
2 12-07-2011 00:20:00 3 771
2 12-07-2011 00:25:00 1 422
2 12-07-2011 00:25:00 2
2 12-07-2011 00:25:00 3 48
2 12-07-2011 00:30:00 1 88
2 12-07-2011 00:30:00 2
2 12-07-2011 00:30:00 3 991
2 12-07-2011 00:35:00 1
2 12-07-2011 00:35:00 2
2 12-07-2011 00:35:00 3
2 12-07-2011 00:40:00 1
2 12-07-2011 00:40:00 2
2 12-07-2011 00:40:00 3
2 12-07-2011 00:45:00 1
2 12-07-2011 00:45:00 2
2 12-07-2011 00:45:00 3
2 12-07-2011 00:50:00 1
2 12-07-2011 00:50:00 2
2 12-07-2011 00:50:00 3
2 12-07-2011 00:55:00 1 793
2 12-07-2011 00:55:00 2
2 12-07-2011 00:55:00 3 601

72 rows selected.


The above was all executed in version 11.2.0.2 and I've also tried it on 10.2.0.4 and I witnessed exactly the same phenomenon there. The only difference is that the boundary value is not 107/108 but 80/81.

I did some more basic investigations regarding space occupation and statistics, but I haven't found a clue so far. I'm wondering if I'm missing something obvious. So, if you have some more information about affected versions or the cause, I'd like to know.

Monday, July 11, 2011

Tutorials zur SQL Model Clause

Marcus Matzberger has translated five of my posts related to the SQL Model Clause for the German speaking Oracle community. Those posts now contain links to the German versions and I've listed them here for convenience as well.

SQL Model Clause Tutorial Part One

SQL Model Clause Tutorial Part Two

SQL Model Clause Tutorial Part Three

Calculating probabilities with N throws of a die

Choosing Between SQL and PL/SQL

Thanks Marcus!

Saturday, June 11, 2011

Back from Tallinn, Estonia

This morning I arrived back from a trip to Tallinn. Oracle Estonia had given me the opportunity to present my SQL Masterclass seminar at their training center in Tallinn, on Thursday and Friday. Thank you all those who spent two days hearing me. Here is a short story about my trip including some photos.

I arrived at Tallinn airport around 1PM local time on Wednesday. My hotel room was located at the 18th floor and I had this splendid view on the old city of Tallinn:



I spent the rest of the afternoon taking a city tour by bus, learning a few things about the history of Tallinn and strolling for an hour or two through the old city. Here are a few of the photos that I took during that stroll:












The seminar itself was on Thursday and Friday. I had spent a huge amount of time preparing the seminar. One of my goals was to present as many recognizable situations as possible that the attendees can apply at their daily job, so I was very pleased to receive some feedback that they indeed found several immediately applicable queries. There is still room for improvement though as a few parts were too short (I had planned one-and-a-half hour per part) and I had an error or two in my scripts that I had overlooked. Hopefully I will get a chance to fix those and present my seminar again somewhere. The feeling at the end was very positive though. The powerpoint and the scripts can be found at the "Presentations and papers" page.

This is a picture of the building where Oracle Estonia and its training center are located:



PS: As expected, during the preparation of the seminar I learnt a couple of new things. A bug with partitioned outer joins and something about connect-by-filtering. I will blog about them soon.

Thursday, May 26, 2011

Instrumentation packages

At CIBER we are doing a series of knowledge sessions where several colleagues are doing a one-hour presentation about one or two package of their choice. We have already heard several very interesting presentations about DBMS_HPROF, DBMS_PARALLEL_EXECUTE, UTL_DBWS, UTL_TCP and DBMS_FGA. And this evening a colleague will do a presentation about DBMS_PREPROCESSOR and DBMS_WARNING, and I will do one about instrumentation related packages LOGGER and DBMS_APPLICATION_INFO. I have added my presentation and the scripts to my Presentations and Papers page, and for convenience here as well:

Keynote presentation instrumentation packages

Powerpoint presentation instrumentation packages

Demonstration scripts instrumentation packages

Warning: they are in Dutch.

EDIT: I just received a message that the sessions will be rescheduled to a later date because of a serious number of cancellations. Because the preparation is finished, I have decided to publish them anyway.

Wednesday, March 23, 2011

OGh APEX day 2011

Yesterday I attended OGh's APEX day. Previous year was already pretty successful with 150 people attending. This year the maximum of 150 people was reached in only a few days, so we scaled up the event to a maximum of 250 people, and in the end we still had to disappoint several people. And this year we had several sponsors as well: Oracle, Ciber, Transfer Solutions and Itium. These must be signs that APEX is becoming increasingly popular here in the Netherlands.

The day started with three plenary sessions. The first one was by Tony Andrews and Nigel Blair who told us about their huge APEX project for Northgate Revenues and Benefits. They sell software to local counsils and their application contained approximately 1500 Forms. They had moved from character mode, to Oracle Forms to Oracle Webforms with their application. Their users complained about Webforms of which the user interface deviated too much from their normal internet experience. The sales win rate dropped from 1 out of 2 to 1 out of 4 or less. With their new APEX application they focussed on the user interface again and are now back at winning 50 to 60% of the deals. It was also good to see how Tony solved several technical problems of which some are now part of APEX 4. This session was a great show case that APEX is certainly very much capable for large, enterprise level applications.

The next session was Hilary Farrell's session about APEX 4.1. Hilary is part of the APEX development team. It always amazes me how much great work this team gets done with such a relatively small team. The last years I think I've seen almost every team member present somewhere: all highly competent and passionate people. And Hilary is no exception. Lots of small improvements are on their way in APEX 4.1, of which I think I'm going to like the improved error handling the most. You can read more about the new version in this statement of direction.

Next up were Dimitri Gielis and John Scott with "APEX Evangelists way of working in APEX Projects". They showed how they handled several aspects of a new APEX project. From Ant scripts to generate a fresh projectstructure in Subversion. Continuous integration, their "Glitchi" project management APEX-tool and Balsamiq, with which they design the user interfaces together with the client.

In the afternoon I saw Jasper Alblas' session demonstrating Eneco's new debtor application. I have worked for more than six years at Eneco, so I was eager to see this session. Very informative session and a nice looking application. Well done, Jasper!

The second afternoon session I attended was John Scott's "Building APEX 4 Plugins". I always love John's presentations. He always presents with humour and makes seemingly complex subjects appear simple so you'll end up thinking "hey, I can do that myself without too much ado". I especially liked his Google Translate demos.

The last session I attended was Hans Wijnacker's Golfscores.eu session. Again a well delivered session about an APEX-application that keeps track of golfscores at tournaments. If you are organizing a golf tournament, you can hire them to take care of the IT. I saw Hans faced some similar challenges that I've faced with my own APEX application for Tourpoules (currently offline by the way, waiting for 11g XE to be released ...).

I had a great time and I want to thank OGh and especially Natalie Rohlof, who organized the event, for the fantastic day. I hope Learco and me will get the chance to setup a program for a third edition in 2012.

Thursday, March 17, 2011

Open cursor paranoia

Most PL/SQL developers will likely have witnessed this phenomenon several times during their career. But only in other people's code, of course :-). I'm talking about PL/SQL code where every program unit ends like this:

exception
when others then
if c%isopen
then
close c;
end if;
raise;
end;

where lines 3 to 6 are repeated for every cursor in the block above.

Proponents of open cursor paranoia justify it as defensive programming. Even when you open-fetch-close a cursor, the fetch could raise an exception and the close statement would not execute, leaving the cursor open, they say. A next execution would raise the dreaded "ORA-06511: PL/SQL: cursor already open".

Let's examine this claim. First by defining about which type of cursor we're talking about, since the term "cursor" is an overloaded one. Here I am talking about explicit session cursors in PL/SQL. The ones you declare and manage yourself. Now suppose you have this package:

SQL> create package pkg
2 as
3 procedure p;
4 end pkg;
5 /

Package created.

SQL> create package body pkg
2 as
3 procedure p
4 is
5 cursor c is select dummy from dual;
6 begin
7 open c;
8 end p;
9 end pkg;
10 /

Package body created.


What will the next piece of code do? Will it raise an ORA-06511: cursor already open?

SQL> begin
2 pkg.p;
3 pkg.p;
4 end;
5 /

No it doesn't:

PL/SQL procedure successfully completed.

An explicit cursor variable has a scope, just like every other variable. Its scope here is the procedure. When the procedure ends, the cursor variable is gone and you can safely do an "open c" again. Note that PL/SQL has a PL/SQL cursor cache which keeps the cursor silently open in the background, so that a reopen of this cursor will not cause a reparse, not even a soft parse:

SQL> select cursor_type
2 from v$open_cursor
3 where sql_text = 'SELECT DUMMY FROM DUAL'
4 /

CURSOR_TYPE
----------------------------------------------------------------
PL/SQL CURSOR CACHED

1 row selected.

By the way, in 11g, Oracle has added 3 extra columns to the V$OPEN_CURSOR view. The new column CURSOR_TYPE is very informative as you can see above.

The cursor type is PL/SQL CURSOR CACHED, because the SESSION_CACHED_CURSORS parameter is on its default of 50 and the cursor was closed in the PL/SQL code. When you switch this parameter off by setting it to 0, the entry disappears from V$OPEN_CURSOR, and the cursor is closed in the background as well:

SQL> alter session set session_cached_cursors = 0
2 /

Session altered.

SQL> begin
2 pkg.p;
3 end;
4 /

PL/SQL procedure successfully completed.

SQL> select cursor_type
2 from v$open_cursor
3 where sql_text = 'SELECT DUMMY FROM DUAL'
4 /

no rows selected

SQL> alter session set session_cached_cursors = 50
2 /

Session altered.

Is there really no way that closing open cursors in the exception handler is useful? Maybe it once made sense in an earlier version? If you know this was the case once, please leave a comment. And it could be useful when the scope of the cursor is bigger than just the procedure, for example when it's the declared globally in the package. In this case the package state holds the state of the cursor variable, so with this package:

SQL> create or replace package body pkg
2 as
3 cursor c is select dummy from dual
4 ;
5 procedure p
6 is
7 begin
8 open c;
9 end p;
10 end pkg;
11 /

Package body created.

, you do get the ORA-06511:

SQL> begin
2 pkg.p;
3 pkg.p;
4 end;
5 /
begin
*
ERROR at line 1:
ORA-06511: PL/SQL: cursor already open
ORA-06512: at "RWIJK.PKG", line 3
ORA-06512: at "RWIJK.PKG", line 8
ORA-06512: at line 3

And the cursor_type attribute of v$open_cursor now says:

SQL> select cursor_type
2 from v$open_cursor
3 where sql_text = 'SELECT DUMMY FROM DUAL '
4 /

CURSOR_TYPE
----------------------------------------------------------------
OPEN-PL/SQL

1 row selected.

But this raises the question why you would ever want to have a globally defined cursor. Recently I've seen more than one application with so called cursor packages. Packages containing only cursors in their specification, as helper packages for the other non cursor packages in the application. A truly bad idea, not only for not hiding the implementation details from the caller, but also because once the package is called, the package has a state and the cursor variables don't get closed automatically, making the application vulnerable for ORA-06511's. Although, I can imagine a cursor package in a situation where a mid-tier application wants to handle record sets without having to deal with the SQL. I don't have experience with this scenario myself though.

Odds are that when you see "if c%isopen then close c; end if;" constructs being used abundantly throughout the code, most of them can happily be removed.

Tuesday, March 15, 2011

Runtime error ORA-01031: insufficient privileges

After a new version of software was installed in production, the end users reported a runtime error: ORA-01031: insufficient privileges, when selecting from a view. The developers of the code were investigating the problem and half way through, they asked me to have a look at the problem. I saw a function from schema3, which was used in a view in schema2, which was used by schema1. I had just finished learning their situation, when they had tried granting an EXECUTE-privilege on the function with grant option, which solved the error. The privilege with grant option was quickly granted in production and everything went back to normal and all interest in the problem vanished. But I was puzzled: how could that missing grant option lead to the error at runtime. It should not have been possible to grant the SELECT-privilege to schema1 at compile/install time, or at least I thought so. So I did a little investigation.

SQL> create user rwijk3 identified by rwijk3
2 /

User created.

SQL> grant create session, create procedure to rwijk3
2 /

Grant succeeded.

SQL> create user rwijk2 identified by rwijk2
2 /

User created.

SQL> grant create session, create view to rwijk2
2 /

Grant succeeded.

SQL> create user rwijk1 identified by rwijk1
2 /

User created.

SQL> grant create session to rwijk1
2 /

Grant succeeded.

SQL> conn rwijk3/rwijk3@ora11202
Connected.
Sessie: (no access to V$ tables)

SQL> create function f return number
2 is
3 begin
4 return 42;
5 end f;
6 /

Function created.

SQL> grant execute on f to rwijk2
2 /

Grant succeeded.

SQL> conn rwijk2/rwijk2@ora11202
Connected.
Sessie: (no access to V$ tables)

SQL> create view v
2 as
3 select rwijk3.f() x
4 from dual
5 /

View created.

SQL> grant select on v to rwijk1
2 /
grant select on v to rwijk1
*
ERROR at line 1:
ORA-01720: grant option does not exist for 'RWIJK3.F'

This is the scenario I expected. Granting the select privilege leads to an error message at compile/install time. But our scenario was a little different than this clean install: the view already existed without the function call and the privilege without the grant option was already granted. Like this:

SQL> show user
USER is "RWIJK2"
SQL> drop view v
2 /

View dropped.

SQL> create view v
2 as
3 select 42 x
4 from dual
5 /

View created.

SQL> grant select on v to rwijk1
2 /

Grant succeeded.

And user rwijk1 could happily query the view of course:

SQL> conn rwijk1/rwijk1@ora11202
Connected.
Sessie: (no access to V$ tables)

SQL> select x
2 from rwijk2.v
3 /

X
----------
42

1 row selected.

The new version introduced the function call:

SQL> create or replace view v
2 as
3 select rwijk3.f() x
4 from dual
5 /

View created.

Which now succeeds, even though select privileges on this view were already granted to schema1. Oracle didn't raise an ORA-01720, like it did in the fresh install scenario. So Oracle does not perform a check whether all already granted privileges remain effective. It also doesn't remove the privileges as user RWIJK1 can describe the view because of the SELECT privilege without the grant option:

SQL> conn rwijk1/rwijk1@ora11202
Connected.
Sessie: (no access to V$ tables)

SQL> desc rwijk2.v
Name Null? Type
--------------------------------------- -------- ---------------------------
X NUMBER

It just issues a runtime error message when you issue a select against the view:

SQL> select x
2 from rwijk2.v
3 /
from rwijk2.v
*
ERROR at line 2:
ORA-01031: insufficient privileges

An ORA-01031 during the parse phase, which you can also see in the trace file:

PARSE ERROR #47952083822528:len=25 dep=0 uid=114 oct=3 lid=114 tim=1299371043385365 err=1031
select x
from rwijk2.v


When doing a "CREATE OR REPLACE VIEW", and the view is used inside a program unit, like a package, procedure or function, these dependent program units are marked invalid so they need to be compiled again against the new view definition.

When doing a "CREATE OR REPLACE VIEW", and the view has instead of triggers defined against the view, the triggers are removed when the view definition is replaced. And you need to install the instead of triggers again.

And when doing a "CREATE OR REPLACE VIEW", and privileges on this view were granted before replacing the view definition, these privileges are not re-evaluated when the view definition is replaced.

Sunday, February 27, 2011

Questions about the result cache

Last year I have presented a couple of times about the result cache. You can download this presentation from my new Presentations and papers tabpage, by the way. After each of those sessions I received several good questions. To some questions I didn't know the answer, and for some other questions I did, but I hadn't tested it so I wasn't absolutely sure. I promised to address those questions with a separate blogpost, so this post is long overdue, but I finally took some time to investigate them all and here are the results on version 11.2.0.2.

Question 1: If my result cached function is valid, and I recompile the valid function, will my cached results still become invalid?

SQL> exec dbms_result_cache.flush

PL/SQL procedure successfully completed.

SQL> create function f (p_factor in number) return number result_cache
2 is
3 begin
4 return 10 * p_factor;
5 end;
6 /

Function created.

SQL> select f(2) from dual
2 /

F(2)
----------
20

1 row selected.

SQL> select id
2 , type
3 , status
4 , name
5 , namespace
6 from v$result_cache_objects
7 /

ID TYPE STATUS NAME NAMES
---------- ---------- --------- ---------------------------------------- -----
0 Dependency Published RWIJK.F
1 Result Published "RWIJK"."F"::8."F"#fac892c7867b54c6 #1 PLSQL

2 rows selected.

SQL> alter function f compile
2 /

Function altered.

SQL> select id
2 , type
3 , status
4 , name
5 , namespace
6 from v$result_cache_objects
7 /

ID TYPE STATUS NAME NAMES
---------- ---------- --------- ---------------------------------------- -----
0 Dependency Published RWIJK.F
1 Result Invalid "RWIJK"."F"::8."F"#fac892c7867b54c6 #1 PLSQL

2 rows selected.

So the answer is yes. The results always become invalid when compiling a function.


Question 2: What happens when you execute a SQL query -of which the results are cached- that also has a result cached function in its select list?

SQL> exec dbms_result_cache.flush

PL/SQL procedure successfully completed.

SQL> select /*+ result_cache */
2 deptno
3 , f(deptno)
4 from dept
5 /

DEPTNO F(DEPTNO)
---------- ----------
10 100
20 200
30 300
40 400

4 rows selected.

SQL> select id
2 , type
3 , status
4 , name
5 , namespace
6 from v$result_cache_objects
7 /

ID TYPE STATUS NAME NAMES
---------- ---------- --------- ---------------------------------------- -----
1 Dependency Published RWIJK.F
0 Dependency Published RWIJK.DEPT
6 Result Published "RWIJK"."F"::8."F"#fac892c7867b54c6 #1 PLSQL
5 Result Published "RWIJK"."F"::8."F"#fac892c7867b54c6 #1 PLSQL
4 Result Published "RWIJK"."F"::8."F"#fac892c7867b54c6 #1 PLSQL
3 Result Published "RWIJK"."F"::8."F"#fac892c7867b54c6 #1 PLSQL
2 Result Published select /*+ result_cache */ SQL
deptno
, f(deptno)
from dept


7 rows selected.

So, in that case both the SQL query as well as the PL/SQL function will cache its results.


Question 3: If my table is partitioned and I query on only one partition and cache the results, will a DML statement that affects another partition invalidate my result set?

SQL> create table my_partitioned_emp
2 ( empno number(4) primary key
3 , deptno number(2) not null
4 , sal number(4) not null
5 )
6 partition by range(deptno)
7 ( partition p1 values less than (15)
8 , partition p2 values less than (25)
9 , partition p3 values less than (35)
10 , partition p4 values less than (maxvalue)
11 )
12 /

Table created.

SQL> insert into my_partitioned_emp
2 select empno
3 , deptno
4 , sal
5 from emp
6 /

14 rows created.

SQL> begin
2 dbms_stats.gather_table_stats
3 ( ownname => user
4 , tabname => 'my_partitioned_emp'
5 , partname => null
6 , cascade => true
7 );
8 end;
9 /

PL/SQL procedure successfully completed.

SQL> exec dbms_result_cache.flush

PL/SQL procedure successfully completed.

SQL> select /*+ result_cache */
2 sum(sal)
3 from my_partitioned_emp
4 where deptno = 10
5 /

SUM(SAL)
----------
8750

1 row selected.

SQL> insert into my_partitioned_emp
2 values (7777, 40, 1000)
3 /

1 row created.

SQL> commit
2 /

Commit complete.

SQL> select id
2 , type
3 , status
4 , name
5 , namespace
6 from v$result_cache_objects
7 /

ID TYPE STATUS NAME NAMES
---------- ---------- --------- ---------------------------------------- -----
0 Dependency Published RWIJK.MY_PARTITIONED_EMP
1 Result Invalid select /*+ result_cache */ SQL
sum(sal)
from my_partitioned_emp
where deptno = 10


2 rows selected.


So yes, the results will be invalidated, even if the DML is on another partition. The granularity is at table level. I'm not sure whether finer grained dependencies for the result cache would be welcome, because it could imply a significant overhead to manage.


Question 4: Does the result cache work with analytic functions?

SQL> exec dbms_result_cache.flush

PL/SQL procedure successfully completed.

SQL> select /*+ result_cache */
2 empno
3 , sal
4 , sum(sal) over (partition by deptno)
5 from emp
6 /

EMPNO SAL SUM(SAL)OVER(PARTITIONBYDEPTNO)
---------- ---------- -------------------------------
7782 2450 8750
7839 5000 8750
7934 1300 8750
7566 2975 10875
7902 3000 10875
7876 1100 10875
7369 800 10875
7788 3000 10875
7521 1250 9400
7844 1500 9400
7499 1600 9400
7900 950 9400
7698 2850 9400
7654 1250 9400

14 rows selected.

SQL> /

EMPNO SAL SUM(SAL)OVER(PARTITIONBYDEPTNO)
---------- ---------- -------------------------------
7782 2450 8750
7839 5000 8750
7934 1300 8750
7566 2975 10875
7902 3000 10875
7876 1100 10875
7369 800 10875
7788 3000 10875
7521 1250 9400
7844 1500 9400
7499 1600 9400
7900 950 9400
7698 2850 9400
7654 1250 9400

14 rows selected.

SQL> select *
2 from v$result_cache_statistics
3 /

ID NAME VALUE
---------- ---------------------------------------- --------------------
1 Block Size (Bytes) 1024
2 Block Count Maximum 5120
3 Block Count Current 32
4 Result Size Maximum (Blocks) 256
5 Create Count Success 1
6 Create Count Failure 0
7 Find Count 1
8 Invalidation Count 0
9 Delete Count Invalid 0
10 Delete Count Valid 0
11 Hash Chain Length 1
12 Find Copy Count 1

12 rows selected.

One create and one find, so it does work. I could not think of a reason why it wouldn't work, but it doesn't hurt to check.


Question 5: Does fine grained auditing work correctly in combination with the result cache?

The idea behind this question is that with the result cache, your query will skip a lot of work. Will a query that gets its results via the result cache, also skip the auditing part?

SQL> exec dbms_result_cache.flush

PL/SQL procedure successfully completed.

SQL> begin
2 dbms_fga.add_policy
3 ( object_schema => user
4 , object_name => 'emp'
5 , policy_name => 'sal_policy'
6 , audit_condition => 'sal <= 1000'
7 , audit_column => 'sal'
8 );
9 end;
10 /

PL/SQL procedure successfully completed.

SQL> select sal
2 from emp
3 where sal = 800
4 /

SAL
----------
800

1 row selected.

SQL> /

SAL
----------
800

1 row selected.

SQL> select /*+ result_cache */
2 sal
3 from emp
4 where sal <= 800
5 /

SAL
----------
800

1 row selected.

SQL> /

SAL
----------
800

1 row selected.

SQL> select policy_name
2 , scn
3 , sql_text
4 from dba_fga_audit_trail
5 order by scn
6 /

POLICY_NAME SCN
------------------------------ ----------
SQL_TEXT
--------------------------------------------------------------------------------
SAL_POLICY 1382762
select sal
from emp
where sal = 800

SAL_POLICY 1382764
select sal
from emp
where sal = 800

SAL_POLICY 1382766
select /*+ result_cache */
sal
from emp
where sal <= 800

SAL_POLICY 1382768
select /*+ result_cache */
sal
from emp
where sal <= 800


4 rows selected.

Both queries with the result_cache hint are there. So even though the second query got its results from the result cache, the policy recorded the query. And so the answer to this question is "Yes". This is not surprising given the execution plan:

SQL> alter session set statistics_level = all
2 /

Session altered.

SQL> set serveroutput off
SQL> select /*+ result_cache */
2 sal
3 from emp
4 where sal <= 800
5 /

SAL
----------
800

1 row selected.

SQL> /

SAL
----------
800

1 row selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID fdfbdwgq5vqnn, child number 0
-------------------------------------
select /*+ result_cache */ sal from emp where sal <= 800

Plan hash value: 3956160932

---------------------------------------------------------------------------
| Id | Operation | Name | Starts | A-Rows |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 1 |
| 1 | RESULT CACHE | fxpb18zb9vn8w87wgw43w8cbyr | 1 | 1 |
|* 2 | TABLE ACCESS FULL| EMP | 0 | 0 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("SAL"<=800)

Result Cache Information (identified by operation id):
------------------------------------------------------

1 -


24 rows selected.

Note that I edited the E-rows and A-time columns out of the formatted plan for nicer display.

You can see that the result cache at line 1 prevents the full scan of EMP from executing, but not line 0 -the SELECT-, where the auditing probably takes place.

Friday, February 25, 2011

SQL Masterclass in Tallinn, Estonia

On June 9 and 10, I'll be doing a SQL Masterclass seminar for Oracle University in Tallinn, Estonia.

You can find the details here.

Tuesday, February 15, 2011

A tip, a conference, an extension and a challenge

This post contains four unrelated notes.

First a small SQL*Plus tip. I really like to know with which user I am connected to which database, so in my login.sql script I used to have this section:

define gname=idle
column global_name new_value gname
select lower(user) || '@' ||
substr(global_name,1,decode(dot,0,length(global_name),dot-1)) global_name
from (select global_name,instr(global_name,'.') dot from global_name);
set sqlprompt '&gname> '

I saw this piece of code in Tom Kyte's book Expert Oracle Database Architecture and I have used it ever since. The downside from using this adjusted prompt, is that the first line isn't properly aligned anymore with lines 2 and further, as you can see by this example:

rwijk@ORA11202> select ename
2 from emp
3 where deptno = 10
4 /

ENAME
----------
CLARK
KING
MILLER

3 rows selected.

I read a tip on Oracle-L, which I now use. Instead of "set sqlprompt '&gname> '", I now use "host title &gname". This adjusts the title bar of my SQL*Plus window with the user@database information, and leaves the prompt at "SQL> ", thus leaving my SQL statements aligned while I'm still able to see my connect string all the time.

***

On March 22, the Dutch Oracle usergroup OGh organizes their second APEX day. Again we have a great lineup of speakers. You can read about it here. Due to the great response, we had to scale up the event. Now there is room for 250 people instead of our originally planned 150. And be sure to bring your manager with you. He or she doesn't want to miss the APEX at work track where customers will tell how they have successfully implemented parts of their business processes with APEX. I'm especially looking forward to the Northgate story as this is a huge APEX project.

***

Today there was good news for future Oracle searches on Google. On their blog they announced a Personal Blocklist Extension. Unfortunately it currently doesn't work for the Dutch version of Chrome. The following quote from the Google blog is very promising though:

If installed, the extension also sends blocked site information to Google, and we will study the resulting feedback and explore using it as a potential ranking signal for our search results.


***

The Northern California Oracle Users Group (NoCOUG) has issued their Second International SQL Challenge. You can read about it here in their journal. SQL commands for creating the required data are available here. Initially, it may look like insufficient explanation is given to solve the problem, but that's part of the fun. If you like SQL and puzzles, then you should definitely give this one a go.

***