About Oracle: December 2008

Monday, December 29, 2008

Article in OGh Visie

For all Dutch readers interested in the SQL model clause (I wonder how selective those two predicates are ...): the last part of the SQL Model Clause Tutorial is published in the winter edition 2008 of OGh Visie. The subject is "Practicalities of the SQL Model Clause" and it tries to answer the question for which kind of problems a model clause query is best applied.

For all non Dutch readers interested in the SQL model clause: a translation of this article will appear on this blog very soon.

Wednesday, December 17, 2008

Plan a schedule with the SQL model clause

My DBA colleague Ronald Rood asked an interesting question on OTN's SQL and PL/SQL forum. I'll repeat the question here. First the setup of the table:

rwijk@ORA11GR1> create table sschedule (
  2  item varchar2(10)
  3  , days varchar2(30)
  4  , fixed_start_time varchar2(5)
  5  , minutes number
  6  )
  7  /

Tabel is aangemaakt.

rwijk@ORA11GR1> insert into sschedule (item,days, fixed_start_time, minutes)
  2  select 'lunch', 'mon,tue,wed,thu,fri','12:00',60 from dual union all
  3  select 'a', 'tue,fri','10:00',60 from dual union all
  4  select 'b', 'mon,wed',null, 120 from dual union all
  5  select 'opening','mon,tue,wed,thu,fri','08:55', 5 from dual union all
  6  select 'close','mon,tue,wed,thu','20:00', 5 from dual union all
  7  select 'close','fri','16:00', 5 from dual union all
  8  select 'c', 'mon,wed,fri',null, 20 from dual union all
  9  select 'd', 'mon,wed,fri',null, 20 from dual union all
 10  select 'diner','mon,tue,wed,thu','18:00', 60 from dual union all
 11  select 'e','tue,thu,fri',null, 40 from dual union all
 12  select 'keynote', 'mon','09:00', 120 from dual union all
 13  select 'bye', 'fri','14:00', 120 from dual union all
 14  select 'f','tue,thu,fri',null, 40 from dual union all
 15  select 'g', 'mon,wed',null, 120 from dual
 16  /

14 rijen zijn aangemaakt.

rwijk@ORA11GR1> select * from sschedule
  2  /

ITEM       DAYS                           FIXED    MINUTES
---------- ------------------------------ ----- ----------
lunch      mon,tue,wed,thu,fri            12:00         60
a          tue,fri                        10:00         60
b          mon,wed                                     120
opening    mon,tue,wed,thu,fri            08:55          5
close      mon,tue,wed,thu                20:00          5
close      fri                            16:00          5
c          mon,wed,fri                                  20
d          mon,wed,fri                                  20
diner      mon,tue,wed,thu                18:00         60
e          tue,thu,fri                                  40
keynote    mon                            09:00        120
bye        fri                            14:00        120
f          tue,thu,fri                                  40
g          mon,wed                                     120

14 rijen zijn geselecteerd.

And the accompanying question:

"I have a table that contains a list of items that are to be discussed in a meeting, all taking place in the same room. For some meetings a guest speaker is invited, those meetings have a fixed start time. All meetings have a fixed duration in minutes. The days all start with opening and end with close. How can we generate the start times of the items that have no fixed start time set?

for friday this gives:
ITEM FIXED MINUTES
------- ----- -------
opening 08:55 5
a 10:00 60
lunch 12:00 60
bye 14:00 120
close 16:00 5
c 20
d 20
e 40
f 40

e can start at 09:00 to fill the gap between opening and a.
c can start at 09:40 to fill the gap between e and a.
f can start at 11:00 to fill the gap between a and lunch.
d can start at 11:40 to fill the gap between f and lunch.

more combinations are very legal, as long as the fixed_start_times are honoured.
How can this list be generated in 1 sql?
When a day becomes overbooked the items that fall must get something like 'does not fit' to signal it's too busy that day."

In real life I would never implement a SQL only solution for such a problem, because the resulting SQL will be quite hard to maintain, even when properly documented. But it sure is fun to do it using only SQL.

If you clicked the link in the beginning of the post, you might have seen a solution already. In this post I will try to explain how I solved this problem which may look very difficult at first. I think it demonstrates the power of the SQL model clause very well.

The first thing to do here, is to normalize the table, by splitting the comma separated string into separate rows, one row for each day. Without such a normalized set, a solution would be much harder. To split the string into several rows, I use the technique that resembles the one described in this post, particularly the SQL model clause variant. This query does this part of the job:

rwijk@ORA11GR1> select item
  2       , cast(days as varchar2(3)) day
  3       , to_date(fixed_start_time,'hh24:mi') start_time
  4       , minutes
  5       , to_date(fixed_start_time,'hh24:mi')
  6         + numtodsinterval(minutes,'minute') end_time
  7    from sschedule
  8   model
  9         return updated rows
 10         partition by (item,fixed_start_time,minutes)
 11         dimension by (0 i)
 12         measures (days)
 13         ( days [for i from 1 to regexp_count(days[0],',') + 1 increment 1]
 14           = regexp_substr(days[0],'[^,]+',1,cv(i))
 15         )
 16   order by decode(day,'mon',1,'tue',2,'wed',3,'thu',4,'fri',5)
 17       , start_time
 18  /

ITEM       DAY START_TIME             MINUTES END_TIME
---------- --- ------------------- ---------- -------------------
opening    mon 01-12-2008 08:55:00          5 01-12-2008 09:00:00
keynote    mon 01-12-2008 09:00:00        120 01-12-2008 11:00:00
lunch      mon 01-12-2008 12:00:00         60 01-12-2008 13:00:00
diner      mon 01-12-2008 18:00:00         60 01-12-2008 19:00:00
close      mon 01-12-2008 20:00:00          5 01-12-2008 20:05:00
b          mon                            120
d          mon                             20
c          mon                             20
g          mon                            120
opening    tue 01-12-2008 08:55:00          5 01-12-2008 09:00:00
a          tue 01-12-2008 10:00:00         60 01-12-2008 11:00:00
lunch      tue 01-12-2008 12:00:00         60 01-12-2008 13:00:00
diner      tue 01-12-2008 18:00:00         60 01-12-2008 19:00:00
close      tue 01-12-2008 20:00:00          5 01-12-2008 20:05:00
f          tue                             40
e          tue                             40
opening    wed 01-12-2008 08:55:00          5 01-12-2008 09:00:00
lunch      wed 01-12-2008 12:00:00         60 01-12-2008 13:00:00
diner      wed 01-12-2008 18:00:00         60 01-12-2008 19:00:00
close      wed 01-12-2008 20:00:00          5 01-12-2008 20:05:00
b          wed                            120
g          wed                            120
d          wed                             20
c          wed                             20
opening    thu 01-12-2008 08:55:00          5 01-12-2008 09:00:00
lunch      thu 01-12-2008 12:00:00         60 01-12-2008 13:00:00
diner      thu 01-12-2008 18:00:00         60 01-12-2008 19:00:00
close      thu 01-12-2008 20:00:00          5 01-12-2008 20:05:00
e          thu                             40
f          thu                             40
opening    fri 01-12-2008 08:55:00          5 01-12-2008 09:00:00
a          fri 01-12-2008 10:00:00         60 01-12-2008 11:00:00
lunch      fri 01-12-2008 12:00:00         60 01-12-2008 13:00:00
bye        fri 01-12-2008 14:00:00        120 01-12-2008 16:00:00
close      fri 01-12-2008 16:00:00          5 01-12-2008 16:05:00
c          fri                             20
d          fri                             20
e          fri                             40
f          fri                             40

39 rijen zijn geselecteerd.

This query puts every row in its own partition, and indexes this row with dimension i set to 0. These rows are the original ones, which we won't return, because of the use of the keyword RETURN UPDATED ROWS. The model creates new cells with dimension values from 1 and upwards. In each partition the number of commas is calculated with regexp_count(days[0],','). The number of commas plus one is the number of rows that have to be generated. For each new row, regexp_substr(days[0],'[^,]+',1,cv(i)) gives the cv(i)-th element in the string.

The less trickier part of the query is to change some datatypes: days is set to varchar2(3) for a prettier layout only, and the start_time and end_time should be dates for easier calculating. You see that a default date is chosen, being the first day of the current month. This part of the date is irrelevant. Using intervals (numtodsinterval) I can now easily add the number of minutes with the start_time to calculate the end_time.

This was the easy part :-). Next challenge is to process all rows with an empty start_time and allocate a time slot to them. The question does not describe an algorithm how to do this, so I choose an easy one: process all rows with an empty start time and start with the longest one. For each of these rows, assign the row to the largest gap in time, at the beginning of the interval. A few challenges here: determining the gaps, adjusting the gaps after an item has been assigned in a gap and detecting when an item doesn't fit in any gap.

For these challenges I put up another model, partitioned by day. The rows inside each partition are indexed by a number with the row_number analytic function, the rows with an empty start_time first. Like this I have a nice dimension I can iterate over AND I can use the UNTIL clause to only iterate over the ones with a empty start_time. For this I have to introduce the cnt measure, that counts the number of empty start_times in a partition. So in each partition there will be as many iterations as there are cells with empty start_times.

Let's first show the resulting query and then explain what's going on:

rwijk@ORA11GR1> with schedule_normalized as
  2  ( select item
  3         , cast(days as varchar2(3)) day
  4         , to_date(fixed_start_time,'hh24:mi') start_time
  5         , minutes
  6         , to_date(fixed_start_time,'hh24:mi')
  7           + numtodsinterval(minutes,'minute') end_time
  8      from sschedule
  9     model
 10           return updated rows
 11           partition by (item,fixed_start_time,minutes)
 12           dimension by (0 i)
 13           measures (days)
 14           ( days [for i from 1 to regexp_count(days[0],',') + 1 increment 1]
 15             = regexp_substr(days[0],'[^,]+',1,cv(i))
 16           )
 17  )
 18  select item
 19       , day
 20       , to_char(st,'hh24:mi') start_time
 21       , to_char(et,'hh24:mi') end_time
 22       , minutes
 23       , fit
 24    from schedule_normalized
 25   model
 26         partition by (day)
 27         dimension by
 28         ( row_number() over
 29           (partition by day order by start_time nulls first, minutes desc) rn
 30         )
 31         measures
 32         ( item
 33         , start_time st
 34         , minutes
 35         , end_time et
 36         , lead(start_time) over
 37           (partition by day order by start_time) next_start_time
 38         , count(nvl2(start_time,null,1)) over (partition by day) cnt
 39         , 0 rn_with_largest_gap
 40         , row_number() over
 41           (partition by day order by start_time nulls first, minutes desc) rnm
 42         , cast(null as varchar2(11)) fit
 43         )
 44         rules iterate(1000) until (iteration_number + 1 = cnt[1])
 45         ( rn_with_largest_gap[1]
 46           = max(rnm) keep
 47             (dense_rank last order by next_start_time - et nulls first)[any]
 48         , fit[iteration_number+1]
 49           = case
 50               when max(next_start_time - et)[any]
 51                  < minutes[iteration_number+1]/24/60
 52               then 'doesn''t fit'
 53             end
 54         , st[iteration_number+1]
 55           = case
 56               when fit[iteration_number+1] is null
 57               then et[rn_with_largest_gap[1]]
 58             end
 59         , et[iteration_number+1]
 60           = case
 61               when fit[iteration_number+1] is null
 62               then et[rn_with_largest_gap[1]]
 63                    + numtodsinterval(minutes[iteration_number+1],'minute')
 64             end
 65         , next_start_time[iteration_number+1]
 66           = case
 67               when fit[iteration_number+1] is null
 68               then next_start_time[rn_with_largest_gap[1]]
 69             end
 70         , next_start_time[rn_with_largest_gap[1]]
 71           = case
 72               when fit[iteration_number+1] is not null
 73               then next_start_time[rn_with_largest_gap[1]]
 74             end
 75         )
 76   order by decode(day,'mon',1,'tue',2,'wed',3,'thu',4,'fri',5)
 77       , start_time
 78  /

ITEM       DAY START END_T    MINUTES FIT
---------- --- ----- ----- ---------- -----------
opening    mon 08:55 09:00          5
keynote    mon 09:00 11:00        120
d          mon 11:00 11:20         20
lunch      mon 12:00 13:00         60
g          mon 13:00 15:00        120
b          mon 15:00 17:00        120
diner      mon 18:00 19:00         60
c          mon 19:00 19:20         20
close      mon 20:00 20:05          5
opening    tue 08:55 09:00          5
a          tue 10:00 11:00         60
lunch      tue 12:00 13:00         60
f          tue 13:00 13:40         40
e          tue 13:40 14:20         40
diner      tue 18:00 19:00         60
close      tue 20:00 20:05          5
opening    wed 08:55 09:00          5
g          wed 09:00 11:00        120
lunch      wed 12:00 13:00         60
b          wed 13:00 15:00        120
d          wed 15:00 15:20         20
c          wed 15:20 15:40         20
diner      wed 18:00 19:00         60
close      wed 20:00 20:05          5
opening    thu 08:55 09:00          5
lunch      thu 12:00 13:00         60
e          thu 13:00 13:40         40
f          thu 13:40 14:20         40
diner      thu 18:00 19:00         60
close      thu 20:00 20:05          5
opening    fri 08:55 09:00          5
d          fri 09:00 09:20         20
c          fri 09:20 09:40         20
a          fri 10:00 11:00         60
f          fri 11:00 11:40         40
lunch      fri 12:00 13:00         60
e          fri 13:00 13:40         40
bye        fri 14:00 16:00        120
close      fri 16:00 16:05          5

39 rijen zijn geselecteerd.

In the with clause you see the normalized query discussed earlier. The part I'm discussing here, starts at line 18. For each iteration I have to choose which of the rows has the largest open time slot attached at the end. For this the auxiliary measure next_start_time is introduced, calculated with the lead analytic function. The index of the row with the largest open time slot is stored in the auxiliary measure rn_with_largest_gap. Each iteration starts with calculating this value again. All subsequent rules of the models use this rn_with_largest_gap measure. Rules 3 and 4 calculate the start_time and end_time of the rows that started out with an empty start_time. The last two rules adjust the next_start_time measures to the new situation: since the new allocated time slot is adjacent to the existing time slot, the next_start_time of the original one is set to null, and the next_start_time of the new time slot is the original next_start_time.

Last part of the solution is to addition of the fit measure. With this measure and the second rule, I can check whether the largest open time slot is large enough for the item. If it is not large enough, I put in a text "doesn't fit" in this cell. All subsequent rules effectively do nothing when a non null value is encountered in this cell. With the current data, everything fits, but if you click on the link to the original question, you'll see a situation where two rows don't fit.

If you have not given up and have read up through here, you maybe agree with me that this is another nice example of a complex algorithm that can be solved with only SQL. Although in this category, nothing beats this one of course. But remember kids: don't try this at work ;-)

Friday, December 5, 2008

GROUP_ID()

In this AMIS-post, Lucas Jellema was looking for a way to duplicate certain grouping sets for his ADF tree structure. Using ROLLUP this is not completely possible, but with GROUPING SETS it is, as I will show below. It was a small challenge however to distinguish the duplicate sets from each other in the select list and this is where I learned something new.

For example, let's start with this query:

rwijk@ORA11GR1> select deptno
  2       , job
  3       , empno
  4       , ename
  5       , sum(sal) sumsal
  6       , case grouping_id(deptno,job,empno)
  7           when 0 then 'grouped by deptno,job,empno,ename'
  8           when 1 then 'grouped by deptno,job'
  9           when 3 then 'grouped by deptno'
 10           when 7 then 'grouped by ()'
 11         end gr_text
 12    from emp
 13   group by rollup(deptno,job,(empno,ename))
 14   order by deptno
 15       , job
 16       , empno
 17  /

DEPTNO JOB        EMPNO ENAME      SUMSAL GR_TEXT
------ --------- ------ ---------- ------ ---------------------------------
    10 CLERK       7934 MILLER       1300 grouped by deptno,job,empno,ename
    10 CLERK                         1300 grouped by deptno,job
    10 MANAGER     7782 CLARK        2450 grouped by deptno,job,empno,ename
    10 MANAGER                       2450 grouped by deptno,job
    10 PRESIDENT   7839 KING         5000 grouped by deptno,job,empno,ename
    10 PRESIDENT                     5000 grouped by deptno,job
    10                               8750 grouped by deptno
    20 ANALYST     7788 SCOTT        3000 grouped by deptno,job,empno,ename
    20 ANALYST     7902 FORD         3000 grouped by deptno,job,empno,ename
    20 ANALYST                       6000 grouped by deptno,job
    20 CLERK       7369 SMITH         800 grouped by deptno,job,empno,ename
    20 CLERK       7876 ADAMS        1100 grouped by deptno,job,empno,ename
    20 CLERK                         1900 grouped by deptno,job
    20 MANAGER     7566 JONES        2975 grouped by deptno,job,empno,ename
    20 MANAGER                       2975 grouped by deptno,job
    20                              10875 grouped by deptno
    30 CLERK       7900 JAMES         950 grouped by deptno,job,empno,ename
    30 CLERK                          950 grouped by deptno,job
    30 MANAGER     7698 BLAKE        2850 grouped by deptno,job,empno,ename
    30 MANAGER                       2850 grouped by deptno,job
    30 SALESMAN    7499 ALLEN        1600 grouped by deptno,job,empno,ename
    30 SALESMAN    7521 WARD         1250 grouped by deptno,job,empno,ename
    30 SALESMAN    7654 MARTIN       1250 grouped by deptno,job,empno,ename
    30 SALESMAN    7844 TURNER       1500 grouped by deptno,job,empno,ename
    30 SALESMAN                      5600 grouped by deptno,job
    30                               9400 grouped by deptno
                                    29025 grouped by ()

27 rijen zijn geselecteerd.

A rollup with N arguments always leads to N+1 grouping sets. So here the rollup has 3 arguments and you can see 4 distinct values in the last gr_text column.

Now let's say we want the grouping sets () and deptno duplicated, leading to 4 extra rows. With rollup you can specify deptno twice:

rwijk@ORA11GR1> select deptno
  2       , job
  3       , empno
  4       , ename
  5       , sum(sal) sumsal
  6       , case grouping_id(deptno,job,empno)
  7           when 0 then 'grouped by deptno,job,empno,ename'
  8           when 1 then 'grouped by deptno,job'
  9           when 3 then 'grouped by deptno'
 10           when 7 then 'grouped by ()'
 11         end gr_text
 12    from emp
 13   group by rollup(deptno,deptno,job,(empno,ename))
 14   order by deptno
 15       , job
 16       , empno
 17  /

DEPTNO JOB        EMPNO ENAME      SUMSAL GR_TEXT
------ --------- ------ ---------- ------ ---------------------------------
    10 CLERK       7934 MILLER       1300 grouped by deptno,job,empno,ename
    10 CLERK                         1300 grouped by deptno,job
    10 MANAGER     7782 CLARK        2450 grouped by deptno,job,empno,ename
    10 MANAGER                       2450 grouped by deptno,job
    10 PRESIDENT   7839 KING         5000 grouped by deptno,job,empno,ename
    10 PRESIDENT                     5000 grouped by deptno,job
    10                               8750 grouped by deptno
    10                               8750 grouped by deptno
    20 ANALYST     7788 SCOTT        3000 grouped by deptno,job,empno,ename
    20 ANALYST     7902 FORD         3000 grouped by deptno,job,empno,ename
    20 ANALYST                       6000 grouped by deptno,job
    20 CLERK       7369 SMITH         800 grouped by deptno,job,empno,ename
    20 CLERK       7876 ADAMS        1100 grouped by deptno,job,empno,ename
    20 CLERK                         1900 grouped by deptno,job
    20 MANAGER     7566 JONES        2975 grouped by deptno,job,empno,ename
    20 MANAGER                       2975 grouped by deptno,job
    20                              10875 grouped by deptno
    20                              10875 grouped by deptno
    30 CLERK       7900 JAMES         950 grouped by deptno,job,empno,ename
    30 CLERK                          950 grouped by deptno,job
    30 MANAGER     7698 BLAKE        2850 grouped by deptno,job,empno,ename
    30 MANAGER                       2850 grouped by deptno,job
    30 SALESMAN    7499 ALLEN        1600 grouped by deptno,job,empno,ename
    30 SALESMAN    7521 WARD         1250 grouped by deptno,job,empno,ename
    30 SALESMAN    7654 MARTIN       1250 grouped by deptno,job,empno,ename
    30 SALESMAN    7844 TURNER       1500 grouped by deptno,job,empno,ename
    30 SALESMAN                      5600 grouped by deptno,job
    30                               9400 grouped by deptno
    30                               9400 grouped by deptno
                                    29025 grouped by ()

30 rijen zijn geselecteerd.

But there is no way to specify the empty grouping set, as it is implicit with the rollup operator. And so it's also impossible to duplicate this empty grouping set with rollup. It is possible though with the grouping sets notation. First, let's rewrite the original rollup expression to the more tedious but clearer grouping sets notation, like this:

rwijk@ORA11GR1> select deptno
  2       , job
  3       , empno
  4       , ename
  5       , sum(sal) sumsal
  6       , case grouping_id(deptno,job,empno)
  7           when 0 then 'grouped by deptno,job,empno,ename'
  8           when 1 then 'grouped by deptno,job'
  9           when 3 then 'grouped by deptno'
 10           when 7 then 'grouped by ()'
 11         end gr_text
 12    from emp
 13   group by grouping sets
 14         ( (deptno,job,empno,ename)
 15         , (deptno,job)
 16         , deptno
 17         , ()
 18         )
 19   order by deptno
 20       , job
 21       , empno
 22  /

DEPTNO JOB        EMPNO ENAME      SUMSAL GR_TEXT
------ --------- ------ ---------- ------ ---------------------------------
    10 CLERK       7934 MILLER       1300 grouped by deptno,job,empno,ename
    10 CLERK                         1300 grouped by deptno,job
    10 MANAGER     7782 CLARK        2450 grouped by deptno,job,empno,ename
    10 MANAGER                       2450 grouped by deptno,job
    10 PRESIDENT   7839 KING         5000 grouped by deptno,job,empno,ename
    10 PRESIDENT                     5000 grouped by deptno,job
    10                               8750 grouped by deptno
    20 ANALYST     7788 SCOTT        3000 grouped by deptno,job,empno,ename
    20 ANALYST     7902 FORD         3000 grouped by deptno,job,empno,ename
    20 ANALYST                       6000 grouped by deptno,job
    20 CLERK       7369 SMITH         800 grouped by deptno,job,empno,ename
    20 CLERK       7876 ADAMS        1100 grouped by deptno,job,empno,ename
    20 CLERK                         1900 grouped by deptno,job
    20 MANAGER     7566 JONES        2975 grouped by deptno,job,empno,ename
    20 MANAGER                       2975 grouped by deptno,job
    20                              10875 grouped by deptno
    30 CLERK       7900 JAMES         950 grouped by deptno,job,empno,ename
    30 CLERK                          950 grouped by deptno,job
    30 MANAGER     7698 BLAKE        2850 grouped by deptno,job,empno,ename
    30 MANAGER                       2850 grouped by deptno,job
    30 SALESMAN    7499 ALLEN        1600 grouped by deptno,job,empno,ename
    30 SALESMAN    7521 WARD         1250 grouped by deptno,job,empno,ename
    30 SALESMAN    7654 MARTIN       1250 grouped by deptno,job,empno,ename
    30 SALESMAN    7844 TURNER       1500 grouped by deptno,job,empno,ename
    30 SALESMAN                      5600 grouped by deptno,job
    30                               9400 grouped by deptno
                                    29025 grouped by ()

27 rijen zijn geselecteerd.

then it becomes very easy to duplicate the deptno and the empty grouping set. Just duplicate them in your grouping sets list of arguments:

rwijk@ORA11GR1> select deptno
  2       , job
  3       , empno
  4       , ename
  5       , sum(sal) sumsal
  6       , case grouping_id(deptno,job,empno)
  7           when 0 then 'grouped by deptno,job,empno,ename'
  8           when 1 then 'grouped by deptno,job'
  9           when 3 then 'grouped by deptno'
 10           when 7 then 'grouped by ()'
 11         end gr_text
 12    from emp
 13   group by grouping sets
 14         ( (deptno,job,empno,ename)
 15         , (deptno,job)
 16         , deptno
 17         , deptno
 18         , ()
 19         , ()
 20         )
 21   order by deptno
 22       , job
 23       , empno
 24  /

DEPTNO JOB        EMPNO ENAME      SUMSAL GR_TEXT
------ --------- ------ ---------- ------ ---------------------------------
    10 CLERK       7934 MILLER       1300 grouped by deptno,job,empno,ename
    10 CLERK                         1300 grouped by deptno,job
    10 MANAGER     7782 CLARK        2450 grouped by deptno,job,empno,ename
    10 MANAGER                       2450 grouped by deptno,job
    10 PRESIDENT   7839 KING         5000 grouped by deptno,job,empno,ename
    10 PRESIDENT                     5000 grouped by deptno,job
    10                               8750 grouped by deptno
    10                               8750 grouped by deptno
    20 ANALYST     7788 SCOTT        3000 grouped by deptno,job,empno,ename
    20 ANALYST     7902 FORD         3000 grouped by deptno,job,empno,ename
    20 ANALYST                       6000 grouped by deptno,job
    20 CLERK       7369 SMITH         800 grouped by deptno,job,empno,ename
    20 CLERK       7876 ADAMS        1100 grouped by deptno,job,empno,ename
    20 CLERK                         1900 grouped by deptno,job
    20 MANAGER     7566 JONES        2975 grouped by deptno,job,empno,ename
    20 MANAGER                       2975 grouped by deptno,job
    20                              10875 grouped by deptno
    20                              10875 grouped by deptno
    30 CLERK       7900 JAMES         950 grouped by deptno,job,empno,ename
    30 CLERK                          950 grouped by deptno,job
    30 MANAGER     7698 BLAKE        2850 grouped by deptno,job,empno,ename
    30 MANAGER                       2850 grouped by deptno,job
    30 SALESMAN    7499 ALLEN        1600 grouped by deptno,job,empno,ename
    30 SALESMAN    7521 WARD         1250 grouped by deptno,job,empno,ename
    30 SALESMAN    7654 MARTIN       1250 grouped by deptno,job,empno,ename
    30 SALESMAN    7844 TURNER       1500 grouped by deptno,job,empno,ename
    30 SALESMAN                      5600 grouped by deptno,job
    30                               9400 grouped by deptno
    30                               9400 grouped by deptno
                                    29025 grouped by ()
                                    29025 grouped by ()

31 rijen zijn geselecteerd.

Now my question was: how can I distinguish between the two equal grouping sets in my select list? Not with GROUPING_ID, because I'd have to mention the columns on which were grouped, not the grouping set. A grouping_id(deptno) would yield to 1 in both deptno grouping sets.

Using the row_number analytic function and including all columns present in a grouping set in the partition by clause, was what I thought of next. This would accomplish a "1" for all singular grouping sets, and an arbitrary 1 and 2 within the duplicate grouping sets, like this:

rwijk@ORA11GR1> select deptno
  2       , job
  3       , empno
  4       , ename
  5       , sum(sal) sumsal
  6       , case grouping_id(deptno,job,empno)
  7           when 0 then 'grouped by deptno,job,empno,ename'
  8           when 1 then 'grouped by deptno,job'
  9           when 3 then 'grouped by deptno, grouping set ' ||
 10             to_char(2+row_number() over
                  (partition by deptno,job,empno order by null))
 11           when 7 then 'grouped by (), grouping set ' ||
 12             to_char(4+row_number() over
                  (partition by deptno,job,empno order by null))
 13         end gr_text
 14    from emp
 15   group by grouping sets
 16         ( (deptno,job,empno,ename)
 17         , (deptno,job)
 18         , deptno
 19         , deptno
 20         , ()
 21         , ()
 22         )
 23   order by deptno
 24       , job
 25       , empno
 26  /

DEPTNO JOB        EMPNO ENAME      SUMSAL GR_TEXT
------ --------- ------ ---------- ------ ---------------------------------
    10 CLERK       7934 MILLER       1300 grouped by deptno,job,empno,ename
    10 CLERK                         1300 grouped by deptno,job
    10 MANAGER     7782 CLARK        2450 grouped by deptno,job,empno,ename
    10 MANAGER                       2450 grouped by deptno,job
    10 PRESIDENT   7839 KING         5000 grouped by deptno,job,empno,ename
    10 PRESIDENT                     5000 grouped by deptno,job
    10                               8750 grouped by deptno, grouping set 3
    10                               8750 grouped by deptno, grouping set 4
    20 ANALYST     7788 SCOTT        3000 grouped by deptno,job,empno,ename
    20 ANALYST     7902 FORD         3000 grouped by deptno,job,empno,ename
    20 ANALYST                       6000 grouped by deptno,job
    20 CLERK       7369 SMITH         800 grouped by deptno,job,empno,ename
    20 CLERK       7876 ADAMS        1100 grouped by deptno,job,empno,ename
    20 CLERK                         1900 grouped by deptno,job
    20 MANAGER     7566 JONES        2975 grouped by deptno,job,empno,ename
    20 MANAGER                       2975 grouped by deptno,job
    20                              10875 grouped by deptno, grouping set 3
    20                              10875 grouped by deptno, grouping set 4
    30 CLERK       7900 JAMES         950 grouped by deptno,job,empno,ename
    30 CLERK                          950 grouped by deptno,job
    30 MANAGER     7698 BLAKE        2850 grouped by deptno,job,empno,ename
    30 MANAGER                       2850 grouped by deptno,job
    30 SALESMAN    7499 ALLEN        1600 grouped by deptno,job,empno,ename
    30 SALESMAN    7521 WARD         1250 grouped by deptno,job,empno,ename
    30 SALESMAN    7654 MARTIN       1250 grouped by deptno,job,empno,ename
    30 SALESMAN    7844 TURNER       1500 grouped by deptno,job,empno,ename
    30 SALESMAN                      5600 grouped by deptno,job
    30                               9400 grouped by deptno, grouping set 3
    30                               9400 grouped by deptno, grouping set 4
                                    29025 grouped by (), grouping set 5
                                    29025 grouped by (), grouping set 6

31 rijen zijn geselecteerd.

The problem is the ordering. There is nothing to order by, so I chose NULL. Any other constant value would have sufficed as well. But this ordering is arbitrary, meaning that two rows can both be 1 and can both be 2. And some implementation detail of Oracle has to decide which one it's going to be. And if I use the row_number function twice, would both expressions give the same results? It depends on the same implementation, so yes, both expressions give the same results:

rwijk@ORA11GR1> select case grouping_id(deptno,job,empno)
  2           when 0 then 'grouped by deptno,job,empno,ename'
  3           when 1 then 'grouped by deptno,job'
  4           when 3 then 'grouped by deptno, grouping set ' ||
  5             to_char(2+row_number() over
                  (partition by deptno,job,empno order by null))
  6           when 7 then 'grouped by (), grouping set ' ||
  7             to_char(4+row_number() over
                  (partition by deptno,job,empno order by null))
  8         end gr_text
  9       , case grouping_id(deptno,job,empno)
 10           when 0 then 'grouped by deptno,job,empno,ename'
 11           when 1 then 'grouped by deptno,job'
 12           when 3 then 'grouped by deptno, grouping set ' ||
 13             to_char(2+row_number() over
                  (partition by deptno,job,empno order by null))
 14           when 7 then 'grouped by (), grouping set ' ||
 15             to_char(4+row_number() over
                  (partition by deptno,job,empno order by null))
 16         end gr_text2
 17    from emp
 18   group by grouping sets
 19         ( (deptno,job,empno,ename)
 20         , (deptno,job)
 21         , deptno
 22         , deptno
 23         , ()
 24         , ()
 25         )
 26   order by deptno
 27       , job
 28       , empno
 29  /

GR_TEXT                           GR_TEXT2
--------------------------------- ---------------------------------
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job             grouped by deptno,job
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job             grouped by deptno,job
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job             grouped by deptno,job
grouped by deptno, grouping set 3 grouped by deptno, grouping set 3
grouped by deptno, grouping set 4 grouped by deptno, grouping set 4
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job             grouped by deptno,job
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job             grouped by deptno,job
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job             grouped by deptno,job
grouped by deptno, grouping set 3 grouped by deptno, grouping set 3
grouped by deptno, grouping set 4 grouped by deptno, grouping set 4
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job             grouped by deptno,job
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job             grouped by deptno,job
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job             grouped by deptno,job
grouped by deptno, grouping set 3 grouped by deptno, grouping set 3
grouped by deptno, grouping set 4 grouped by deptno, grouping set 4
grouped by (), grouping set 5     grouped by (), grouping set 5
grouped by (), grouping set 6     grouped by (), grouping set 6

31 rijen zijn geselecteerd.

But still I felt uncomfortable with this solution. I was pressing the Previous and Next buttons in the SQL Reference Manual to find any related functions and saw the GROUP_ID function. "This has got to be the most useless function ever" must have been my thoughts when I read about this function years ago. In fact, I had completely forgotten about it. But it's entire reason for being is a case like this with duplicate grouping sets, to be able to distinguish between the sets:

rwijk@ORA11GR1> select deptno
  2       , job
  3       , empno
  4       , ename
  5       , sum(sal) sumsal
  6       , case grouping_id(deptno,job,empno)
  7           when 0 then 'grouped by deptno,job,empno,ename'
  8           when 1 then 'grouped by deptno,job'
  9           when 3 then 'grouped by deptno'
 10           when 7 then 'grouped by ()'
 11         end gr_text
 12       , group_id() grid
 13    from emp
 14   group by grouping sets
 15         ( (deptno,job,empno,ename)
 16         , (deptno,job)
 17         , deptno
 18         , deptno
 19         , ()
 20         , ()
 21         , ()
 22         , ()
 23         , ()
 24         )
 25   order by deptno
 26       , job
 27       , empno
 28  /

DEPTNO JOB        EMPNO ENAME      SUMSAL GR_TEXT                           GRID
------ --------- ------ ---------- ------ --------------------------------- ----
    10 CLERK       7934 MILLER       1300 grouped by deptno,job,empno,ename    0
    10 CLERK                         1300 grouped by deptno,job                0
    10 MANAGER     7782 CLARK        2450 grouped by deptno,job,empno,ename    0
    10 MANAGER                       2450 grouped by deptno,job                0
    10 PRESIDENT   7839 KING         5000 grouped by deptno,job,empno,ename    0
    10 PRESIDENT                     5000 grouped by deptno,job                0
    10                               8750 grouped by deptno                    1
    10                               8750 grouped by deptno                    0
    20 ANALYST     7788 SCOTT        3000 grouped by deptno,job,empno,ename    0
    20 ANALYST     7902 FORD         3000 grouped by deptno,job,empno,ename    0
    20 ANALYST                       6000 grouped by deptno,job                0
    20 CLERK       7369 SMITH         800 grouped by deptno,job,empno,ename    0
    20 CLERK       7876 ADAMS        1100 grouped by deptno,job,empno,ename    0
    20 CLERK                         1900 grouped by deptno,job                0
    20 MANAGER     7566 JONES        2975 grouped by deptno,job,empno,ename    0
    20 MANAGER                       2975 grouped by deptno,job                0
    20                              10875 grouped by deptno                    1
    20                              10875 grouped by deptno                    0
    30 CLERK       7900 JAMES         950 grouped by deptno,job,empno,ename    0
    30 CLERK                          950 grouped by deptno,job                0
    30 MANAGER     7698 BLAKE        2850 grouped by deptno,job,empno,ename    0
    30 MANAGER                       2850 grouped by deptno,job                0
    30 SALESMAN    7499 ALLEN        1600 grouped by deptno,job,empno,ename    0
    30 SALESMAN    7521 WARD         1250 grouped by deptno,job,empno,ename    0
    30 SALESMAN    7654 MARTIN       1250 grouped by deptno,job,empno,ename    0
    30 SALESMAN    7844 TURNER       1500 grouped by deptno,job,empno,ename    0
    30 SALESMAN                      5600 grouped by deptno,job                0
    30                               9400 grouped by deptno                    0
    30                               9400 grouped by deptno                    1
                                    29025 grouped by ()                        2
                                    29025 grouped by ()                        3
                                    29025 grouped by ()                        4
                                    29025 grouped by ()                        0
                                    29025 grouped by ()                        1

34 rijen zijn geselecteerd.

Here I have duplicated the empty grouping set 4 times and you see the outcome of the GROUP_ID() function for those five sets ranging from 0 to 4. This is exactly what is needed here, so now the initial query is quite easy, like this:

rwijk@ORA11GR1> select deptno
  2       , job
  3       , empno
  4       , ename
  5       , sum(sal) sumsal
  6       , case grouping_id(deptno,job,empno)
  7           when 0 then 'grouped by deptno,job,empno,ename'
  8           when 1 then 'grouped by deptno,job'
  9           when 3 then 'grouped by deptno, grouping set ' || to_char(3+group_id())
 10           when 7 then 'grouped by (), grouping set ' || to_char(5+group_id())
 11         end gr_text
 12    from emp
 13   group by grouping sets
 14         ( (deptno,job,empno,ename)
 15         , (deptno,job)
 16         , deptno
 17         , deptno
 18         , ()
 19         , ()
 20         )
 21   order by deptno
 22       , job
 23       , empno
 24  /

DEPTNO JOB        EMPNO ENAME      SUMSAL GR_TEXT
------ --------- ------ ---------- ------ ---------------------------------
    10 CLERK       7934 MILLER       1300 grouped by deptno,job,empno,ename
    10 CLERK                         1300 grouped by deptno,job
    10 MANAGER     7782 CLARK        2450 grouped by deptno,job,empno,ename
    10 MANAGER                       2450 grouped by deptno,job
    10 PRESIDENT   7839 KING         5000 grouped by deptno,job,empno,ename
    10 PRESIDENT                     5000 grouped by deptno,job
    10                               8750 grouped by deptno, grouping set 4
    10                               8750 grouped by deptno, grouping set 3
    20 ANALYST     7788 SCOTT        3000 grouped by deptno,job,empno,ename
    20 ANALYST     7902 FORD         3000 grouped by deptno,job,empno,ename
    20 ANALYST                       6000 grouped by deptno,job
    20 CLERK       7369 SMITH         800 grouped by deptno,job,empno,ename
    20 CLERK       7876 ADAMS        1100 grouped by deptno,job,empno,ename
    20 CLERK                         1900 grouped by deptno,job
    20 MANAGER     7566 JONES        2975 grouped by deptno,job,empno,ename
    20 MANAGER                       2975 grouped by deptno,job
    20                              10875 grouped by deptno, grouping set 3
    20                              10875 grouped by deptno, grouping set 4
    30 CLERK       7900 JAMES         950 grouped by deptno,job,empno,ename
    30 CLERK                          950 grouped by deptno,job
    30 MANAGER     7698 BLAKE        2850 grouped by deptno,job,empno,ename
    30 MANAGER                       2850 grouped by deptno,job
    30 SALESMAN    7499 ALLEN        1600 grouped by deptno,job,empno,ename
    30 SALESMAN    7521 WARD         1250 grouped by deptno,job,empno,ename
    30 SALESMAN    7654 MARTIN       1250 grouped by deptno,job,empno,ename
    30 SALESMAN    7844 TURNER       1500 grouped by deptno,job,empno,ename
    30 SALESMAN                      5600 grouped by deptno,job
    30                               9400 grouped by deptno, grouping set 3
    30                               9400 grouped by deptno, grouping set 4
                                    29025 grouped by (), grouping set 5
                                    29025 grouped by (), grouping set 6

31 rijen zijn geselecteerd.

About Oracle

Pages

Monday, December 29, 2008

Article in OGh Visie

Wednesday, December 17, 2008

Plan a schedule with the SQL model clause

Friday, December 5, 2008

GROUP_ID()

About me

My APEX applications

Upcoming presentations

Twitter

Oracle related links

Oracle database and APEX blogs

Blog Archive

Total Pageviews