Monty Latiolais: July 2015

Wednesday, July 15, 2015

A Few Lines of SQL Tames a Weird Requirement

A few days back I answered a question on the OTN SQL Discussion Forum and thought it blog worthy.

The requirement itself is kind of strange, but I'll try and explain it. Basically there is an outbound trip and an inbound trip. Each record has a direction identifier, a sequence and various waypoints aka tp points.

The data looks like this:

create table a(dirid number, seq number, tp number);

insert into a values (1,1,101);

insert into a values (1,2,143);

insert into a values (1,3,120);

insert into a values (1,4,119);

insert into a values (4,1,61);

insert into a values (4,2,119);

insert into a values (4,3,120);

insert into a values (4,4,143);

insert into a values (4,5,101);

The requirement is for the data to be transformed

from this to this

Dirid Seq Tr Dirid Seq Tr

1 1 101 1 1 101

1 2 143 1 2 143

1 3 120 1 3 120

1 4 119 1 4 119

4 1 61 1 5 NULL

4 2 119 4 5 61

4 3 120 4 4 119

4 4 143 4 3 120

4 5 101 4 2 143

4 1 101

So I see a couple of things right off the bat. First off, we've added a row. It appears to be an endpoint to dirid = 1. Secondly we've changed the sequence values for dirid = 4. Thirdly, the sort order are different for the various dirid's. Ascending for dirid=1 and descending for dirid=4

Hmmm, how do we attack this.

Since, we're manufacturing rows, I'm thinking MODEL clause.

Next, I'll need to populate that NULL row with the maximum seq value.

I'll need to modify the seq values where dirid=4. I can do that with a CASE statement around multiple ROW_NUMBER analytics.

Lastly, I'll need to again put a CASE statement around multiple ROW_NUMBER analytics, because he sorting rules change based on dirid.

I warned you the requirement seemed a little sketchy, but fun nonetheless.

So let's start to put it all together.

First, let's add that row using the MODEL clause

SELECT dirid,seq, tp

FROM a

MODEL

DIMENSION BY ( dirid,seq )

MEASURES ( tp )

RULES UPSERT ( tp [1,0] = NULL)

which gives us the following output...so far so good.

Dirid Seq Tr

1 1 101

1 2 143

1 3 120

1 4 119

4 1 61

4 2 119

4 3 120

4 4 143

4 5 101

1 0 NULL

Next we need to populate the new row with the correct seq value. Let's use a MAX analytic function

SELECT dirid,DECODE(seq,0,MAX(seq) OVER (PARTITION BY NULL),seq) seq,tp

FROM (SELECT dirid,seq, tp

FROM a

MODEL

DIMENSION BY ( dirid,seq )

MEASURES ( tp )

RULES UPSERT ( tp [1,0] = NULL))

which gives us the following output...again, so far so good.

Dirid Seq Tr

1 1 101

1 2 143

1 3 120

1 4 119

4 1 61

4 2 119

4 3 120

4 4 143

4 5 101

1 5 NULL

Next we need to modify the seq values where dirid = 4. Let's place a CASE statement around a few ROW_NUMBER analytics.

SELECT dirid,seq,tp,

CASE

WHEN dirid = 1 THEN

row_number() OVER (PARTITION BY dirid ORDER BY seq)

ELSE

row_number() OVER (PARTITION BY dirid ORDER BY seq DESC)

END rn

FROM (SELECT dirid,DECODE(seq,0,MAX(seq) OVER (PARTITION BY NULL),seq) seq,tp

FROM (SELECT dirid,seq, tp

FROM a

MODEL

DIMENSION BY ( dirid,seq )

MEASURES ( tp )

RULES UPSERT ( tp [1,0] = NULL)))

Cool. The rn column contains the values needed for seq.

Dirid Seq Tr rn

1 1 101 1

1 2 143 2

1 3 120 3

1 4 119 4

1 5 NULL 5

4 1 61 5

4 2 119 4

4 3 120 3

4 4 143 2

4 5 101 1

Let's replace seq with rn and again wrap a CASE statement to handle the different sorting requirements of dirid=1 and dirid=4

SELECT dirid,rn seq,tp,

CASE

WHEN dirid = 1 THEN

row_number() OVER (PARTITION BY dirid ORDER BY rn)

ELSE

row_number() OVER (PARTITION BY dirid ORDER BY rn DESC)

END rn2

FROM (SELECT dirid,seq,tp,

CASE

WHEN dirid = 1 THEN

row_number() OVER (PARTITION BY dirid ORDER BY seq)

ELSE

row_number() OVER (PARTITION BY dirid ORDER BY seq DESC)

END rn

FROM (SELECT dirid,DECODE(seq,0,MAX(seq) OVER (PARTITION BY NULL),seq)seq

,tp

FROM (SELECT dirid,seq, tp

FROM a

MODEL

DIMENSION BY ( dirid,seq )

MEASURES ( tp )

RULES UPSERT ( tp [1,0] = NULL))))

Almost there.

Dirid Seq Tr rn2

1 1 101 1

1 2 143 2

1 3 120 3

1 4 119 4

1 5 NULL 5

4 1 101 5

4 2 143 4

4 3 120 3

4 4 119 2

4 5 61 1

The final step is to sort on dirid, rn2.

SELECT dirid,seq,tp

FROM ( SELECT dirid,rn seq,tp,

CASE

WHEN dirid = 1 THEN

row_number() OVER (PARTITION BY dirid ORDER BY rn)

ELSE

row_number() OVER (PARTITION BY dirid ORDER BY rn DESC)

END rn2

FROM (SELECT dirid,seq,tp,

CASE

WHEN dirid = 1 THEN

row_number() OVER (PARTITION BY dirid ORDER BY seq)

ELSE

row_number() OVER (PARTITION BY dirid ORDER BY seq DESC)

END rn

FROM (SELECT dirid,DECODE(seq,0,MAX(seq) OVER (PARTITION BY NULL),seq) seq,tp

FROM (SELECT dirid,seq, tp

FROM a

MODEL

DIMENSION BY ( dirid,seq )

MEASURES ( tp )

RULES UPSERT ( tp [1,0] = NULL)))))

ORDER BY dirid,rn2

Sweetness! The outbound trip (dirid=1) goes out to the following waypoint in seq order, then the inbound trip (dirid=4) comes back to the starting point in reverse.

Dirid Seq Tr

1 1 101

1 2 143

1 3 120

1 4 119

1 5 NULL

4 5 61

4 4 119

4 3 120

4 2 143

4 1 101

I warned you it was a weird requirement, but those are the kind of things that make you think "I can do that in a few lines of SQL"....now with a smattering of analytic functions you can too.

...our journey continues

Thursday, July 9, 2015

NTILE vs WIDTH_BUCKET and the Winner is...PIVOT

At a recent meeting of developers, a requirement was discussed where a finite number of columns are to be populated in alphabetical order - think the opposite of telephone book.

As is often the case, there are multiple ways to do this, but at it's heart, this really is a pure pivot. I think we should rule out both NTILE and WIDTH_BUCKET.

NTILE creates equiheight histograms, while WIDTH_BUCKET creates equiwidth histograms. We could probably force the use of either, but we would still need to pivot based on the quartile values.

NTILE, in reality, allows us to specify the number of rows not the number of columns across. For example,

SELECT ename , NTILE(5) OVER ( ORDER BY ename ) AS quartile FROM emp

ENAME	QUARTILE
ADAMS	1
ALLEN	1
BLAKE	1
CLARK	2
FORD	2
JAMES	2
JONES	3
KING	3
MARTIN	3
MILLER	4
SCOTT	4
SMITH	4
TURNER	5
WARD	5

Modifying the query actually adds to the number of columns. For example,

SELECT ename , NTILE(3) OVER ( ORDER BY ename ) AS quartile FROM emp

ENAME	QUARTILE
ADAMS	1
ALLEN	1
BLAKE	1
CLARK	1
FORD	1
JAMES	2
JONES	2
KING	2
MARTIN	2
MILLER	2
SCOTT	3
SMITH	3
TURNER	3
WARD	3

At first glance, WIDTH_BUCKET would seem to make better sense, because we want to specify a number of columns, but these columns are assigned based on numerical ranges, not strings.

So the most straightforward way to do this, I feel, is a pure PIVOT...something like this.

SELECT col_1 , col_2 , col_3

FROM ( SELECT ename ,

TRUNC ( ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1 ) / 3 ) AS row_num ,

MOD ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1, 3 ) AS col_num

FROM emp ) PIVOT

( MAX(ename) FOR col_num IN ( 0 as col_1 , 1 as col_2 , 2 as col_3 )

)

ORDER BY row_num

COL_1	COL_2	COL_3
ADAMS	ALLEN	BLAKE
CLARK	FORD	JAMES
JONES	KING	MARTIN
MILLER	SCOTT	SMITH
TURNER	WARD

To display 4 columns instead of 3....

SELECT col_1 , col_2 , col_3 , col_4

FROM ( SELECT ename ,

TRUNC ( ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1 ) / 4 ) AS row_num ,

MOD ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1, 4 ) AS col_num

FROM emp ) PIVOT

( MAX(ename) FOR col_num IN ( 0 as col_1 , 1 as col_2 , 2 as col_3 , 3 as col_4 )

)

ORDER BY row_num

COL_1	COL_2	COL_3	COL_4
ADAMS	ALLEN	BLAKE	CLARK
FORD	JAMES	JONES	KING
MARTIN	MILLER	SCOTT	SMITH
TURNER	WARD

To completely eliminate the PIVOT, one could use the MODEL clause, but that can get pretty tedious, pretty fast.

SELECT col_1, col_2, col_3

FROM emp

MODEL

RETURN UPDATED ROWS

DIMENSION BY

( TRUNC ( ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1 ) / 3 ) AS row_num ,

MOD ( ROW_NUMBER() OVER ( ORDER BY ename ) - 1, 3 ) AS col_num

)

MEASURES ( ename AS col_1, ename AS col_2, ename AS col_3 )

RULES UPSERT

...not this kind of MODEL

(

col_2[ANY,0] = col_1[CV(),1] ,

col_3[ANY,0] = col_1[CV(),2]

)

ORDER BY row_num

COL_1	COL_2	COL_3
ADAMS	ALLEN	BLAKE
CLARK	FORD	JAMES
JONES	KING	MARTIN
MILLER	SCOTT	SMITH
TURNER	WARD

...our journey continues