Wednesday, July 15, 2015

A Few Lines of SQL Tames a Weird Requirement

A few days back I answered a question on the OTN SQL Discussion Forum and thought it blog worthy.

The requirement itself is kind of strange, but I'll try and explain it. Basically there is an outbound trip and an inbound trip. Each record has a direction identifier, a sequence and various waypoints aka tp points.

The data looks like this:

create table a(dirid number, seq number, tp number);

insert into a values (1,1,101);
insert into a values (1,2,143);
insert into a values (1,3,120);
insert into a values (1,4,119);

insert into a values (4,1,61);
insert into a values (4,2,119);
insert into a values (4,3,120);
insert into a values (4,4,143);
insert into a values (4,5,101);


The requirement is for the data to be transformed 
from this                                 to this
Dirid Seq  Tr                     Dirid Seq  Tr
1     1    101                    1     1    101
1     2    143                    1     2    143
1     3    120                    1     3    120
1     4    119                    1     4    119
4     1    61                     1     5    NULL
4     2    119                    4     5    61
4     3    120                    4     4    119
4     4    143                    4     3    120
4     5    101                    4     2    143
                                  4     1    101

So I see a couple of things right off the bat. First off, we've added a row. It appears to be an endpoint to dirid = 1. Secondly we've changed the sequence values for dirid = 4. Thirdly, the sort order are different for the various dirid's. Ascending for dirid=1 and descending for dirid=4

Hmmm, how do we attack this.

Since, we're manufacturing rows, I'm thinking MODEL clause.
Next, I'll need to populate that NULL row with the maximum seq value.
I'll need to modify the seq values where dirid=4. I can do that with a CASE statement around multiple ROW_NUMBER analytics.
Lastly, I'll need to again put a CASE statement around multiple ROW_NUMBER analytics, because he sorting rules change based on dirid.

I warned you the requirement seemed a little sketchy, but fun nonetheless. 

So let's start to put it all together.

First, let's add that row using the MODEL clause

SELECT dirid,seq, tp
  FROM a
 MODEL
DIMENSION BY ( dirid,seq )
    MEASURES ( tp )
RULES UPSERT ( tp [1,0] = NULL)

which gives us the following output...so far so good.

Dirid Seq  Tr
1     1    101
1     2    143
1     3    120
1     4    119
4     1    61
4     2    119
4     3    120
4     4    143
4     5    101
1     0    NULL

Next we need to populate the new row with the correct seq value. Let's use a MAX analytic function

SELECT dirid,DECODE(seq,0,MAX(seq) OVER (PARTITION BY NULL),seq) seq,tp
  FROM (SELECT dirid,seq, tp
          FROM a
         MODEL
        DIMENSION BY ( dirid,seq )
            MEASURES ( tp )
        RULES UPSERT ( tp [1,0] = NULL))

which gives us the following output...again, so far so good.

Dirid Seq  Tr
1     1    101
1     2    143
1     3    120
1     4    119
4     1    61
4     2    119
4     3    120
4     4    143
4     5    101
1     5    NULL

Next we need to modify the seq values where dirid = 4. Let's place a CASE statement around a few ROW_NUMBER analytics.

SELECT dirid,seq,tp,
       CASE 
        WHEN dirid = 1 THEN  
             row_number() OVER (PARTITION BY dirid ORDER BY seq) 
        ELSE  
             row_number() OVER (PARTITION BY dirid ORDER BY seq DESC) 
        END rn
  FROM (SELECT dirid,DECODE(seq,0,MAX(seq) OVER (PARTITION BY NULL),seq) seq,tp
          FROM (SELECT dirid,seq, tp
                  FROM a
                 MODEL
                DIMENSION BY ( dirid,seq )
                    MEASURES ( tp )
                RULES UPSERT ( tp [1,0] = NULL)))

Cool. The rn column contains the values needed for seq.

Dirid Seq  Tr    rn
1     1    101   1
1     2    143   2
1     3    120   3
1     4    119   4
1     5    NULL  5
4     1    61    5
4     2    119   4
4     3    120   3
4     4    143   2
4     5    101   1

Let's replace seq with rn and again wrap a CASE statement to handle the different sorting requirements of dirid=1 and dirid=4


SELECT dirid,rn seq,tp,
       CASE 
         WHEN dirid = 1 THEN  
              row_number() OVER (PARTITION BY dirid ORDER BY rn) 
         ELSE  
              row_number() OVER (PARTITION BY dirid ORDER BY rn DESC) 
       END rn2
FROM (SELECT dirid,seq,tp,
             CASE 
                WHEN dirid = 1 THEN  
                     row_number() OVER (PARTITION BY dirid ORDER BY seq) 
                ELSE  
                     row_number() OVER (PARTITION BY dirid ORDER BY seq DESC) 
             END rn
       FROM (SELECT dirid,DECODE(seq,0,MAX(seq) OVER (PARTITION BY NULL),seq)seq
                    ,tp
              FROM (SELECT dirid,seq, tp
                    FROM a
                    MODEL
                    DIMENSION BY ( dirid,seq )
                    MEASURES ( tp )
                    RULES UPSERT ( tp [1,0] = NULL))))

Almost there.

Dirid Seq  Tr    rn2
1     1    101   1
1     2    143   2
1     3    120   3
1     4    119   4
1     5    NULL  5
4     1    101   5
4     2    143   4
4     3    120   3
4     4    119   2
4     5    61    1


The final step is to sort on dirid, rn2.

SELECT dirid,seq,tp
FROM (   SELECT dirid,rn seq,tp,
                CASE 
                   WHEN dirid = 1 THEN  
                        row_number() OVER (PARTITION BY dirid ORDER BY rn) 
                   ELSE  
                        row_number() OVER (PARTITION BY dirid ORDER BY rn DESC) 
                END rn2
          FROM (SELECT dirid,seq,tp,
                      CASE 
                        WHEN dirid = 1 THEN  
                             row_number() OVER (PARTITION BY dirid ORDER BY seq) 
                        ELSE  
                             row_number() OVER (PARTITION BY dirid ORDER BY seq DESC) 
                      END rn
                  FROM (SELECT dirid,DECODE(seq,0,MAX(seq) OVER (PARTITION BY NULL),seq) seq,tp
                          FROM (SELECT dirid,seq, tp
                                  FROM a
                                 MODEL
                                DIMENSION BY ( dirid,seq )
                                    MEASURES ( tp )
                                RULES UPSERT ( tp [1,0] = NULL)))))
ORDER BY dirid,rn2   


Sweetness! The outbound trip (dirid=1) goes out to the following waypoint in seq order, then the inbound trip (dirid=4)  comes back to the starting point in reverse.

Dirid Seq  Tr    
1     1    101   
1     2    143   
1     3    120   
1     4    119   
1     5    NULL  
4     5    61    
4     4    119   
4     3    120   
4     2    143   
4     1    101   


I warned you it was a weird requirement, but those are the kind of things that make you think "I can do that in a few lines of SQL"....now with a smattering of analytic functions you can too.


...our journey continues


1 comment:

Unknown said...

That is a very weird requirement. Once you get into "throw out the actual value and construct a different value" it's tempting to just whip out the pl/sql or even (argh) the hard coding. Thanks for the nice example of MODEL clause!

-Natalka Roshak - http://rdbms-insight.com