3.7 Working with Sequences3.7.1 ProblemYou want to find any arithmetically increasing sequence in the data. In our example, you want to find any sequences in which the purity level is arithmetically increasing (100, 101, 102, etc.). Any such sequences that are three or more containers long indicate that a production line is overheating . 3.7.2 SolutionFind sequences in the ProductionLine data using the following query: SELECT p1.ContainerId SeqBeg, p2.ContainerId SeqEnd, p2.ContainerId-p1.ContainerId+1 SequenceSize FROM ProductionLine p1, ProductionLine p2 WHERE (p1.ContainerId < p2.ContainerId) AND NOT EXISTS(SELECT * FROM ProductionLine p3 WHERE (p3.Purity-p3.ContainerId!=p1.Purity-p1.ContainerId AND p3.ContainerId BETWEEN p1.ContainerId AND p2.ContainerId) OR (p3.ContainerId=p1.ContainerId-1 AND p3.Purity-p3.ContainerId=p1.Purity-p1.ContainerId) OR (p3.ContainerId=p2.ContainerId+1 AND p3.Purity-p3.ContainerId=p1.Purity-p1.ContainerId)) SeqBeg SeqEnd SequenceSize ----------- ----------- ------------ 2 5 4 8 9 2 3.7.3 DiscussionThis query uses a framework similar to that used to find regions . The difference is that the subquery contains a WHERE clause condition to identify sequences. To explain exactly how that works, let's begin by looking at the raw data: ContainerId Purity Diff ----------- ----------- ----------- 1 100 99 2 100 98 3 101 98 4 102 98 5 103 98 6 100 94 7 103 96 8 108 100 9 109 100 10 100 90 11 100 89 12 100 88 Notice that the purity levels of containers 2-5 represent an arithmetically increasing sequence. Notice also that the container IDs represent an arithmetically increasing sequence. We can use the fact that both sequences are monotonic to our advantage. It means that, within any given sequence, the difference between the ContainerId and Purity will be a constant. For example: 100 - 2 = 98 101 - 3 = 98 ... We put knowledge of this pattern to use in the subquery, which uses the following WHERE condition to find any candidates that break the sequence: p3.Purity-p3.ContainerId!=p1.Purity-p1.ContainerId If any row in a candidate sequence (p3) has the same difference as the first row of the sequence (p1), it is a member of the sequence. If not, the candidate pair (p1, p2) should be discarded. The rest of the framework is exactly the same as for finding regions, and you can easily extend it for sequences in the same ways that you can when finding regions. For example, to find only sequences larger than three rows, add the following WHERE condition: p2.ContainerId-p1.ContainerId>=2 AND For example: SELECT p1.ContainerId SeqBeg, p2.ContainerId SeqEnd, p2.ContainerId-p1.ContainerId+1 SequenceSize FROM ProductionLine p1, ProductionLine p2 WHERE (p1.ContainerId < p2.ContainerId) AND p2.ContainerId-p1.ContainerId>=2 AND NOT EXISTS(SELECT * FROM ProductionLine p3 WHERE (p3.Purity-p3.ContainerId!=p1.Purity-p1.ContainerId AND p3.ContainerId BETWEEN p1.ContainerId AND p2.ContainerId) OR (p3.ContainerId=p1.ContainerId-1 AND p3.Purity-p3.ContainerId=p1.Purity-p1.ContainerId) OR (p3.ContainerId=p2.ContainerId+1 AND p3.Purity-p3.ContainerId=p1.Purity-p1.ContainerId)) SeqBeg SeqEnd SequenceSize ----------- ----------- ------------ 2 5 4 With this framework, you can use algorithms for regions and apply them to sequences with minimal changes to the code. Usually, you just have to add an additional condition such as the one we added in this recipe. |