Recipe 14.4. Extracting Elements of a String from Unfixed LocationsProblemYou have a string field that contains serialized log data. You want to parse through the string and extract the relevant information. Unfortunately, the relevant information is not at fixed points in the string. Instead, you must use the fact that certain characters exist around the information you need, to extract said information. For example, consider the following strings: xxxxxabc[867]xxx[-]xxxx[5309]xxxxx xxxxxtime:[11271978]favnum:[4]id:[Joe]xxxxx call:[F_GET_ROWS()]b1:[ROSEWOOD…SIR]b2:[44400002]77.90xxxxx film:[non_marked]qq:[unit]tailpipe:[withabanana?]80sxxxxx You want to extract the values between the square brackets, returning the following result set: FIRST_VAL SECOND_VAL LAST_VAL --------------- ------------------- --------------- 867 - 5309 11271978 4 Joe F_GET_ROWS( ) ROSEWOOD…SIR 44400002 non_marked unit withabanana? SolutionDespite not knowing the exact locations within the string of the interesting values, you do know that they are located between square brackets [], and you know there are three of them. Use Oracle's built-in function INSTR to find the locations to of the brackets. Use the built-in function SUBSTR to extract the values from the string. View V will contain the strings to parse and is defined as follows (its use is strictly for readability): create view V as select 'xxxxxabc[867]xxx[-]xxxx[5309]xxxxx' msg from dual union all select 'xxxxxtime:[11271978]favnum:[4]id:[Joe]xxxxx' msg from dual union all select 'call:[F_GET_ROWS()]b1:[ROSEWOOD…SIR]b2:[44400002]77.90xxxxx' msg from dual union all select 'film:[non_marked]qq:[unit]tailpipe:[withabanana?]80sxxxxx' msg from dual 1 select substr(msg, 2 instr(msg,'[',1,1)+1, 3 instr(msg,']',1,1)-instr(msg,'[',1,1)-1) first_val, 4 substr(msg, 5 instr(msg,'[',1,2)+1, 6 instr(msg,']',1,2)-instr(msg,'[',1,2)-1) second_val, 7 substr(msg, 8 instr(msg,'[',-1,1)+1, 9 instr(msg,']',-1,1)-instr(msg,'[',-1,1)-1) last_val 10 from V DiscussionUsing Oracle's built-in function INSTR makes this problem fairly simple to solve. Since you know the values you are after are enclosed in [], and that there are three sets of [], the first step to this solution is to simply use INSTR to find the numeric positions of [] in each string. The following example returns the numeric position of the opening and closing brackets in each row: select instr(msg,'[',1,1) "1st_[", instr(msg,']',1,1) "]_1st", instr(msg,'[',1,2) "2nd_[", instr(msg,']',1,2) "]_2nd", instr(msg,'[',-1,1) "3rd_[", instr(msg,']',-1,1) "]_3rd" from V 1st_[ ]_1st 2nd_[ ]_2nd 3rd_[ ]_3rd ------ ----- ---------- ----- ---------- ----- 9 13 17 19 24 29 11 20 28 30 34 38 6 19 23 38 42 51 6 17 21 26 36 49 At this point, the hard work is done. All that is left is to plug the numeric positions into SUBSTR to parse MSG at those locations. You'll notice that in the complete solution there's some simple arithmetic on the values returned by INSTR, particularly, +1 and1; this is necessary to ensure the opening square bracket, [, is not returned in the final result set. Listed below is the solution less addition and subtraction of 1 on the return values from INSTR; notice how each value has a leading square bracket: select substr(msg, instr(msg,'[',1,1), instr(msg,']',1,1)-instr(msg,'[',1,1)) first_val, substr(msg, instr(msg,'[',1,2), instr(msg,']',1,2)-instr(msg,'[',1,2)) second_val, substr(msg, instr(msg,'[',-1,1), instr(msg,']',-1,1)-instr(msg,'[',-1,1)) last_val from V FIRST_VAL SECOND_VAL LAST_VAL --------------- -------------------- ------- [867 [- [5309 [11271978 [4 [Joe [F_GET_ROWS( ) [ROSEWOOD…SIR [44400002 [non_marked [unit [withabanana? From the result set above, you can see that the open bracket is there. You may be thinking: "OK, put the addition of 1 to INSTR back and the leading square bracket goes away. Why do we need to subtract 1?" The reason is this: if you put the addition back but leave out the subtraction, you end up including the closing square bracket, as can be seen below: select substr(msg, instr(msg,'[',1,1)+1, instr(msg,']',1,1)-instr(msg,'[',1,1)) first_val, substr(msg, instr(msg,'[',1,2)+1, instr(msg,']',1,2)-instr(msg,'[',1,2)) second_val, substr(msg, instr(msg,'[',-1,1)+1, instr(msg,']',-1,1)-instr(msg,'[',-1,1)) last_val from V FIRST_VAL SECOND_VAL LAST_VAL --------------- --------------- ------------- 867] -] 5309] 11271978] 4] Joe] F_GET_ROWS( )] ROSEWOOD…SIR] 44400002] non_marked] unit] withabanana?] At this point it should be clear: to ensure you include neither of the square brackets, you must add 1 to the beginning index and subtract one from the ending index. |