Recipe14.4.Extracting Elements of a String from Unfixed Locations


Recipe 14.4. Extracting Elements of a String from Unfixed Locations

Problem

You have a string field that contains serialized log data. You want to parse through the string and extract the relevant information. Unfortunately, the relevant information is not at fixed points in the string. Instead, you must use the fact that certain characters exist around the information you need, to extract said information. For example, consider the following strings:

 xxxxxabc[867]xxx[-]xxxx[5309]xxxxx xxxxxtime:[11271978]favnum:[4]id:[Joe]xxxxx call:[F_GET_ROWS()]b1:[ROSEWOOD…SIR]b2:[44400002]77.90xxxxx film:[non_marked]qq:[unit]tailpipe:[withabanana?]80sxxxxx 

You want to extract the values between the square brackets, returning the following result set:

 FIRST_VAL       SECOND_VAL          LAST_VAL --------------- ------------------- --------------- 867             -                   5309 11271978        4                   Joe F_GET_ROWS( )   ROSEWOOD…SIR 44400002 non_marked      unit                withabanana? 

Solution

Despite not knowing the exact locations within the string of the interesting values, you do know that they are located between square brackets [], and you know there are three of them. Use Oracle's built-in function INSTR to find the locations to of the brackets. Use the built-in function SUBSTR to extract the values from the string. View V will contain the strings to parse and is defined as follows (its use is strictly for readability):

 create view V as select 'xxxxxabc[867]xxx[-]xxxx[5309]xxxxx' msg     from dual    union all   select 'xxxxxtime:[11271978]favnum:[4]id:[Joe]xxxxx' msg     from dual    union all   select 'call:[F_GET_ROWS()]b1:[ROSEWOOD…SIR]b2:[44400002]77.90xxxxx' msg      from dual    union all   select 'film:[non_marked]qq:[unit]tailpipe:[withabanana?]80sxxxxx' msg     from dual  1 select substr(msg,  2         instr(msg,'[',1,1)+1,  3         instr(msg,']',1,1)-instr(msg,'[',1,1)-1) first_val,  4        substr(msg,  5         instr(msg,'[',1,2)+1,  6         instr(msg,']',1,2)-instr(msg,'[',1,2)-1) second_val,  7        substr(msg,  8         instr(msg,'[',-1,1)+1,  9         instr(msg,']',-1,1)-instr(msg,'[',-1,1)-1) last_val 10  from V 

Discussion

Using Oracle's built-in function INSTR makes this problem fairly simple to solve. Since you know the values you are after are enclosed in [], and that there are three sets of [], the first step to this solution is to simply use INSTR to find the numeric positions of [] in each string. The following example returns the numeric position of the opening and closing brackets in each row:

  select instr(msg,'[',1,1) "1st_[",                   instr(msg,']',1,1) "]_1st",                   instr(msg,'[',1,2) "2nd_[",                   instr(msg,']',1,2) "]_2nd",                   instr(msg,'[',-1,1) "3rd_[",                   instr(msg,']',-1,1) "]_3rd"              from V   1st_[ ]_1st      2nd_[ ]_2nd      3rd_[ ]_3rd   ------ ----- ---------- ----- ---------- -----        9    13         17    19         24    29       11    20         28    30         34    38        6    19         23    38         42    51        6    17         21    26         36    49 

At this point, the hard work is done. All that is left is to plug the numeric positions into SUBSTR to parse MSG at those locations. You'll notice that in the complete solution there's some simple arithmetic on the values returned by INSTR, particularly, +1 and1; this is necessary to ensure the opening square bracket, [, is not returned in the final result set. Listed below is the solution less addition and subtraction of 1 on the return values from INSTR; notice how each value has a leading square bracket:

  select substr(msg,         instr(msg,'[',1,1),         instr(msg,']',1,1)-instr(msg,'[',1,1)) first_val,        substr(msg,         instr(msg,'[',1,2),         instr(msg,']',1,2)-instr(msg,'[',1,2)) second_val,        substr(msg,         instr(msg,'[',-1,1),         instr(msg,']',-1,1)-instr(msg,'[',-1,1)) last_val   from V FIRST_VAL       SECOND_VAL           LAST_VAL --------------- -------------------- ------- [867            [-                   [5309 [11271978       [4                   [Joe [F_GET_ROWS( )  [ROSEWOOD…SIR [44400002 [non_marked     [unit                [withabanana? 

From the result set above, you can see that the open bracket is there. You may be thinking: "OK, put the addition of 1 to INSTR back and the leading square bracket goes away. Why do we need to subtract 1?" The reason is this: if you put the addition back but leave out the subtraction, you end up including the closing square bracket, as can be seen below:

  select substr(msg,         instr(msg,'[',1,1)+1,         instr(msg,']',1,1)-instr(msg,'[',1,1)) first_val,    substr(msg,     instr(msg,'[',1,2)+1,     instr(msg,']',1,2)-instr(msg,'[',1,2)) second_val,     substr(msg,     instr(msg,'[',-1,1)+1,     instr(msg,']',-1,1)-instr(msg,'[',-1,1)) last_val   from V FIRST_VAL       SECOND_VAL           LAST_VAL --------------- ---------------      ------------- 867]            -]                   5309] 11271978]       4]                   Joe] F_GET_ROWS( )]  ROSEWOOD…SIR]  44400002] non_marked]     unit]                withabanana?] 

At this point it should be clear: to ensure you include neither of the square brackets, you must add 1 to the beginning index and subtract one from the ending index.




SQL Cookbook
SQL Cookbook (Cookbooks (OReilly))
ISBN: 0596009763
EAN: 2147483647
Year: 2005
Pages: 235

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net