Search and Replace | OpenOffice.org Macros Explained

The search process is directed by a search descriptor, which is able to search only the object that created it. In other words, you cannot use the same search descriptor to search multiple documents. The search descriptor specifies the search text and how the text is searched (see Table 14 ). The search descriptor is the most complicated component of searching.

Table 14: Properties of the com.sun.star.util.SearchDescriptor service.
Property	Description
SearchBackwards	If True, search the document backwards .
SearchCaseSensitive	If True, the case of the letters affects the search.
SearchWords	If True, only complete words are found.
SearchRegularExpression	If True, the search string is treated as a regular expression.
SearchStyles	If True, text is found based on applied style names -not on the text content.
SearchSimilarity	If True, a "similarity search" is performed.
SearchSimilarityRelax	If True, the properties SearchSimilarityRelax, SearchSimilarityRemove, SearchSimilarityAdd, and SearchSimilarityExchange are all used.
SearchSimilarityRemove	Short Integer specifying how many characters may be ignored in a match.
SearchSimilarityAdd	Short Integer specifying how many characters may be added in a match.
SearchSimilarityExchange	Short Integer specifying how many characters may be replaced in a match.

Although not included in Table 14, a search descriptor supports the string property SearchString, which represents the string to search. The XSearchDescriptor interface defines the methods getSearchString() and setSearchString() to get and set the property if you prefer to use a method rather than directly setting the property. The XSearchable interface defines the methods used for searching and creating the search descriptor (see Table 15 ).

Table 15: Methods defined by the com.sun.star.util.XSearchable interface.
Method	Description
createSearchDescriptor()	Create a new SearchDescriptor.
findAll(XSearchDescriptor)	Return an XindexAccess containing all occurrences.
find First(XSearchDescriptor)	Starting from the beginning of the searchable object, return a text range containing the first found text.
findNext(XText Range, XSearchDescriptor)	Starting from the provided text range, return a text range containing the first found text.

The macro in Listing 30 is very simple; it sets the CharWeight character property of all occurrences of the text "hello" to com.sun.start.awt.FontWeight.BOLD-text ranges support character and paragraph properties.

Listing 30: Set all occurrences of the word "hello" to bold text.

 Sub SimpleSearchExample   Dim oDescriptor  'The search descriptor   Dim oFound       'The found range   oDescriptor = ThisComponent.createSearchDescriptor()   With oDescriptor     .SearchString = "hello"     .SearchWords = true           'The attributes default to False     .SearchCaseSensitive = False  'So setting one to False is redundant   End With   ' Find the first one   oFound = ThisComponent.findFirst(oDescriptor)   Do While Not IsNull(oFound)     Print oFound.getString()     oFound.CharWeight = com.sun.star.awt.FontWeight.BOLD     oFound = ThisComponent.findNext(oFound.End, oDescriptor)   Loop End Sub

Searching Selected Text or a Specified Range

The trick to searching a specified range of text is to notice that you can use any text range, including a text cursor, in the findNext routine. After each call to findNext(), check the end points of the find to see if the search went too far. You may, therefore, constrain a search to any text range. The primary purpose of the findFirst method is to obtain the initial text range for the findNext routine. You can use the selected text framework very easily to search a range of text (see Listing 31 ).

Listing 31: Iterate through all occurrences of text between two cursors .

 Sub SearchSelectedWorker(oLCursor, oRCursor, oText, oDescriptor)   If oText.compareRegionEnds(oLCursor, oRCursor) <= 0 Then Exit Sub   oLCursor.goRight(0, False)   Dim oFound   REM There is no reason to perform a findFirst.   oFound = oDoc.findNext(oLCursor, oDescriptor)   Do While Not IsNull(oFound)     REM See if we searched past the end     If -1 = oText.compareRegionEnds(oFound, oRCursor) Then Exit Do     Print oFound.getString()     oFound = ThisComponent.findNext(oFound.End, oDescriptor)   Loop End Sub

The text object cannot compare two regions unless they both belong to that text object. Text that resides in a different frame, section, or even a text table, uses a different text object than the main document text object. As an exercise, investigate what happens if the found text is in a different text object than the text object that contains oRCursor in Listing 31. Is the code in Listing 31 robust?

Searching for all Occurrences

It is significantly faster to search for all occurrences of the text at one time using the findAll() object method than to repeatedly call findNext(). Care must be used, however, when using all occurrences of the specified text. The macro in Listing 32 is an extreme example of code gone bad on purpose.

Listing 32: Find and replace all occurrences of the word "helloxyzzy".

 Sub SimpleSearchAllExample   Dim oDescriptor  'The search descriptor   Dim oFound       'The found range   Dim oFoundAll    'The found range   Dim n%           'General index variable   oDescriptor = ThisComponent.createSearchDescriptor()   oDescriptor.SearchString = "helloxyzzy"   oFoundAll = ThisComponent.findAll(oDescriptor)   For n% = 0 to oFoundAll.getCount()-1     oFound = oFoundAll.getByIndex(n%)     Print oFound.getString()     oFound.setString("hello" & n%)   Next End Sub

The macro in Listing 32 obtains a list of text ranges that encapsulate the text "helloxyzzy". This text is then replaced with a shorter piece of text. In a perfect world, the occurrences of "helloxyzzy" would be replaced with "hello0", "hellol", "hello2", ... When each instance is replaced with the shorter text, the total length of the document changes. Remember that the text-range objects were all obtained before the first instance of text is modified. Although the text-range interface is clearly defined, the internal workings are not. I purposely created this example because I expected it to fail, and with no defined behavior my only option was to create a test. Experimentally, I observed that if multiple occurrences of "helloxyzzy" are contained in the same word, poor behavior results. I also observed that if all occurrences of "helloxyzzy" are contained in separate words, everything works great, and only occurrences of "helloxyzzy" are changed-leaving the surrounding text intact. I can but nod my head in approval at the brilliance of the programmers that allow this behavior, while remaining cautiously paranoid , expecting that code relying on this behavior will fail in the future.

Note	After observing the behavior produced by the macro in Listing 32, I expect similar behavior from text cursors as well.

Investigating the behavior of the macro in Listing 32 is more than simple academics . I use a similar macro regularly in my own writing. For various reasons, the numbering used during the creation of this book was done manually rather than using the built-in numbering capabilities of OOo-in other words, I should have read Chapter 5 of Taming OpenOffice.org Writer 1.1 by Jean Hollis Weber. Manually inserting numbering is a problem when a table, figure, or code listing is deleted, inserted, or moved. In other words, if I delete the first table in my document, all of my tables will be numbered incorrectly. I created a macro that verifies that the items are sequentially numbered starting with 1. Consider tables, for example. The table captions use the paragraph style "_table caption." Tables are numbered using the form "Table 1." The macro in Listing 33 verifies that the captions are sequentially numbered, and it renumbers them if required. When the text "Table #" is found in a paragraph using the "_table caption" paragraph style, it is assumed to be identifying a table. The macro in Listing 33 renumbers these based on their occurrence in the document. Each occurrence of "Table #" that is not in the "_table caption" paragraph style is assumed to be a reference to a table.

Listing 33: Reorder_Captions is found in the Writer module in this chapter's source code files as SC13.sxw.

 Sub Reorder_Captions(Optional oDoc)   REM If the first argument is missing, then use the current document.   If IsMissing(oDoc) Then oDoc = ThisComponent   Dim sFront As String      'Figure, Listing, or Table   Dim sStyleType As String  'Style to search.   Dim oSearchDescr          'Search descriptor used for searching   Dim oAllFound             'All found occurrences   Dim oFound                'One single occurrence   Dim v()                   'Contains the caption numbers   Dim x()                   'Work array for splitting words   Dim n As Integer          'General index variable   Dim i As Integer          'General index variable   Dim FirstOutOfSync As Integer   REM First, determine the type to search.   sFront = InputBox(_     "Reorder which figure type (Figure, Listing, Table, etc...):",_     "Reorder Type",_     "Listing")   If IsNull(sFront) Then Exit Sub     REM Set styles to "_listing caption", "_figure caption", "_table caption"     sFront = Trim(sFront)     sStyleType = "_" & LCase(sFront) & " caption"     REM Search the document using a regular expression.     REM [:digit:] searches for a digit. Placing the "*" after digit     REM says to look for zero or more occurrences.     REM This finds things such as "Table 5" and "Table 11"     oSearchDescr = oDoc.createSearchDescriptor()     With oSearchDescr       .SearchString = sFront & " [:digit:][:digit:]*"       .SearchRegularExpression = True     End With     oAllFound = oDoc.findAll(oSearchDescr)     If oAllFound.getCount() > 0 Then                'Require at least one       REM Note that when this is dimensioned that the dimension purposely       REM starts with 1 rather than 0. The first caption is therefore       REM stored at location 1.       ReDim v(1 To oAllFound.getCount())            'Set to correct dimension       For n = 0 to oAllFound.getCount()-1           'Look at each occurrence         oFound = oAllFound.getByIndex(n)            'Obtain the occurrence        If (oFound.ParaStyleName = sStyleType) Then  'Is this the correct style?          REM Because the style is the correct type, this is a caption and          REM not merely a reference to a caption. The found text is of the          REM form: "Table 11"; no extra text is included.          REM Split this text into words so that 12 is the second word.          REM i is tracking the number of captions found. If i is 1, then          REM the caption should also be 1.          i = i + 1                                 'Found another          x() = Split(oFound.String)                'Split at the space          v(i) = Cint(x(1))                         'Second word is the number          REM Check if the captions match. The most interesting          REM caption is the first one that does not match.          If FirstOutOfSync = 0 AND v(i) <> i Then FirstOutOfSync = i       End If     Next     REM Change the dimension of the array yet again, but this time     REM make it smaller to include only the used elements.     ReDim Preserve v(1 To i)   End If   If FirstOutOfSync = 0 Then     Print "Nothing is out of sync"     Exit Sub   End If   Dim OldFigureNum%   Dim NewFigureNum%   Dim s$   REM Look at all found occurrences, this time putting the instances   REM back in sync. There are difficulties if the same caption number is   REM used twice. Out of order, however, is not a problem.   For n = 0 to oAllFound.getCount()-1     oFound = oAllFound.getByIndex(n)     REM Split found text again. Remember that the second word is the number.     REM If the caption number is before the first one that is out of sync     REM then nothing needs to be done.     x() = Split(oFound.String)     OldFigureNum% = CInt(x(l))     If OldFigureNum% >= FirstOutOfSync Then       REM This one is probably out of sync. The index of the array       REM indicates what the caption should be. Find out where this       REM particular number exists in the array.       NewFigureNum% = IndexInArray(v(), OldFigureNum%)       If NewFigureNum% <> OldFigureNum% Then         REM Track what was done so that it can be displayed to the user.         s$ = s$ & "Change " & OldFigureNum% & " To " & NewFigureNum% & CHR$(10)         REM This may change the length of the string.         REM Hopefully it will not cause mayhem and destruction.         oFound.String = sFront & " " & CStr(NewFigureNum%)       End If     End If   Next   MsgBox s$, 0, "Changes" End Sub Function IndexInArray(v(), x) As Integer   IndexInArray = -1   Dim i%   For i% = LBound(v()) To UBound(v())     If v(i%) = x Then       IndexInArray = i%       Exit For     End If   Next End Function

Searching and Replacing

You can perform simple searching and replacing by searching and manually replacing each found occurrence with the replacement text. OOo also defines the XReplaceable interface, which adds the ability to replace all occurrences using one method. You must use an XReplaceDescriptor rather than an XSearchDescriptor, however. Replacing all occurrences of text is very simple (see Listing 34 ).

Listing 34: Replace "hello you" with "hello me".

 oDescriptor = oDoc.createReplaceDescriptor() With oDescriptor   .SearchString = "hello you"   .ReplaceString = "hello me" End With oDoc.ReplaceAll(oDescriptor)

Note	The XReplaceable interface is derived from the XSearchable interface, and the XReplaceDescriptor is derived from the XSearchDescriptor interface.

Advanced Search and Replace

While using the OOo GUI for searching and replacing, it is possible to search for and replace attributes as well as text. An inspection of the search-and-replace descriptors reveals the object methods setSearchAttributes() and setReplaccAttributes(). I discovered how to use these object methods when I found some code written by Alex Savitsky, whom I do not know, and Laurent Godard, whom I do.

The macro in Listing 35 finds all text that is in bold type, converts the text to regular type, and then surrounds the text with two sets of curly brackets. The conversion of attributes to text tags is frequently done while converting formatted text to regular ASCII text with no special formatting capabilities. While reading Listing 35, look for the following interesting techniques:

To search for all text that is bold regardless of the content, you must use a regular expression. In OOo, the period matches any single character and the asterisk means "find zero or more occurrences of the previous character." Placed together, the regular expression ".*" matches any text. Regular expressions are required to find "any text" that is bold.
While searching regular expressions, the ampersand character is replaced by the found text. In Listing 35, the replacement text "{{ & }}" causes the found text "hello" to become "{{ hello }}".
Text that is set to bold using an applied style is found only while searching character attributes if SearchStyles is set to True. If the SearchStyles attribute is set to False, only text that has been directly set to bold will be found.
To search for text with specific attributes, create an array of structures of type PropertyValue. There should be one entry in the array for each attribute you want to search. Set the property name to the name of the attribute to search, and the property value to the value for which to search. Although this is complicated to describe using words, it is clearly shown in Listing 35.
You can set attribute values by specifying the replacement attributes in the same way that you set the search attributes.

Listing 35: ReplaceFormatting is found in the Writer module in this chapter's source code files as SC13.sxw.

 Sub ReplaceFormatting   REM original code : Alex Savitsky   REM modified by : Laurent Godard   REM modified by : Andrew Pitonyak   REM The purpose of this macro is to surround all BOLD elements with {{ }}   REM and change the Bold attribute to NORMAL by using a regular expression.   Dim oReplace   Dim SrchAttributes(0) as new com.sun.star.beans.PropertyValue   Dim ReplAttributes(0) as new com.sun.star.beans.PropertyValue   oReplace = ThisComponent.createReplaceDescriptor()   oReplace.SearchString = ".*"           'Regular expression. Match any text   oReplace.ReplaceString = "{{ & }}"     'Note the & places the found text back   oReplace.SearchRegularExpression=True  'Use regular expressions   oReplace.searchStyles=True             'We want to search styles   oReplace.searchAll=True                'Do the entire document   REM This is the attribute to find   SrchAttributes(0).Name = "CharWeight"   SrchAttributes(0).Value =com.sun.star.awt.FontWeight.BOLD   REM This is the attribute to replace it with   ReplAttributes(0).Name = "CharWeight"   ReplAttributes(0).Value =com.sun.star.awt.FontWeight.NORMAL   REM Set the attributes in the replace descriptor   oReplace.SetSearchAttributes(SrchAttributes())   oReplace.SetReplaceAttributes(ReplAttributes())   REM Now do the work!   ThisComponent.replaceAll(oReplace) End Sub

Note	Regular expressions don't work in OOo prior to version 1.1.

Table 16 lists the regular expressions that are supported as of OOo version 1.1.1.

Table 16: Supported regular expression characters.
Character	Description
.	A period represents any single character. The search term "sh.rt" finds both "shirt" and "short".
*	An asterisk represents any number of characters. The search term "sh*rt" finds "shrt", "shirt", "shiirt", "shioibaldawpclasdfa asdf asdfrt" and "short"-to name a few things that it can find.
^	A caret represents the beginning of a paragraph. The search term "^Bob" only finds the word "Bob" if it is at the beginning of a paragraph. The search term "^." finds the first character in a paragraph.
$	A dollar sign represents the end of a paragraph. The search term "Bob$" only finds the word "Bob" if it is at the end of a paragraph.
^$	Search for an empty paragraph. This is listed here only because it is used so frequently.
+	A plus sign indicates that the preceding character must appear at least once. The plus sign also works with the wildcard character ".". For example, "t.+s" finds a section of text that starts with a 'T' and ends with an "s". The longest possible text within the paragraph is always found. In other words, multiple words may be found, but the found text will always reside in the same paragraph.
?	A question mark marks the previous character as optional. For example, you could find words that include the characters that come before the character that is in front of the "\". For example, "birds?" finds both "bird" and "birds".
\n	The text "\n" has two uses. When searching, this finds a hard row break inserted with Shift+Enter. In the replace field, this represents a paragraph break. You can, therefore, replace all hard breaks with a paragraph break.
\t	The text "\t" is used to find a tab. In the replace field, this adds a tab.
\>	Using the text "\>" indicates that the preceding text must end a word. For example, "book\>" finds " checkbook " but not "bookmark".
\<	Using the text "\<" indicates that the following text must end a word. For example, "\<book" finds "bookmark" but not "checkbook".
\xXXXX	A backslash followed by a lowercase x followed by a four-digit hexadecimal number (XXXX) finds the character whose Unicode (ASCII) value is the same as the four-digit hexadecimal number.
\	The backslash character followed by anything other than "n", "t", ">", "<", or "x" is used to specify the character that follows . For example, "\M" finds "M". The primary purpose is to allow special wild characters to be found. For example, assume that I wanted to find any character preceded by a "+". Well, the "+" is a special character so I need to precede it with a "\". Use ".\+" to find any character preceding a "+" character.
&	The ampersand is used in the replace text to add the found characters. In Listing 35, the ampersand is used to surround all bold text with "{{" and "}}".
[abc123]	Match any character that is between square brackets. For example, "t[ex]+t" finds "text", "teet", and "txeeet"; to name a few examples of what it finds.
[a-e]	The minus sign is used to define a range when used inside of square brackets. For example, "[a-e]" matches characters between "a" and "e" and "[a-ex-z]" matches characters between "a" and "e" or "x" and "z".
[^a-e]	Placing a caret symbol inside square brackets will find anything but the specified characters. For example, "[^a-e]" finds any character that is not between "a" and "e".
	Placing a vertical bar between two search strings will match what is before and also match what is after. For example, "bobjean" matches the string "bob" and also matches the string "jean".
{2}	Placing a number between curly brackets finds that many occurrences of the previous character. For example, "me{2}t" matches "meet", and "[0-9]{3}" matches any three-digit number. Note that "[0-9]{3}" will also find the first three digits of a number with more than three digits, unless "find whole words" is also specified.
{1,2}	Placing two numbers separated by a comma between curly brackets finds the preceding character a variable number of times. For example, "[0-9]{1 ,4}" finds any number that contains between one and four digits.
( )	Text placed within parentheses is treated as a reference. The text "\1 " finds the first reference, "\2" finds the second reference, and so on. For example, "([0-9]{3})-[0-9]{2}-\1" finds "123-45-123" but not "123-45-678". Parentheses can also be used for grouping. For example, "(hesheme)$" finds any paragraph that ends with "he", "she", or "me".
[:digit:]	Finds a single-digit number. For example, "[:digit:]?" finds a single-digit number and "[:digit:]+" finds any number with one or more digits.
[:space:]	Finds any white space, such as spaces and tabs.
[:print:]	Finds any printable characters.
[:cntrl:]	Finds any non-printing characters.
[:alnum:]	Finds any alphanumeric characters (numbers and text characters).
[:alpha:]	Finds any alphabetic characters, both uppercase and lowercase.
[:lower:]	Finds any lowercase characters if "Match case" is selected in the Options area.
[:upper:]	Finds any uppercase characters if "Match case" is selected in the Options area.