The search process is directed by a search descriptor, which is able to search only the object that created it. In other words, you cannot use the same search descriptor to search multiple documents. The search descriptor specifies the search text and how the text is searched (see Table 14 ). The search descriptor is the most complicated component of searching.
Property | Description |
---|---|
SearchBackwards | If True, search the document backwards . |
SearchCaseSensitive | If True, the case of the letters affects the search. |
SearchWords | If True, only complete words are found. |
SearchRegularExpression | If True, the search string is treated as a regular expression. |
SearchStyles | If True, text is found based on applied style names -not on the text content. |
SearchSimilarity | If True, a "similarity search" is performed. |
SearchSimilarityRelax | If True, the properties SearchSimilarityRelax, SearchSimilarityRemove, SearchSimilarityAdd, and SearchSimilarityExchange are all used. |
SearchSimilarityRemove | Short Integer specifying how many characters may be ignored in a match. |
SearchSimilarityAdd | Short Integer specifying how many characters may be added in a match. |
SearchSimilarityExchange | Short Integer specifying how many characters may be replaced in a match. |
Although not included in Table 14, a search descriptor supports the string property SearchString, which represents the string to search. The XSearchDescriptor interface defines the methods getSearchString() and setSearchString() to get and set the property if you prefer to use a method rather than directly setting the property. The XSearchable interface defines the methods used for searching and creating the search descriptor (see Table 15 ).
Method | Description |
---|---|
createSearchDescriptor() | Create a new SearchDescriptor. |
findAll(XSearchDescriptor) | Return an XindexAccess containing all occurrences. |
find First(XSearchDescriptor) | Starting from the beginning of the searchable object, return a text range containing the first found text. |
findNext(XText Range, XSearchDescriptor) | Starting from the provided text range, return a text range containing the first found text. |
The macro in Listing 30 is very simple; it sets the CharWeight character property of all occurrences of the text "hello" to com.sun.start.awt.FontWeight.BOLD-text ranges support character and paragraph properties.
Sub SimpleSearchExample Dim oDescriptor 'The search descriptor Dim oFound 'The found range oDescriptor = ThisComponent.createSearchDescriptor() With oDescriptor .SearchString = "hello" .SearchWords = true 'The attributes default to False .SearchCaseSensitive = False 'So setting one to False is redundant End With ' Find the first one oFound = ThisComponent.findFirst(oDescriptor) Do While Not IsNull(oFound) Print oFound.getString() oFound.CharWeight = com.sun.star.awt.FontWeight.BOLD oFound = ThisComponent.findNext(oFound.End, oDescriptor) Loop End Sub
The trick to searching a specified range of text is to notice that you can use any text range, including a text cursor, in the findNext routine. After each call to findNext(), check the end points of the find to see if the search went too far. You may, therefore, constrain a search to any text range. The primary purpose of the findFirst method is to obtain the initial text range for the findNext routine. You can use the selected text framework very easily to search a range of text (see Listing 31 ).
Sub SearchSelectedWorker(oLCursor, oRCursor, oText, oDescriptor) If oText.compareRegionEnds(oLCursor, oRCursor) <= 0 Then Exit Sub oLCursor.goRight(0, False) Dim oFound REM There is no reason to perform a findFirst. oFound = oDoc.findNext(oLCursor, oDescriptor) Do While Not IsNull(oFound) REM See if we searched past the end If -1 = oText.compareRegionEnds(oFound, oRCursor) Then Exit Do Print oFound.getString() oFound = ThisComponent.findNext(oFound.End, oDescriptor) Loop End Sub
The text object cannot compare two regions unless they both belong to that text object. Text that resides in a different frame, section, or even a text table, uses a different text object than the main document text object. As an exercise, investigate what happens if the found text is in a different text object than the text object that contains oRCursor in Listing 31. Is the code in Listing 31 robust?
It is significantly faster to search for all occurrences of the text at one time using the findAll() object method than to repeatedly call findNext(). Care must be used, however, when using all occurrences of the specified text. The macro in Listing 32 is an extreme example of code gone bad on purpose.
Sub SimpleSearchAllExample Dim oDescriptor 'The search descriptor Dim oFound 'The found range Dim oFoundAll 'The found range Dim n% 'General index variable oDescriptor = ThisComponent.createSearchDescriptor() oDescriptor.SearchString = "helloxyzzy" oFoundAll = ThisComponent.findAll(oDescriptor) For n% = 0 to oFoundAll.getCount()-1 oFound = oFoundAll.getByIndex(n%) Print oFound.getString() oFound.setString("hello" & n%) Next End Sub
The macro in Listing 32 obtains a list of text ranges that encapsulate the text "helloxyzzy". This text is then replaced with a shorter piece of text. In a perfect world, the occurrences of "helloxyzzy" would be replaced with "hello0", "hellol", "hello2", ... When each instance is replaced with the shorter text, the total length of the document changes. Remember that the text-range objects were all obtained before the first instance of text is modified. Although the text-range interface is clearly defined, the internal workings are not. I purposely created this example because I expected it to fail, and with no defined behavior my only option was to create a test. Experimentally, I observed that if multiple occurrences of "helloxyzzy" are contained in the same word, poor behavior results. I also observed that if all occurrences of "helloxyzzy" are contained in separate words, everything works great, and only occurrences of "helloxyzzy" are changed-leaving the surrounding text intact. I can but nod my head in approval at the brilliance of the programmers that allow this behavior, while remaining cautiously paranoid , expecting that code relying on this behavior will fail in the future.
Note | After observing the behavior produced by the macro in Listing 32, I expect similar behavior from text cursors as well. |
Investigating the behavior of the macro in Listing 32 is more than simple academics . I use a similar macro regularly in my own writing. For various reasons, the numbering used during the creation of this book was done manually rather than using the built-in numbering capabilities of OOo-in other words, I should have read Chapter 5 of Taming OpenOffice.org Writer 1.1 by Jean Hollis Weber. Manually inserting numbering is a problem when a table, figure, or code listing is deleted, inserted, or moved. In other words, if I delete the first table in my document, all of my tables will be numbered incorrectly. I created a macro that verifies that the items are sequentially numbered starting with 1. Consider tables, for example. The table captions use the paragraph style "_table caption." Tables are numbered using the form "Table 1." The macro in Listing 33 verifies that the captions are sequentially numbered, and it renumbers them if required. When the text "Table #" is found in a paragraph using the "_table caption" paragraph style, it is assumed to be identifying a table. The macro in Listing 33 renumbers these based on their occurrence in the document. Each occurrence of "Table #" that is not in the "_table caption" paragraph style is assumed to be a reference to a table.
Sub Reorder_Captions(Optional oDoc) REM If the first argument is missing, then use the current document. If IsMissing(oDoc) Then oDoc = ThisComponent Dim sFront As String 'Figure, Listing, or Table Dim sStyleType As String 'Style to search. Dim oSearchDescr 'Search descriptor used for searching Dim oAllFound 'All found occurrences Dim oFound 'One single occurrence Dim v() 'Contains the caption numbers Dim x() 'Work array for splitting words Dim n As Integer 'General index variable Dim i As Integer 'General index variable Dim FirstOutOfSync As Integer REM First, determine the type to search. sFront = InputBox(_ "Reorder which figure type (Figure, Listing, Table, etc...):",_ "Reorder Type",_ "Listing") If IsNull(sFront) Then Exit Sub REM Set styles to "_listing caption", "_figure caption", "_table caption" sFront = Trim(sFront) sStyleType = "_" & LCase(sFront) & " caption" REM Search the document using a regular expression. REM [:digit:] searches for a digit. Placing the "*" after digit REM says to look for zero or more occurrences. REM This finds things such as "Table 5" and "Table 11" oSearchDescr = oDoc.createSearchDescriptor() With oSearchDescr .SearchString = sFront & " [:digit:][:digit:]*" .SearchRegularExpression = True End With oAllFound = oDoc.findAll(oSearchDescr) If oAllFound.getCount() > 0 Then 'Require at least one REM Note that when this is dimensioned that the dimension purposely REM starts with 1 rather than 0. The first caption is therefore REM stored at location 1. ReDim v(1 To oAllFound.getCount()) 'Set to correct dimension For n = 0 to oAllFound.getCount()-1 'Look at each occurrence oFound = oAllFound.getByIndex(n) 'Obtain the occurrence If (oFound.ParaStyleName = sStyleType) Then 'Is this the correct style? REM Because the style is the correct type, this is a caption and REM not merely a reference to a caption. The found text is of the REM form: "Table 11"; no extra text is included. REM Split this text into words so that 12 is the second word. REM i is tracking the number of captions found. If i is 1, then REM the caption should also be 1. i = i + 1 'Found another x() = Split(oFound.String) 'Split at the space v(i) = Cint(x(1)) 'Second word is the number REM Check if the captions match. The most interesting REM caption is the first one that does not match. If FirstOutOfSync = 0 AND v(i) <> i Then FirstOutOfSync = i End If Next REM Change the dimension of the array yet again, but this time REM make it smaller to include only the used elements. ReDim Preserve v(1 To i) End If If FirstOutOfSync = 0 Then Print "Nothing is out of sync" Exit Sub End If Dim OldFigureNum% Dim NewFigureNum% Dim s$ REM Look at all found occurrences, this time putting the instances REM back in sync. There are difficulties if the same caption number is REM used twice. Out of order, however, is not a problem. For n = 0 to oAllFound.getCount()-1 oFound = oAllFound.getByIndex(n) REM Split found text again. Remember that the second word is the number. REM If the caption number is before the first one that is out of sync REM then nothing needs to be done. x() = Split(oFound.String) OldFigureNum% = CInt(x(l)) If OldFigureNum% >= FirstOutOfSync Then REM This one is probably out of sync. The index of the array REM indicates what the caption should be. Find out where this REM particular number exists in the array. NewFigureNum% = IndexInArray(v(), OldFigureNum%) If NewFigureNum% <> OldFigureNum% Then REM Track what was done so that it can be displayed to the user. s$ = s$ & "Change " & OldFigureNum% & " To " & NewFigureNum% & CHR$(10) REM This may change the length of the string. REM Hopefully it will not cause mayhem and destruction. oFound.String = sFront & " " & CStr(NewFigureNum%) End If End If Next MsgBox s$, 0, "Changes" End Sub Function IndexInArray(v(), x) As Integer IndexInArray = -1 Dim i% For i% = LBound(v()) To UBound(v()) If v(i%) = x Then IndexInArray = i% Exit For End If Next End Function
You can perform simple searching and replacing by searching and manually replacing each found occurrence with the replacement text. OOo also defines the XReplaceable interface, which adds the ability to replace all occurrences using one method. You must use an XReplaceDescriptor rather than an XSearchDescriptor, however. Replacing all occurrences of text is very simple (see Listing 34 ).
oDescriptor = oDoc.createReplaceDescriptor() With oDescriptor .SearchString = "hello you" .ReplaceString = "hello me" End With oDoc.ReplaceAll(oDescriptor)
Note | The XReplaceable interface is derived from the XSearchable interface, and the XReplaceDescriptor is derived from the XSearchDescriptor interface. |
While using the OOo GUI for searching and replacing, it is possible to search for and replace attributes as well as text. An inspection of the search-and-replace descriptors reveals the object methods setSearchAttributes() and setReplaccAttributes(). I discovered how to use these object methods when I found some code written by Alex Savitsky, whom I do not know, and Laurent Godard, whom I do.
The macro in Listing 35 finds all text that is in bold type, converts the text to regular type, and then surrounds the text with two sets of curly brackets. The conversion of attributes to text tags is frequently done while converting formatted text to regular ASCII text with no special formatting capabilities. While reading Listing 35, look for the following interesting techniques:
To search for all text that is bold regardless of the content, you must use a regular expression. In OOo, the period matches any single character and the asterisk means "find zero or more occurrences of the previous character." Placed together, the regular expression ".*" matches any text. Regular expressions are required to find "any text" that is bold.
While searching regular expressions, the ampersand character is replaced by the found text. In Listing 35, the replacement text "{{ & }}" causes the found text "hello" to become "{{ hello }}".
Text that is set to bold using an applied style is found only while searching character attributes if SearchStyles is set to True. If the SearchStyles attribute is set to False, only text that has been directly set to bold will be found.
To search for text with specific attributes, create an array of structures of type PropertyValue. There should be one entry in the array for each attribute you want to search. Set the property name to the name of the attribute to search, and the property value to the value for which to search. Although this is complicated to describe using words, it is clearly shown in Listing 35.
You can set attribute values by specifying the replacement attributes in the same way that you set the search attributes.
Sub ReplaceFormatting REM original code : Alex Savitsky REM modified by : Laurent Godard REM modified by : Andrew Pitonyak REM The purpose of this macro is to surround all BOLD elements with {{ }} REM and change the Bold attribute to NORMAL by using a regular expression. Dim oReplace Dim SrchAttributes(0) as new com.sun.star.beans.PropertyValue Dim ReplAttributes(0) as new com.sun.star.beans.PropertyValue oReplace = ThisComponent.createReplaceDescriptor() oReplace.SearchString = ".*" 'Regular expression. Match any text oReplace.ReplaceString = "{{ & }}" 'Note the & places the found text back oReplace.SearchRegularExpression=True 'Use regular expressions oReplace.searchStyles=True 'We want to search styles oReplace.searchAll=True 'Do the entire document REM This is the attribute to find SrchAttributes(0).Name = "CharWeight" SrchAttributes(0).Value =com.sun.star.awt.FontWeight.BOLD REM This is the attribute to replace it with ReplAttributes(0).Name = "CharWeight" ReplAttributes(0).Value =com.sun.star.awt.FontWeight.NORMAL REM Set the attributes in the replace descriptor oReplace.SetSearchAttributes(SrchAttributes()) oReplace.SetReplaceAttributes(ReplAttributes()) REM Now do the work! ThisComponent.replaceAll(oReplace) End Sub
Note | Regular expressions don't work in OOo prior to version 1.1. |
Table 16 lists the regular expressions that are supported as of OOo version 1.1.1.
Character | Description |
---|---|
. | A period represents any single character. The search term "sh.rt" finds both "shirt" and "short". |
* | An asterisk represents any number of characters. The search term "sh*rt" finds "shrt", "shirt", "shiirt", "shioibaldawpclasdfa asdf asdfrt" and "short"-to name a few things that it can find. |
^ | A caret represents the beginning of a paragraph. The search term "^Bob" only finds the word "Bob" if it is at the beginning of a paragraph. The search term "^." finds the first character in a paragraph. |
$ | A dollar sign represents the end of a paragraph. The search term "Bob$" only finds the word "Bob" if it is at the end of a paragraph. |
^$ | Search for an empty paragraph. This is listed here only because it is used so frequently. |
+ | A plus sign indicates that the preceding character must appear at least once. The plus sign also works with the wildcard character ".". For example, "t.+s" finds a section of text that starts with a 'T' and ends with an "s". The longest possible text within the paragraph is always found. In other words, multiple words may be found, but the found text will always reside in the same paragraph. |
? | A question mark marks the previous character as optional. For example, you could find words that include the characters that come before the character that is in front of the "\". For example, "birds?" finds both "bird" and "birds". |
\n | The text "\n" has two uses. When searching, this finds a hard row break inserted with Shift+Enter. In the replace field, this represents a paragraph break. You can, therefore, replace all hard breaks with a paragraph break. |
\t | The text "\t" is used to find a tab. In the replace field, this adds a tab. |
\> | Using the text "\>" indicates that the preceding text must end a word. For example, "book\>" finds " checkbook " but not "bookmark". |
\< | Using the text "\<" indicates that the following text must end a word. For example, "\<book" finds "bookmark" but not "checkbook". |
\xXXXX | A backslash followed by a lowercase x followed by a four-digit hexadecimal number (XXXX) finds the character whose Unicode (ASCII) value is the same as the four-digit hexadecimal number. |
\ | The backslash character followed by anything other than "n", "t", ">", "<", or "x" is used to specify the character that follows . For example, "\M" finds "M". The primary purpose is to allow special wild characters to be found. For example, assume that I wanted to find any character preceded by a "+". Well, the "+" is a special character so I need to precede it with a "\". Use ".\+" to find any character preceding a "+" character. |
& | The ampersand is used in the replace text to add the found characters. In Listing 35, the ampersand is used to surround all bold text with "{{" and "}}". |
[abc123] | Match any character that is between square brackets. For example, "t[ex]+t" finds "text", "teet", and "txeeet"; to name a few examples of what it finds. |
[a-e] | The minus sign is used to define a range when used inside of square brackets. For example, "[a-e]" matches characters between "a" and "e" and "[a-ex-z]" matches characters between "a" and "e" or "x" and "z". |
[^a-e] | Placing a caret symbol inside square brackets will find anything but the specified characters. For example, "[^a-e]" finds any character that is not between "a" and "e". |
| Placing a vertical bar between two search strings will match what is before and also match what is after. For example, "bobjean" matches the string "bob" and also matches the string "jean". |
{2} | Placing a number between curly brackets finds that many occurrences of the previous character. For example, "me{2}t" matches "meet", and "[0-9]{3}" matches any three-digit number. Note that "[0-9]{3}" will also find the first three digits of a number with more than three digits, unless "find whole words" is also specified. |
{1,2} | Placing two numbers separated by a comma between curly brackets finds the preceding character a variable number of times. For example, "[0-9]{1 ,4}" finds any number that contains between one and four digits. |
( ) | Text placed within parentheses is treated as a reference. The text "\1 " finds the first reference, "\2" finds the second reference, and so on. For example, "([0-9]{3})-[0-9]{2}-\1" finds "123-45-123" but not "123-45-678". Parentheses can also be used for grouping. For example, "(hesheme)$" finds any paragraph that ends with "he", "she", or "me". |
[:digit:] | Finds a single-digit number. For example, "[:digit:]?" finds a single-digit number and "[:digit:]+" finds any number with one or more digits. |
[:space:] | Finds any white space, such as spaces and tabs. |
[:print:] | Finds any printable characters. |
[:cntrl:] | Finds any non-printing characters. |
[:alnum:] | Finds any alphanumeric characters (numbers and text characters). |
[:alpha:] | Finds any alphabetic characters, both uppercase and lowercase. |
[:lower:] | Finds any lowercase characters if "Match case" is selected in the Options area. |
[:upper:] | Finds any uppercase characters if "Match case" is selected in the Options area. |