Editing and Managing Documents Through XML


Under each of the headings within this section, you’ll find a different exercise for editing the document parts in the XML package. These exercises include editing text, formatting, and style definitions; replacing a picture; and removing comments from a document.

Note 

The purpose of these exercises is to familiarize you with the structure and rules of a ZIP package and how to work with the XML syntax in the document parts to manage and troubleshoot documents. Some of the specific tasks in these exercises are not tasks that you would likely use the ZIP package for on a daily basis when working with individual documents (such as editing text or deleting comments from a document), because doing this in Word is faster and easier than doing it in the ZIP package for just one file.

However, another benefit of being able to edit documents through the ZIP package is that developers can create automation to batch edit files without ever opening the source program. For example, a developer might create a program to remove comments from all files in a given folder. In that case, doing so through the ZIP package, without having to open the files in Word, greatly simplifies the automation.

Before You Begin Editing a Document Part

A couple of important points warrant noting before you begin editing Office Open XML Format ZIP packages.

  • Most document measurements that appear in a document part for an Excel file will appear as the measurement you enter. For example, a 16-point font appears as the number 16; a one-inch margin appears as the number 1. Though this might seem like stating the obvious, I mention this because point size measurements do not appear as set in the document parts of a PowerPoint or Word ZIP package.

    • In Word, point size measurement for font sizes is doubled in the document part. So, for example, a 12-point font appears as the number 24. All other point size measurements are multiplied by 20 (that is, they use a unit of measure known as a twip, which is 1/20th of a point or 1/1440th of an inch). So, 12-point spacing after a paragraph appears as 240 in the document.xml part (if it’s applied as direct formatting) or in the applicable custom style definition in styles.xml.

    • In PowerPoint, point size measurements are multiplied by 100. So, 12-point spacing after a paragraph or a 12-point font size applied to text would appear as the number 1200.

  • Built-in, default style definitions are not stored in the XML for a Word document. Only the definitions for user-defined styles as well as any built-in styles that have been customized are accessible in styles.xml. Learn more about styles.xml in the Style editing exercise that follows later in this section.

Editing Text and Formatting

You can edit any document content or formatting directly in the ZIP package. In this sample exercise, we’ll walk through editing text, adding text, changing the settings in a paragraph style, and adding direct formatting to specified text.

Note 

To try the exercises in this section, you can either create your own sample document to work with or open one provided on this book’s CD. To create your own sample file, create a new Word document containing one line of text, such as This is my sample text Then, create a custom paragraph style but don’t apply it to any text. To match the exercise in this section on editing styles, your custom style should include a 12-point Arial font, 12 points spacing after the paragraph, and the Theme Color Accent 1 for the font color. When you’ve finished this setup, save the file using the .docx format in a location where you can easily access it (such as the Windows desktop) and then close the file.

image from book To use the sample file provided, find the file image from book Text editing.docx in the sample files included on this book’s CD.

To edit the ZIP package, change the file extension for image from book Text editing.docx (or your own sample file) from .docx to .zip. (Remember that you can do this by appending the .zip file extension if you like, rather than replacing the .docx extension, to save a bit of time when you’re ready to change the file extension back to.docx.) Once you’ve opened the ZIP package, give the following exercises a try.

Edit Text and Settings in document.xml

To begin editing any document part, as mentioned at the start of this section, first copy it out of the ZIP package (for example, paste a copy on the Windows desktop). Do this with the document.xml file inside the word folder for your sample document, and then do the following.

  1. Open document.xml in Internet Explorer. If you’re using the sample document provided, the document content following the namespace definitions will look like the image that follows.

     - <w:body>   - <w:p>     - <w:r>        <w:t>This is my sample text.</w:t>       </w:r>     </w:p>   - <w:sectPr>      <w:pgSz w:w="12240" w:h="15840" />      <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720"       w:footer="720" w:gutter="0" />      <w:cols w:space="720" />      <w:docGrid w:linePitch="360" />     </w:sectPr>    </w:body>

    Notice the construction of the text. There is one paragraph of text, nested within the paired code <w:p></w:p>, followed by some document settings, including paper size and page margins. (If you’re using your own file for this exercise and you see additional codes in document.xml labeled rsidR and rsidPr, see the note at the end of “Reading a Markup Language” on page 1145 for information about those codes and why they appear.)

    Note 

    Most XML coding uses characters or abbreviations that are logical and easy to interpret by anyone who knows the program, such as <w:p> to refer to a Word paragraph, pgSz to refer to the size of the page, or pgMar to refer to page margins. In the comprehensive documentation for Office Open XML that you can learn more about in the last section of this chapter, you can find each and every one of these codes. However, you can see how, just using logic and what you know about the program, it’s very easy to decipher an XML document part without having to memorize the XML language details.

  2. Leave the document open in Internet Explorer, for easy reference, and open it in Notepad as well.

    You can save changes to the document in Notepad even while it’s open in Internet Explorer, and you can then refresh the Internet Explorer page to view your saved changes.

  3. Find the text This is my sample text. in Notepad, and then delete the word sample.

  4. Copy the codes for that entire paragraph, starting with <w:p> and ending with </w:p>. Then, paste what you’ve copied immediately after the existing </w:p> code.

    You’ve just added a second paragraph to your document. You can now change the text that appears between the <w:t> and </w:t> codes that denote the paragraph text. I chose to have that new paragraph read This is fun. I’m editing my Word document without opening Word.

  5. Change the left and right page margins to 0.75 inch each. Remember that you’ll need to calculate the values in twips for document.xml to understand the values you add. Because there are 72 points to an inch, three-quarters of an inch is 54 points. To convert that number to twips so that document.xml understands it, multiply the number by 20. (Or, since a twip is 1/1440th of an inch, multiply 1440 by 0.75.) So, you’ll enter 1080 as the left and the right margin values. Be sure to leave quotation marks and related codes intact when you change the numbers.

  6. Save document.xml and then close Notepad. In Internet Explorer, refresh the page and check your changes.

    Following is what the body portion of document.xml looks like in my file after these changes.

     - <w:body>   -<w:p>    - <w:r>       <w:t>This is my text.</w:t>      </w:r>    </w:p>   - <w:p>     - <w:r>        <w:t>This is fun. I'm editing my Word document without opening Word.</w:t>       </w:r>      </w:p>   - <w:sectPr>      <w:pgSz w:w="12240" w:h="15840" />      <w:pgMar w:top="1440" w:right="1080" w:bottom="1440" w:left="1080" w:header="720"        w:footer="720" w:gutter="0" />      <w:cols w:space="720" />      <w:docGrid w:linePitch="360" />    </w:sectPr>   </w:body> </w:document>

  7. When you’re happy with your changes, copy document.xml back into the ZIP package, overwriting the existing document.xml file.

  8. Open the document in Word.

    Because the next exercise also requires editing this ZIP package, save time by opening the .zip file in Word instead of changing the file extension back and forth. To do this, open Word and then press Ctrl+O for the Open file dialog box. Browse to the location of your ZIP package, change the file type list setting to All Files, and then select and open your ZIP package. As mentioned earlier, it will open like a regular Word document.

If you’re happy with the changes and additions to your text, and the changes to your page margins, continue to the next exercise.

Add Formatting to Text in document.xml

In this exercise, you’ll add some direct formatting to one paragraph in the sample document image from book Text editing.docx and then add direct formatting to just part of the second paragraph. To do this, if you still have the copy of document.xml that you edited in the last exercise, continue using that file. If not, copy document.xml out of the ZIP package again. Then, do the following.

  1. Open document.xml in both Internet Explorer (for reference) and Notepad (for editing).

  2. Add direct formatting of bold and italics to the first paragraph in the document. To do this, in Notepad, take the following steps.

    Place your insertion point immediately before the <w:t> code for the first paragraph of text and then type the following code.

     <w:rPr><w:b /><w:i /></w:rPr>

    As you might recognize from the first code sample in this chapter, you’ve just added bold and italic formatting to the first paragraph in the document.

  3. Save and close the file in Notepad. Then, return to the Internet Explorer window where document.xml is open, and refresh the page to check your changes. The paragraph you edited should look something like this:

     - <w:p>   - <w:r>     - <w:rPr>        <w:b />        <w:i />       </w:rPr>       <w:t>This is my text.</w:t>     </w:r>   </w:p>

  4. When you’re happy with your edits, copy document.xml back into the ZIP package, overwriting the version that exists. Then, in Word, press Ctrl+O for the Open file dialog box, and then select and open the ZIP package as you did in the previous exercise.

When you confirm that you completed the preceding steps correctly, the next exercise is to add direct formatting (a 14-point font in this example) to just part of the second paragraph in the document. Close the document when you’re ready to continue, and then take the following steps.

  1. Open document.xml in both Internet Explorer and Notepad. (Use the most recent copy of document.xml that you updated for the preceding exercise, if you still have it available. Otherwise, copy the file out of the ZIP package again.)

    If you’re using image from book Text editing.docx or a version of the same document that you created, the second paragraph contains the text This is fun. I’m editing my Word document without opening Word. The steps that follow will add a 14-point font size to just the first sentence in that paragraph.

  2. Since you’ll be adding formatting to just part of the paragraph, you first need to separate the parts of the paragraph that will have different formatting. To do this, take the following substeps.

    First, place your insertion point between the two sentences of the second paragraph. (If you’ve been following along with the preceding exercises, your insertion point will be after the period that follows the text This is fun.) Type the following code between those two sentences, and then type a space (so that a space separates the new codes and the sentence they precede).

     </w:t></w:r><w:r><w:t> The first two codes in the preceding structure end the text and the text of the run and then end the run of code. The next two codes begin a new run of code, followed by a new text string. The space you added after the four new codes is the space that will appear between the two sentences of text in the document.

    Inside the new <w:t> code (the last of the four codes you just typed), add a space after the letter t (and before the closing angle bracket), followed by xml: space="preserve”. That code should now look like the following.

     <w:t xml:space="preserve">

    This “preserve” statement tells the XML to preserve the space you added at the beginning of the second sentence, so that spacing between the separated sentences is retained. Following is the way the code for this paragraph should look at this point.

     - <w:p>   - <w:r>      <w:t>This is fun.</w:t>     </w:r>   - <w:r>      <w:t xml:space="preserve">I'm editing my Word document without opening        Word.</w:t>     </w:r>   </w:p>

  3. Now, it’s time to add the 14-point font formatting to just the first sentence in the paragraph. In Notepad, place your insertion point immediately before the <w:t> code that precedes the first sentence in the second paragraph (the one for which you intend to add the formatting). Then type the following.

     <w:rPr><w:sz w:val="28" /></w:rPr>

  4. In Notepad, save and close the document. Then, in Internet Explorer, refresh the page to check your code. Code for the paragraph being edited should now look like this:

     - <w:p>   - <w:r>     - <w:rPr>        <w:sz w:val="28" />       </w:rPr>       <w:t>This is fun.</w:t>      </w:r>   - <w:r>       <w:t xml:space="preserve">I'm editing my Word document without opening        Word.</w:t>     </w:r>    </w:p>

    Notice that you used 28 to represent 14-point font size, as discussed earlier in this section. The paired <w:rPr></w:rPr> codes that you also used in the preceding exercise are the codes inside which you store any unique formatting for the specified text. As with the preceding exercise, you could add additional lines of code to represent other font formatting after <w:sz w:val="28" />, such as the bold or italic codes you used earlier.

    When you’re happy with the code you see in Internet Explorer, copy document.xml back into the ZIP package and once again, from Word, open the ZIP package file to check your results. If your results are successful, close the file and continue on to the next exercise.

Edit Custom Styles in styles.xml

This is the last exercise using the file image from book Text editing.docx. In this exercise, you will edit a custom style that’s saved in this document. To begin, copy the file styles.xml out of the word folder in the ZIP package for the sample file.

Open the file styles.xml in Internet Explorer. This is a long file that contains a list of every built-in style available to the document, other than those that are used by built-in features not used in the active document. The list includes the style visibility and priority settings that correspond to the settings on the Recommend tab of the Manage Styles dialog box. For example, the styles Header and Footer are used by the document header and footer. If you’ve not yet accessed the header and footer layer in the document, those styles won’t appear in this styles.xml list.

However, only definitions of your own custom (user-defined) styles, or any built-in styles that have been customized, appear in styles.xml at all. The definitions of any builtin styles that are not customized do not appear anywhere in the ZIP package, because Word “knows” these settings, so it doesn’t need to record them in the file’s XML. If you see exceptions to this, they are most likely for styles new to Word 2007, because Word records these style definitions when the document is opened in an earlier version.

To begin this exercise, scroll to the bottom of styles.xml in Internet Explorer, where you’ll see the custom style named MyStyle, which I created in the sample document. If you’re using a similar sample document instead that you created by following the instructions at the start of this section, look for your own custom style name at the end of styles.xml. The code for the definition of MyStyle looks like this:

 - <w:style w:type="paragraph" w:customStyle="1" w:stylebold">MyStyle">     <w:name w:val="MyStyle" />     <w:basedOn w:val="Normal" />     <w:qFormat />   - <w:pPr>       <w:spacing w:after="240" />     </w:pPr>   - <w:rPr>       <w:rFonts w:ascii="Arial" w:hAnsi="Arial" />       <w:color w:val="4F81BD" w:themeColor="accent1" />       <w:sz w:val="24" />     </w:rPr>   </w:style>

Now, open styles.xml in Notepad. The following steps walk you through changing the paragraph spacing and the font included in this custom style.

  1. Scroll to the bottom of the document and find the definition for the custom style that you just reviewed in Internet Explorer.

  2. Notice that the paragraph spacing is set to 12 points after (which is written as the value 240 in the XML). Change this setting to 6 points after the paragraph by replacing 240 with 120. Then, add 6 points before the paragraph as well, by adding the following code immediately before or after the spacing after code.

     <w:spacing w:before="120"/>

  3. Now, change the font. Notice that the font is listed twice, once in a code specifying ANSI text and one in a code specifying ASCII text. This is because Word styles can carry a separate font for those languages that don’t use Latin text. To avoid having a separate setting added to your style (unless you deliberately want one), change the font in both of those settings from Arial to Times New Roman. Be sure to leave the quotation marks and all other code syntax intact. (Note that, if you have multiple editing languages enabled, you may see additional font definitions in your documents as well.) Then, save and close the file.

  4. If you’d like to refresh the page in Internet Explorer to check your work, do so. Then, copy styks.xml back into the ZIP package, overwriting the existing version of the same file.

  5. You’re now done editing the ZIP package, so you can change the file extension back to .docx, and then double-click to open the file in Word. Check to confirm that the changes you made to the style look correct, either by checking the style definition in the Styles pane or by applying the style to an existing paragraph.

Congratulations! Now that you can edit text and formatting for your documents through the ZIP package, try two other exercises that follow for editing document content through the XML. These exercises include programmatically changing a picture in a Word document and removing the comments from a Word document.

Editing Pictures

Note 

image from book For the exercises titled “Editing Pictures” and “Removing a Document Part,” use the files image from book Content edit.docx and image from book Fearless logo.png that you can find in the sample files included on this book’s CD. The second of these is a logo image that you’ll use in the first of the two exercises.

If you prefer to create your own sample file, create a file that uses one picture in two places (such as in the body of the page and in a header) and add two comments to the document, using the Comments feature on the Review tab. For the picture, use a picture saved in the .png file format.

The sample file image from book Content edit.docx is a simple starter document containing a few lines of text on two pages, as well as a placeholder picture for a logo on the front page and in the header of the second page (which is a new section). The file also includes two comments that will be used in the exercise that follows.

For this exercise, you’ll use the ZIP package to replace the placeholder logo file. First, however, open the file image from book Content edit.docx in Word, so that you can take note of the picture (the logo placeholder) that appears in both the second section header and in the body text of the first page. Notice that the picture has different sizing in the two positions and that it has a picture style applied where it appears on the first page of the document. Once you’ve noted this, close the file and then, in a Windows Explorer (or on the Windows desktop if that is where you’ve placed this file), change the file extension to .zip.

  1. Open the ZIP package, and then open the word folder. Notice that this folder contains a media folder. The picture that appears in the document is stored in this folder.

  2. Open the media folder. Notice that the picture is actually saved as a picture file. If you’re using the sample files provided, the file name in the media folder is image1.png, as you see in the following image.

    image from book

    In fact, any pictures pasted or inserted into your document are saved as complete picture files in the media folder within the ZIP package, which is one of my favorite timesavers for using the ZIP packages in daily document production work. If you need to use a picture from an Office Open XML document in another document or another location, and you don’t have the original picture file, simply copy the complete picture file out of the ZIP package and use or share it as needed.

  3. To replace the placeholder image with the logo image, just rename the file Fearless logo.png to image1.png. Then, copy it into the ZIP package, replacing the existing image1.png in that package.

  4. Open the file in Word.

    The placeholder logo has been replaced with the image from book Fearless logo.png image in both locations, and all picture size, placement, and other formatting remains intact.

Inside Out-Replace an Image File with a Different File Type 

In the “Editing Pictures” exercise, you replaced one .png file with another just by changing the file name and copying the new picture. But, what if you need to replace a .png picture with a .bmp or a .jpg file, or you don’t want to change the file name?

Changing the file name to match the existing image file allows you to replace the file without having to edit any relationships. However, you can replace the image file with another image of a different name and file type, as long as you edit the relationships and content types in the ZIP package accordingly. no

To do this, open the _rels folder located in the word folder. In the case of the sample file, there are .rels files for both the document and the header. Copy both of these out of the ZIP package and then open each in Notepad. Find the reference to the file image1.png and change it to the name of your new image file, such as image from book Northwind.tif. Then, copy the files back into the ZIP package, overwriting the existing files of the same names.

image from book Note that the file image from book Northwind.tif is available in the sample files on this book’s CD, if you’d like to try this for yourself.

Next, copy [Content_Types].xml out of the ZIP package. and open that file in Notepad. Notice that there is a Default Extension definition near the top of the file for the .png format. You can either copy this entire definition string to add one for the .tif format, or (if there are no other .png images in your document) replace the two references to png in that string with tif, so that the string looks like the this:

 <Default Extension="tif" ContentType="image/tif"/>

Save and close the file and copy it back into the ZIP package, overwriting the existing file of the same name. That’s it. Just change the file extension back to .docx and open the file in Word to see your new image. If you’ve used image from book Northwind.tif, the image will look exactly like image from book Fearless logo.png in the preceding exercise, because these files use the same logo.

You know that the exercise in this sidebar worked, however, because the file wouldn’t open and display the new image if you had not correctly revised the relationships and content type definitions.

Removing a Document Part

In this exercise, you’ll remove all comments from the ZIP package. Doing this requires the following changes.

  • Delete the comments.xml document part that resides in the word folder.

  • Delete the relationship for the comments document part.

  • Remove the comment placeholders from the document.xml file.

Take the following steps to do this.

  1. Open the ZIP package for the sample file used in the preceding image exercise.

  2. In the word folder, select and delete comments.xml

  3. In the word folder, open the _rels folder and copy the file document.xml.rels out of the ZIP package.

  4. Open document.xml.rels in Notepad and delete the relationship to the comments part. Be sure to delete the entire relationship and nothing else. Though the relationship ID might vary if you’re using a file other than the sample provided (indicated by the pound sign in the ID shown in the sample that follows), the content to delete should look like this.

     <Relationship  Type="http://schemas.openxmlformats.org/officeDocu ment/2006/relationshi ps/comments" Target="comments.xml"/>

  5. Save and close the file. Then, copy the file back into the ZIP package, overwriting the existing file of the same name.

  6. From the word folder, copy the file document.xml out of the ZIP package and then open that file in both Internet Explorer and Notepad.

  7. In Internet Explorer, look for two comment references. Each one should look very much like the following image.

     - <w:r>   - <w:rPr>      <w:rStyle w:val="CommentReference" />      <w:rFonts w:asciiTheme="minorHAnsi" w:eastAsiaTheme="minorHAnsi"        w:hAnsiTheme="minorHAnsi" w:cstheme="minorBidi" />      <w:color w:val="auto" />      <w:spacing w:val="0" />      <w:kern w:val="0" />     </w:rPr>     <w:commentReference w:bold">0" />   </w:r>

    Notice that the reference begins with the open code <w:r> (the reference to a run, or a portion of related content) and ends with the matching end code </w:r>. Most of the content between is similar to code you’ve worked with before. Notice that everything except the next-to-last line of this sample is information about formatting for the comment. The next-to-last line of the sample is the comment placeholder itself.

    Just above the second comment reference, you’ll see two lines of code that read <w:commentRangeStart and <w:commentRangeEnd, separated by a few other lines, as you see in the following image.

       <w:commentRangeStart w: /> - <w:r>     <w:lastRenderedPageBreak />     <w:t>My document text will start here.</w:t>   </w:r>   <w:commentRangeEnd w: />

    These appear here because the second comment in this document was inserted with text selected, rather than being inserted at a blinking insertion point.

  8. In Notepad, delete both complete comment references, including all lines of code shown in the first image in the preceding step, as well as the comment range start and end lines of code shown in the second image in the preceding step. Do not delete the lines between the comment range start and end statement, as those are part of the document text. Save and close the file when done, and then refresh the page in Internet Explorer to check your work.

  9. Copy document.xml back into the ZIP package and change the file extension back to .docx. Open the file and you should see no sign of the two comments that no previously existed.

Note 

The content types file in a document containing comments includes a content type definition for comments. However, it’s not necessary to delete a content type definition from this file when deleting a document part. As you noticed earlier in this chapter, when we first looked at the ZIP package content of a new, default Word file, several content types are included by default even if they’re not used in the document. It’s essential to remove the relationship for a deleted document part, but content types can remain without causing any problems.

Also note that, if you didn’t delete the comment placeholders from document.xml, the document still would have opened. The comment text would have been gone (you only need to delete the document part and its relationship to accomplish that), but the comment placeholders would remain.




2007 Microsoft Office System Inside Out
2007 MicrosoftВ® Office System Inside Out (Bpg-Inside Out)
ISBN: 0735623244
EAN: 2147483647
Year: 2007
Pages: 299

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net