Compressing Office Files That Contain Unicode Text


Because Unicode uses more bytes to store information, Microsoft Office 2003 Editions files may be larger when stored in Unicode than they would be if stored in earlier, non-Unicode versions of Office. However, Microsoft Office Word 2003 can automatically compress portions of files to reduce the size.

Office 2003 Editions store text in a form of Unicode called UTF-16, just as Office XP does. Unicode characters are encoded in two bytes (or very rarely, four bytes) rather than what is used in non-Unicode systems—for example, a single byte, or a mixture of one and two bytes in some Asian languages. Generally, Office 2003 Editions files with multilingual text are similar in size to Office 97, Office 2000, or Office XP files. However, Office 2003 Editions files may be 30 to 50 percent larger than files created in previous, non-Unicode versions of Office (Office 95 and earlier).

Note

If a file contains text from only English or Western European languages, there is little or no increase in file size because Office 2003 Editions applications can compress the text.

When Word 2003 users open and save an English or Western European file from a previous, non-Unicode version of Word (a version earlier than Office 2000), Word converts the contents to Unicode. The first time the file is saved, Word analyzes the file and notes regions that can be compressed, but the resulting file is temporarily twice the size of the original file. The next time the file is saved, Word performs the compression, and the file size returns to normal.

For Microsoft Office PowerPoint 2003 files, text is typically a small percentage of file size, so Unicode does not significantly increase file size.




Microsoft Office 2003 Resource Kit 2003
Microsoft Office 2003 Editions Resource Kit (Pro-Resource Kit)
ISBN: 0735618801
EAN: 2147483647
Year: 2004
Pages: 196

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net