Microsoft Office 2003 offers yet another translation opportunity. Office 2003 includes a feature called Research Services that enables you to perform various kinds of research from within an Office application. This enables you to perform the research without having to leave your Office application, but it also allows the result of the research to be pasted into your application in context. You can write your own Research Services in .NET and have Office use them just like one of its own. But this isn't what is of most interest to us with regard to machine translation. One of the built-in Research Services is the Translation service. To see it in action, start an Office 2003 application, such as Word, and select Tools, Research.... A new pane called "Research" is added on the right side of the window. Drop down the combo box (which initially says "All Reference Books") and select Translation. In the "Search for:" text box, enter a phrase to translate. In the two combo boxes below, enter the "From" language and the "To" language. Click the green arrow next to the "Search for:" text box; the text then is translated (see Figure 9.8). Figure 9.8. Using the Microsoft Office 2003 Research Pane to Perform Translations (Translation Services Provided by WorldLingo.)
It is this translation facility that you can harness for your own automatic translation. If you search on MSDN (http://msdn.microsoft.com) or the Microsoft Office Web site (http://office.microsoft.com), you will find a fair amount of information on creating your own research services and integrating them into Office. However, you won't find any information on how to consume research services. The reason behind this is that Microsoft expects that the only consumer of Office 2003 Research Services is Office 2003. However, all that you need to know is in this section. You might find it useful to download the Microsoft Office 2003 Research Services SDK, which contains some background information on the subject. Most Office 2003 Research Services are simply Web services. As such, given the Web service's URL, its WSDL, and the format of its messages, we can use the Web service just like any other Web service. Figure 9.9 shows the list of Research Services that are contained within the Registry at HKEY_CURRENT_USER\Software\ Microsoft\Office\11.0\Common\Research\Sources. Figure 9.9. Microsoft Office 2003 Research Services Registry EntriesResearch Services is an "install on demand" option, so you will need to use the translation facility once before the Registry is populated. A "source" is a provider of research information. Microsoft Office Online Services (shown in Figure 9.9) is an example of one such provider. From this entry, you can see the URL of the Web service (http://office.microsoft.com/Research/query.asmx). Providers provide services. If you expand the source's keys, you can see the list of services (see Figure 9.10). Figure 9.10. Microsoft Office 2003 Services Registry EntriesThis service is a translation service that translates from "English (U.S.)" to "French (France)". The kind of service is specified in the CategoryID, which is 0x36120000 (907149312) for translation services (this is the REFERENCE_TRANSLATION constant in the Office 2003 Research Service SDK). Of particular interest here is the SourceData entry, which is in the following format: <FromLCID>/<ToLCID>/<ResultType> In the entry in Figure 9.10, the FromLCID is 1033 (which is the locale ID for "English (U.S.)"), the ToLCID is 1036 (which is the locale ID for "French (France)"), and the ResultType is 4. The result type is "1" for keyword translators and "2" for whole-document translators; "4" is not documented but appears to be for keyword/sentence translators. For our purposes, we are interested in "1" and "4". From this information, you could read through the list of providers collecting a list of services that have a CategoryID of 0x36120000 and a SourceData that has a result type of either 1 or 4.
By default, three providers of translation services are included with Office 2003:
The "internal:LocalTranslation" provider is a set of Win32 DLLs and is not a Web service. You can find the DLLs in "%CommonProgramFiles%\Microsoft Shared\TRANSLAT". They are installed on demand, so they won't be present until you have translated English to/from French and/or English to/from Spanish. Because this provider is not a web service and the functions are undocumented, I have chosen to ignore this provider. At first sight, the Microsoft Office Online Services looks like a good source of machine translation. The URL in the Registry can be used as is in Visual Studio's ASP.NET Web Service Wizard to generate a Web service reference because the Web service returns the WSDL that describes the Web service. Unfortunately, the Web service itself suffers from two problems. First, the Web service is more of a translation dictionary than a keyword translator. For example, if you translate Stop into German, the result (after all the HTML formatting has been removed) is this:
Clearly, this is the kind of definition that you would expect to find in a dictionary, but it is virtually useless for machine translation. Second, it translates just single words; it cannot translate a sentence or a phrase. It is almost completely meaningless to translate words one by one and string them together, so these services have no use to us. WorldLingo Translation ServicesThe third provider, WorldLingo, is the only viable option that is installed by default. The complete source code to use with this provider is included with this book. Because it is long, I focus only on the most important parts. The first problem in using the WorldLingo services is that the WorldLingo server doesn't expose the WSDL for the Web service. You can't simply put http://www.worldlingo.com/wl/msoffice11 into Visual Studio's ASP.NET Web Service wizard; the process needs to be a little lower level. Instead, you can use an HttpWebRequest object to send an HTTP request to the server and read the Web Response object that is returned. SendRequest sends a SOAP request to a URL: protected string SendRequest(string url, string soapPacket) { HttpWebRequest httpWebRequest = (HttpWebRequest) WebRequest.Create(url); httpWebRequest.ContentType = "text/xml; charset=utf-8"; httpWebRequest.Headers.Add( "SOAPAction: urn:Microsoft.Search/Query"); httpWebRequest.Method = "POST"; httpWebRequest.ProtocolVersion = HttpVersion.Version10; Stream stream = httpWebRequest.GetRequestStream(); StreamWriter streamWriter = new StreamWriter(stream); streamWriter.Write(soapPacket); streamWriter.Close(); WebResponse webResponse = httpWebRequest.GetResponse(); Stream responseStream = webResponse.GetResponseStream(); StreamReader responseStreamReader = new StreamReader(responseStream); return responseStreamReader.ReadToEnd(); } This would be used something like this:- string responsePacket = SendRequest( "http://www.worldlingo.com/wl/msoffice11", queryPacket); The Web service has a method called Query that accepts a single parameter that is a string of XML. The XML contains the translation request, including the "from" language, the "to" language, and the text to be translated. The aforementioned Microsoft Office 2003 Research Services SDK has the structure of this XML packet. At first sight, the Research Services Class Library (RCSL, also available from http://msdn.microsoft.com) includes QueryRequest and QueryResponse classes that might help. These classes are wrappers to build and read the XML used with the Query method. Unfortunately, they are designed for use by developers, not consumers, of Research Services; consequently, they enable you to read the query XML and to create the response XML. This doesn't help because we want to create the query XML and read the response XML. To create the query XML, I wrote a GetQueryXml method, which can be called something like this: GetQueryXml("The monkey is in the tree", service.Id, "(11.0.6360)") We pass the string to translate, the GUID of the service that performs the translation, and a build number. The GUID of the service identifies the from/to language pair. GetQueryXml then builds the necessary XML using XmlTextWriter according to the schema defined in the SDK. The return result of the SendRequest method is the response from the Web service. Again, this is an XML string using the QueryResponse schema defined in the SDK. The Response element of this XML contains the translated text. Unfortunately, this translated text is formatted for display in an Office application, so it contains HTML formatting that must be removed first. With this done, we have our translated text. |