|< Day Day Up >|| |
When employing Web mining techniques, researchers are faced with the challenge of creating useful knowledge out of a sea of information. This has created a fertile ground for research that seeks to discover new ways in which data can be mined more efficiently and effectively. One of the outcomes of research is to identify ways to increase site usability. To do this, the site must stand out from the rest while at the same time provide information that is both relevant and valid.
In order for a site to stand out, Web developers must take the appropriate steps to design a user-centered interface that is appealing. This task can be accomplished by considering the three factors that affect the manner in which a user perceives a Web site. These factors include content, Web page design, and overall site design. Web developers can effectively address these three criteria by using past data (data mining) to design a Web site or cyber-community.
Data mining is the process of making inferences from past data. The context of this study focuses on data mining techniques for the World Wide Web. That being said, the process of data mining that is applied to the Web is called Web mining. Web mining is a technique of data mining that is used to extract non-arbitrary information from the Web (Borges & Levene, 1999; Bucher, Baumgarten, Anand, Mulvenna, & Hughes, 1999; Chang, Healey, McHugh, & Wang, 2001). Researchers agree on three categories of Web mining: Web content mining, Web structure mining, and Web usage mining (Kosala & Blockeel, 2000; Srivastava, Cooley, Deshpande, & Tan, 2000). Although there are three distinct areas, the techniques can be used in isolation or interchangeably.
Web content mining consists of the discovery of useful information from the Web by examining the data that is contained in the Web site; Web structure mining is the study of the link topology; and Web usage mining is concerned with the navigational behavior of the users. Since Web mining is based on the premises of data mining, similar techniques and approaches are applicable. In this chapter, the author will focus on Web structure (link topology) mining and exploiting the hub and authority model.
Data mining techniques such as association rules, cluster analysis, classification rules, sequential patterns, and traversal patterns can be employed effectively in mining Web data. Association rules are utilized to indicate a relationship among items. In Web mining, association rules are used to discover a group of URLs that occur frequently together (Mobasher, Dai, Luo, Sun, & Zhu, 2000). Similarly, sequence rules (traversal path) not only discover events that occur together, but in the same order (Spiliopoulou, 2000). These techniques allow the Web developer to design the site based on the most frequent traversal paths.
A review of current research reveals a number of interesting algorithms. However, most are developed for e-commerce. Bucher et al. (1999) developed MiDAS (mining Internet data for associative sequences), a data mining algorithm that discovers sequential access patterns from Web logs in order to identify navigational patterns, and Srikant and Yang (2001) proposed an algorithm that automatically discovers Web site pages whose location is different from where the users would expect to find them. This latter is based on the assumption that the user will backtrack in order to find a page of interest.
|< Day Day Up >|| |