4.4 Why Are Agents Important?
One of the most compelling uses for agent technology is in the area of information retrieval; the explosion of information about individuals and companies on the Internet and the databases connected to it is huge. Based on studies from Forrester Research and the Yankee Group, there are over 1 billion documents on the visible Web, with 7 million documents being added on a daily basis. What is more important is that the Web is becoming increasingly database driven, and records in these databases cannot be indexed or retrieved using typical search engines. This is due in part to the rise of new technologies like XML and Active Server Pages (ASP), which conventional search engines omit simply because they cannot retrieve the records from these dynamic databases.
These same studies indicate that this dynamic Web is 500 times larger than the visible Web of 1 billion pages. Agent technologies, which support special scripting capabilities, have the capability to correspond to different information types and, thus, to retrieve much more information than normal search engines. In other words, agents can sense the type of data source and adjust and convert the parameters into a query that can be understood by the information source. Of course, these types of agents can negotiate and extract information, not just from Web-connected databases, but also from local databases, intranets, extranets, and other proprietary networks.
Agents are needed to help analysts and investigators deal with and leverage a tremendous amount of data in the course of their work. Agents are sophisticated programs that, as we have discussed, possess human-like attributes, such as the ability to work independently, communicate, coordinate, learn, and accumulate knowledge to conduct their assigned tasks. When used in conjunction with other data mining technologies, such as those that will be covered in subsequent chapters, agents can assist investigators in accessing, organizing, and using current and relevant data for security deterrence, forensic analysis, and criminal detection.
As we have seen, agents are designed to perform in a particular environment, such as a closed network or the Internet; they can also be categorized according to their functionality, such as information retrieval, information filtering, monitoring and alerting, etc. They can also be classified according to their core architecture; for example, there are learning agents that employ internal neural networks to acquire knowledge as they work or machine-learning algorithms to generate their own rules for behavior and action. For the most part, there are two major categories of agent that lend themselves to investigative data mining applications: Internet (open sources) and intranet (secured sources) agents.
4.5 Open Sources Agents
These Internet agents provide search services over the Web. There are also server-specific agents that provide services, such as security, at the server level. There are Internet agents that can serve as information-filtering agents, so that, based on the security level of users, only certain information is passed to them. There are also notification and special services agents and even mobile agents for executing specific tasks, like special alerts to wireless devices. Internet agents are computer programs that reside on servers performing specific data detection, retrieval, and delivery tasks to designated users based on preset parameters, behaving very much like intelligent robots. In this context, intelligent agents can play an integral role in the overall process of investigative data mining.
These Web robots operate using different Boolean or vector-space strategies when following links and retrieving documents, based on different prioritized methods and schemes. In fact, search agents are the most widely used Web services. Using keyword query forms, they are easy to use and provide the user an instant response and a hierarchical list of sources of information. Their indexing provides users a universe of information in an instant. In addition, some metasearch engines incorporate the knowledge of where to look for information depending on the attributes of the data, such as searching for individuals, phone numbers, physical and e-mail addresses, technical reports, public record fillings, and foreign news stories in their native language.