Chapter 2: Investigative Data Warehousing


2.1 Relevant Data

One of the most difficult and frustrating phases of data mining is getting access to the right data. In government there are always issues between agencies and agreements to be sorted out, not to mention formats that need to be reconciled, all of which require several meetings before arrangements can be made. In private industry, there are the issues of privacy and cost. These are some of the minor, but very real, obstacles that accompany most data mining projects. Of greater significance are the issues revolving around what data is required for the desired objective. However, in the aftermath of 9/11 a new sense of urgency has evolved, in the face of which these obstacles pale in comparison to failing to resolve these data integration issues.

The value of any data mining model is very much dependent on the quality of the data used to construct it; for this reason it is critical that some creative discussions be held and consideration be made about what data is available at the start of the project. Aside from the data that is internally available, thought should be given to what external data sources could provide valuable insight to the data mining analysis. In this chapter we will discuss the closed and open sources of data available both online and offline and how to integrate and prepare the data prior to its analysis.

Data mining is about predicting behavior or profiling individuals; as such, it is critical to have access to timely and relevant information. Without it, the whole process is doomed to failure. For example, in order to construct an accurate link analysis chart of phone calls made by targeted suspects, it is critical to have access to the most current wireless toll records. Similarly, in order to construct predictive models for the profiling of fraudulent transactions or other criminal or terrorist activities, it is equally important to be able to construct a centralized database or to query multiple networks with very relevant and current data. In order to construct a good fraud model, for example, it is critical to have an adequate sampling of all the types of illegal transactions that have been uncovered by, say, an insurance provider, an e-commerce site, or a wireless carrier.




Investigative Data Mining for Security and Criminal Detection
Investigative Data Mining for Security and Criminal Detection
ISBN: 0750676132
EAN: 2147483647
Year: 2005
Pages: 232
Authors: Jesus Mena

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net