Before links can be constructed to yield valuable associations, the analysts must have a complete understanding of the data they are working with. As with all data mining projects, extracting and preparing the data for analysis is commonly a major task. Transactional databases more often than not contain incomplete or inconsistent information, or multiple instances of the same entities because they are designed and built for speed not analysis. For example, a database of airline ticket purchases may have different names or account numbers for the same person or different individuals with the same name. In order to map associations correctly it is necessary to first identify accurately the right individuals in a database. This process of disambiguating and combining unique identification information into a unique entity is the task of data consolidation and preparation and must take place before any link analysis can be undertaken. (This was discussed in the preceding chapter, under section 2.11 on data preparation.)
Applying domain-specific knowledge is a very important component of this task. An investigator or analyst must know the nuances of the databases that she or he is working with. In data mining it is critical that the analyst know what every field in a database or column in a spreadsheet contains and what each value represents. For example, in a database of financial transactions a single, specific family, gang, cell, group, unit, or company may represent the desired level of granularity for analysis. Knowing and working at the right level of granularity is a very important part of preparing the data for link analysis.
Representing and configuring suspects who underlie the transactions and are reflected in various identifiers in the data involves two operations: consolidation and disambiguation. For example, it is often necessary to consolidate multiple transactions in order to evaluate the activities of targeted entities. On the other hand, a process of "merge-purge" for correcting errors in data (which may be intentional misrepresentations) is the process of disambiguation. Striking the proper balance is the task of data preparation prior to analysis. Data mining is a mixture of science and art, much like cooking: It requires experience, and, like every meal, no two projects are ever exactly the same.
Profiling entities and discovering anomalies that may indicate fraud or other criminal activity often requires assembling transactions in order to uncover patterns of unique behavior. Before any analysis can be conducted, the essential, initial activity of data preparation must, be undertaken prior to generating a link analysis chart. The data must be formatted to identify relevant entities from transactional formatted data; for example, deposits into a bank account should be organized in terms of that individual entity to look for structuring activity in a money-laundering scheme. Structuring is the practice of making multiple deposits under the $10,000 maximum that requires their reporting to the IRS.