3.4 Using Link Analysis Networks
Before links can be constructed to yield valuable associations, the analysts must have a complete understanding of the data they are working with. As with all data mining projects, extracting and preparing the data for analysis is commonly a major task. Transactional databases more often than not contain incomplete or inconsistent information, or multiple instances of the same entities because they are designed and built for speed not analysis. For example, a database of airline ticket purchases may have different names or account numbers for the same person or different individuals with the same name. In order to map associations correctly it is necessary to first identify accurately the right individuals in a database. This process of disambiguating and combining unique identification information into a unique entity is the task of data consolidation and preparation and must take place before any link analysis can be undertaken. (This was discussed in the preceding chapter, under section 2.11 on data preparation.)
Applying domain-specific knowledge is a very important component of this task. An investigator or analyst must know the nuances of the databases that she or he is working with. In data mining it is critical that the analyst know what every field in a database or column in a spreadsheet contains and what each value represents. For example, in a database of financial transactions a single, specific family, gang, cell, group, unit, or company may represent the desired level of granularity for analysis. Knowing and working at the right level of granularity is a very important part of preparing the data for link analysis.
Representing and configuring suspects who underlie the transactions and are reflected in various identifiers in the data involves two operations: consolidation and disambiguation. For example, it is often necessary to consolidate multiple transactions in order to evaluate the activities of targeted entities. On the other hand, a process of "merge-purge" for correcting errors in data (which may be intentional misrepresentations) is the process of disambiguation. Striking the proper balance is the task of data preparation prior to analysis. Data mining is a mixture of science and art, much like cooking: It requires experience, and, like every meal, no two projects are ever exactly the same.
Profiling entities and discovering anomalies that may indicate fraud or other criminal activity often requires assembling transactions in order to uncover patterns of unique behavior. Before any analysis can be conducted, the essential, initial activity of data preparation must, be undertaken prior to generating a link analysis chart. The data must be formatted to identify relevant entities from transactional formatted data; for example, deposits into a bank account should be organized in terms of that individual entity to look for structuring activity in a money-laundering scheme. Structuring is the practice of making multiple deposits under the $10,000 maximum that requires their reporting to the IRS.
3.5 Fighting Wireless Fraud with Link Analysis: A Case Study
Wireless fraud is by nature reactive in that a carrier can't detect it until it takes place. One way that carriers are reducing fraud is by identifying and stopping repeat offenders via link analysis. Using link analysis, fraud investigators gather and correlate subscriber information that can associate new customers to fraudulent activities. Various types of subscriber data can be used for this type of analysis. One method, known as a dialed-digit analysis, scrutinizes the records of who's calling whom. Using a link analysis tool, a large volume of call detail records (CDRs) is used to reveal correlations between records. The strategy is to identify fraud and potential suspects by association.
Using link analysis techniques an investigator can track a new subscriber's 10 most frequently called numbers. For example, alarms are set to activate when the dialed numbers match those associated with ongoing or previous fraud case phone number accounts. This enables the carrier to identify a previously banished criminal by his or her most frequently dialed numbers. Expanding the fraud circle of association, accomplished by including the criminals' incoming calls, is another way to perform a dial-digit analysis. The system tracks not only the people the fraudulent phones dial, but also those who call the fraudulent phones.
Because the analysts can visually represent the calling patterns, they can find numbers to investigate that may not have popped up through other methods. In some cases, returning criminals may not call numbers that can be linked to past crimes, but they might call a number called by another fraudulent subscriber, which can alert the fraud analyst of possible criminal activity by association. Using link analysis, an analyst creates a web of who calls whom and represents those associations in a digital map, which from an analyst's perspective makes it a lot easier to recognize patterns of behavior that may not be evident through other traditional analyses.
Another way that carriers use link analysis to detect potential bad accounts is by analyzing call-pattern usage information in combination with billing information. The objective here is to understand usage information better in the context of a subscriber's billing profile. For example, if a new customer subscribes to the least expensive plan a carrier offers but starts making 300 calls a day, there's a problem. Link analysis is also used to detect potentially fraudulent new subscribers, primarily by looking at their provided credit-card information and its association with other bad accounts. The analyst marks accounts that have used fraudulent credit cards before and looks for new accounts that use the same credit-card information.
Link analysis is also used to catch subscription fraud by detecting suspicious changes early in the life of a new account; for instance, a subscriber who changes the account address within the first week would raise suspicion. Sometimes fraudulent subscribers change the account address early to prevent the legitimate credit-card owner from receiving welcome information from the wireless carrier. A series of link analyses are also performed on other identity data, such as Social Security numbers and home telephone numbers, again the objective being to find associations with previously tainted, bad accounts.