8.12 Extracting Data Samples


8.12 Extracting Data Samples

One of the first tasks is to aggregate an adequate sample of fraudulent and non-fraudulent accounts. As previously mentioned, it is critical to first create a fraud profile. This file should contain samples of on-line transactions by as many different types and categories of products lines for different dollar amounts and number of purchases made by legal and fraudulent shoppers. In order to create predictive models via data mining techniques, it is very important to have an adequate sample of observation for "training" a system such as a neural network to recognize the patterns of fraud. Additionally, a machine-learning analysis will be used to extract the features of fraudulent transactions. Typically, the transactional data variables collected and used for the modeling process include some of the following data items (see Table 8.4).

Table 8.4: Transactional Data Variables

Product Category

Number of Purchases

Vendor Name

Vendor ID number

Invoice number

Order date

Customer ID number

Billing address 1

Billing address 2

Phone number

SKU

Product name

Product price

Product quantity

Product description

Brand

File source

etc.




Investigative Data Mining for Security and Criminal Detection
Investigative Data Mining for Security and Criminal Detection
ISBN: 0750676132
EAN: 2147483647
Year: 2005
Pages: 232
Authors: Jesus Mena

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net