One of the first tasks is to aggregate an adequate sample of fraudulent and non-fraudulent accounts. As previously mentioned, it is critical to first create a fraud profile. This file should contain samples of on-line transactions by as many different types and categories of products lines for different dollar amounts and number of purchases made by legal and fraudulent shoppers. In order to create predictive models via data mining techniques, it is very important to have an adequate sample of observation for "training" a system such as a neural network to recognize the patterns of fraud. Additionally, a machine-learning analysis will be used to extract the features of fraudulent transactions. Typically, the transactional data variables collected and used for the modeling process include some of the following data items (see Table 8.4).
Product Category | Number of Purchases | Vendor Name |
---|---|---|
Vendor ID number | Invoice number | Order date |
Customer ID number | Billing address 1 | Billing address 2 |
Phone number | SKU | Product name |
Product price | Product quantity | Product description |
Brand | File source | etc. |