Data Architecture


This section will detail the data sources both in SQL Server and in the format of each of the input files. First, create default locations to simulate your receiving area for the external files, and then you'll take a closer look at each of the input files that are part of the business requirements.

File Storage Location Setup

Create a base directory to store the file-based imports to this project. Throughout the case study, the base location will be referred to as C:\casestudy\ for file-based imports. In the base directory, create two sub-directories: ACH\ and LOCKBOX\. You will use these locations to store the files you'll create in the next few sections.

Bank ACH Payments

Customers make payments within their own banks or electronic payment systems to ABC Company through an automated clearinghouse. The automated clearinghouse bundles up all the payments for the day and sends one XML file through an encrypted VPN connection to an encrypted folder. The bank wires contain only a little bit of information at the transactional level. Each XML file does contain a header row with a unique ID that identifies the file transmission. The header also contains a total deposit amount and a transaction count that can be used to further verify the file transmission. Each transactional detail row represents a deposit and contains two date fields: the date the deposit item was received and the date the deposit item was posted to ABC Company's deposit account. Each payment contains the amount of the deposit and a free-form field that could contain the customer's name on a bank account, an e-mail address, or anything the customer adds to the wire. More commonly the description contains the name on the customer's bank account — which is often very different from the name in Company ABC's customer data. To make the sample ACH file, and for each file in this example, you'll need to re-create these files manually or download the files from this book's page at www.wrox.com. Download or create a file named c:\casestudy\ach\sampleach.xml that looks like the following:

 <BATCH> <HEADER><ID>AAS22119289</ID> <TOTALDEPOSIT>180553.00</TOTALDEPOSIT> <DEPOSITDATE>07/15/2005</DEPOSITDATE> <TOTALTRANS>6</TOTALTRANS> </HEADER> <DETAIL><AMOUNT>23318.00</AMOUNT> <DESC>Complete Enterprises</DESC> <RECEIVEDDATE>07/15/2005</RECEIVEDDATE> <POSTEDDATE>07/15/2005</POSTEDDATE></DETAIL> <DETAIL><AMOUNT>37054.00</AMOUNT> <DESC>Premier Sport</DESC> <RECEIVEDDATE>07/15/2005</RECEIVEDDATE> <POSTEDDATE>07/15/2005</POSTEDDATE></DETAIL> <DETAIL><AMOUNT>34953.00</AMOUNT> <DESC>Intl Sports Association</DESC> <RECEIVEDDATE>07/15/2005</RECEIVEDDATE> <POSTEDDATE>07/15/2005</POSTEDDATE></DETAIL> <DETAIL><AMOUNT>22660.00</AMOUNT> <DESC>Arthur Datum</DESC> <RECEIVEDDATE>07/15/2005</RECEIVEDDATE> <POSTEDDATE>07/15/2005</POSTEDDATE></DETAIL> <DETAIL><AMOUNT>24759.00</AMOUNT> <DESC>Northwind Traders</DESC> <RECEIVEDDATE>07/15/2005</RECEIVEDDATE> <POSTEDDATE>07/15/2005</POSTEDDATE></DETAIL> <DETAIL><AMOUNT>37809.00</AMOUNT> <DESC>Wood Fitness</DESC> <RECEIVEDDATE>07/15/2005</RECEIVEDDATE> <POSTEDDATE>07/15/2005</POSTEDDATE></DETAIL> </BATCH> 

Lockbox Files

Company ABC has starting using a lockbox service that their bank provides for a nominal fee. This service images all check and invoice stubs sent to a specific address that the bank monitors. The bank provides a data file containing the following data attributes for each deposit item: the amount, a reference number for the invoice, and an image key that can be used to review the images of the item online. The terms of the service dictate that if the bank can't determine the invoice number because of legibility issues, or if the invoice is not sent in with the deposit item, either a customer account number or a customer name may be used in place of the invoice number. Periodically during the day, the bank will batch a series of payments into one file containing a header that includes a batch number, the posted deposit date for all deposit items, and an expected total for the batch.

The structure of the file from the bank is as follows:

 HEADER: TYPE            1A          TYPE OF LINE H-HEADER POSTDATE        6A          DATE DEPOSIT POSTED FILLER          1A          SPACE(1) BATCHID         12A         UNIQUE BATCH NBR DETAIL (TYPE I): TYPE            1A          TYPE OF LINE I-INVOICE IMGID           10A         IMAGE LOOK UP ID (2-6 IS ID) DESC            25A         INVOICE OR DESC INFO DETAIL (TYPE C) TYPE            1A          TYPE OF LINE C-CHECK IMGID           10A         IMAGE LOOK UP ID (2-6 IS ID) DESC            8S 2        CHECK AMOUNT 

Download or create using the following code a file named c:\casestudy\lockbox\samplelockbox.txt to simulate the Lockbox transmission in this example:

 H080105 B1239-99Z-99 0058730760 I4001010003 181INTERNAT C4001010004   01844400 I4002020005 151METROSPOOO1 C4002020006   02331800 I4003030009 MAGIC CYCLES C4003030010   02697000 I4004040013 LINDELL C4004040014   02131800 I4005040017 151GMASKI0001 C4005040019   01938800 

PayPal or Direct Credits to Corporate Account

Company ABC has started a pilot program to allow customers to make payments using PayPal and other similar online electronic payment services. Customers like this option because it is easy to use. However, these payments are difficult to process for the Accounting group, because not all e-mail addresses have been collected for the customers and that is the most common description on the transactions. Accounting personnel have to do some research to determine who the customer is and to release the deposit to the payment systems. They would like to be able to input the e-mail address into the customer records and have that data used in future processing. Currently the accounting department uses a data synch process in their accounting software to download these transactions directly from a special bank account periodically during the day. This information is available through a read-only view in the database called [vCorpDirectAcctTrans]. Figure 19-5 shows the structure of this view.

image from book
Figure 19-5

Case Study Database Model

The case study database model (see Figure 19-6) is limited to only the information relevant to the case study. The core entities are the following:

  • Customer: An entity that utilizes products and services from Company ABC. To keep it simple, only the customer name, account number, and e-mail address attributes are represented in the table.

  • Invoice: A monthly listing of total billed products and services for each customer. Each invoice is represented by a unique invoice number. Invoice details are not shown in the case study data model for simplicity.

  • BankBatch: Any set of transactions from a bank or deposit institution that is combined. Auditable information expected for the exchange of monetary data is a major part of this entity. Files, or batches, of transactions must be validated in terms of the number of transaction lines and most importantly by amount. Care must be taken not to load a batch more than once. Recording the bank batch number or BankBatchNbr field and comparing incoming batches should allow you to keep this from happening.

  • BankBatchDetail: Each bank batch will be composed of many transactions that break down into essentially a check and an invoice. You could receive as much as both pieces of information or as little as none of this information. For auditing purposes, you should record exactly what you received from the input source. You'll also store in this table logically determined foreign keys for the customer and invoice dimension.

  • CustomerLookUp: This lookup table will be populated by your SSIS package and an external application. This will allow users to store matching information to identify customers for future processing. This will allow the data import processes to "learn" good matches from bad data.

image from book
Figure 19-6

Database Setup

To get started, you need to set up the database named CaseStudy. The database and all objects in it will become the basis for your solution to this business issue. Of course, as mentioned earlier, all input files and scripts are available from the www.wrox.com Web site. There are two ways to create a new database. Use the Microsoft SQL Management Studio to connect to a server or database engine of your choice. On the Databases node, right-click and select the pop-up menu option New Database. In the New Database editor, provide the Database name as CaseStudy. Press the OK button to accept the other defaults. The second easy option is to run the following SQL script in a new query editor:

 USE [master] GO CREATE DATABASE [CaseStudy] ON PRIMARY ( NAME = N'CaseStudy', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\CaseStudy.mdf' , SIZE = 3072KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )  LOG ON ( NAME = N'CaseStudy_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\ CaseStudy_log.ldf' , SIZE = 1024KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)  COLLATE Latin1_General_CI_AI GO EXEC dbo.sp_dbcmptlevel @dbname=N'CaseStudy', @new_cmptlevel=90 GO EXEC [CaseStudy].[dbo].[sp_fulltext_database] @action = 'disable' GO 

Customer

The customer table can be created in the Microsoft SQL Server Management Studio. Click on the New Query button in the toolbar to open a New Query window. Run the following SQL statement in the window.

 use casestudy GO CREATE TABLE [dbo].[Customer](   [CustomerID] [int] IDENTITY(1,1) NOT NULL,   [AccountNbr] [char](15) NOT NULL,   [Name] [varchar](50)    NOT NULL,   [Email] [varchar](50)   NULL,   [SendEmailConfirm] [bit] NOT NULL CONSTRAINT [DF_Customer_SendEmailConfirm] DEFAULT ((0)),  CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED (   [CustomerID] ASC ) ON [PRIMARY] ) ON [PRIMARY] 

To fill the table with potential customers, you'll need access to an AdventureWorks database. You'll use the vendor table to create your customers.

 use casestudy GO INSERT INTO CaseStudy..Customer(AccountNbr, [Name]) SELECT AccountNumber, [Name] as CustName FROM adventureworks.purchasing.vendor GO 

You'll also use the AdventureWorks data to create some e-mail addresses. Run the following SQL script to populate the e-mail field in the customer table.

 use casestudy GO UPDATE Customer SET Email = subqry.email,     SendEmailConfirm = case when len(subqry.email) > 0 then 1 else 0 end FROM Customer cust INNER JOIN (      SELECT v.vendorid as CustomerID, email =          case when len(replace(replace(replace(replace([name], '&', ''), ' ', ''),          '.', ''), ',', '')) > 15          and ascii(left(c.firstname, 1)) < 70 then              left(c.firstname, 1) + rtrim(c.lastname) + '@msn.com'          when len(replace(replace(replace(replace([name], '&', ''), ' ', ''), '.',          ''), ',', '')) < 15          and (len(rtrim(c.lastname)) + 1) < 16 then              left(c.firstname, 1) + rtrim(c.lastname) + '@' +              replace(replace(replace(replace([name], '&', ''), ' ', ''), '.', ''),                  ',', '') + '.com'              else NULL end      FROM adventureworks.purchasing.vendor v      INNER JOIN adventureworks.person.contact c      ON vendorid = contactid ) subQry ON cust.CustomerID = subQry.CustomerID 

Invoice

To create the invoice table, run the following SQL statement:

 USE [CaseStudy] GO CREATE TABLE [dbo].[Invoice](   [InvoiceID] [int] IDENTITY(1,1) NOT NULL,   [InvoiceNbr] [varchar](50) NOT NULL,   [CustomerID] [int] NOT NULL,   [TotalBilledAmt] [money] NOT NULL,   [BilledDate] [datetime] NOT NULL,   [PaidFlag] [smallint] NOT NULL CONSTRAINT [DF_Invoice_PaidFlag] DEFAULT ((0)),  CONSTRAINT [PK_Invoice] PRIMARY KEY CLUSTERED (   [InvoiceID] ASC ) ON [PRIMARY] ) ON [PRIMARY] GO ALTER TABLE [dbo].[Invoice]  WITH NOCHECK ADD CONSTRAINT [FK_Invoice_CustomerID] FOREIGN KEY([CustomerID]) REFERENCES [dbo].[Customer] ([CustomerID]) GO ALTER TABLE [dbo].[Invoice] CHECK CONSTRAINT [FK_Invoice_CustomerID] 

You will use the customer table to generate three months' worth of invoice data. In doing so, you are creating invoice numbers with the customer account number embedded in the invoice number. Companies commonly do this because it provides an extra piece of identification as a cross-check in an environment where there is very limited data. Use the following SQL script to create the invoice entries:

 INSERT INTO Invoice(InvoiceNbr, CustomerID, TotalBilledAmt, BilledDate, PaidFlag) SELECT  InvoiceNbr = '151' + Accountnbr,         CustomerID,         TotalBilledAmt = cast(131 * (ascii(left(name, 1)) + ascii(substring(name,                          2, 1))) as money),         BilledDate = '06/01/2005 00:00:00',         PaidFlag = 0 FROM customer UNION SELECT  InvoiceNbr = '181' + Accountnbr,         CustomerID,         TotalBilledAmt = case            when left(Accountnbr, 1) in ('A', 'B', 'C', 'D', 'E', 'F', 'G')            then cast(131 * (ascii(left(name, 1)) + ascii(substring(name, 2, 1)))                 as money)            else                cast(191 * (ascii(left(name, 1)) + ascii(substring(name, 2, 1)))                as money)            end,         BilledDate = '07/01/2005 00:00:00',         PaidFlag = 0 FROM customer UNION SELECT  InvoiceNbr = '212' + Accountnbr,         CustomerID,         TotalBilledAmt = case            when left(Accountnbr, 1) in ('A', 'F', 'G')            then cast(132 * (ascii(left(name, 1)) + ascii(substring(name, 2, 1)))                 as money)            else                 cast(155 * (ascii(left(name, 1)) + ascii(substring(name, 2, 1)))                 as money)            end,         BilledDate = '08/01/2005 00:00:00',         PaidFlag = 0 FROM customer GO UPDATE invoice set totalbilledamt = 18444.00 WHERE invoicenbr = '151INTERNAT0002' and totalbilledamt = 23973 

CustomerLookUp

The customer lookup table will be used to verify some repeated bad data that continues to be sent through the accounting feeds. A bad data string can be stored for each import type as long as the data has been resolved into an existing customer. The structure can be created using the following SQL script:

 USE [CaseStudy] GO CREATE TABLE [dbo].[CustomerLookUp](   [RawDataToMatch] [varchar](50) NOT NULL,   [ImportType] [char](10) NOT NULL,   [CustomerID] [int] NOT NULL,  CONSTRAINT [PK_CustomerLookUp] PRIMARY KEY CLUSTERED (   [RawDataToMatch] ASC,   [ImportType] ASC ) ON [PRIMARY] ) ON [PRIMARY] GO ALTER TABLE [dbo].[CustomerLookUp]  WITH NOCHECK ADD  CONSTRAINT [FK_CustomerLookUp_CustomerID] FOREIGN KEY([CustomerID]) REFERENCES [dbo].[Customer] ([CustomerID]) GO ALTER TABLE [dbo].[CustomerLookUp] CHECK CONSTRAINT [FK_CustomerLookUp_CustomerID] BankBatch 

The BankBatch table will not only store the summary data from the batch file but also store the file itself in the BatchFile field. This table can be created using the following SQL statement.

 USE [CaseStudy] GO CREATE TABLE [dbo].[BankBatch](   [BankBatchID] [int] NOT NULL,   [BankBatchNbr] [nvarchar](50) NULL,   [DepositDate] [datetime] NULL,   [ReceivedDate] [datetime] NULL,   [BalancedDate] [datetime] NULL,   [PostedDate] [datetime] NULL,   [BatchTotal] [money] NULL,   [BatchItems] [int] NULL,   [BatchItemsComplete] [int] NULL,   [FileBytes] [int] NULL,   [FullFilePath] [nchar](100) NULL,   [ImportType] [char](1) NULL,   [ErrMsg] [varchar](100) NULL,   [BatchFile] [ntext] NULL,  CONSTRAINT [PK_BankBatch] PRIMARY KEY CLUSTERED (   [BankBatchID] ASC ) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] 

BankBatchDetail

The detail to the BankBatch table can be created using the following SQL script:

 USE [CaseStudy] GO CREATE TABLE [dbo].[BankBatchDetail](   [BankBatchDtlID] [int] NOT NULL,   [BankBatchID] [int] NOT NULL,   [RawInvoiceNbr] [nvarchar](15) NULL,   [RawAccountNbr] [nvarchar](15) NULL,   [ReferenceData1] [nvarchar](15) NULL,   [ReferenceData2] [nvarchar](15) NULL,   [MatchedInvoiceID] [int] NULL,   [MatchedCustomerID] [int] NULL,   [MatchedDate] [datetime] NULL,  CONSTRAINT [PK_BankBatchDtlID] PRIMARY KEY CLUSTERED (   [BankBatchDtlID] ASC ) ON [PRIMARY] ) ON [PRIMARY] ALTER TABLE [dbo].[BankBatchDetail]  WITH NOCHECK ADD    CONSTRAINT [FK_BankBatchDetail_BankBatchID] FOREIGN KEY([BankBatchID]) REFERENCES [dbo].[BankBatch] ([BankBatchID]) GO ALTER TABLE [dbo].[BankBatchDetail] CHECK CONSTRAINT [FK_BankBatchDetail_BankBatchID] GO ALTER TABLE [dbo].[BankBatchDetail]  WITH CHECK ADD CONSTRAINT [FK_BankBatchDetail_CustomerID] FOREIGN KEY([MatchedCustomerID]) REFERENCES [dbo].[Customer] ([CustomerID]) GO ALTER TABLE [dbo].[BankBatchDetail]  WITH CHECK ADD CONSTRAINT [FK_BankBatchDetail_InvoiceID] FOREIGN KEY([MatchedInvoiceID]) REFERENCES [dbo].[Invoice] ([InvoiceID]) 

Corporate Ledger Data

To simulate a view into your Direct Credits to the Corporate Account, you need to create the GLAccountData structure and your view [vCorpDirectAcctTrans]. Run the following SQL to create the physical table:

 USE [CaseStudy] GO CREATE TABLE [dbo].[GLAccountData](   [TransID] [int] IDENTITY(1,1) NOT NULL,   [PostDate] [datetime] NULL,   [ProcessDate] [datetime] NULL,   [DepositAmount] [money] NULL,   [TransDesc] [varchar](50) NULL,   [GLAccount] [char](10) NULL,  CONSTRAINT [PK_GLAccountData] PRIMARY KEY CLUSTERED (   [TransID] ASC ) ON [PRIMARY] ) ON [PRIMARY] 

Run the following SQL to create the logical view to this data.

 USE [CaseStudy] GO CREATE VIEW dbo.vCorpDirectAcctTrans AS SELECT     TransID, PostDate, ProcessDate, DepositAmount, TransDesc FROM         dbo.GLAccountData 

Run this SQL batch to load the GLAccountData with some sample deposit transactions from the directpay customers:

 INSERT INTO GLACCOUNTDATA(postdate, processdate, depositamount, transdesc, glaccount) SELECT '08/09/05', '08/10/05', 22794.00, 'PAYPAL*MBlack@Marsh.com', 'BANK' UNION SELECT '08/09/05', '08/10/05', 21484.00, 'PAYPAL*JBrown@CapitalCycles.com',  'BANK' UNION SELECT '08/09/05', '08/10/05', 22008.00, 'PAYPAL*DBlanco@msn.com', 'BANK' UNION SELECT '08/09/05', '08/10/05', 22794.00, 'PAYPAL*CBooth@MagicCycle', 'BANK' UNION SELECT '08/09/05', '08/10/05', 22401.00, 'PAYPAL*ABaltazar@msn.com', 'BANK' 

ErrorDetail

Although there are some great new logging options in SSIS, there is still a need to log detailed errors that can occur at the column level when processing. This table will allow you to store that information, and by storing the Execution ID, you can later join the custom-logged error detail with the step-level error information logged during package execution.

 USE [CaseStudy] GO CREATE TABLE [dbo].[ErrorDetail](   [ExecutionID] [nchar](38) NOT NULL,   [ErrorEvent] [nchar](20) NULL,   [ErrorCode] [int] NULL,   [ErrorColumn] [int] NULL,   [ErrorDesc] [nvarchar](1048) NULL,   [ErrorDate] [datetime] NULL,   [RawData] [varchar](2048) NULL ) ON [PRIMARY] 

Usp_BankBatch_Add

This stored procedure will be used to add a new bank batch to the payment processing system. Run the script to add this procedure to CaseStudy:

 Use CaseStudy GO CREATE PROC usp_BankBatch_Add(   @BankBatchID int OUTPUT,   @BankBatchNbr nvarchar(50)=NULL,   @DepositDate datetime=NULL,   @ReceivedDate datetime=NULL,   @BatchTotal money=NULL,   @BatchItems int=NULL,   @FileBytes int=NULL,   @FullFilePath nvarchar(100)=NULL,   @ImportType char(10) ) AS   /*=======================================================    PROC: usp_BankBatch_Add    PURPOSE: To Add BankBatch Header Basic info             and to validate that the batch is new.    OUTPUT: Will return BankBatchID if new or 0 if exists    HISTORY: 08/01/05 Created    =======================================================*/   SET NOCOUNT ON   If @ReceivedDate is null     SET @ReceivedDate = getdate()   IF LEN(@BankBatchNbr) <= 1 OR Exists(Select top 1 *         FROM BankBatch         WHERE BankBatchNbr = @BankBatchNbr         AND ImportType = @ImportType)       BEGIN         SET @BANKBATCHID = 0         RETURN -1       END   ELSE       BEGIN         INSERT INTO BankBatch(BankBatchNbr, DepositDate, ReceivedDate,                               BatchTotal, BatchItems, FileBytes, FullFilePath,                               ImportType)         SELECT UPPER(@BankBatchNbr), @DepositDate, @ReceivedDate,                      @BatchTotal, @BatchItems, @FileBytes, UPPER(@FullFilePath),                      UPPER(@ImportType)         SET @BANKBATCHID = Scope_Identity()       END   SET NOCOUNT OFF GO 

usp_BankBatchDetail_Match

This stored procedure will be used to update a payment with a matching invoice or customer identification number relating back to the dimension tables. Run the script to add this procedure to CaseStudy:

 CREATE PROC dbo.usp_BankBatchDetail_Match(             @BankBatchDtlID int,             @InvoiceID int=NULL,             @CustomerID int=NULL) AS   /*===============================================    PROC: usp_BankBatchDetail_Match    PURPOSE: To update as paid an incoming payment             with matched invoice and customerid    HISTORY: 08/01/05 Created    ===============================================   */   SET NOCOUNT ON   --UPDATE BANKBATCH DETAIL WITH INVOICE AND CUSTOMERID   --NOTE: IF EITHER IS NULL THEN DON'T UPDATE   --MATCHED DATE. THIS WILL PUSH THE ITEM INTO A SUBJECT-   --MATTER-EXPERT'S QUEUE TO IDENTIFY.   UPDATE BankBatchDetail   SET MatchedInvoiceID = @InvoiceID,       MatchedCustomerID = @CustomerID,       MatchedDate = case when @InvoiceID is NULL or @CustomerID is NULL then NULL                     else getdate() end   WHERE BankBatchDtlID = @BankBatchDtlID   SET NOCOUNT OFF 

usp_BankBatch_Balance

This stored procedure is used to examine all payments in a batch and to mark the batch as complete when all payments have been identified with an invoice and a customer:

 GO CREATE PROC usp_BankBatch_Balance AS   /*======================================================    PROC: usp_BankBatch_Balance    PURPOSE: To update batchdetails when they are matched             Then keep BankBatches balanced by matching all             line items    =======================================================   */   UPDATE bankbatchdetail   SET MatchedDate = GetDate()   where (matchedinvoiceid is not null   and matchedcustomerid is not null)   and  (matchedinvoiceid <> 0   and matchedcustomerid <> 0)   UPDATE BANKBATCH   SET BatchItemsComplete = BatchItems - b.NotComplete   FROM BANKBATCH A   INNER JOIN (   select bankbatchid, count(*) as NotComplete   from bankbatchdetail   where   (matchedinvoiceid is null   OR matchedcustomerid is null   OR matcheddate is null)   group by bankbatchid   ) B   on A.BankBatchID = B.BankBatchID   UPDATE BankBatch   SET BalancedDate = getdate()   WHERE BalancedDate IS NULL   and BatchItems = BatchItemsComplete 



Professional SQL Server 2005 Integration Services
Wireless Java : Developing with Java 2, Micro Edition
ISBN: 189311550X
EAN: 2147483647
Year: 2006
Pages: 182

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net