Case Study Load Package


The import integration service will be the first of the two packages you will build for this project. To keep this from becoming a 100-step process, you'll break it up into several sections: Naming Conventions and Tips, Package Setup and File System Tasks, Lockbox Control Flow Processing, Lockbox File Validation, Lockbox Processing, ACH Control Flow Processing, ACH Validation, ACH Processing, Email Control Flow Processing, and Email Data Flow Processing. Each step will be explained in detail the first time, and as things become repetitive, you'll pick up some speed.

Naming Conventions and Tips

There's nothing like opening up a package that fails in production and seeing tasks named Execute SQL Task, Execute SQL Task1, and Execute SQL Task2. What do they do? There is also something to be said when there is so much annotation that it is a nightmare to maintain. The balance depends on your philosophy and your team, but for now, the following rules seem to make sense in your SSIS development processes.

  • Name the package. Name it something other than package.dtsx.

  • Name packages with ETL verb extensions: <PACKAGE NAME>_Extract, <Package Name>_Transform, or <Package Name>_Load. The extension _Process seems to be explicit enough for those packages that don't fall into the other three categories.

  • Provide some brief annotation about what the package does, where it gets inputs and outputs, and what to do if it fails. Can it be rerun again? Is it part of a larger set of packages? Should it be restarted on checkpoints?

  • Add short descriptive words to SSIS tasks and transforms, but don't alter the name altogether. For example, change an Execute SQL task to Execute SQL Task to Retrieve Invoice ID. Use the description field on the object to provide the detailed information. (You'll see this info in a tooltip when the mouse hovers over the object.) It is important to document, but completely changing the name of an Execute SQL task to Retrieve Invoice ID obscures information about the "how" that is embedded with the knowledge that the task is an Execute SQL task. You could of course learn the pictures, but then you'd just get the picture.

Both of these packages will be fairly large, so a few additional tips may be in order before you start:

  • Packages save themselves when you run them, so just be aware.

  • Packages don't save themselves as you are working, so save periodically as you work on these large development packages. There is a nice recovery feature that sometimes will save you — don't depend on it.

  • Data Viewers are your friends. They are like grid message boxes. Add them in temporarily to see what is in your transformation stream.

  • Default data types aren't your friend. If your tables don't use Unicode text fields, watch your settings when you are adding columns or source data.

  • If you are at a point where you want to experiment, stop and save the SSIS project directory for the package you are working on. Experiment with the copy until you figure out what is wrong. Go back to the original project folder, make your corrections, and continue.

  • Disable tasks or groups of tasks as you work through large packages to focus only on specific functional areas until they work. (To disable a task, right-click it and select Disable from the pop-up menu.)

Package Setup and File System Tasks

This load package will be set up to look in specific file directories for Lockbox and ACH files. External dependencies like file folders can be a headache during package deployment when you have to remember to have them set up in each environment. Instead of having to set up the locations outside the package, you are going to build in the ability of your package to get these paths from variables and build them as needed. You can then use configuration files to set up the package in each environment without any further intervention. However, you could still have some issues if the directories that you provide are not created, so you need to take this into consideration as you set up the precedence and control of flow in your package. It means adding a few extra steps, but it will allow your package to adjust during initial deployment and any future change to these file locations.

  1. Create a new SSIS project. Name the project CaseStudy_Load. When the project is built, go to the Solution Explorer and click on the Package.dtsx file. In the Property window, find the Name property and change the name from package.dtsx to casestudy_load.dtsx. Answer Yes to change the name of the package object as well.

  2. Because the File System tasks only allow the source and destination properties to be set to variables — not expressions derived from variable values — you are going to have to create a few variables. Use the menu SSIS Variables to access the Variables editor and add the elements shown in the following table. (Note: Some variables are intentionally left blank.)

    Variable Name

    Scope

    Data Type

    Value

    ACHBASEFILEPATH

    CaseStudy_Load

    String

    c:\casestudy\ach

    ACHCURRENTFILE

    CaseStudy_Load

    String

    c:\casestudy\lockbox\sample ach.xml

    ACHERRORFILE

    CaseStudy_Load

    String

    ACHERRORFILEPATH

    CaseStudy_Load

    String

    ACHIMPORTTYPE

    CaseStudy_Load

    String

    ACH

    ACHPROCESSEDFILE

    CaseStudy_Load

    String

    ACHPROCESSEDFILEPATH

    CaseStudy_Load

    String

    BANKBATCHID

    CaseStudy_Load

    Int32

    -999

    BANKBATCHNBR

    CaseStudy_Load

    String

    BATCHITEMS

    CaseStudy_Load

    UInt64

    0

    BATCHTOTAL

    CaseStudy_Load

    Double

    0

    DEPOSITDATE

    CaseStudy_Load

    DateTime

    12/30/1899

    FILEBYTES

    CaseStudy_Load

    Int64

    0

    LBBASEFILEPATH

    CaseStudy_Load

    String

    c:\casestudy\lockbox

    LBCURRENTFILE

    CaseStudy_Load

    String

    c:\casestudy\lockbox\sample lockbox.txt

    LBERRORFILE

    CaseStudy_Load

    String

    LBERRORFILEPATH

    CaseStudy_Load

    String

    LBIMPORTTYPE

    CaseStudy_Load

    String

    LOCKBOX

    LBPROCESSEDFILE

    CaseStudy_Load

    String

    LBPROCESSEDFILEPATH

    CaseStudy_Load

    String

    EMAILIMPORTTYPE

    CaseStudy_Load

    String

    EMAIL

    EMAILCNT

    CaseStudy_Load

    Int32

    0

  3. Add an OLE DB Connection to the Connection Managers to connect to the CaseStudy database. Name the connection CaseStudy.OLEDB.

  4. Add an ADO.NET Connection to the Connection Managers to connect to the CaseStudy database. Name the connection CaseStudy.ADO.NET.

  5. Add an SMTP Connection to the Connection Managers that connects to a viable SMTP mail server. Name the connection Mail Server.

  6. The variables with the string FILEPATH in the names, like @LBPROCESSEDFILEPATH, need to retrieve their values relative to the base file paths. The variable @LBPROCESSEDFILEPATH should be set up relative to the base Lockbox file path in a subdirectory called processed\. To do this, you'll use an expression for the value of the variable. Click on the variable in the Variables Editor. In the Property window, set the property EvaluateAsExpression to True. In the Expression property, add the expression to match Figure 19-7. The \\ is required as an escape sequence for the backslash in the expressions editor. Set the other three variables up to be evaluated as expressions the same way. Notice in the Variables Editor, and as shown in Figure 19-8, that the values change immediately.

    For Variable Name

    Set Expression To

    LBERRORFILEPATH

    @LBBASEFILEPATH + "\ \error"

    ACHERRORFILEPATH

    @ACHBASEFILEPATH + "\ \error"

    ACHPROCESSEDFILEPATH

    @ACHBASEFILEPATH + "\ \processed"

  7. The variables ending in the string FILE, like @LBERRORFILE, @LBPROCESSEDFILE, @ACHERRORFILE, and @ACHPROCESSEDFILE, need to retrieve a unique value that can be used to rename the file into its respective destination file path. Set these variables up to be evaluated using expressions similar to the following formula, but change the variable represented in the example with @LBERRORFILEPATH for each of the four variables. This formula will generate a name similar to 200508160552080160000000.txt. For the ACH files, use the same formula but end the string with an ".xml" extension.

     @LBERRORFILEPATH + "\\" + REPLACE(REPLACE(REPLACE(REPLACE((DT_WSTR, 50)GETUTCDATE(),"-","")," ", ""),".", ""),":", "") + (DT_WSTR, 50)@FILEBYTES + ".txt" 

    Note

    If you are manually typing the expression, there are no hard returns.

  8. Add four File System tasks to the Control Flow design surface. Change the name and description properties to use the settings in the following table.

    Name

    Description

    File System Task Folder LB Processed Folder

    Ensures that the LB Processed Folder exists

    File System Task Folder LB Error Folder

    Ensures that the LB Error Folder exists

    File System Task Folder ACH Processed Folder

    Ensures that the ACH Processed Folder exists

    File System Task Folder ACH Error Folder

    Ensures that the ACH Error Folder exists

  9. You want each of these four tasks to ensure that the directories needed for your load package exist. A File System task can perform the operation of creating a directory. One nifty thing that it will do when it creates a directory is create all the subdirectories in the hierarchy down to the last subdirectory. That is why you don't need two additional File System tasks to create the paths for the variables @LBBASEFILEPATH and @ACHBASEFILEPATH. You get these for free when you check for their subdirectories. For each of the four File System tasks, set the properties shown in the following table.

    Property

    Setting

    Operation

    Create Directory

    UseDirectoryIfExists

    True

    IsSourcePathVariable

    True

    SourceVariable

    Choose the corresponding variable for each task. (Notice how easy this is when the task is named properly?)

  10. Connect the two lockbox File System tasks together by setting a precedence constraint between the File System Task Folder LB Processed Folder task and the File System Task Folder LB Error Folder task. Connect the two ACH File System tasks together the same way. Both precedence constraints should automatically be set to Success.

  11. Run the package to validate that this section has been set up correctly. You should see a file hierarchy created on your machine resembling Figure 19-9.

image from book
Figure 19-7

image from book
Figure 19-8

image from book
Figure 19-9

Executing these four File System tasks at the beginning of your load package helps ensure that the paths will exist for the tasks and transforms downstream. Almost all tasks and transforms that depend on paths or files will generate package failures if they are executed and the paths are invalid. Right now would be a good time to close and save this package and make a copy in another directory for insurance purposes. The CaseStudy_Load package at this point should look like Figure 19-10.

image from book
Figure 19-10

Lockbox Control Flow Processing

If all you had to do were load the Lockbox file, your task would be simple. You could build a Data Flow task to just split the file into header and detail rows and insert them. However, you need to validate that the file has not been previously loaded before you insert the payment information. If the batch has already been loaded, you need to divert the file to an error folder. The problem is that you don't know whether the file has already been processed until you open it up and process it. The other problem is that once you are inside a transformation, you lose the ability to alter the flow of control. In other words, you can't shift gears once you are inside the transformation and use precedence-type logic.

The strategy in this section is to retrieve a file name from the lockbox folder, to parse the header information from the file, and then to store the information in variables. If the batch number hasn't been processed, a [BankBatch] row will be inserted. Finally, you will reparse the Lockbox file for the detail records and insert them using the [BankBatchID] retrieved from the inserted [BankBatch] row. If everything works, you'll e-mail the bank and confirm the receipt of the file. If anything goes wrong, the file will get moved into the error folder.

Note

Make sure you have created and copied the samplelockbox.txt file to the base lockbox directory before starting this section.

  1. Drop a Sequence Container on the Control Flow design surface. Change the Name property to Sequence of Lockbox Processing. Connect the precedence from the last Lockbox File Systems task to the Sequence Container so that the Sequence Container is not executed unless the File Systems task completes successfully.

  2. Add a Foreach Loop container inside the Sequence Container. Change the Name property to For Each Lockbox File. The Foreach Loop is expecting a literal path to poll. You want the loop to rely on a variable, so you'll have to use an expression. This task object is a little confusing because there are actually two sets of expression collections: One set in the left tab is for the container; the second set appears only when the Collections tab is selected. The second set of expressions is the collection of properties for the Foreach Enumerator. It is this second set of expressions that you want to alter. Click on the ellipsis to the right of this Expressions collection.

  3. In the Expressions Editor, the property folder doesn't exist. It does, but it is named Directory. Select the Directory property and set its value to the variable @LBBASEFILEPATH. Evaluate the expression to ensure that it matches the base lockbox path.

    Note

    @LBBASEFILEPATH and @[User::LBBASEFILEPATH] are the same thing since they are both in the same namespace.

  4. Set the property files to "*.txt". Leave the Retrieve File value as Name Fully Qualified.

  5. To store the name of the file you are processing into a variable, click the Variable Mappings tab on the left side of the Foreach Loop container. Select the variable named LBCURRENTFILE to retrieve the value of the Foreach loop for each file found. Leave the index on the variable mapping set to zero (0). This represents the first position in a files collection or the file name returned by the loop. Press OK to complete this task.

  6. One of the things you have to save into the BankBatch data table is the file name and the number of bytes in the file. Now that you have the file name stored in the variable @LBCURRENTFILE, you can retrieve the file size using a Script task and some VB.NET code. Add a Script task to the Foreach Loop. Change the name to Script LB File Size into Variable.

  7. For the Script task to be able to read and write to variables, you have to pass them into the container. Provide the variable LBCURRENTFILE for the ReadOnlyVariables property. Provide the variable FILEBYTES as the ReadWriteVariables property. Note that when passing variables into the Script task, the @ sign should not be used.

  8. Click the Design Script button. This opens up a .NET development environment. Add an imports reference to the System.IO namespace and update the script to pull the file bytes from the file name provided in the DTS Variables collection:

     Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Runtime Imports System.IO  '<--Added Input/Output library Public Class ScriptMain     Public Sub Main()         '**         'SCRIPT         'PURPOSE: To take file bytes and save to global variable         '==================================================================         Dim oFile As FileInfo         Dim lDefault As Int64         lDefault = 0         Try             oFile = New FileInfo(Dts.Variables("LBCURRENTFILE").Value.ToString)             Dts.Variables("FILEBYTES").Value = oFile.Length         Catch ex As Exception             Dts.Variables("FILEBYTES").Value = lDefault         End Try         Dts.TaskResult = Dts.Results.Success     End Sub End Class 

At this point, you would know the file name and file size. The Foreach Loop stored the file name into a variable. The Script task retrieved the file size and stored the data into the FILEBYTES variable. You still need to figure out whether you have seen this file before. A unique batch number by import type is embedded in the header of the file. There are a few ways to retrieve that information. One way is to use a Data Flow task to examine the file. You could also use a Script task to open up and examine the file header row, but the Data Flow task will provide the added advantage of failure upon encountering a bad format. You can then alter your Control Flow to push this file to the error folder.

  1. Add a Data Flow task. Connect the successful completion of the Script task to this task. Change the Name property to Data Flow Lockbox Validate File and Header Info. This task will parse out the batch header information into variables and then perform a lookup for a similar batch. An existing BankBatchID will be returned in the BankBatchID variable. You'll come back and configure the Data Flow in the next section, Lockbox File Validation. Disable the task for now.

  2. Add an Execute SQL task. This task will use a stored procedure, usp_BankBatch_Add, to add the parsed information in the Lockbox file as a row in the BankBatch table to represent a new batch file. You don't want this task or any other tasks connected to it to be executed unless the Data Flow task returned a no-match condition or a BankBatchID of zero (0). To create a conditional precedence with an expression, connect the successful completion of the Data Flow task to this task. Then right-click on the precedence connector and select Edit. Set up the Precedence Constraint editor to look like Figure 19-11.

  3. If the Data Flow task finds an existing BankBatchID, you also need to move the file into an error folder. Add a File System task and connect with similar precedence and constraint conditions — except change the expression to apply when the BankBatchID does not equal zero (0) OR if the Data Flow task fails. Set the Evaluation Operation to Expression OR Constraint. Set the Value to Failure and the Expression to @BANKBATCHID != 0. Select the Multiple constraint option of Logical OR.

  4. It seems that the operation you want to perform with the File System task is Move File, but since you created a variable earlier, @LBERRORFILE, that is created with a full file path and a unique file name, the better option is to Rename the file. Update the variable @LBCurrentFile to c:\casestudy\lockbox\samplelockbox.txt and then set the File System properties to the values shown in the following table.

    Property

    Value

    IsDestinationPathVariable

    True

    Destination Variable

    User::LBERRORFILE

    OverwriteDestination

    True

    Name

    File System Task Error Folder

    Description

    Moves bad files to an error folder

    Operation

    Rename File.

    IsSourcePathVariable

    True

    SourceVariable

    User::LBCURRENTFILE

    Note

    The File System task here will complain that the value for User::LBCurrentFile is empty if it doesn't have a default value.

  5. Finish up the Execute SQL task by setting the properties to match the table below. Then set up the Parameter Mapping page to look like Figure 19-12.

    Property

    Value

    Name

    Execute SQL task to add Bank Batch Hdr

    ConnectionType

    ADO.Net

    Connection

    CaseStudy.ADO.Net

    SQLStatement

     EXEC usp_BankBatch_Add @BankBatchID OUTPUT, @BankBatchNbr, @DepositDate, @ReceivedDate, @BatchTotal, @BatchItems, @FileBytes, @FullFilePath, @ImportType 

  6. Add a second Data Flow task to the Foreach Loop. Connect the successful completion of the Execute SQL task to this task. Add an expression to check for a nonzero BankBatchID, and add a successful completion constraint between the Execute SQL task and this new Data Flow task. The stored procedure will return via an output parameter either a zero (0), which indicates that the batch already exists, or a new BankBatchID. If the Execute SQL task were to fail, or if the BankBatchID is returned as a zero, the Lockbox file should be moved to the error folder and you should move on to the next file. Add a Failure constraint and an expression that looks for a zero BankBatchID value. Set the condition to an OR so that if either occurs, the file moves to the error folder.

  7. Change the name property to Data Flow Lockbox Detail Data Load. You'll come back and configure the Data Flow in the section "Lockbox Processing." Disable the task for now.

  8. Add a Send Mail task to the Foreach Loop. Connect the successful completion of the Data Flow Lockbox Detail Data Load to this task. Set the SMTPConnection property of the task to the SMTP Mail Server connection. Fill in the To, From, and Subject properties. Create an expression for the MessageSource property similar to the following code:

     "COMPANY ABC,  123 Main St, Somewhere, FL 99999 received and successfully processed lockbox file -" +  @[User::BANKBATCHNBR] + "- for the amount of $" + (DT_WSTR, 15) @[User::BATCHTOTAL] + " on " + (DT_WSTR, 50) GETUTCDATE()  +".  If this is not correct please contact Minnie in accounts payable at 555-1111 ext 211 or minnie@companyabc.com " + "This is an automated response." 

  9. If the file is processed successfully, you need to move it to the "processed" folder. Add another File System task and connect it to the successful completion of the second Data Flow task. Set up this task just like the Error Folder File System task but point everything to the processed folder.

    Property

    Value

    IsDestinationPathVariable

    True

    Destination Variable

    User::LBPROCESSEDFILE

    OverwriteDestination

    True

    Name

    File System Task Processed Folder

    Description

    Moves completed files to an error folder

    Operation

    Rename File.

    IsSourcePathVariable

    True

    SourceVariable

    User::LBCURRENTFILE

image from book
Figure 19-11

image from book
Figure 19-12

You now have the basic structure set up for the Lockbox control flow. You still need to go back and build your transforms. You'll get to that in the next steps. Right now would be a good time to close and save this package and make a copy in another directory for insurance purposes. The CaseStudy_Load package at this point should look like Figure 19-13.

image from book
Figure 19-13

Lockbox Validation

In this section of the package, you are going to fill in the details of the Data Flow container. The strategy will be to open up the Lockbox file and retrieve information from the header to pass back to the control flow via variables. You'll use a Flat File connection to read the file, a conditional split transform to separate out the header and the check lines, derived columns to parse out the header line, and an aggregate count transform to count the check transactions. You'll use Script Component transforms to pull this information from the transformation stream and store it in your variables.

  1. Start by setting up a New Flat File Connection in the Connection Managers tab. Configure the properties in the following table on the connection.

    Property

    Setting

    Name

    Lockbox Flat File

    Description

    Flat File for Lockbox processing

    File Name

    c:\casestudy\lockbox\samplelockbox.txt

    Format

    Ragged right

    Advanced:OutputColumnWidth

    80

    Advanced:DataType

    string[DT_STR]

    Advanced:Name

    line (case is important)

  2. The only problem with the previous step is that you had to set a file name to a literal to set up the connection. At runtime you want to retrieve the file name from your variable @LBCURRENTFILE. Save the Flat File Connection, and then access the expressions collection in the Properties tab. Use the expression editor to set the ConnectionString property to @LBCURRENTFILE.

  3. To use the Flat File connection in your Data Flow, add a Flat File Source to the Data Flow design surface. Select the Flat File connection created in the previous step named Lockbox Flat File. Name this transform source Flat File Lockbox.

  4. One of the main purposes of this Data Flow is to perform an extraction of the header information and then perform a lookup on the batch number. You will use the Lookup transformation for this task, and one of the "gotchas" to using this task is that it is case-sensitive. Since at this point your source contains all the data in a single column, it makes sense to go ahead and run the data through a transform that can convert the data to uppercase in one step. Add a Character Map transform to the Data Flow. It should be connected to the output of the Flat File Source and be configured to perform an in-line change to uppercase with both input column and output alias as "line."

  5. The Lockbox file contains three types of data formats: header, invoice, and check. At this stage, you are trying to determine whether this batch has been previously processed, so you only need the information from the header and a count of the check transactions. To split the one-column flat file input, add a Conditional Split transform to the Data Flow. Set up the transform to use the leftmost character of the input stream to split the line into two outputs: Header and Check. The transform should look like Figure 19-14.

  6. Add a Derived Column task to the Data Flow. Name it Derived Columns from Header.Connect to the Header output of the Conditional Split. This task will allow you to easily split up the line into the pieces of data that it represents. With the Derived Column task, you also get the conversion utilities as an added benefit. With the import stream being a string type, this is where you have to think ahead on your conversions. To add a row to the BankBatch table, you are going to need to extract a Batch Number from this input stream. Notice that the BatchNbr field in the BankBatch table is a Unicode variable text field. If you parse the text string into the data type of [DT_WSTR], you will match the destination field. Paying attention to data types early will save you many headaches further into the package. Set up the derived columns to match Figure 19-15.

  7. Wait a minute! These explicit castings of string data could be disastrous. What if the bank provides some bad data in the Batch Total field? Good question. If you just left the default error handling in place, the package would fail. You don't want that to happen; you just want to reject the file. To do that, you need the Control Flow to recognize that something is wrong and divert the file to the error folder. Notice that we said Control Flow — not Data Flow. The precedence and constraint you set up between this Data Flow task and the Execute SQL task to add a Bank Batch header is set up to reject the file if it returns any nonzero bank batch ID. For your transform to set the variable, you'll need to divert the whole row. In fact, it would be a good idea to divert the output stream of the header row if you have any problems converting this data into proper types. Click on the Configure Error Output button and set all actions for error or truncations to divert and redirect the output to another output stream. See Figure 19-16 for the completed error output.

  8. There are many different options for handling errors in SSIS. In this Data Flow, if there is an error parsing the lockbox header, it is probably an invalid format-type error, so you want to be able to capture information about that error. To do this, add a Script Component task to make use of the error stream you created from the Derived Column task. Set it up as a transformation. The error stream currently contains your raw data, an error code, and a column number for each error generated. You can use the Script Component to add the error description to your stream, and then in another transform task you can log the information into your [ErrorDetail] table. Connect the error output of the Derived Column task to the Script Component task to capture the original input stream. Name this task Script Component Get Error Desc. Open the Script Transformation Editor and select all the input columns. Then, in the Inputs and Outputs tab, expand the Output0 collection and add an output column (not an output) named ErrorDesc. Set the type to [DT_WSTR] with a length of 1048. Open up the design environment for the Script Component. Change your processinputrow event code to the following:

     Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)     'SCRIPT     'PURPOSE: To retrieve the error description to write to error log     '=====================================================================     Row.ErrorDesc=  ComponentMetaData.GetErrorDescription(Row.ErrorCode) End Sub 

  9. Add a Derived Column transform and name it Derived Column System Variables. Along with the detailed error message, it will be helpful to add other information like the ExecutionInstanceGUID to log in to your custom ErrorDetail table. The ExecutionInstanceGUID is a unique ID given to each run of an SSIS package that will allow you to combine your custom error logging with other package error logging to give you a complete picture of what occurred when a package failed. Create the Derived Columns shown in Figure 19-17.

  10. Add an OLE DB Destination to save this data into the ErrorDetail table. Name the transform OLE DB Destination Error Log. Set up the OLE DB connection and the name of the table to ErrorDetail. Map the columns. Most input columns should match the destination columns in the table. Map the column [Line] to the [RawData] column. Now you've handled the worst-case scenario of bad batch header data. Not only do you store the error of a bad conversion or batch header, but the flow of data will now stop at this transform. This leaves the value of the BankBatchID to the default of -999, which will cause the Control Flow to divert the file to the error folder — just what you want.

  11. The other Data Flow back at the Derived Columns from Header transform should contain data that was successfully converted to proper data types. You now have to determine if the BatchNbr has been sent and received earlier by checking to see if it matches any existing BatchNbr in the BankBatch table by import type. Add a Lookup transformation task to the flow of the Derived column. Change the name to Lookup BankBatchID. Connect the CaseStudy.OLEDB connection. In the reference tab, instead of using the BankBatch table, specifically select the BankBatchID, BankBatchNbr, and ImportType columns using a SQL Query. One of the many reasons to use a query in the Lookup transformations is that the task is case-sensitive. You made sure that the batch number parsed from the input file is in uppercase. To ensure that you get matches to the stored data, use an UPPER statement in the SQL statement.

     SELECT BANKBATCHID, UPPER(BANKBATCHNBR) AS BANKBATCHNBR, UPPER(IMPORTTYPE) AS IMPORTTYPE FROM BankBatch ORDER BY BANKBATCHNBR 

    In the Columns tab, connect the BatchNbr input column to the BankBatchNbr column. Connect the ImportType input column to the ImportType column. This is the equivalent of running a query against the BankBatch table looking for a matching row ImportType and BatchNbr for each row in the input stream. In the grid, add a lookup column BankBatchID by selecting the field in the lookup table. The result of the lookup will be either a NULL value or a retrieved BankBatchID. Since you are expecting the lookup task to return no matches, use the Configure Error Output to set the error output for the lookup to Ignore Failure.

  12. At this point in the flow, you've accomplished the mission of determining whether the Lockbox file is formatted properly, and you have retrieved header-level information from the file that you need to get back to the Control Flow. Add a new Script Component task to the Data Flow and connect to the successful output stream of the Lookup task as a destination. Name this task Script Component to Store Variables. In this task, select the columns BankBatchID, BatchNbr, BatchTotal, ImportType, and DepositDate from the input stream. They will be added automatically as input columns and will be available in a row object. Add the matching variables to the ReadWriteVariable property: BANKBATCHID, BANKBATCHNBR, DEPOSITDATE, BATCHTOTAL. Remember, variables are case-sensitive and must be passed as a comma-delimited list.

  13. In the script component, this time you want to access the row object to retrieve the values that are in the input row stream. Since you are processing the header row, you'll have only one row to process. Accessing the row values is not a problem. However, saving the value to a package variable is not allowed when processing at a row level. You can only access package variables in the PostExecute event stub. To retrieve the values, use variables to capture the values in the row-level event, and then update the package variables in the PostExecute event. If you have a need to retrieve information from your Data Flow into variables as in this example, this technique will be really useful to you. To continue with this example, replace the Script Component script with the following code:

     'CASE STUDY 2005 Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper Imports Microsoft.SqlServer.Dts.Runtime.Wrapper Public Class ScriptMain     Inherits UserComponent     Public LocalBankBatchID As Int32 = -999     Public LocalBatchTotal As Double = 0     Public LocalBankBatchNbr As String = ""     Public LocalDepositDate As Date = #12/31/1899#     Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)         'SCRIPT         'PURPOSE: This sub will fire for each row processed         '         since we only have one header row we only         '         this sub will fire only one time.         '         Store values in variables         '============================================================         Try             If Row.DepositDate_IsNull = True Or _                   Row.BatchTotal_IsNull = True Or _                     Row.BatchTotal = 0 Then                 'KEEP DEFAULTS             Else                 LocalBankBatchNbr = Row.BatchNbr + ""                 LocalBatchTotal = Row.BatchTotal                 LocalBankBatchID = Row.BANKBATCHID                 LocalDepositDate = Row.DepositDate             End If         Catch ex As Exception             'KEEP DEFAULTS         End Try     End Sub     Public Overrides Sub PostExecute()         'SCRIPT         'PURPOSE: To set SSIS variables with values retrieved earlier         '============================================================         Try             'Attempt to accept the values             Variables.BANKBATCHNBR = LocalBankBatchNbr             Variables.BANKBATCHID = LocalBankBatchID             Variables.DEPOSITDATE = LocalDepositDate             Variables.BATCHTOTAL = LocalBatchTotal / CDbl(100)         Catch ex As Exception             'If any failure occurs fail the file             Variables.BANKBATCHID = -999             Variables.BATCHTOTAL = 0         End Try         MyBase.PostExecute()     End Sub End Class 

  14. The last variable that you need to retrieve is the number of transactions in the lockbox batch. The Lockbox file has detail lines separated into invoice and check types. You are really only interested in the check lines. Add an Aggregation transformation to the Conditional Split transform to capture the Check output stream. Name the transform Aggregate Check Count. Select the line column from the input columns. Set the Output Alias to BatchItems. Set the operation to Count. This will count the number of checks and put that count into your Data Flow. Now you just need to save the count of the checks into a variable.

  15. Add a Script Component task to the Data Flow and attach the Aggregate output as a Destination. Change the name to Script Component to Capture BatchItems. Select the BatchItems column from the input stream. Add the variables BATCHITEMS and BANKBATCHID to the ReadWriteVariables property. A common issue at this point is that the type returned by the Aggregate output is a Unicode Long data type. That's why your BatchItems variable was preset to UINT64. Add the following script to the task in the ProcessInputRow stub:

     ' CASE STUDY 2005 Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper Imports Microsoft.SqlServer.Dts.Runtime.Wrapper Public Class ScriptMain     Inherits UserComponent     Public LocalBatchItems As UInt64     Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)         '**         'SCRIPT         'PURPOSE: Attempt to save the value of batch items         '         from aggregation to local variables b/c we         '         can't set DTS Variables in this sub         Try             If Row.BatchItems_IsNull = True Then                 LocalBatchItems = 0             Else                 LocalBatchItems = Row.BatchItems             End If         Catch ex As Exception             LocalBatchItems = 0         End Try     End Sub     Public Overrides Sub PostExecute()         '**         'SCRIPT         'PURPOSE: Sets the value of DTS variables to         '         local values         Variables.BATCHITEMS = LocalBatchItems         MyBase.PostExecute()     End Sub End Class 

image from book
Figure 19-14

image from book
Figure 19-15

image from book
Figure 19-16

image from book
Figure 19-17

This will complete the Lockbox Validation Data Flow — which should at this point look like Figure 19-18. Enable the Data Flow Lockbox Validate File and Header Info in the Control Flow and disable the Execute SQL task to run a test of the package. Remember that the only purpose of this Data Flow is to determine whether the file is properly formatted and whether it is indeed a new file to process. All it will do is open the file and parse the information from the file. Play around with the header of the file and put in invalid data such as '02/31/05'. You should see the transformation move through the error section of the Data Flow and store a row in the ErrorDetail table. The text file will then be moved into the error folder in the c:\casestudy\lockbox\ directory.

image from book
Figure 19-18

This Data Flow is by no means complete and ready for production. The batch lines making up the detail should also be validated for proper data types using the same techniques in this step. Essentially the default BANKBATCHID is set prior to this set of transformations to fail. If the transformation flows completely to the script component and stores the batch header information, it will be considered a success. If not, this step will be considered suspect, and the file will be rejected. This should give you a good idea of what you can do without having to stage the data before processing it.

Lockbox Processing

Once you've validated your file, loading the detail data into the BankBatchDetail table will be rather simple. You have all the header-related information. The Execute SQL task will create a row in the BankBatch table to store the batch, and you'll store the primary key in the BANKBATCHID variable. You now just need to re-examine the text file and process the detail transactions. Your strategy will be to separate the invoice lines into two parts: a part containing individual payment invoice information, and a part containing check lines from the batch file. After validating the contents, you will recombine the two rows into one. That will give you the ability to do a straight insert using one row into the BankBatchDetail table.

Enable the Data Flow Lockbox Detail Data Load task in the Control Flow. Double-click it to expose the Data Flow design surface.

  1. You already have a file connection in the Connection Managers for the Lockbox Flat File. Add a Flat File Source onto the design surface and set it up to use the Lockbox Flat File connection. Name it Flat File Lockbox Source.

  2. Because the lookup transactions are case-sensitive, it is better to add a Character Map transform to convert the stream to uppercase while all the data is in one column. Name the Character Map, and set the operation to Uppercase and the destination to In-place.

  3. Add a Conditional Split transform similar to what you did earlier when validating the batch file. This time you'll split the file into each of its parts: the header, check, and invoice lines. Set up the transform to use the leftmost character of the input stream to split the line into three outputs: Header, Check, and Invoice, based on the characters "H," "C," and "I," respectively.

  4. Add two Derived Column transforms to the Data Flow. Attach one to the Checks output of the Conditional Split transform. Name it Derived Column Check. Attach the other to the Invoice output of the Conditional Split transform. Name it Derived Column Invoice. Don't worry about the header portion for the moment. Using the business specs, create the following new columns to parse out of each line type.

    Transform

    Derived Column

    Expression

    Data Type

    Invoice

    RawInvoiceNbr

    trim(substring(line,13,len([line])-13))

    [DT_STR] 50

    Invoice

    MatchingKey

    trim(substring(line,2,4))

    [DT_STR] 50

    Invoice

    ReferenceData1

    trim(substring(line,2,10))

    [DT_STR] 50

    Check

    PaymentAmount

    ((DT_NUMERIC,11,2)_SUBSTRING([line],15,8) / (DT_NUMERIC,11,2)100.00)

    [DT_CY]

    Check

    MatchingKey

    trim(substring(line,2,4))

    [DT_STR] 50

    Check

    ReferenceData2

    trim(substring(line,2,10))

    [DT_STR] 50

    Notice the use of the [DT_STR] data type, which is a NULL-Terminated Non-Unicode data type — not the [DT_WSTR] Unicode data type. You are doing this because ultimately you are going to insert this data into the table BankBatchDetail where the data types are VARCHAR. I'm mentioning this here because the default data types and lengths are inferred from the field that you are using to design the package. It can be annoying that your settings will be overwritten if you change anything in the expression. However, if you don't get the data type right here, you'll need to add a Data Conversion transform to convert the data into a compatible format or the SSIS validation engines will complain — and may not compile. The other thing to notice here is the use of TRIM statements. In flat file data, the columns are tightly defined, but that doesn't mean the data within them lines up exactly to these columns. Use the TRIM statements to remove leading and trailing spaces that will affect your lookup processes downstream.

  5. Now at this point you've got two output streams: one from the invoice lines and one from the check lines. You want to put them back together into one row. Any transformation you use is going to require that you sort these outputs and then find something in common to join them together. To put them together, you have to have some data that matches between the outputs. Luckily the bank provides the matching information and you parsed it out in the Derived Column task. The column name shared by both outputs that contains the same data is ReferenceData1. Look at a two-line sample from the Lockbox file. Columns two through six (2–6) contain the string 4001, which is defined in your business specs as the lookup key that ties the transaction together. (The entire sequence 4001010003 refers to an image system lookup ID.)

     I4001010003 181INTERNAT C4001010004   01844400 

    Add two new Sort transformations to the Data Flow and connect one to each Derived Column output. Select the field MatchingKey in both sorts and sort ascending. Select all columns for pass-through, except for the Line column. You will no longer use the line data, so there is no need to continue to drag this data through your transformation process. Now you are ready to merge the outputs.

  6. Add a Merge Join transformation to connect the two outputs to the component. In the editor, select the RawInvoiceNbr and ReferenceNbr1 columns from the Invoice sort stream. Select the PaymentAmount and ReferenceData2 columns from the Check sort stream. There is no need to bring the Matching key data forward since that information is embedded in the ReferenceData fields. Make sure the JOIN type is set to INNER Join.

  7. This stream is now one row per check and invoice combination. You are almost ready to insert the data into the BankBatchDetail table. All you need now is the Foreign Key for the BankBatch table. Earlier you stored this information in a global variable. To merge this into your stream, add a Derived Column task and add the variable BANKBATCHID to the stream. (You could have done this earlier in either the check or invoice processing steps as well.) You automatically get all the other fields in the Data Flow as pass-through.

  8. Add an OLE DB destination and connect to the CaseStudy.OLEDB connection. Connect the transform and select the table BankBatchDetail. Map the columns from the output to the BankBatchDetail table where the column names match.

  9. You still have one task left to do before closing out this Data Flow, and that is saving a snapshot of the file contents into the BankBatch row. Everything else you are doing in this Data Flow is saving data at the payment or detail level. Saving the entire file contents for audit purposes is a batch-level task. To do this, you'll need to create a separate stream that will use the Header portion of the Conditional stream you split off early in the Data Flow. Start by adding a Derived Column task connecting to the header-based output. Add the following columns to the stream.

    Derived Column

    Expression

    Data Type

    LBCurrentFile

    @[User::LBCURRENTFILE]

    [DT_WSTR] 100

    BankBatchID

    @[User::BANKBATCHID]

    [DT_I4]

  10. Add an Import Column transformation and connect it to this Header Derived Column transform. On the Input Columns tab, select the field that contains the file path in the stream: LBCurrentFile. Then go to the Advanced Input Output property tab and expand the Import Column Input and Import Column Output nodes. Add an output column to the output column node called FileStreamToStore. Set the DataType property to [DT_NTEXT]. The editor should look similar to Figure 19-19, but the Lineage IDs will be different. Note the LineageID and set the property in the LBCurrentFile Input Column named FileDataColumnID to that LineageID. Using Figure 19-19, the ID would be 1080.

  11. Add an OLE DB Command transform to the output of the Header Derived Column transform. Set the OLEDB connection to CaseStudy.OLEDB. Then set the SQL Command to Update BankBatch Set BatchFile = ? where BankBatchID = ? and press Refresh. In the Mappings tab, connect the FileStreamToStore to the Destination Column Param_0, which is the [BatchFile] field in the BankBatch table. Connect the BankBatchID to the Destination column Param_1. Click Refresh and save.

image from book
Figure 19-19

This completes the Data Flow task. The task will parse and save the lockbox detail file into your BankBatchDetail data table. The entire Data Flow should look similar to Figure 19-20. Now would be a good time to save the package. If you've run the package up to this point, check to see that a lockbox sample file exists in the c:\casestudy\lockbox\ folder. Enable the Execute SQL task and run the package to watch it execute.

image from book
Figure 19-20

Note

To run the test file through multiple times, you'll need to reset the database by truncating the BankBatch and BatchBatchDetail tables between runs. Otherwise in subsequent runs the package will fail upon finding that the file has been previously processed.

The remaining two processing options for ACH (which involve processing an XML file and e-mail payments) are not required to be completed, if you want to go on to building the Case Study Process package.

ACH Control Flow Processing

In the business specs, you have to process ACH files that represent the payment wire detail. The approach to this problem will resemble closely what you did for the Lockbox.

  1. Move the two File System tasks that you built to validate the ACH paths in the Package Setup & File System Tasks section of the Case Study down under the Lockbox Sequence Container. Connect these tasks with a precedence type of Completion. You are using this condition because you don't want the ACH processing logic to be skipped just because the Lockbox has an error, but you also don't want the ACH process to take place until the Lockbox process completes. Minimize the Sequence Loop to give yourself more room to work. The Control Flow surface should look similar to Figure 19-21.

  2. Add a Sequence Container named Sequence of ACH Processing. Connect to the last File Systems task for the ACH Processed folder.

  3. Add a Script task to the Sequence container to reset the variables. Add the following variables in the ReadWriteProperty of the Script task: BANKBATCHID, BANKBATCHNBR, BATCHITEMS, BATCH-TOTAL, DEPOSITDATE, FILEBYTES. Open the script design mode and replace the Sub Main with this code:

     'SCRIPT 'PURPOSE: To reset variables used by each payment file process '================================================================ Public Sub Main()     Dts.Variables("BANKBATCHID").Value = System.Convert.ToInt32(-999)     Dts.Variables("BANKBATCHNBR").Value = ""     Dts.Variables("BATCHITEMS").Value = System.Convert.ToUInt64(0)     Dts.Variables("BATCHTOTAL").Value = (0.0)     Dts.Variables("DEPOSITDATE").Value = #12/31/1899#     Dts.Variables("FILEBYTES").Value = System.Convert.ToInt64(0)     Dts.TaskResult = Dts.Results.Success End Sub 

  4. Add a Foreach Loop to loop through the XML files. Connect it to the Script task with a success precedence. Set this task up just like the Lockbox Foreach Loop, using expressions for the Directory property, but use the variable ACHBASEFILEPATH. In the Collections tab, set the folder up to start with the path c:\casestudy\ach\. Set the Files property to *.xml. Capture the file in the Foreach Loop in the variable ACHCURRENTFILE.

  5. Add a Script task to take the file name stored in the variable ACHCURRENTFILE and retrieve the file size into the variable FILEBYTES. This task is set up the same as the one for scripting the lockbox.

  6. Add a Data Flow task. Connect the successful completion of the Script task to this task. Change the Name property to Data Flow ACH Validate File and Header Info. This task will parse out the batch header information into variables and then perform a lookup for a similar batch. An existing BankBatchID will be returned in the BankBatchID variable. You'll come back and configure the Data Flow in the section "ACH Validate Data Flow." Disable the task for now.

  7. Add an Execute SQL task. Copy this task directly from the Lockbox Foreach Loop. This task will use a stored procedure, usp_BankBatch_Add, to add the parsed information in the Lockbox file as a row in the BankBatch table to represent a new batch file. You don't want this task, or any other tasks connected to it, to be executed unless the Data Flow task returns a no-match condition or a BankBatchID of zero (0). Create a successful conditional precedence with an expression between the Data Flow task and this task. The expression should confirm that the value of @BANKBATCHID==0 is true. Change the mappings to parameters @FullFilePath and @ImportType to [User::ACHCURRENTFILE] and [User::ACHIMPORTTYPE].

  8. If the Data Flow task finds an existing BankBatchID, you also need to move the file into an error folder. Add a File System task and connect with similar precedence and constraint conditions — except change the expression to apply when the BankBatchID does not equal zero (0) OR if the Data Flow task fails. Set the Evaluation Operation to Expression OR Constraint. Set the Value to Failure and the Expression to @BANKBATCHID != 0. Select the Multiple Constraint property to the option of Logical OR.

  9. Set up the File System task to match the same step for the lockbox — except for the properties in the following table.

    Property

    Value

    Destination Variable

    User::ACHERRORFILE

    SourceVariable

    User::ACHCURRENTFILE

  10. Add a second Data Flow task to the Foreach Loop. Connect the successful completion of the Execute SQL task to this task. Add an expression to check for a nonzero BankBatchID and a successful completion constraint between the Execute SQL task. The stored procedure will return via an output parameter either a zero (0), which indicates that the batch already exists, or a new BankBatchID. If the Execute SQL task were to fail, or if the BankBatchID is returned as a zero, the Lockbox file should be moved to the error folder and you should move on to the next file. Add a Failure constraint and an expression that looks for a zero BankBatchID value. Set the condition to an OR so that if either condition is true, the file moves to the error folder.

  11. Change the name property on the Data flow task to Data Flow ACH Detail Data Load. You'll come back and configure the Data Flow in the section "Data ACH Detail Data Load" later in the chapter. Disable the task for now.

  12. Add a Send Mail task to the Foreach Loop. Connect the successful completion of the Data Flow ACH Detail Data Load to this task. Set the SMTPConnection property of the task to the SMTP Mail Server connection. Fill in the To, From, and Subject properties. Create an expression for the MessageSource property similar to the following code:

     "COMPANY ABC, 123 Main St, Somewhere, FL 99999  received and successfully processed ACH file -" +  @[User::BANKBATCHNBR] + "- for the amount of $" + (DT_WSTR, 15) @[User::BATCHTOTAL] + " on " + (DT_WSTR, 50) GETUTCDATE()  +".  If this is not correct please contact Minnie in accounts payable at 555-1111 ext 211 or minnie@companyabc.com " + "This is an automated response." 

  13. If the file is processed successfully, you need to move it to the "processed" folder. Add another File System task and connect it to the successful completion of the second Data Flow task. Set up this task just like the Error Folder File System task, but point everything to the processed folder. The properties in the following table are different.

    Property

    Value

    Destination Variable

    User::ACHPROCESSEDFILE

    SourceVariable

    User::ACHCURRENTFILE

image from book
Figure 19-21

You now have the basic structure set up for the ACH Control Flow. You still need to go back and build your transforms. But you'll get to that in the next steps. Right now would be a good time to close and save this package and make a copy in another directory for insurance purposes. The CaseStudy_Load ACH Sequence container at this point should look like Figure 19-22.

image from book
Figure 19-22

To test the progress so far, disable the Execute SQL task so that a batch row won't be created. Disable the Lockbox Sequence Container, so it won't be run. Save, and then execute the package to ensure that everything so far is set up properly.

ACH Validation

In this section of the package, you are going to fill in the details for the ACH Data Flow container. The strategy will be to open up the ACH file, retrieve information from the header, and pass the information back to the Control Flow via variables. You'll use an XML data source combined with an XSD file that you'll create and edit to read the file. Since the data is structured and hierarchical, you don't have the parsing tasks that are associated with flat files. However, you can still have bad data in the structure, so you'll have to validate the file. You'll use a lookup on the header to look for matches by batch number, and a Script Component will pull this information from the transformation stream and send it back into your Control Flow.

  1. Start by enabling the ACH Validate File and Header Info Data Flow. Click on the task to enter the Data Flow.

  2. Add an XML Source to the Data Flow. In the XML Source editor, set the XML Location to the SampleAch.xml file that should be in the c:\casestudy\ach\ folder. You should immediately see the message shown in Figure 19-23. This message is acknowledging that an XML formatted file has been selected, but the task needs schema information from the XSD file to set up data types and sizes. Since you don't have an XSD file, you'll use a utility provided with this component to generate one.

  3. Provide the XML source with a path to build the XSD as c:\casestudy\ach\ach.xsd. Then press the Generate XSD button to create the file. Unfortunately, the XSD generator is not perfect, so if you use this tool, manually validate the XSD file. Here's where error-handling strategy and design come into play. You can set up the XSD with all text fields, and the file will always parse successfully. However, you will have to type check all the fields yourself. If you strongly type your XSD, the task could fail and you won't get a chance to make any programmatic decisions. Another thing to note is that the automatically generated XSD is based on the available data in the XML file, so in the case of your header, which has only one row, it doesn't have much data to review to pick data types. That's why the XSD type designation for the BATCHITEMS variable is incorrect. Open the XSD up in Notepad and change the XSD type designation from xs:unsignedByte to xs:UnsignedInt. Now you match the data type of your global BATCHITEMS variable.

  4. In the XML source component, go to the Error Output tab. For both header and detail output and for every column and every error type, set the action to Redirect Row. Since you are dealing with an ACH file, the effect of truncating numbers and dates is a big deal. You want the file to fail, and redirecting the output will allow you to record what went wrong and then end the current Data Flow task that exists solely to validate the incoming file.

  5. In the same way as the lockbox, if you do get row errors, you would like to gather as much information about the error to assist in the troubleshooting process. The XML Source has two error outputs, Header and Detail, so you'll have twice as much work to do. Create two Script Component tasks as transformations like you did to capture errors in the Lockbox Data Flow for each of the error outputs from the XML Source. Select the ErrorCode and ErrorColumn columns from the input. Create a new Output Column named ErrorDesc of type Unicode string [DT_WSTR] and size 1048. Open up the design environment for the Script Component. Change your ProcessInputRow event code to the following:

     Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)     'SCRIPT     'PURPOSE: To retrieve the error description to write to error log     '=====================================================================     Row.ErrorDesc= ComponentMetaData.GetErrorDescription(Row.ErrorCode) End Sub 

  6. Add two Derived Column transforms with the following derived columns. Connect them to the output of the two Script component transformations.

    Derived Column

    Expression

    DataType

    ExecutionID

    @[System::ExecutionInstanceGUID]

    DT_WSTR 38

    ErrorEvent

    "ACH"

    DT_WSTR 20

    ErrorDate

    @[System::ContainerStartTime]

    DT_DATE

    For the Detail output only, add the following derived column.

    RawData

    (DT_STR, 1028, 1252) ErrorDesc

    DT_STR 1048

  7. Add two OLE DB destination components and connect them to the output of the Derived Columns mapping the fields to the table ErrorDetail exactly as you did for the Lockbox Data Flow. Map the converted [rawdata] field to the [rawdata] field for the detail output. Map the ID field of the header output to the output [rawdata] field. The error handling of the bad XML file should look like Figure 19-24.

  8. If the XML data is good, you want to perform a lookup on the batch number. Don't forget about the lookup being case-sensitive. Add a Character Map transform task and convert the Header output ID field (a batch number) to uppercase as an in-place change.

  9. You also need a value in your stream to allow a lookup on import type. Batch numbers are only guaranteed to be unique by this type, and it is stored in the global variables. Add a Derived Column transform to add a column ImportType to your output stream. Since the [ImportType] field in the BankBatch table is of type CHAR(10), add the derived column as a type string [DT_STR] of size 10.

  10. Add the Lookup transform to the Data Flow. Set the OLE DB connection to CaseStudy.OLEDB. Set the Lookup to the results of the following query:

     SELECT BANKBATCHID, UPPER(BANKBATCHNBR) AS BANKBATCHNBR, UPPER(IMPORTTYPE) AS IMPORTTYPE FROM BANKBATCH ORDER BY BANKBATCHNBR 

    In the Columns tab, link the Input Column ID to the Lookup column of BankBatchNbr. Link the ImportType columns. Add BankBatchID as the Lookup column with an output alias of BANKBATCHID. Since you are expecting that you will not get a match on the lookup and that this is indeed a new file, use the Configure Error output button and set the Lookup step to Ignore Failure on the Lookup Output.

  11. Add a script component task as a destination to capture the successful end of your transformation Data Flow. Connect it to the Lookup output. Open the editor and select all the available input columns. Add the following global variables as ReadWriteVariables: BANKBATCHNBR, BANKBATCHID, BATCHTOTAL, BATCHITEMS, DEPOSITDATE. Insert the following code to store the variables. This code uses a technique of creating global variables, setting the global variables for the one row representing the header, and then setting the values of your passed-in variables to the variables stored temporarily in the script global variables:

     'CASE STUDY 2005 Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper Imports Microsoft.SqlServer.Dts.Runtime.Wrapper Public Class ScriptMain     Inherits UserComponent     Public LocalBankBatchID As Int32 = -999     Public LocalBatchTotal As Double = 0     Public LocalBankBatchNbr As String = ""     Public LocalDepositDate As Date = #12/31/1899#     Public LocalBatchItems As UInt64 = 0     Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)         'SCRIPT         'PURPOSE: This sub will fire for each row processed         '         since we only have one header row we only         '         this sub will fire only one time.         '         Store values in variables         '============================================================         Try             If Row.TOTALDEPOSIT_IsNull = True Or _                   Row.TOTALDEPOSIT = 0.0 Or _                     Row.TOTALTRANS_IsNull = True Or _                         IsDate(Row.DEPOSITDATE) = False Then                 'KEEP DEFAULTS             Else                 LocalBatchItems = Row.TOTALTRANS                 LocalBankBatchNbr = Row.ID + ""                 LocalBatchTotal = System.Convert.ToDouble(Row.TOTALDEPOSIT)                 If Row.BANKBATCHID_IsNull Then                     LocalBankBatchID = CInt(0)                 Else                     LocalBankBatchID = System.Convert.ToInt32(Row.BANKBATCHID)                 End If                 LocalDepositDate = System.Convert.ToDateTime(Row.DEPOSITDATE)             End If         Catch ex As Exception             'KEEP DEFAULTS             Variables.BANKBATCHID = -999             Variables.BATCHTOTAL = 0         End Try     End Sub     Public Overrides Sub PostExecute()         'SCRIPT         'PURPOSE: To set dts variables with values retrieved earlier         '============================================================         Try             'Attempt to accept the values             Variables.BANKBATCHNBR = LocalBankBatchNbr             Variables.BANKBATCHID = LocalBankBatchID             Variables.BATCHTOTAL = LocalBatchTotal             Variables.BATCHITEMS = LocalBatchItems             Variables.DEPOSITDATE = LocalDepositDate         Catch ex As Exception             'If any failure occurs fail the file             Variables.BANKBATCHID = -999             Variables.BATCHTOTAL = 0         End Try         MyBase.PostExecute()     End Sub End Class 

image from book
Figure 19-23

image from book
Figure 19-24

The ACH Validation Data Flow is now complete. The final Data Flow should look like Figure 19-25. Make sure you've still got the Lockbox Sequence container disabled, and then go ahead and run the package. Once you get it working properly, archive a copy, because you've got another Data Flow to build to import this ACH XML file. Play around with the XML file by adding bad data and malforming the structure of the file to see how the Data Flow handles it.

image from book
Figure 19-25

ACH Processing

This section in a lot of ways mirrors the Data Flow for Lockbox Processing. Once you've validated the ACH XML file, the Control Flow will create a [BankBatch] row and start the process of importing the detail. You have all the header-related information just as you did for the Lockbox process, and since the file has been validated, you can simply transform the data into the [BankBatchDetail] table.

  1. Enable the Data Flow task named Data Flow ACH Validate File and Header Info and drill down into its design surface. Add an XML Source and set it up exactly the same as you did for the Validation Data Flow. However, this time you already have an XSD file, so just point the component to it. Leave the ErrorOutput settings to Fail component if an error is encountered while processing the file. You'll also leave the error-handling components out in this Data Flow, although in production you should add them back in.

    Note

    If you ran the package to test the ACH Validation section, you'll need to put a new ACH XML File back into c:\casestudy\ach\.

  2. This time, you are concerned mainly with the detail portion of the XML data. You have the foreign key information stored in a variable so you don't need to perform any lookups on data, but you will want to use the Lookup later on the DESC field that you are going to import to the RawInvoiceNbr field in the CaseStudy_Processing package. Add a Character Map transform to convert the DESC field to uppercase and replace its current value in the stream.

  3. The only other thing you need is that foreign key stored in the variable @BANKBATCHID. Add a Derived Column transform to add that variable to the current stream. Add another column named RAWINVOICENBR and select the [DESC] field from the Columns input collection as a string [DT_STR] type of length 50. This selection of string type has the result of conversion in one step.

  4. Add an OLE DB destination and connect to the CaseStudy.OLEDB connection. Select the [BankBatchDetail] table and map the columns in the following table.

    Input Column

    Destination Column

    BankBatchID

    BankBatchID

    Amount

    PaymentAmount

    RawInvoiceNbr

    RawInvoiceNbr

  5. The final thing you need to do is save the entire XML file in the [BankBatch] table. You'll use exactly the same technique from the Lockbox process. Add a Derived Column transform and connect to the Header output of the XML file. Add columns for the variables BANKBATCHID and ACHCURRENTFILE. Make sure the ACHCURRENTFILE column is set to [DT_WSTR] 100. Refer back to the "Lockbox Processing" section to see an example of this transform.

  6. Add an Import Column transform and connect to this Header Derived Column transform. On the Input Columns tab, select the field that contains the file path in the stream: ACHCURRENT-FILE. Then go to the Advanced Input Output property tab and expand the Input Column Input and Import Column Output nodes. Add an output column to the output columns node named FileStreamToStore. Set the DataType property to [DT_NTEXT]. The editor should look similar to Figure 19-19, but the Lineage IDs may be different. Note the LineageID and set the property in the ACHCurrentFile Input Column named FileDataColumnID to that LineageID.

  7. Add an OLE DB Destination to the output of the Header Derived column transform. Set the OLEDB connection to CaseStudy.OLEDB. Then set the SQL Command to Update BankBatch Set BatchFile = ? WHERE BANKBATCHID = ? and press Refresh. In the Mappings tab, connect the FileStreamToStore to the Destination Column Param_0, which is the [BatchFile] field in the BankBatch table. Connect the field BankBatchID to the Destination Column Param_1. Click Refresh and save.

The final Data Flow for ACH processing should look similar to Figure 19-26. Because the logic of this package is designed to process files only once, you may need to truncate the BankBatch and BankBatchDetail table in order to test repeatedly. As an alternative, alter the batch number in the input files to a new batch number. After you've gotten a successful run of this Data Flow, archive the package.

image from book
Figure 19-26

E-Mail Payment Processing

The e-mail payment processing is interesting. The payment transactions are stored in a relational database, so you don't have data issues. You just need to check to see if there are any to process. You also have to make sure that you haven't picked the transaction up before. To be safe, you'll store the transactional primary key from the accounting system as your [ReferenceData1] field. You can then use this field in your extraction to keep from pulling a transaction more than once.

  1. Add another Sequence container to the Control Flow surface. Name the container Sequence of Email Payment Processing and connect with a completion constraint to the ACH Processing sequence container.

  2. Copy and paste the Script task named Script to Reset Variables Task from the ACH Payment Processing sequence container. (This task, as you recall, will reset the variables that you are sharing across processes.)

  3. Add an Execute SQL task named Execute SQL to check for Trans. This task will count the number of transactions in the accounting system not yet processed. The task will set the variables EMAILCNT and BATCHTOTAL equal to the number and total amounts of available transactions to work. Set up the properties using the following table.

    Property

    Setting

    ResultSet

    SingleRow

    ConnectionType

    OLE DB

    Connection

    CaseStudy.OLEDB

    SQLSourceType

    Direct Input

    SQLStatement

     SELECT count(*) as TranCnt, sum(depositamount) as TotAmt FROM vCorpDirectAcctTrans Corp LEFT OUTER JOIN BANKBATCHDETAIL DTL ON cast(CORP.TRANSID as var- char(50)) = DTL.REFERENCEDATA1 WHERE DTL.REFERENCE- DATA1 is null 

    ResultSet:ResultName

    0

    ResultSet:Variable

    User::EMAILCNT

    ResultSet:ResultName

    1

    ResultSet:Variable

    User::BATCHTOTAL

  4. Copy a new Execute SQL task into the sequence container from the ACH processing container named Execute SQL Task to Add ACH Bank Batch Hdr. This task will create the batch header for your e-mail-based transactions. Add a conditional constraint in combination with an expression between the two SQL tasks that won't allow the second SQL task to be executed if there are no items to be worked as e-mail payments. The expression should be set to:

     EMAILCNT>0 

  5. In the Add Batch Execute SQL task, you don't have all the variable values like you did for the text and XML files. Navigate to the Parameter Mapping tab and change some of these variables to match Figure 19-27.

  6. The last step is to add the Data Flow task and connect it to the Execute SQL Batch task. At the moment, the Email Control Flow tasks should resemble Figure 19-28. Continue on to the next section before testing this Control Flow.

image from book
Figure 19-27

image from book
Figure 19-28

E-Mail Data Flow Processing

If the package initiates the Email Data Flow, there must be some e-mail-based accounting transactions in the accounting database and the Execute SQL task will have already created a new row with a BankBatchID from the BankBatch table for you that is stored in the BANKBATCHID variable. All you have to do is extract the data from the accounting view, add the foreign key to the data, and insert the rows into the [BankBatchDetail] table.

  1. Start by entering the Data Flow Email Load Task design surface. Add an OLE DB Source to the Data Flow. Connect to the CaseStudy.OLE.DB connection and supply the following SQL command text for the extract:

     SELECT TransID, DepositAmount, TransDesc FROM vCorpDirectAcctTrans Corp LEFT OUTER JOIN BANKBATCHDETAIL DTL ON CAST(CORP.TRANSID as VARCHAR(50)) = DTL.REFERENCEDATA1 WHERE DTL.REFERENCEDATA1 is null 

  2. You also need to add that BankBatchID foreign key to your stream, so add a Derived Column Transform to add the BANKBATCHID variable to the stream. Connect the OLE DB Source and the Derived Column transforms.

  3. Look at a sample of the TransDesc data that is being brought over in Figure 19-29. To get this to match the e-mail addresses in the customer table, it would be better to strip the PAYPAL* identifier. Since the BankBatchDetail file expects a varchar field of 50 characters and you are also watching out for case-sensitivity, convert the type and case at the same time by adding an additional column named RawInvoiceNbr as a string [DT_STR] of 50 characters, and set the expression to the following:

     TRIM(UPPER(REPLACE(REPLACE(TransDesc," PAYPAL",""),"*",""))) 

  4. Add a last column to the Derived Column transform to also convert the TransID to a string value. Name the column TransIDtoString. Set the Data Type to [DT_STR] length 50, and set the Expression to the following:

     (DT_STR, 50, 1252)[TransID] 

  5. Add an OLE DB Destination task and connect it to the output of the Derived Column task. Set the connection to the CaseStudy.OLEDB connection. Set the table to [BankBatchDetail]. Map the fields in the Mapping tab to those shown in the following table.

    Input Field

    Destination in [BankBatchDetail]

    DepositAmount

    PaymentAmount

    TransDesc

    ReferenceData2

    BankBatchID

    BankBatchID

    RawInvoiceNbr

    RawInvoiceNbr

    TransIDasString

    ReferenceData1

image from book
Figure 19-29

This completes the construction of the Data Flow for the e-mail load task — and for the CaseStudy_Load package as well. The Email Load Data Flow should look like Figure 19-30.

image from book
Figure 19-30

The final package, with everything enabled, should look like Figure 19-31.

image from book
Figure 19-31

A more reasonable view of the CaseStudy_Load package, with the sequence and Foreach containers collapsed, is shown in Figure 19-32.

image from book
Figure 19-32

Testing

Test the package by disabling all the sequence containers. Work your way through each of the tasks, enabling them as you go. Use this SQL script to delete rows that you may be adding to the database during repeated testing that may change the flow of logic in the Control Flow sections:

 DELETE FROM BANKBATCHDETAIL GO DELETE FROM BANKBATCH 



Professional SQL Server 2005 Integration Services
Wireless Java : Developing with Java 2, Micro Edition
ISBN: 189311550X
EAN: 2147483647
Year: 2006
Pages: 182

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net