Extending SSIS to use libraries within the .NET Framework is very powerful. Imagine having a corporate encryption algorithm that you must adhere to. Rather than having to rewrite this in SSIS, you could reference the assembly that has already been written from within a Script Task or Component. The ability to access custom functionality provided through assemblies that are either built in-house or proprietary to an organization takes SSIS packages to the next level. Instead of rewriting custom code within a task that may already be encapsulated within an organization’s framework, using external assemblies provides an object-oriented approach by promoting code re-use. Code can now reside within an assembly or one location, and then be referenced from multiple tasks.
To create a new assembly, open Visual Studio 2005 and create a new Visual Basic .NET class library, as shown in Figure 2-5. You can also add this project to your existing SSIS solution to keep every dependency organized in the same container. A class library project is the type of the project to use when you need to create assemblies.
Figure 2-5: Creating a Visual Basic .NET class library
Note that there is no user interface for class library projects like there would be for Windows or web projects. Instead, you will see that a default class is created. For this example, change the class to HttpFileDownload.vb, and then combine and add the code from Listing 2-1.
Listing 2-1: Building an Assembly
Option Strict On Imports System.Net Namespace SSIS.Utilities Public Class HttpDownload Implements IDisposable Private webClient As WebClient Private m_FileLocation As String Public Property FileLocation() As String Get Return m_FileLocation End Get Set(ByVal value As String) m_FileLocation = value End Set End Property Private m_Url As String Public Property WebUrl() As String Get Return m_Url End Get Set(ByVal value As String) m_Url = value End Set End Property Public Sub New(ByVal HttpFileUrl As String, ByVal PathToDownload As String) FileLocation = PathToDownload WebUrl = HttpFileUrl End Sub Public Sub GetFile() webClient = New WebClient Try webClient.DownloadFile(m_Url, m_FileLocation) Catch WebEx As WebException Throw Catch ex As Exception Throw End Try End Sub Public Sub Dispose() Implements IDisposable.Dispose webClient.Dispose() GC.SuppressFinalize(Me) End Sub End Class End Namespace
The code example in Listing 2-1 shows how to build a simple assembly (in this case, using Visual Basic .NET 2005). This class is going to download a file over HTTP to the calling server. This will later be integrated into a Script Task.
Let’s take a quick look at the basic structure of the object. The first thing to notice is Option Strict On, which enforces strong typing of objects. Strong typing means that object assignments and comparisons must only be set to like objects, therefore reducing memory loss and increasing performance. Next are the two Import statements. The Import statement enables you to use objects from the .NET Runtime libraries (such as System.Net). After you import the object, you won’t have to fully qualify the object each time you reference it. The following namespace enables grouping of related classes. If you added an additional class to the namespace SSIS.Utilities, then both classes would be represented within that namespace.
The public HttpDownload class in Listing 2-1 implements the IDisposable interface, which does two things. First, it uses the implemented Dispose() method for releasing resources from memory that the custom object utilizes during runtime. In this case, the WebClient object is disposed of within the Dispose() method.
Within the Dispose() method, there is a call to the garbage collector’s SuppressFinalize() method. The garbage collector reclaims the memory, and by calling its SuppressFinalize method and passing in Me(HttpDownload Class), the collector knows that its objects have already been cleaned up through code, and, therefore, there is no reason for the garbage collector to call Object.Finalize to destroy its objects.
The HttpDownload class also has two properties. These properties can be set externally, or they can be passed into the constructor, when the object is instantiated. The constructor, New(), serves as the entry-point where the HttpFileUrl and PathToDownload parameters are passed in. The WebClient object then uses the properties within its DownloadFile() method. The HttpFileUrl property identifies the URL that a file will be downloaded from and the PathToDownLoad property indicates the path where on the local machine the file will be downloaded to.
With the class now created, integrate this into the Script Task, as discussed in the next section. All of the previous code can be downloaded in its compiled and non-compiled format at the Wrox web site (www.wrox.com).
Many sites make files publicly available via the Web, but not via FTP. Files formatted as comma-delimited (as well as many other formats) can be downloaded and then transformed through the Data Flow Task into valuable data. Adding this functionality is simple, especially if you have a developer who has already written the code for you and all you have to do is reference the assembly within an SSIS Script Task. If a developer hasn’t provided a class that you can reuse, you’ll have to re-create the class code inside the Script Task. Now that you know some of the basics of objects, just follow the next couple of steps and add the code yourself.
In the last section in Listing 2-1, there were two properties added: FileLocation (which is the path where the downloaded file will be saved) and WebUrl (which serves as the HTTP address that is the Web location from where the file will be downloaded). These properties are set when the object is created.
You can take advantage of the WebClient object to add the code that will get files from the Web and download them locally. In Listing 2-1, there was a line of code added that imports the System.Net namespace so that the code could have access to the WebClient object. The WebClient object can send or receive data over the Web by using Uniform Resource Identifiers (URIs), which represent web pages or other resources on the Web.
The method shown in the following code uses the WebClient to download a file from a Web site passed in as a parameter, and then stores the file locally within the file system based on the file location parameter, m_FileLocation, which is also passed. The WebException is thrown if there are any issues such as invalid credentials, or if the site cannot be found from where the file is attempted to be downloaded. An Exception object can also be thrown for any other exceptions that are caught. Only that stub code is provided here.
Public Sub GetFile() webClient = New WebClient Try webClient.DownloadFile(m_Url, m_FileLocation) Catch WebEx As WebException Throw Catch ex As Exception Throw End Try End Sub
Most .NET development environments allow references to external assemblies. This becomes very useful when you want to reuse the assembly over and over again, versus having to retype the same code. SSIS requires that these custom objects be placed within the Global Assembly Cache (GAC) on the machine that is executing the package to maximize performance of the pipeline and tasks. The GAC is a code cache where assemblies can be placed as a central repository. Even though the framework of SSIS does not provide the choice of placing custom assemblies outside of the GAC, you should be aware that, in most cases, it is better to reference assemblies from a local directory instead from the GAC.
Placing assemblies within the GAC allows them to be referenced by multiple applications on a server. However, it is much easier to manage each application’s assemblies within its working directory by giving the developer full control of versioning. When an assembly is referenced from the GAC, each application that references the assembly is affected when the assembly is updated. This might be a nice automated feature at first glance. However, remember that this means applications are being updated without being tested to ensure that the update did not break anything.
To add an assembly to the GAC, it must be given a strong name. When an assembly is signed with a strong name, it ensures that the assembly will be unique. An identical or updated assembly cannot be created by another person. This guarantees that the assembly came from the originator and has not been modified by anyone else. The first thing to do when strong-naming an assembly is to create a cryptographic key pair that will be placed in a file.
There are two ways to create a key file:
By using the properties settings of the Visual Studio 2005 project
By manually using the SN utility that comes within the .NET Framework
The SN utility can be found in C:\Program Files\Microsoft.NET\SDK\v2.0\Bin. However, you can also select Start All Programs Microsoft Visual Studio 2005 Visual Studio Tools Visual Studio 2005 Command Prompt. At the command prompt, type sn –k “C:\MyKeyFile.snk”. This will build your strong-name key file within the C:\ root, as shown in Figure 2-6.
Figure 2-6: Using the command prompt for the SN utility
When signing an assembly with a strong name through the properties of the class library project, a strong-name file must be specified. To do this, select Project Properties from the toolbar. When the properties window appears, select the Signing tab, then select the Sign the assembly check box, as shown in Figure 2-7. Choose a strong-name key file by using the drop-down menu and selecting Browse to find the key file created. If the key has not been created using the SN utility, it can be created by selecting New instead of Browse from the drop-down list.
Figure 2-7: Using the Signing tab in the properties window
As shown in Figure 2-8, Visual Studio 2005 makes it even easier to create new keys, as well as providing the option for password-protecting them.
Figure 2-8: Using Visual Studio 2005 to create a strong-name key
In the properties window, select the Compile tab and set the configuration to Active (Release). This optimizes the assembly for production by removing any debugging information. You should never send code to production without having the Release setting set or your code may run significantly slower. The project can now be built by selecting from the toolbar Build Build Solution. Take note of the build output path, because it will be needed when the assembly is added to the GAC.
The easiest way to add the assembly to the GAC is by copying the assembly to C:\WINDOWS\assembly. However, the Global Assembly Cache Tool (Gacutil.exe), which is found in C:\Program Files\ Microsoft.NET\SDK\v2.0\Bin, can be used as well. You can also access the tool’s path by clicking on Start All Programs Microsoft Visual Studio 2005 Visual Studio Tools Visual Studio 2005 Command Prompt. Keep in mind that your path may vary based on what version of .NET you are using. At the command prompt, type gacutil.exe /i “<assembly path>” to start the utility, as shown in Figure 2-9.
Figure 2-9: Launching gacutil.exe from the command prompt
The assembly must also be placed in the C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727 path (again, this final folder name may vary based on the version of .NET you have installed). This is where Visual Studio for Applications will look for references. When you have the Script Task in design mode, the assembly can be referenced by right-clicking References within the Project Explorer pane on the left, and then selecting Add Reference. You will then be presented with a list of available assemblies that can be referenced. You can add the reference by selecting the assembly’s name and clicking the Add button. The added assembly is now listed on the Add Reference window, as shown in Figure 2-10. Click OK to accept this addition and close the window.
Figure 2-10: Adding a reference
After you have added a reference, the following code shows how to call a custom assembly from a Script Task. Two Script Task variables are passed to the custom assembly, SSIS.Utilities.HttpDownload, through the object’s constructor. It’s important to have the two variables already set in the ReadOnlyVariables property of the Script Task prior to using the following code.
Imports System Imports System.Data Imports System.Math Imports System.Net Imports System.IO Imports System.Windows Public Class ScriptMain Public Sub Main() Dim strUrl As String = CStr(Dts.Variables("varHttp").Value) Dim strFile As String = CStr(Dts.Variables("varSaveFile").Value) Using DownloadFile As SSIS.Utilities.HttpDownload = New SSIS.Utilities.HttpDownload(strUrl, strFile) DownloadFile.GetFile() End Using Dts.TaskResult = Dts.Results.Success End Sub End Class
Before running this package, be sure to set the values of the two variables to something useful, or else the code will not know which URL to visit and where to save the resulting file. Set these variables by opening the Variables window again and manually typing both a valid URL into the value for the strUrl variable and a valid file name into the value for the strSaveFile variable. Run the package and, after a while, you should see the file created in the specified location.
Any time custom assemblies are used, it is important to test them before they are implemented. An easy way to test functionality of assemblies is to add a Windows project to the same solution that the class library project belongs to that created the custom object (see Figure 2-11). To add a window project, right-click the solution of the class library project and click Add New Project. Select Windows Application. Within the Windows project, right-click References Add References Projects to add the custom library project as a reference. All you need to add is a button to test the custom object. Note that the Windows project in the following code is in C#. This is a new feature within Visual Studio 2005 that enables you to add projects of different code types. The following code can be added to test the assembly.
using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Text; using System.Windows.Forms; using System.Net; using System.IO; using SSIS.Utilities; using System.Configuration; namespace WindowsApplication1 { public partial class Form1 : Form { public Form1() { InitializeComponent(); } private void GetFromWebClient() { using (HttpDownload httpDL = new HttpDownload("http://localhost/CSVDownload/Delimited.txt", @"C:\Documents and Settings\Administrator\Desktop\YouDidit.txt")) { httpDL.GetFile(); } } private void button1_Click(object sender, EventArgs e) { try { GetFromWebClient(); } catch(Exception ex1) { MessageBox.Show(ex1.Message); } } } }
Figure 2-11: Testing the assembly as a Windows project
The method GetFromWebClient() instantiates the custom object, HttpDownload, and passes in a URL and file path. This method is then called from within the button1_Click event, which is called when the button on the form is clicked. Your button may be named differently, so change the name of this event to represent your name. Now that there are two projects, the Windows form project must be set up as the start-up project. To do this, right-click the Windows project and select Set as Startup Project. After the solution is run, if there are any problems, a message box will pop up with any errors that occurred.
When the code is functioning correctly, the assembly is ready to be referenced from within the Script Task.