The following category of transforms provides metadata about the data flow. They're useful for answering how fast or how many or from where type questions. They help you understand what kind of data is flowing through the data flow and how much of it. For example, you can spread these transforms throughout your data flow and log the information they create for later scrutiny or troubleshooting. They are very lightweight transforms and minimally impact the performance of the data flow.
AuditThe Audit transform allows you to create a new column on a given path that will contain system variable information. This is helpful for understanding where data came from and how it was added to a table. Table 20.18 provides a profile for this component.
Setting Up the Audit TransformTo set up the Audit transform, simply select the audit type and then input the name of the new column that will contain the audit information. Figure 20.29 shows the Audit Transformation Editor with two new columns created. Figure 20.29. The Audit Transformation Editor
Table 20.19 shows the system variables that the Audit transform might use to populate a new column and their description.
Percentage and Row SamplingThe Percentage and Row Sampling transforms provide a way to select subsets of source data. This is useful in a number of ways. For example, when profiling data, it isn't necessary to process the entire set of data, but rather a statistically significant subset. Using the Percentage Sampling transform, you can reduce the total number of rows that flow through your data profiling process, reducing the total processing time while still ensuring that the row selection is random and relevant. Table 20.20 provides a profile for these components.
Setting Up the Row and Percentage Sampling TransformsBoth these transforms are very similar and have very similar editors. The only difference is that the Row Sampling transform allows you to specify the number of rows you want to randomly select and the Percentage Sampling transform allows you to select a specified percentage of rows. Because both are so similar, this section only explains how to set up the Percentage Sampling transform. The Percentage Sampling transform has two outputs. One contains the dataset sample and the other contains the entire dataset sans the dataset sample. Set the Percentage of Rows settings to what you want to sample. You can name the outputs if you want. Only specify a random seed if you want to return the same rows in subsequent runs, which might be useful for testing purposes. If you leave the Random Seed setting unchecked, the transform picks a different random seed on each subsequent run and returns a different set of data. Figure 20.30 shows the Percentage Sampling Transformation Editor. Figure 20.30. The Percentage Sampling Transformation Editor
RowCountThe RowCount transform provides a way to track how many rows are flowing through a particular path. You can use this information in logs as diagnostic information or for sending in an email as a notification of the work a package has completed. Table 20.21 provides a profile for this component.
Setting Up the RowCount TransformThe RowCount transform is very simple to set up. Drop it on the design surface, drag an output from another transform, and specify a read/write variable to contain the rowcount. Figure 20.31 shows the Advanced Editor with the rowcount property correctly set. Figure 20.31. Setting the rowcount variable name with the Advanced Editor |