Section 3.7. Transform Objects as They Pass Through the Pipeline


3.7. Transform Objects as They Pass Through the Pipeline

Functions remain one of the most versatile aspects of the MSH language. Their use in both command-line and pipeline situations makes them very flexible. However, there are times when we're creating solely for pipeline use, and it's more convenient to run a script block on each object in the pipeline than getting them delivered in one large chunk in the $input variable. Let's say we're going through each line of a 400 MB daily IIS logfile to figure out the set of requested URLs that resulted in a 404 error message. Loading all 400 MB into memory probably isn't the best approach; we would do better to create a small script block that could match a 404 line in the log and record the requested URL into a variable.

To accommodate these cases, MSH offers a special type of function, called a filter, that is designed to be placed into the pipeline and used to inspect, modify, or augment data as it passes between processes.

Let's create a couple of filters to see how they work.

Functions and Filters

Given that functions and filters are defined with very similar syntax, have similar purposes, and can both operate in the pipeline, you'd be forgiven for asking, "Can't I just use a function for this?" The answer is yes: any purpose a filter serves can also be met with a function. Filters are convenient for situations in which the processing of individual pipeline objects is the key focus, and the filter can operate in isolation, without knowledge of what has come before or what will follow. Filters remove the need to loop over every item in the $input variable. Also, because a filter is run for each pipeline object rather than for the whole collection, filters can also show some significant performance gains over functions when processing large quantities of data.


3.7.1. How Do I Do That?

We'll start by defining a very simple filter. Filters are defined in the same way as functions, but instead of using arguments or $input, we again make use of the special variable $_:

     MSH D:\MshScripts> filter double { $_ * 2 }

Let's take the new filter for a test drive. To begin, we'll use some simple values to seed the pipeline:

     MSH D:\MshScripts> 10 | double     20     MSH D:\MshScripts> @(1,2,3,4) | double     2     4     6     8     MSH D:\MshScripts> @(1,2,3,4) | double | double     4     8     12     16

Let's stay with the number theme for one more example. In this case, we'll write a filter that expects a number and uses a loop to determine whether it's prime (i.e., whether any numbers other than itself will divide into it):

     MSH D:\MshScripts> filter test-prime {     >>$limit = ($_/2)+1;     >>for ($i=2; $i -lt $limit; $i++)     >>{     >>  # divisible by $i, so drop this object and return     >>  if (($_ % $i) -eq 0) { return }     >>}     >>$_   # nothing divided into it, must be prime     >>}     >>     MSH D:\MshScripts> @(1..100) | test-prime     1     2     3     5     7     ...

Fascinating, but how exactly are filters useful? For the next example, we'll use an MSH feature called notes. Notes can be used to attach a piece of data to an object as it passes through the pipeline. This note can then be accessed seamlessly by downstream cmdlets as if it were a property of the object.

Here, we'll create a filter that can recognize a few file extensions and attach a note to any FileInfo objects that pass through:

     MSH D:\MshScripts> filter add-friendlytype {     >>switch ($_.Extension)     >>{     >>    ".msh" { $ftype = "MSH script" }     >>    ".txt" { $ftype = "Regular text file" }     >>    ".exe" { $ftype = "Executable file" }     >>    default { $ftype = "Unknown" }     >>}     >>$note=new-object System.Management.Automation.MshNoteProperty "FriendlyType",$ftype     >>$_.MshObject.Properties.Add($note)     >>$_     >>}     >>     MSH D:\MshScripts> get-childitem | add-friendlytype | format-table Name,FriendlyType     Name                                    FriendlyType     ----                                    ------------     filter.msh                              MSH script     fn.msh                                  MSH script     winword.exe                             Executable file     summary.txt                             Regular text file     outline.hxs                             Unknown

3.7.2. What Just Happened?

Filters are a special type of function intended to be used in a pipeline. Unlike functions, they do not block the pipeline, as there is no need to wait for $input to fill up with all of the incoming objects. The special variable $_ is prepopulated with the current pipeline object. Also, the script inside the filter has no knowledge of whether any objects have passed through the filter before the current one, nor whether any are due to follow it. This simplicity makes the task of writing a filter easier because it forces you to focus only on what to do with the object at hand.

It is usual, although not required, for a filter to put an object into the pipeline as an output. A filter that swallows every object it sees probably isn't going to be all that useful, especially if downstream processes are expecting objects to work with. It's totally legitimate for a filter to drop some or most objects that don't meet certain criteria, as we saw with the test-prime case. There's some overlap with a couple of cmdlets we've already covered. For example, a filter can be used to replicate the functionality of the where-object cmdlet; in this case, it allows only those FileInfo objects that have a certain extension:

     filter where-executable { if ($_.Extension -eq ".msh") { $_ } }

In many simple cases, the where-object and foreach-object cmdlets are sufficient and eliminate the need to first define a filter and then include it in the pipeline sequence. However, there are no strict rules about when to use a filter and when to use where and foreach; for a given case, the where-object approach might be quicker, whereas in other cases, a parameters filter could make a compact and more readable pipeline.

One other feature we caught a glimpse of here is the note. Notes are a handy companion to use within a filter because they allow us to annotate objects with extra information as they pass through the pipeline. The same objects come out the other end of the filter, and downstream stages in the pipeline work, oblivious to its presence, unless they specifically look up its value. The code for adding a new note looks a little cumbersome here, but when taken on its own is just two lines:

     $note = new-object System.Management.MshNoteProperty <note name>,     <note value>     <object to add note to>.MshObject.Properties.Add($note)

3.7.3. What About...

...That IIS logfile example? First of all, we'll use a new cmdlet, get-content, which reads a file and puts it into the pipeline line by line. Because some logfiles have comment lines, we'll pass each line through where-object to remove any lines that start with a pound sign (#). We'll also use a method of the String class, Split, to break out each space-separated part of the logfile line. When we're done, we'll be left with a hashtable containing the URLs that caused 404s and the number of times each URL appeared:

     MSH D:\MshScripts> $badUrls=@{}     MSH D:\MshScripts> filter count-http404 {     >>$parts = $_.Split(" ")     >>if ($parts[10] -eq 404)     >>{     >>    $uri = $parts[4]     >>    if ($badUrls.Contains($uri)) { $badUrls[$uri]++ }     >>    else { $badUrls[$uri]=1 }     >>}     >>}     >>     MSH D:\MshScripts> get-content ex010101.log | where-object {$_ -notlike "#*"} | count-http404     MSH D:\MshScripts> $badUrls     Key                   Value     ---                   -----     /index.php              102     /robots.txt            1054     /cgi-bin/run.pl         320     /cgi-bin/exec.pl        993     /cmd.exe                821     /command.com           3822

This filter is fairly lightweight and simple at this point. With some creative use of the other data in the $parts array, it shouldn't take much extra effort to report on other aspects of the logfiles. Another route for expansion would be to consider passing wildcard filenamesfor example, get-content ex0504??.log | ... to report on a whole month.




Monad Jumpstart
Monad Jumpstart
ISBN: N/A
EAN: N/A
Year: 2005
Pages: 117

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net