Section 3.6. Capture Reusable Behavior in a Function

3.6. Capture Reusable Behavior in a Function

As we saw earlier, it's easy to start with several simple tasks and join them together to create increasingly complex scripts and pipelines. At some point, instead of using the recall buffer, it is convenient to store command sequences in scripts for later reuse. Functions serve a similar purpose by enabling us to collect one or more commands and group them together so that they may be run with a single command. What sets functions apart is that they can be used to return a result based on the processing of a set of inputsin effect, the function becomes a black box that encapsulates some frequently used logic.

Several of the built-in commands, such as clear-host, are, in fact, functions that perform tests, determine parameters, and invoke cmdlets based on their inputs. The new-item cmdlet is very flexible, yet at the same time, it can have a cumbersome syntax; thus, MSH creates these short-cut functions for convenience. Even familiar commands such as C: and D: are functions that call the set-location cmdlet.

We'll see how functions offer the means to bring together the various topics we've seen in this chapter, including variables, conditional tests, and loops. Functions can range from the very simple to the surprisingly complex, yet they invariably offer a way to arrange scripts neatly in logical sections, making them much easier to understand and maintain than other methods.

3.6.1. How Do I Do That?

For our first function, let's start with something simple that prints out a welcome message:

     MSH D:\MshScripts> function say-greeting { "Hello" }     MSH D:\MshScripts> say-greeting     Hello

The Black Box

Among their virtues, functions have one very strong value proposition: they neatly group together a piece of logic into a single place. Tasks well-suited to shell scripts frequently incur some repetition of logic, whether that involves extracting information from an object, storing data into a variable, or performing some calculations. Although the task may be simple, putting it into a pipeline or script with several other tasks can quickly cause that simplicity to vanish. Functions provide some refreshment by allowing a little bit of script to be taken aside and given a name. The opportunity to get that part working in isolation can make writing the overall script faster and debugging scripts a far more manageable task.

Functions are only as useful as the information they have at their disposal. Now, let's see how we can pass arguments to a function that it can then use for decision making. We'll extend the say-greeting function to optionally take a person's name. The param keyword is used to call out the parameters that are expected to be passed to the function on the command line:

     MSH D:\MshScripts> function say-greeting {     >>param($name)     >>$message = "Hello, "     >>if (! $name) { $message += "World" }     >>else { $message += $name }     >>$message     >>}     >>     MSH D:\MshScripts> say-greeting     Hello, World     MSH D:\MshScripts> say-greeting Andy     Hello, Andy

Functions coexist neatly with cmdlets and aliases. Earlier we saw how aliases provide an alternative name for commonly used cmdlets. Functions can take this a step further, either by encapsulating a small pipeline or by prepopulating certain arguments for a cmdlet:

     MSH D:\MshScripts> function get-ProcessByHandles {     >>param($count = 200)     >>get-process | where-object { $_.Handles -gt $count }     >>}     >>     MSH D:\MshScripts> get-ProcessByHandles 400 | format-list     ProcessName : CcmExec     Id          : 1656     ProcessName : csrss     Id          : 464     ProcessName : explorer     Id          : 492     ...

Suppose we want to place a function into a pipeline. Inside the function script block, functions have a special variable $input. When a function is placed in a pipeline, the $input variable will be populated with any incoming objects before the function is run. To continue the pipeline inside a function, we can pipe $input into another cmdlet:

     MSH D:\MshScripts> function get-properties {     >>$input | get-member -MemberType Property     >>}     >>     MSH D:\MshScripts> get-process | get-properties         TypeName: System.Diagnostics.Process     Name                       MemberType Definition     ----                       ---------- ----------     BasePriority               Property   Int32 BasePriority {get;}     ExitCode                   Property   Int32 ExitCode {get;}     HasExited                  Property   Boolean HasExited {get;}     ExitTime                   Property   DateTime ExitTime {get;}     ...

Functions become very powerful when used for processing and returning a result to the caller. For a function to return data, there is no need to explicitly call out "I want to return this value;" instead, the script block inside the function has the freedom to place any number of objects into the pipeline for downstream processes to use. In the next example, we'll see that by simply running a command that has output ($total), the function will generate a meaningful result.

Let's consider a simple function that adds together all command-line arguments. This time, we'll use the special variable $args, which is an array of all arguments passed to the function on the command line:

     MSH D:\MshScripts> function add {     >>$total = $null     >>foreach ($arg in $args) { $total += $arg }     >>$total     >>}     >>     MSH D:\MshScripts> add     0     MSH D:\MshScripts> add 1 2 3     6     MSH D:\MshScripts> $b = (add 11 12 13)     MSH D:\MshScripts> $b     36     MSH D:\MshScripts> add "foo" "bar"     foobar

As we've seen, functions can invoke cmdlets, maintain their own state, and return results. Putting all of these aspects together, let's create a useful function for reporting disk usage within a folder (and, optionally, its subfolders). The get-childitem cmdlet does the bulk of the work for us; we just walk through its results and add the sizes of any files found. When we're done, we'll put the total count into the pipeline and let MSH take care of the rest:

     MSH D:\MshScripts> function du {     >>$bytes=0     >>get-childitem -Recurse:($args[0] -eq "-r") | foreach { $bytes += $_.Length }     >>$bytes     >>}     >>

Normally, we'll want to see the results in kilobytes rather than bytes, but there's no need to duplicate the same logic in another function. Instead, just wrap du into another function that will do this conversion automatically:

     MSH D:\MshScripts> function duk {     >>(du $args)/1k     >>}     >>     MSH D:\MshScripts> du     9959698     MSH D:\MshScripts> duk     9726.267578125

3.6.2. What Just Happened?

Functions are one of the most useful elements of the command shell in that they provide an easy framework for reusing scripts and logic across a wide range of tasks. Unlike cmdlets, no strict naming convention is enforced, although it's always good practice to use self-descriptive names. Whenever possible, try to name functions in the same verb-noun format, especially if they serve purposes similar to already existing cmdlets.

Functions separate from aliases at the point at which they can take input, either from the command line or when introduced into a pipeline sequence. In most cases, rather than using the $args variable directly, the param keyword should be used to identify explicitly the parameters that a function expects to receive. Using named parameters makes functions much more readable and easier to follow.

In one of the previous examples, the get-ProcessByHandles function gave one of its named parameters, $count, a default value: if no arguments had been passed to the function, $count would automatically have assumed a value of 200.

In Example 3-7, the add-integers function shows how the data types of parameters can also be identified in the param section. MSH takes care of ensuring that all parameters are of the correct type (integer) before running the main body of the function.

Example 3-7. add-integers function

 function add-integers {     param([int]$a, [int]$b)     $a+$b }

When a function is used in a pipeline, any cmdlets or processes will first complete all of their tasks before the function is run. The function will wait, collecting all of the input coming through the pipeline, and will not start until everything else before it has finished. Functions are executed in this way to ensure that the special variable $input will always contain the complete set of pipeline objects available to the function. In just a moment, we'll look at a special type of function called a filter that is sometimes more suited to pipeline use.

Not all functions are required to generate output. For those that do, the types of objects emitted by the function will depend on the purpose it serves. In our add example earlier, one result, a number, was returned at the end of the function. In other cases, for example, when a loop is involved, it may be more desirable to output data for each iteration of the loop. When multiple results are generated by a function, they'll be available in the pipeline for downstream processing in exactly the same order in which they were generated.

3.6.3. What About...

...Seeing the functions you've defined? As we saw with variables, the built-in variable provider exposes a Function: drive that contains all of the defined functions for the current session.

You set a variable in your function, but when you run the function, the variable is never set. What's going on? This is actually expected behavior and is another instance in which variable scoping comes into play. By default, variables set or defined within a function are only available within that function; when the function finishes, any variables settings are removed. This helps prevent a function from accidentally interfering with another function. We'll look at scoping more in Chapter 4.

Although all of the examples here used interactive mode, it's perfectly valid to use functions inside script files. It might be a good idea to add the definitions of frequently used functions to your profile so they'll be available whenever you need them.