The line input operator < filehandle > can be used to read either a single line from a stream in a scalar context, or the entire contents of a stream in a list context. Which method you should use depends on your need for efficiency, access to the lines read, and other factors like syntactic convenience. The line-at-a-time method is the most efficient in terms of memory, and is as fast as "ordinary" alternatives. The implicit while (<>) form is equivalent in speed to the corresponding explicit code:
Note the use of the defined operator. This prevents the loop from missing a line if the very last line of a file is the single character " " with no terminating newlinenot a likely occurrence, but it can't hurt to be careful. You can use a similar syntax with a foreach loop to read the entire file into memory in a single operation:
The all-at-once method uses more memory than the line-at-a-time method, but it is potentially faster. If all you want to do is step through the lines in a short file, it won't likely matter which method you use. All-at-once has its advantages when combined with operations like sorting:
All-at-once may be appropriate if you need access to more than one line at a time: Read in a file all at once to manipulate more than one line at a time.
Many of these situations can still be handled with line-at-a-time input, although the code is definitely more complex: Use a queue to manipulate more than one line at a time.
Maintaining a queue of lines of text with slice assignments makes this slower than the equivalent all-at-once code, but this technique works for arbitrarily large input. The queue could also be implemented with an index variable rather than a slice assignment, which would result in more complex but faster running code. If your goal is simply to read a file into memory as quickly as possible, you might consider clearing the input separator variable $/ and reading the entire file as a single string. This will read the contents of a file or stream much faster than either of the alternatives above:
Finally, the read and sysread operators are useful for quickly scanning a file if line boundaries are of no importance: Use read or sysread for maximum speed.
|