Recipe19.11.Reading Lines with Continuation Characters


Recipe 19.11. Reading Lines with Continuation Characters

Credit: Alex Martelli

Problem

You have a file that includes long logical lines split over two or more physical lines, with backslashes to indicate that a continuation line follows. You want to process a sequence of logical lines, "rejoining" those split lines.

Solution

As usual, our first idea for a problem involving sequences should be a generator:

def logical_lines(physical_lines, joiner=''.join):     logical_line = [  ]     for line in physical_lines:         stripped = line.rstrip( )         if stripped.endswith('\\'):             # a line which continues w/the next physical line             logical_line.append(stripped[:-1])         else:             # a line which does not continue, end of logical line             logical_line.append(line)             yield joiner(logical_line)             logical_line = [  ]     if logical_line:         # end of sequence implies end of last logical line         yield joiner(logical_line) if _ _name_ _=='_ _main_ _':     text = 'some\\\n', 'lines\\\n', 'get\n', 'joined\\\n', 'up\n'     for line in text:         print 'P:', repr(line)     for line in logical_lines(text, ' '.join):         print 'L:', repr(line)

When run as a main script, this code emits:

<c>P: 'some\\\n' P: 'lines\\\n' P: 'get\n' P: 'joined\\\n' P: 'up\n' L: 'some lines get\n' L: 'joined up\n'</c>

Discussion

This problem is about sequence-bunching, just like the previous Recipe 19.10. It is therefore not surprising that this recipe, like the previous, is a generator (with an internal structure quite similar to the one in the "other" recipe): today, in Python, sequences are often processed most simply and effectively by means of generators.

In this recipe, the generator can encompass just a small amount of generality without introducing extra complexity. Determining whether a line is a continuation line, and of how to proceed when it is, is slightly too idiosyncratic to generalize in a simple and transparent way. I have therefore chosen to code that functionality inline, in the body of the logical_lines generator, rather than "factoring it out" into separate callables. Remember, generality is good, but simplicity is even more important. However, I have kept the simple and transparent generality obtained by passing the joiner function as an argument, and the snippet of code under the if _ _name_ _= ='_ _main_ _' test demonstrates how we may want to use that generality, for example, to join continuation lines with a space rather than with an empty string.

If you are certain that the file you're processing is sufficiently small to fit comfortably in your computer's memory, with room to spare for processing, and you don't need the feature (offered in the version of logical_lines shown in the "Solution") of ignoring whitespace to the right of a terminating \\, a solution using a plain function rather than a generator is simpler than the one shown in this recipe's Solution:

def logical_lines(physical_lines, joiner=''.join, separator=''):     return joiner(physical_lines).replace('\\\n', separator).splitlines(True)

In this variant, we join all of the physical lines into one long string, then we replace the "canceled" line ends (line ends immediately preceded by a backslash) with nothing (or any other separator we're requested to use), and finally split the resulting long string back into lines (keeping the line endsthat's what the true argument to method splitlines is for). This approach is a very different one from that suggested in this recipe but possibly worthwhile, if physical_lines is small enough that you can afford the memory for it. I prefer the "Solution"'s approach because giving semantic significance to trailing whitespace is a poor user interface design choice.

See Also

Recipe 19.10; Perl Cookbook recipe 8.1; Chapter 1 for general issues about handling text; Chapter 2 for general issues about handling files.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net