Representing Unprintable Characters

Problem

You need to make reference to a control character, a strange UTF-8 character, or some other character that's not on your keyboard.

Solution

Ruby gives you a number of escaping mechanisms to refer to unprintable characters. By using one of these mechanisms within a double-quoted string, you can put any binary character into the string.

You can reference any any binary character by encoding its octal representation into the format "00", or its hexadecimal representation into the format "x00".

	octal = "00011020"
	octal.each_byte { |x| puts x }
	# 0
	# 1
	# 8
	# 16

	hexadecimal = "x00x01x10x20"
	hexadecimal.each_byte { |x| puts x }
	# 0
	# 1
	# 16
	# 32

This makes it possible to represent UTF-8 characters even when you can't type them or display them in your terminal. Try running this program, and then opening the generated file smiley.html in your web browser:

	open('smiley.html', 'wb') do |f|
	 f << ''
	 f << "xe2x98xBA"
	end

The most common unprintable characters (such as newline) have special mneumonic aliases consisting of a backslash and a letter.

	"a" == "x07" # => true # ASCII 0x07 = BEL (Sound system bell)
	"" == "x08" # => true # ASCII 0x08 = BS (Backspace)
	"e" == "x1b" # => true # ASCII 0x1B = ESC (Escape)
	"f" == "x0c" # => true # ASCII 0x0C = FF (Form feed)
	"
" == "x0a" # => true # ASCII 0x0A = LF (Newline/line feed)
	"
" == "x0d" # => true # ASCII 0x0D = CR (Carriage return)
	"	" == "x09" # => true # ASCII 0x09 = HT (Tab/horizontal tab)
	"v" == "x0b" # => true # ASCII 0x0B = VT (Vertical tab)

 

Discussion

Ruby stores a string as a sequence of bytes. It makes no difference whether those bytes are printable ASCII characters, binary characters, or a mix of the two.

When Ruby prints out a human-readable string representation of a binary character, it uses the character's xxx octal representation. Characters with special x mneumonics are printed as the mneumonic. Printable characters are output as their printable representation, even if another representation was used to create the string.

	"x10x11xfexff" # => "2021376377"
	"x48145x6cx6c157x0a" # => "Hello
"

To avoid confusion with the mneumonic characters, a literal backslash in a string is represented by two backslashes. For instance, the two-character string consisting of a backslash and the 14th letter of the alphabet is represented as "\n".

	"\".size # => 1
	"\" == "x5c" # => true
	"\n"[0] == ?\ # => true
	"\n"[1] == ?n # => true
	"\n" =~ /
/ # => nil

Ruby also provides special shortcuts for representing keyboard sequences like Control-C. "C-_x_" represents the sequence you get by holding down the control key and hitting the x key, and "M-_x_" represents the sequence you get by holding down the Alt (or Meta) key and hitting the x key:

	"C-aC-bC-c" # => "010203"
	"M-aM-bM-c" # => "341342343"

Shorthand representations of binary characters can be used whenever Ruby expects a character. For instance, you can get the decimal byte number of a special character by prefixing it with ?, and you can use shorthand representations in regular expression character ranges.

	?C-a # => 1
	?M-z # => 250

	contains_control_chars = /[C-a-C-^]/
	'Foobar' =~ contains_control_chars # => nil
	"FooC-zbar" =~ contains_control_chars # => 3

	contains_upper_chars = /[x80-xff]/
	'Foobar' =~ contains_upper_chars # => nil
	"Foo212bar" =~ contains_upper_chars # => 3

Here's a sinister application that scans logged keystrokes for special characters:

	def snoop_on_keylog(input)
	 input.each_byte do |b|
	 case b
	 when ?C-c; puts 'Control-C: stopped a process?'
	 when ?C-z; puts 'Control-Z: suspended a process?'
	 when ?
; puts 'Newline.'
	 when ?M-x; puts 'Meta-x: using Emacs?'
	 end
	 end
	end

	snoop_on_keylog("ls -ltR03emacsHello12370rot13-other-window1232")
	# Control-C: stopped a process?
	# Newline.
	# Meta-x: using Emacs?
	# Newline.
	# Control-Z: suspended a process?

Special characters are only interpreted in strings delimited by double quotes, or strings created with %{} or %Q{}. They are not interpreted in strings delimited by single quotes, or strings created with %q{}. You can take advantage of this feature when you need to display special characters to the end-user, or create a string containing a lot of backslashes.

	puts "foo	bar"
	# foo bar
	puts %{foo	bar}
	# foo bar
	puts %Q{foo	bar}
	# foo bar

	puts 'foo	bar'
	# foo	bar
	puts %q{foo	bar}
	# foo	bar

If you come to Ruby from Python, this feature can take advantage of you, making you wonder why the special characters in your single-quoted strings aren't treated as special. If you need to create a string with special characters and a lot of embedded double quotes, use the %{} construct.


Strings

Numbers

Date and Time

Arrays

Hashes

Files and Directories

Code Blocks and Iteration

Objects and Classes8

Modules and Namespaces

Reflection and Metaprogramming

XML and HTML

Graphics and Other File Formats

Databases and Persistence

Internet Services

Web Development Ruby on Rails

Web Services and Distributed Programming

Testing, Debugging, Optimizing, and Documenting

Packaging and Distributing Software

Automating Tasks with Rake

Multitasking and Multithreading

User Interface

Extending Ruby with Other Languages

System Administration



Ruby Cookbook
Ruby Cookbook (Cookbooks (OReilly))
ISBN: 0596523696
EAN: 2147483647
Year: N/A
Pages: 399

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net