Section 6.2. Ranges | The Ruby Way, Second Edition: Solutions and Techniques in Ruby Programming (2nd Edition)

6.1. Symbols

A symbol in Ruby is an instance of the class Symbol. The syntax is simple in the typical case: a colon followed by an identifier.

A symbol is like a string in that it corresponds to a sequence of characters. It is unlike a string in that each symbol has only one instance (just as a Fixnum works). Therefore, there is a memory or performance issue to be aware of. For example, in the following code, the string "foo" is stored as three separate objects in memory, but the symbol :foo is stored as a single object (referenced more than once):

array = ["foo", "foo", "foo", :foo, :foo, :foo]

Some people are confused by the leading colon on a symbol name. There is no need for confusion; it's a simple matter of syntax. Strings, arrays, and hashes have both beginning and ending delimiters; a symbol has only a beginning delimiter. Think of it as a unary delimiter rather than a binary one. You may consider the syntax strange at first, but there is no mystery.

It's worth mentioning that in older versions of Ruby (prior to 1.6), a symbol constant was not a first-class object as such but was translated into a Fixnum and stored. This is still true internally; a symbol corresponds to a number and is stored as an immediate value. The number can be retrieved with to_i, but there is little need for it.

According to Jim Weirich, a symbol is "an object that has a name." Austin Ziegler prefers to say "an object that is a name." In any case, there is a one-to-one correspondence between symbols and names. What kinds of things do we need to apply names to? Such things as variables, methods, and arbitrary constants.

One common use of symbols is to represent the name of a variable or method. For example, we know that if we want to add a read/write attribute to a class, we can do it this way:

class SomeClass   attr_accessor :whatever end

This is equivalent to saying:

class SomeClass   def whatever     @whatever   end   def whatever=(val)     @whatever = val   end end

In other words, the symbol :whatever tells the attr_accessor method that the "getter" and "setter" (as well as the instance variable) will all be given names corresponding to that symbol.

You might well ask why we couldn't use a string instead. As it happens, we could. Many or most core methods that expect symbols are content to get strings instead.

attr_reader :alpha attr_reader "beta"   # This is also legal

In fact, a symbol is "like" a string in that it corresponds to a sequence of characters. This leads some people to say that "a symbol is just an immutable string." However, the Symbol class does not inherit from String, and the typical operations we might apply to a string are not necessarily applicable to symbols.

Another misunderstanding is to think that symbols necessarily correspond directly to identifiers. This leads some people to talk of "the symbol table" (as they would in referring to an assembled object program). But this is not really a useful concept; although symbols are certainly stored in a kind of table internally, Ruby does not expose the table as an entity we can access, and we as programmers don't care that it is there.

What is more, symbols need not even look like identifiers. Typically they do, whatever that means; but they can also contain punctuation if they are enclosed in quotes. These are also valid Ruby symbols:

sym1 = :"This is a symbol" sym2 = :"This is, too!" sym3 = :")(*&^%$"            # and even this

You could even use such symbols to define instance variables and methods, but then you would need such techniques as send and instance_variable_get to reference them. In general, such a thing is not recommended.

6.1.1. Symbols As Enumerations

Languages such as Pascal and later versions of C have the concept of an enumerated type. Ruby can't really have such a thing; there is no type checking anyhow. But symbols are frequently useful for their mnemonic value; we might represent directions as :north, :south, :east, and :west.

It might be a little clearer to store these as constants.

North, South, East, West = :north, :south, :east, :west

If these were strings rather than symbols, defining them as constants would save memory, but each symbol exists only once in object space anyhow. (Symbols, like Fixnums, are stored as immediate values.)

6.1.2. Symbols As Metavalues

Frequently we use exceptions as a way of avoiding return codes. But if you prefer to use return codes, you can. The fact that Ruby's methods are not limited to a single return type makes it possible to pass back "out of band" values.

We frequently have need for such values. At one time, the ASCII NUL character was considered to be not a character at all. C has the idea of the NULL pointer, Pascal has the nil pointer, SQL has NULL, and so on. Ruby, of course, has nil.

The trouble with such metavalues is that they keep getting absorbed into the set of valid values. Everyone today considers NUL a true ASCII character. And in Ruby, nil isn't really a non-object; it can be stored and manipulated. Thus we have minor annoyances such as hash[key] returning nil; did it return nil because the key was not found, or because the key is really associated with a nil?

The point here is that symbols can sometimes be used as good metavalues. Imagine a method that somehow grabs a string from the network (perhaps via http or something similar). If we want, we can return nonstring values to indicate exceptional occurrences.

str = get_string case str   when String     # Proceed normally   when :eof     # end of file, socket closed, whatever   when :error     # I/O or network error   when :timeout     # didn't get a reply end

Is this really "better" or clearer than using exceptions? Not necessarily. But it is a technique to keep in mind, especially when you want to deal with conditions that may be "edge cases" but not necessarily errors.

6.1.3. Symbols, Variables, and Methods

Probably the best known use of symbols is in defining attributes on a class:

class MyClass   attr_reader :alpha, :beta   attr_writer :gamma, :delta   attr_accessor :epsilon   # ... end

Bear in mind that there is some code at work here. For example, attr_accessor uses the symbol name to determine the name of the instance variable and the reader and writer methods. That does not mean that there is always an exact correspondence between that symbol and that instance variable name. For example, if we use instance_variable_set, we have to specify the exact name of the variable, including the at-sign:

sym1 = :@foo sym2 = :foo instance_variable_set(sym1,"str")   # Works instance_variable_set(sym2,"str")   # error

In short, a symbol passed into the attr family of methods is just an argument, and these methods create instance variables and methods as needed, based on the value of that symbol. (The writer has an equal sign appended to the end, and the instance variable name has an at-sign added to the front.) In other cases, the symbol must exactly correspond to the identifier it is referencing.

In most, if not all cases, methods that expect symbols can also take strings. The reverse is not necessarily true.

6.1.4. Converting to/from Symbols

Strings and symbols can be freely interconverted with the to_str and to_sym methods:

a = "foobar" b = :foobar a == b.to_str    # true b == a.to_sym    # true

If you're doing metaprogramming, the following method might prove useful sometimes.

class Symbol   def +(other)     (self.to_s + other.to_s).to_sym   end end

The preceding method allows us to concatenate symbols (or append a string onto a symbol). The following is an example that uses it; this trivial piece of code accepts a symbol and tries to tell us whether it represents an accessor (that is, a reader and writer both exist):

class Object   def accessor?(sym)     return (self.respond_to?(sym) and self.respond_to?(sym+"="))   end end

There is a clever usage of symbols that I'll mention here. When we do a map operation, sometimes a complex block may be attached. But in many cases, we are simply calling a method on each element of the array or collection:

list = words.map {|x| x.capitalize }

In such a case, it may seem we are doing a little too much punctuation for the benefit we're getting. Let's open the Symbol class and define a to_proc method. This ensures that any symbol can be coerced into a proc object. But what proc should we return? Obviously, one corresponding to the symbol itself in the context of the objectin other words, one that will send the symbol itself as a message to the object.

def to_proc   proc {|obj, *args| obj.send(self, *args) } end

This code, by the way, came from Gavin Sinclair's "Ruby Extensions" project. With this method in place, we can rewrite our original code fragment:

list = words.map(&:capitalize)

It's worth spending a minute understanding how this works. The map method ordinarily takes only a block (no other parameters). The ampersand notation allows us to pass a proc instead of an explicit attached block if we want. Because we use the ampersand on an object that isn't a proc, the interpreter tries to call to_proc on that object. The resulting proc takes the place of an explicit block so that map will call it repeatedly, once for each element in the array. Now, why does self make sense as the thing passed as a message to the array element? It's because a proc is a closure and therefore remembers the context in which it was created. At the time it was created, self referred to the symbol on which the to_proc was called.