Section 4.2. Data Types

4.2. Data Types

The operation of a Python program hinges on the data it handles. All data values in Python are objects, and each object, or value, has a type. An object's type determines which operations the object supports, or, in other words, which operations you can perform on the data value. The type also determines the object's attributes and items (if any) and whether the object can be altered. An object that can be altered is known as a mutable object, while one that cannot be altered is an immutable object. I cover object attributes and items in detail in "Object attributes and items" on page 46.

The built-in type(obj) accepts any object as its argument and returns the type object that is the type of obj. Built-in function isinstance(obj, type) returns true if object obj has type type (or any subclass thereof); otherwise, it returns False.

Python has built-in types for fundamental data types such as numbers, strings, tuples, lists, and dictionaries, as covered in the following sections. You can also create user-defined types, known as classes, as discussed in "Classes and Instances" on page 82.

4.2.1. Numbers

The built-in number objects in Python support integers (plain and long), floating-point numbers, and complex numbers. In Python 2.4, the standard library also offers decimal floating-point numbers, covered in "The decimal Module" on page 372. All numbers in Python are immutable objects, meaning that when you perform any operation on a number object, you always produce a new number object. Operations on numbers, also known as arithmetic operations, are covered in "Numeric Operations" on page 52.

Note that numeric literals do not include a sign: a leading + or -, if present, is a separate operator, as discussed in "Arithmetic Operations" on page 52.

4.2.1.1. Integer numbers

Integer literals can be decimal, octal, or hexadecimal. A decimal literal is represented by a sequence of digits in which the first digit is nonzero. To denote an octal literal, use 0 followed by a sequence of octal digits (0 to 7). To indicate a hexadecimal literal, use 0x followed by a sequence of hexadecimal digits (0 to 9 and A to F, in either upper- or lowercase). For example:

 1, 23, 3493                  # Decimal integers       01, 027, 06645               # Octal integers 0x1, 0x17, 0xDA5             # Hexadecimal integers

In practice, you don't need to worry about the distinction between plain and long integers in modern Python, since operating on plain integers produces results that are long integers when needed (i.e., when the result would not fit within the range of plain integers). However, you may choose to terminate any kind of integer literal with a letter L (or l) to explicitly denote a long integer. For instance:

 1L, 23L, 99999333493L        # Long decimal integers 01L, 027L, 01351033136165L   # Long octal integers 0x1L, 0x17L, 0x17486CBC75L   # Long hexadecimal integers

Use uppercase L here, not lowercase l, which might look like the digit 1. The difference between long and plain integers is one of implementation. A long integer has no predefined size limit; it may be as large as memory allows. A plain integer takes up just a few bytes of memory and its minimum and maximum values are dictated by machine architecture. sys.maxint is the largest positive plain integer available, while -sys.maxint-1 is the largest negative one. On 32-bit machines, sys.maxint is 2147483647.

4.2.1.2. Floating-point numbers

A floating-point literal is represented by a sequence of decimal digits that includes a decimal point (.), an exponent part (an e or E, optionally followed by + or -, followed by one or more digits), or both. The leading character of a floating-point literal cannot be e or E; it may be any digit or a period (.). For example:

 0., 0.0, .0, 1., 1.0, 1e0, 1.e0, 1.0e0

A Python floating-point value corresponds to a C double and shares its limits of range and precision, typically 53 bits of precision on modern platforms. (Python offers no way to find out the exact range and precision of floating-point values on your platform.)

4.2.1.3. Complex numbers

A complex number is made up of two floating-point values, one each for the real and imaginary parts. You can access the parts of a complex object z as read-only attributes z.real and z.imag. You can specify an imaginary literal as a floating-point or decimal literal followed by a j or J:

 0j, 0.j, 0.0j, .0j, 1j, 1.j, 1.0j, 1e0j, 1.e0j, 1.0e0j

The j at the end of the literal indicates the square root of -1, as commonly used in electrical engineering (some other disciplines use i for this purpose, but Python has chosen j). There are no other complex literals. To denote any constant complex number, add or subtract a floating-point (or integer) literal and an imaginary one. For example, to denote the complex number that equals one, use expressions like 1+0j or 1.0+0.0j.

4.2.2. Sequences

A sequence is an ordered container of items, indexed by nonnegative integers. Python provides built-in sequence types known as strings (plain and Unicode), tuples, and lists. Library and extension modules provide other sequence types, and you can write yet others yourself (as discussed in "Sequences" on page 109). You can manipulate sequences in a variety of ways, as discussed in "Sequence Operations" on page 53.

4.2.2.1. Iterables

A Python concept that generalizes the idea of "sequence" is that of iterables, covered in "The for Statement" on page 64 and "Iterators" on page 65. All sequences are iterable: whenever I say that you can use an iterable, you can, in particular, use a sequence (for example, a list).

Also, when I say that you can use an iterable, I mean, in general, a bounded iterable, which is an iterable that eventually stops yielding items. All sequences are bounded. Iterables, in general, can be unbounded, but if you try to use an unbounded iterable without special precautions, you could easily produce a program that never terminates, or one that exhausts all available memory.

4.2.2.2. Strings

A built-in string object (plain or Unicode) is a sequence of characters used to store and represent text-based information (plain strings are also sometimes used to store and represent arbitrary sequences of binary bytes). Strings in Python are immutable, meaning that when you perform an operation on strings, you always produce a new string object, rather than mutating an existing string. String objects provide many methods, as discussed in detail in "Methods of String Objects" on page 186.

A string literal can be quoted or triple-quoted. A quoted string is a sequence of zero or more characters enclosed in matching quotes, single (') or double ("). For example:

 'This is a literal string' "This is another string"

The two different kinds of quotes function identically; having both allows you to include one kind of quote inside of a string specified with the other kind without needing to escape them with the backslash character (\):

 'I\'m a Python fanatic'           # a quote can be escaped "I'm a Python fanatic"            # this way is more readable

All other things being equal, using single quotes to denote string literals is a more common Python style. To have a string literal span multiple physical lines, you can use a backslash as the last character of a line to indicate that the next line is a continuation:

 "A not very long string\ that spans two lines"             # comment not allowed on previous line

To make the string output on two lines, you can embed a newline in the string:

 "A not very long string\n\ that prints on two lines"         # comment not allowed on previous line

A better approach is to use a triple-quoted string, which is enclosed by matching triplets of quote characters (''' or """):

 """An even bigger string that spans three lines"""                    # comments not allowed on previous lines

In a triple-quoted string literal, line breaks in the literal are preserved as newline characters in the resulting string object.

The only character that cannot be part of a triple-quoted string is an unescaped backslash, while a quoted string cannot contain unescaped backslashes, nor line ends, nor the quote character that encloses it. The backslash character starts an escape sequence, which lets you introduce any character in either kind of string. Python's string escape sequences are listed in Table 4-1.

Table 4-1. String escape sequences
Sequence	Meaning	ASCII/ISO code
`\<newline>`	End of line is ignored	None
`\\`	Backslash	`0x5c`
`\'`	Single quote	`0x27`
`\"`	Double quote	`0x22`
`\a`	Bell	`0x07`
`\b`	Backspace	`0x08`
`\f`	Form feed	`0x0c`
`\n`	Newline	`0x0a`
`\r`	Carriage return	`0x0d`
`\t`	Tab	`0x09`
`\v`	Vertical tab	`0x0b`
`\DDD`	Octal value `DDD`	As given
`\xXX`	Hexadecimal value `XX`	As given
`\other`	Any other character	`0x5c` + as given

A variant of a string literal is a raw string. The syntax is the same as for quoted or triple-quoted string literals, except that an r or R immediately precedes the leading quote. In raw strings, escape sequences are not interpreted as in Table 4-1, but are literally copied into the string, including backslashes and newline characters. Raw string syntax is handy for strings that include many backslashes, as in regular expressions (see "Pattern-String Syntax" on page 201). A raw string cannot end with an odd number of backslashes; the last one would be taken as escaping the terminating quote.

Unicode string literals have the same syntax as other string literals, with a u or U immediately before the leading quote. Unicode string literals can use \u followed by four hex digits to denote Unicode characters and can include the escape sequences listed in Table 4-1. Unicode literals can also include the escape sequence \N{name}, where name is a standard Unicode name, as listed at http://www.unicode.org/charts/. For example, \N{Copyright Sign} indicates a Unicode copyright sign character (©). Raw Unicode string literals start with ur, not ru. Note that raw strings are not a different type from ordinary strings: raw strings are just an alternative syntax for literals of the usual two string types, plain (a.k.a. byte strings) and Unicode.

Multiple string literals of any kind (quoted, triple-quoted, raw, Unicode) can be adjacent, with optional whitespace in between. The compiler concatenates such adjacent string literals into a single string object. If any literal in the concatenation is Unicode, the whole result is Unicode. Writing a long string literal in this way lets you present it readably across multiple physical lines and gives you an opportunity to insert comments about parts of the string. For example:

 marypop = ('supercalifragilistic'  # Open paren -> logical line continues            'expialidocious')       # Indentation ignored in continuation

The string assigned to marypop is a single word of 34 characters.

4.2.2.3. Tuples

A tuple is an immutable ordered sequence of items. The items of a tuple are arbitrary objects and may be of different types. To specify a tuple, use a series of expressions (the items of the tuple) separated by commas (,). You may optionally place a redundant comma after the last item. You may group tuple items within parentheses, but the parentheses are necessary only where the commas would otherwise have another meaning (e.g., in function calls), or to denote empty or nested tuples. A tuple with exactly two items is often known as a pair. To create a tuple of one item (often known as a singleton), add a comma to the end of the expression. To denote an empty tuple, use an empty pair of parentheses. Here are some tuples, all enclosed in the optional parentheses:

 (100, 200, 300)            # Tuple with three items (3.14,)                    # Tuple with one item ( )                            # Empty tuple (parentheses NOT optional!)

You can also call the built-in type tuple to create a tuple. For example:

 tuple('wow')

This builds a tuple equal to:

 ('w', 'o', 'w')

tuple( ) without arguments creates and returns an empty tuple. When x is iterable, tuple(x) returns a tuple whose items are the same as the items in x.

4.2.2.4. Lists

A list is a mutable ordered sequence of items. The items of a list are arbitrary objects and may be of different types. To specify a list, use a series of expressions (the items of the list) separated by commas (,) and within brackets ([]). You may optionally place a redundant comma after the last item. To denote an empty list, use an empty pair of brackets. Here are some example lists:

 [42, 3.14, 'hello']        # List with three items [100]                      # List with one item []                         # Empty list

You can also call the built-in type list to create a list. For example:

 list('wow')

This builds a list equal to:

 ['w', 'o', 'w']

list( ) without arguments creates and returns an empty list. When x is iterable, list(x) creates and returns a new list whose items are the same as the items in x. You can also build lists with list comprehensions, as discussed in "List comprehensions" on page 67.

4.2.3. Sets

Python 2.4 introduces two built-in set types, set and frozenset, to represent arbitrarily ordered collections of unique items. These types are equivalent to classes Set and ImmutableSet found in standard library module sets, which also exists in Python 2.3. To ensure that your module uses the best available sets, in any release of Python from 2.3 onwards, place the following code at the start of your module:

 try:   set except NameError:   from sets import Set as set, ImmutableSet as frozenset

Items in a set may be of different types, but they must be hashable (see hash on page 162). Instances of type set are mutable, and therefore not hashable; instances of type frozenset are immutable and hashable. So you can't have a set whose items are sets, but you can have a set (or frozenset) whose items are frozensets. Sets and frozensets are not ordered.

To create a set, call the built-in type set with no argument (this means an empty set) or one argument that is iterable (this means a set whose items are the items of the iterable).

4.2.4. Dictionaries

A mapping is an arbitrary collection of objects indexed by nearly arbitrary values called keys. Mappings are mutable and, unlike sequences, are not ordered.

Python provides a single built-in mapping type, the dictionary type. Library and extension modules provide other mapping types, and you can write others yourself (as discussed in "Mappings" on page 110). Keys in a dictionary may be of different types, but they must be hashable (see hash on page 162). Values in a dictionary are arbitrary objects and may be of different types. An item in a dictionary is a key/value pair. You can think of a dictionary as an associative array (known in other languages as a "map," "hash table," or "hash").

To specify a dictionary, you can use a series of pairs of expressions (the pairs are the items of the dictionary) separated by commas (,) within braces ({}). You may optionally place a redundant comma after the last item. Each item in a dictionary is written as key:value, where key is an expression giving the item's key and value is an expression giving the item's value. If a key appears more than once in a dictionary literal, only one of the items with that key is kept in the resulting dictionary objectdictionaries do not allow duplicate keys. To denote an empty dictionary, use an empty pair of braces. Here are some dictionaries:

 {'x':42, 'y':3.14, 'z':7 }   # Dictionary with three items and string keys {1:2, 3:4 }                  # Dictionary with two items and integer keys {}                           # Empty dictionary

You can also call the built-in type dict to create a dictionary in a way that, while less concise, can sometimes be more readable. For example, the dictionaries in this last snippet can also, equivalently, be written as, respectively:

 dict(x=42, y=3.14, z=7)      # Dictionary with three items and string keys dict([[1, 2], [3, 4]])       # Dictionary with two items and integer keys dict( )                          # Empty dictionary

dict( ) without arguments creates and returns an empty dictionary. When the argument x to dict is a mapping, dict returns a new dictionary object with the same keys and values as x. When x is iterable, the items in x must be pairs, and dict(x) returns a dictionary whose items (key/value pairs) are the same as the items in x. If a key appears more than once in x, only the last item with that key is kept in the resulting dictionary.

When you call dict, in addition to or instead of the positional argument x you may pass named arguments, each with the syntax name=value, where name is an identifier to use as an item's key and value is an expression giving the item's value. When you call dict and pass both a positional argument and one or more named arguments, if a key appears both in the positional argument and as a named argument, Python associates to that key the value given with the named argument (i.e., the named argument "wins").

You can also create a dictionary by calling dict.fromkeys. The first argument is an iterable whose items become the keys of the dictionary; the second argument is the value that corresponds to each key (all keys initially have the same corresponding value). If you omit the second argument, the value corresponding to each key is None. For example:

 dict.fromkeys('hello', 2)   # same as {'h':2, 'e':2, 'l':2, 'o':2} dict.fromkeys([1, 2, 3])    # same as {1:None, 2:None, 3:None}

4.2.5. None

The built-in None denotes a null object. None has no methods or other attributes. You can use None as a placeholder when you need a reference but you don't care what object you refer to, or when you need to indicate that no object is there. Functions return None as their result unless they have specific return statements coded to return other values.

4.2.6. Callables

In Python, callable types are those whose instances support the function call operation (see "Calling Functions" on page 73). Functions are callable. Python provides several built-in functions (see "Built-in Functions" on page 158) and supports user-defined functions (see "The def Statement" on page 70). Generators are also callable (see "Generators" on page 78).

Types are also callable, as we already saw for the dict, list, and tuple built-in types. (See "Built-in Types" on page 154 for a complete list of built-in types.) As we'll discuss in "Python Classes" on page 82, class objects (user-defined types) are also callable. Calling a type normally creates and returns a new instance of that type.

Other callables are methods, which are functions bound to class attributes and instances of classes that supply a special method named _ _call_ _.

4.2.7. Boolean Values

Every data value in Python can be taken as a truth value: true or false. Any nonzero number or nonempty container (e.g., string, tuple, list, set, or dictionary) is true. 0 (of any numeric type), None, and empty containers are false. Be careful about using a floating-point number as a truth value: such use is equivalent to comparing the number for exact equality with zero, and floating-point numbers should almost never be compared for exact equality!

Built-in type bool is a subclass of int. The only two values of type bool are TRue and False, which have string representations of 'true' and 'False', but also numerical values of 1 and 0, respectively. Several built-in functions return bool results, as do comparison operators. You can call bool(x) with any x as the argument. The result is true if x is true and False if x is false. Good Python style is not to use such calls when they are redundant: always write if x:, never if bool(x):, if x==True:, if bool(x)==True, and so on.