Section 4.1. How Pictures are Encoded

[Page 76 (continued)]

4.1. How Pictures are Encoded

Pictures (images, graphics) are an important part of any media communication. In this chapter, we discuss how pictures are represented on a computer (mostly as bitmap imageseach dot or pixel is represented separately) and how they can be manipulated.

Pictures are two-dimensional arrays of pixels (which is short for picture element). In this section, each of these terms will be described.

For our purposes, a picture is an image stored in a JPEG file. JPEG is an international standard for how to store images with high quality but in little space. JPEG is a lossy compression format. That means that it is compressed, made smaller, but not with 100% of the quality of the original format. Typically, though, what gets thrown away is stuff that you don't see or don't notice anyway. For most purposes, a JPEG image works fine.

[Page 77]

If we want to write programs to manipulate JPEG images we need to understand how they are stored and displayed. To do this we need to understand arrays, matrices, pixels, and color.

An array is a sequence of elements, each with an index number associated with it (Figure 4.1). The first element in an array is at index 0, the second at index 1, the third at index 2, and so on. The last element of the array will always be at the length of the array minus one. An array with five elements will have its last element at index 4.

Figure 4.1. A depiction of the first five elements in an array.

It may sound strange to say that the first element of an array is at index 0 but the index is based on the distance from the beginning of the array to the element. Since the first item of the array is at the beginning of the array the distance is 0. Why is the index based on the distance? Array values are stored one after the other in memory. This makes it easy to find any element of the array by multiplying the size of each element by the index and adding it to the address of the beginning of the array. If you are looking for the element at index 3 in an array and the size of each element is 4 bytes long and the array starts at memory location 26 then the 3rd element is at (3 * 4 + 26 = 12 + 26 = 38).

Every time you join a line (queue) of people, you are in something like an array. All you usually care about is how far you are from the front of the line. If you are at the front of the line, then that is index 0 (you are next). If you are the second one in line, then you are at index 1 (there is one person in front of you). If you are the third person in line, then you are at index 2 (there are two people in front of you).

Arrays are a great way to store lots of data of the same type. You wouldn't want to create a different variable for every pixel in a picture when there are hundreds of thousands of pixels in a picture. Instead you use an array of pixels. You still need a way to refer to a particular pixel, so we use an index for that. You can access elements of an array in Java using arrayName[index]. For example, to access the first element in an array variable named pixels use pixels[0]. To access the second element use pixels[1]. To access the third element use pixels[2]. You can get the number of items in an array using arrayName.length. So, to access the last element in the array use arrayName[arrayName.length - 1].

To declare an array in Java you specify the type and then use open and close square brackets followed by a name for the array.

[Page 78]

> double[] grades; > System.out.println(grades); null

or you could have specified the square brackets after the variable name:

> double grades[]; > System.out.println(grades); null

The above code declares an array of doubles with the name grades. Notice though that this just declared an object reference and set it to null. It didn't create the array. In Java you can create an array and specify the values for it at the same time:

> double[] gradeArray = {80, 90.5, 88, 92, 94.5}; > System.out.println(gradeArray.length); 5 > System.out.println(gradeArray[0]); 80.0 > System.out.println(gradeArray[4]); 94.5

Making it Work Tip: Using Dot Notation for Public Fields

Notice that there are no parentheses following arrayName.length. This is because length is not a method but a public field (data). Public fields can be accessed using dot notation objectName.fieldName. Methods always have parenthesis after the method name even if there are no input parameters, such as FileChooser.pickAFile().

A two-dimensional array is a matrix. A matrix is a collection of elements arranged in both a horizontal and vertical sequence. For one-dimensional arrays, you would talk about an element at index i, that is array[i]. For two-dimensional arrays, you can talk about an element at row r and column c, that is, matrix[r][c]. This is called row-major order.

Have you ever played the game Battleship^TM? If you have, then you had to specify both the row and column of your guess (B-3). This means row B and column 3 (Figure 4.2). Have you ever gone to a play? Usually your ticket has a row and seat number. These are both examples of row-major two-dimensional arrays.

Figure 4.2. The top-left corner of a Battleship guess board with a miss at B-3. (This item is displayed on page 79 in the print version)

Another way to specify a location in a two-dimensional array is column-major order which specifies the column first and then the row: matrix[c][r]. This is how we normally talk about pictures by using an x for the horizontal location and a y for the vertical location such as matrix[x][y]. Picture data is represented as a column-major two-dimensional array.

Java actually creates multidimensional arrays as arrays of arrays. When you have a two-dimensional array, the first index is the location in the outer array, and the second is the location in the inner array. You can think of the outer array as either the rows or the columns. So Java isn't row-major or column-major, but you will create and work with your arrays in either row-major or column-major fashion (Figure 4.3). Just be sure to be consistent.

[Page 79]

Figure 4.3. Picturing a 2D array as row-major or column-major.

In Figure 4.4, you see an example matrix. Using column-major order for the coordinates (0, 0) (horizontal, vertical), you'll find the matrix element whose value is 15. The element at (1, 1) is 7, (2, 1) is 43, and (3, 1) is 23. We will often refer to these coordinates as (x, y) (horizontal, vertical).

Figure 4.4. An example matrix (two-dimensional array) of numbers.

What's stored at each element in the picture is a pixel. The word "pixel" is short for "picture element." It's literally a dot, and the overall picture is made up of lots of these dots. Have you ever taken a magnifying glass to view pictures in a newspaper or magazine, or to a television or even your own computer monitor? Figure 4.5 was generated by capturing as an image the top-left part of the DrJava window and then magnifying it 600%. It's made up of many, many dots. When you look at the picture in the magazine or on the television, it doesn't look like it's broken up into millions of discrete spots, but it is.

[Page 80]

Figure 4.5. Upper-left corner of the DrJava window with a portion magnified 600%.

You can get a similar view of individual pixels using the picture explorer, which is discussed later in this chapter. The picture explorer allows you to zoom a picture up to 500% so that each individual pixel is visible (Figure 4.6).

Figure 4.6. Image shown in the picture explorer: 100% image on left and 500% on right (close-up of the branch over the mountain).

Our human sensor apparatus can't distinguish (without magnification or other special equipment) the small bits in the whole. Humans have low visual acuitywe don't see as much detail as, say, an eagle. We actually have more than one kind of vision system in use in our brain and our eyes. Our system for processing color is different than our system for processing black-and-white (or luminance). We actually pick up luminance detail better with the sides of our eyes than the center of our eyes. That's an evolutionary advantage because it allows you to pick out the sabertooth tiger sneaking up on you from the side.

[Page 81]

The lack of resolution in human vision is what makes it possible to digitize pictures. Animals that perceive greater details than humans (e.g., eagles or cats) may actually see the individual pixels. We break up the picture into smaller elements (pixels), but there are enough of them and they are small enough that the picture doesn't look choppy when viewed from a normal viewing distance. If you can see the effects of the digitization (e.g., lines have sharp edges, you see little rectangles in some spots), we call that pixelizationthe effect when the digitization process becomes obvious.

Picture encoding is actually more complex than sound encoding. A sound is inherently linearit progresses forward in time. It can be represented using a one-dimensional array. A picture has two dimensions, a width and a height.

4.1.1. Color Representations

Visible light is continuousvisible light is any wavelength between 370 and 730 nanometers (0.00000037 and 0.00000073 meters). But our perception of light is limited by how our color sensors work. Our eyes have sensors that trigger (peak) around 425 nanometers (blue), 550 nanometers (green), and 560 nanometers (red). Our brain determines what color we "see" based on the feedback from these three sensors in our eyes. There are some animals with only two kinds of sensors, like dogs. Those animals still perceive color, but not the same colors nor in the same way as humans do. One of the interesting implications of our limited visual sensory apparatus is that we actually perceive two kinds of orange. There is a spectral visiona particular wavelength that is natural orange. There is also a mixture of red and yellow that hits our color sensors just right so that we perceive it as the same orange.

Based on how we perceive color, as long as we encode what hits our three kinds of color sensors, we're recording our human perception of color. Thus, we can encode each pixel as a triplet of numbers. The first number represents the amount of red in the pixel. The second is the amount of green, and the third is the amount of blue. We can make up any human-visible color by combining red, green, and blue light (Figure 4.7). Combining all three gives us pure white. Turning off all three gives us black. We call this the RGB color model.

Figure 4.7. Merging red, green, and blue to make new colors. (This item is displayed on page 82 in the print version)

There are other models for defining and encoding colors besides the RGB color model. There's the HSV color model which encodes Hue, Saturation, and Value (sometimes also called the HSB color model for Hue, Saturation, and Brightness). The nice thing about the HSV model is that some notions, like making a color "lighter" or "darker" map cleanly to it, e.g., you simply change the saturation (Figure 4.8). Another model is the CMYK color model, which encodes Cyan, Magenta, Yellow, and blacK ("B" could be confused with Blue). The CMYK model is what printers usethose are the inks they combine to make colors. However, the four elements means more to encode on a computer, so it's less popular for media computation. RGB is the most popular model on computers.

[Page 82]

Figure 4.8. Picking colors using the HSB color model.

Each color component (sometimes called a channel) in a pixel is typically represented with a single byte, eight bits. Eight bits can represent 256 patterns (2⁸): 0000000, 00000001, up through 11111111. We typically use these patterns to represent the values 0 to 255. Each pixel, then, uses 24 bits to represent colors. That means that there are 2²⁴ possible patterns of 0's and 1's in those 24 bits. That means that the standard encoding for color using the RGB model can represent 16,777,216 colors. We can actually perceive more than 16 million colors, but it turns out that it just doesn't matter. Humans have no technology that comes even close to being able to replicate the whole color space that we can see. We do have devices that can represent 16 million distinct colors, but those 16 million colors don't cover the entire space of color (nor luminance) that we can perceive. So, the 24 bit RGB model is adequate until technology advances.

[Page 83]

There are computer models that use more bits per pixel. For example, there are 32 bit models which use the extra 8 bits to represent transparencyhow much of the color "below" the given image should be blended with this color? These additional 8 bits are sometimes called the alpha channel. There are other models that actually use more than 8 bits for the red, green, and blue channels, but they are uncommon.

We actually perceive borders of objects, motion, and depth through a separate vision system. We perceive color through one system, and luminance (how light/dark things are) through another system. Luminance is not actually the amount of light, but our perception of the amount of light. We can measure the amount of light (e.g., the number of photons reflected off the color) and show that a red and a blue spot each are reflecting the same amount of light, but we'll perceive the blue as darker. Our sense of luminance is based on comparisons with the surroundingsthe optical illusion in Figure 4.9 highlights how we perceive gray levels. The two end quarters are actually the same level of gray, but because the two mid quarters end in a sharp contrast of lightness and darkness, we perceive that one end is darker than the other.

Figure 4.9. The ends of this figure are the same colors of gray, but the middle two quarters contrast sharply so the left looks darker than the right.

Most tools for allowing users to pick out colors let the users specify the color as RGB components. The Macintosh offers RGB sliders in its basic color picker (Figure 4.10). The color chooser in Java offers a similar set of sliders (Figure 4.11).

Figure 4.10. The Macintosh OS X RGB color picker.

[Page 84]

Figure 4.11. Picking a color using RGB sliders from Java.

As mentioned a triplet of (0, 0, 0) (red, green, blue components) is black, and (255, 255, 255) is white. (255, 0, 0) is pure red, but (100, 0, 0) is red, toojust darker. (0, 100, 0) is a dark green, and (0, 0, 100) is a dark blue.

When the red component is the same as the green and as the blue, the resultant color is gray. (50, 50, 50) would be a fairly dark gray, and (150, 150, 150) is a lighter gray.

Figure 4.12 is a representation of pixel RGB triplets in a matrix representation. In column-major order the pixel at (1, 0) has color (30, 30, 255) which means that it has a red value of 30, a green value of 30, and a blue value of 255it's a mostly blue color, but not pure blue. Pixel at (2, 1) has pure green but also more red and blue ((150, 255, 150)), so it's a fairly light green.

Figure 4.12. RGB triplets in a matrix representation.

[Page 85]

Images on disk and even in computer memory are usually stored in some kind of compressed form. The amount of memory needed to represent every pixel of even small images is pretty large (Table 4.1). A fairly small image of 320 pixels wide by 240 pixels high, with 24-bits per pixel, takes up 230, 400 bytesthat's roughly 230 kilobytes (1,000 bytes) or 1/4 megabyte (million bytes). A computer monitor with 1,024 pixels across and 768 pixels vertically with 32-bits per pixel takes up over 3 megabytes just to represent the screen.

Table 4.1. Number of bytes needed to store pixels at various sizes and formats
	320x240	640x480	1,024x768
24-bit color	230,400 bytes	921,600 bytes	2,359,296 bytes
32-bit color	307,200 bytes	1,228,800 bytes	3,145,728 bytes

Computer Science Idea: Kilobyte (kB) versus Kibibyte (KiB or K or KB)

The term kilobyte has caused problems because it has been interpreted differently by different groups. Computer scientists formerly used it to mean 2 to the 10th power which is 1,024 bytes. Telecommunications engineers used it to mean 1,000 bytes. The International Electrotechnical Commission (IEC) decreed in 1998 to call 1,024 bytes a kibibyte (KiB) and 1,000 bytes a kilobyte. Similarly a mebibyte is defined to be 2 raised to the 20th power, and a megabyte is 1,000,000 bytes (one million bytes). A gibibyte is defined to be 2 raised to the 30th power, and a gigabyte is defined to be 1,000,000,000 (one billion bytes).