Hack 54. Check Your iPod s Honesty

Hack 54. Check Your iPod's Honesty

Find out how random your iPod's "random" shuffle really is.

Personalized song ratings in Apple's iTunes, the software that allows you to play songs on your iPod, lets you quickly find your favorites and helps the Party Shuffle feature play more of what you like most. The algorithm iTunes uses to pick what comes next in the playlist is meant to select randomly from your favorites. Is it really random, though?

After hearing one artist played over and over during a shuffled play of your entire music library in iTunes, you might think your player has a preference of its own. Apple, though, claims the iTunes's shuffle algorithm is completely random. The shuffle algorithm chooses songs without replacement. In other words, much like going through a shuffled deck of cards, you will hear each song only once until you have heard them all (or until you have stopped the player or selected a different playlist).

iTunes's Party Shuffle is a different matter. Its algorithm selects songs with replacement, meaning the entire library is reshuffled after each song is played (like reshuffling a deck of cards after every time a card is drawn). The "Play higher rated songs more often" option does exactly what it says, but how much preference is given to higher rated songs?

This hack originally appeared as an article on the OmniNerd web site at http://www.omninerd.com/.

Assessing iTunes's Selection Procedures

I wanted to test two different song selection options: Party Shuffle and "Play higher rated songs more often." I created a short playlist of six songs: one from each different star rating and a song left unrated. The songs were from the same genre and artist and were each changed to be only one second in duration.

I conducted my tests on iTunes 5. iTunes 6 has added a Smart Shuffle feature, which may decrease the chances of hearing songs from the same artist or album consecutively, but I haven't tested it yet.

After resetting the play count to zero, I hit Play and left my desk for the weekend. I ran the same songs twice: once selecting random (Party Shuffle) and once selecting both random and the "Play higher rated songs more often" option. Table 5-8 shows the play counts, as of Monday morning.

Table Song selection distribution
	Random selection		Based on rating
Song rating	Times played	Percentage of total	Times played	Percentage of total
None	9,105	16.70 percent	2,052	3.9 percent
1	9,055	16.60 percent	6,238	11.8 percent
2	9,090	16.67 percent	8,125	15.4 percent
3	9,114	16.71 percent	10,020	18.9 percent
4	9,027	16.55 percent	12,158	23.0 percent
5	9,146	16.77 percent	14,293	27.0 percent
Total	54,537	100 percent	52,886	100 percent

The play counts in the random trial were very close to each other, as can be expected with a random selection. For the trial based on song ratings (or rating biased selection), the preference algorithm appears to be linear from 12 percent to 27 percent for the rated songs. Moving from the five-star rating downward, the linear preference declines around 4 percent with each step down in rating, but the drop doubles from one-star to unrated, with a fall of 8 percent. While one star might seem like the lowest rating, no rating proved the black sheep of the lot.

Your iPod assumes that if you haven't provided a rating for a song, you must want to hear it even less frequently than those songs to which you have assigned your lowest rating. This is a bit like choosing a movie with bad reviews over a movie that hasn't been reviewed.

Figure 5-2 shows the effects of different song selection options. You can judge the randomness of the true random selection option by seeing if those "Random" bars in the figure all seem the same height. The linear nature of the "Rating Biased" barscan be judged by imagining whether there are equal jumps in height as one moves from a rating of 1 to a rating of 5.

Figure 5-2. Patterns of song selection

Calculating the Statistics of the Selection Process

Changing the number of songs within each rating changes the probabilities for each song's selection. With multiple songs of each rating, the chance of a song with rating r coming up next in the ratings-biased Party Shuffle can be calculated using this expression:

Subscripts in this expression indicate the song rating. The chance of a song being chosen is based on x (number of songs with each rating) and P (the proportional weight assigned by the iTunes algorithm for each rating).

With iTunes's preference probabilities for each rating determined from the weekend-long sampling run, here's the resulting expression:

Although the higher-rated songs are given preference, you will not definitively hear more five-star rated songs than all other songs. Let's assume most people follow a normal distribution for their ratings [Hack #23], with the three-star rating being the most common. Table 5-9 displays a hypothetical iTunes library with this bell-shaped curve for the rating song count.

Table Typical song rating distribution
Song rating	Number of songs
None	72
1	321
2	1,527
3	1,812
4	507
5	95

If I run these hypothetical numbers through our frequency equations, I get a distribution that looks like Figure 5-3.

Figure 5-3. Probability distribution of song selection

As you can see in Figure 5-3, the chance of a song with a particular rating coming up next in the playlist is greatly determined by the song count within the rating. The iTunes preference for higher-rated songs and dislike for lower-rated songs only slightly raises or lowers the probability determined first from the song count.

These chances of hearing a song with a certain rating can be applied to find the chances of hearing a particular song. If we remove the song count from the numerator in the song selection expression, we can calculate the chance of a certain specific song, not just the rating, coming up next:

Explaining Statistical Surprises

About a month after running these tests, I noticed my iTunes Party Shuffle at work played the same song two times in a row. This was the first time I had noticed a consecutive repeat, and I checked the playlist. Not only did I find Nirvana's "Territorial Pissings" listed twice in a row, but A.F.I.'s "Death of Seasons" was listed twice in a row three tracks later.

I use the "Play higher rated songs more often" option, but these were each middle-of-the-road 3-star songs, and my song library has nearly 4,000 songs. The odds might seem outrageous at first, but you have to realize just how many songs you hear throughout a workday. If I average 10 hours at work each day and average a 3¹/₂-minute song duration, odds say I should hear a consecutive repeat in less than a month.

Many claim to still see patterns as iTunes rambles through their music collection, but the majority of these patterns are simply multiple songs from the same artist. Think of it this way: if you have 2,000 songs and 40 of them are from the same artist, there is always about a 2 percent chance of hearing them next with random play. Right after one of their songs finishes, odds show a 50 percent chance a song by the same artist will play again within the next 35 songs and a 64 percent chance they will be played again within the next 50 songs. This can be calculated following this equation:

As we have seen in other hacks, a low likelihood event (such as our 2 percent chance of repeating an artist) becomes a highly likely event after just a few opportunities [Hack #46].

It's simply the mind's tendency to find a pattern that makes you think iTunes has a preference.