An exercise for the student
I'm a creature of habit. There are a lot of areas in which I don't like trying new things, especially if I've already found something that I already really like. This is particularly true when it comes to music; in fact, I pretty much dislike every song I hear, the first time I hear it. (There have been some exceptions, but then sometimes I got sick of those songs really easily.)
As you might imagine, this makes finding new music to listen to rather tricky. However, I've discovered that if I buy an album which has a few songs on it that I already know and like, I can generally suffer through listening to the rest of them until I actually start liking them, as well.
I used to buy a new CD ever time I had to make a cross country trip, and then I'd put the CD on shuffle until I'd heard all of the songs I liked. (Then I'd typically take it out and put in an old favorite.)
Which brings us to our exercise.
Given a CD with m songs I like on it and n total tracks (where we assume that n ≥ m), how many songs will I have to listen to on average before I hear all of the songs that I like? (Remember that the CD player is on shuffle, not random, so each track will play exactly once before any of them repeat.)
I once derived a general solution to this, which I'll add as a comment, but I don't understand why it works, only that it does. (For low values of m and n, at least. Katya's Last Theorem, anyone?)
17 Comments:
(n+1)m/(m+1)
Hey, it's a non-weaksauce version of the drawing socks out of a drawer problem. Cool.
I'm not sure how to solve it mathematically, though I see a few promising leads Googling 'probability "how many draws"', particularly Discrete Probability. I'm about halfway through a Python script to rigorously confirm your formula empirically, though.
(Thanks for posting this. We're kindred spirits in riddling.)
Heh. Yeah, I guessed you'd be one of the people to take the challenge. I'm wondring if CPM will show up, too.
Also, the first time I derived this, it was without pen and paper because I was driving across Nebraska at the time, so I had to keep everything my head. It took a LONG time because I kept forgetting things. (But then, there's not much else to do when you're driving across Nebraska . . .)
Oh man I feel like I used to know how to do this at one point. Bawb I'm curious how your empirical study goes...
Posted the Katya's enigma script.
I'm getting close to working the formal math out for "random" with Wikipedia: Summation, though "shuffle" is still beyond me.
The probability of drawing one of n favorites from m songs is n/m. The expected number of draws required for the first song is thus m/n. Next, you want one of n-1 songs, still out of m. This will take an average of m/(n-1) tracks. The total expected track count is thus:
Σ(i from 0 to n-1) of
(m/(n-i))
m * Σ(i from 0 to n-1) of
(1/(n-i))
m * Σ(i from 1-2n to -n) of
(1/i)
I'll have to look at it more tomorrow. (I'll probably have to break it up into (i from 0 to 1-2n) and (i from 0 to -n) and use the i^p formula.)
From the script, the formula without replacements appears to approach nm/(n+1) for high m. I got pretty close for nm/(n+1) + 2n/m, but it's not quite right yet.
Proof:
No matter how they are mixed together, there are m! ways the liked songs can be ordered, and (n-m)! ways the disliked songs can be ordered.
The number of ways that will result in only m songs being heard is m!*(n-m)!*C(m-1,0).
The number of ways that will result in only m+1 songs being heard is m!*(n-m)!*C(m,1).
The number of ways that will result in only m+2 songs being heard is m!*(n-m)!*C(m+1,2).
The number of ways that will result in only m+3 songs being heard is m!*(n-m)!*C(m+2,3).
And so on.
A few notes:
C(x,y) denotes the choose function, giving the number of possible ways to choose y objects out of a group of x. C(x,y)=x!/(y!*(x-y)!). For example, in this case, C(m+1,2) represents the number of ways that 2 disliked songs can appear in the first m+1 songs.
Since hearing exactly m+2 songs requires that 2 disliked songs be distributed among the first m+1 songs, the coefficient C(m+1,2) gives the number of ways that this can happen.
C(m-1,0) is included mostly to make the math work; it is equal to 1.
Now, since there are n! total ways to order n songs, the probability that each song count will occur is the number of ways to achieve that many songs divided by n!. For example, the probability that m+2 songs will be heard is m!*(n-m)!*C(m+1,2)/n!.
Using these probabilities, we can find the average number of songs that are played. The average is m*p(m) + (m+1)*p(m+1) + (m+2)*p(m+2) + ...
Here, p(x) denotes the probability of achieving a total song count of x songs.
Mathematically, this gives:
m*m!*(n-m)!*C(m-1,0)/n!
+ (m+1)*m!*(n-m)!*C(m,1)/n!
+ (m+2)*m!*(n-m)!*C(m+1,2)/n!
+ (m+3)*m!*(n-m)!*C(m+2,3)/n!
+ ...
Now, using a lot of manipulation and substitution (this may be hard to follow in this format, so writing it out by hand may be easier; "sum" represents sigma notation):
m!*(n-m)!/n! * sum[i=0 to n-m]{(m+i)*C(m-1+i,i)}
= m!*(n-m)!/n! * sum[i=0 to n-m]{((m+i)*(m+i-1)!)/(i!*(m-1)!)}
(by the formula for the choose function)
= m!*(n-m)!/n! * sum[i=0 to n-m]{(m+i)!/(i!*(m-1)!)}
= m!*(n-m)!/(n!*(m-1)!) * sum[i=0 to n-m]{(m+i)!/i!}
(taking the (m-1)! term out of the sum since it does not depend on i)
= m!*(n-m)!/(n!*(m-1)!) * m! * sum[i=0 to n-m]{(m+i)!/(i!*m!)}
(multiplying and dividing by m!)
= m!*(n-m)!/(n!*(m-1)!) * m! * sum[i=0 to n-m]{C(m+i,i)}
(using the formula for the choose function again)
= m!*(n-m)!/(n!*(m-1)!) * m! * C(n+1,n-m)
(using a combinatorial identity known as parallel summation; a proof is here)
= m!*(n-m)!/(n!*(m-1)!) * m! * (n+1)!/((m+1)!*(n-m)!)
(using the formula for the choose function again)
= m!*(n-m)!/(n!*(m-1)!) * (n+1)!/((m+1)*(n-m)!)
= (m!*(n-m)!*(n+1)!)/(n!*(m-1)!*(n-m)!*(m+1))
= (m!*(n+1)!)/(n!*(m-1)!*(m+1))
= m*(n+1)/(m+1)
There is more likely a more elegant way to prove this. But, this works (and hopefully at least someone can follow it).
Wow! Thanks Quandary (and everyone). Clearly I need to be posting more math problems on my blog. :)
Wait, he got (n+1)m/(m+1) too? That predicts that it'll only take two songs on average for the one favorable song to be played out of an arbitrarily long playlist, doesn't it? Am I misunderstanding the problem?
bawb - You've got n and m switched, I think. If there's only one favorite song, the equation becomes
(n+1)/2
So the average amount number of tracks you'll have to listen to is the total number of tracks, plus one, divided by two. That value tends toward infinity for arbitrarily large playlists (which should be reassuring).
Oh, man, I was mixing up m and n. I figured out that mn/(n+1) was always about n/(n+1) off, meaning (mn+n)/(n+1), meaning n(m+1)/(n+1), meaning . . . aw, crap.
Well, that was fun.
Heh. Well, at least your faith in math and common sense is restored.
.
Unrelated: are you really reading På lerfötter as På lerfötter?
Th. - And not "Feet of Clay"? Yes.
So the non-mathematical person wonders: why not just buy individual songs on iTunes and make a CD that only contains songs you like...
Jenn - Because if I listen to the songs long enough, I might learn to like more of them. I.e., the first time I go through a CD, I might only like two songs on it, but by the fifth time I hear it, I might really like four or five. I figure it's good for me to have to listen to popular music, once in a while. :)
Post a Comment
<< Home