"First of all, just to say that this is really serious stuff in terms of what was done."
"The probability that McCartney wrote it was .018"
"In situations like this, you'd better believe the math because it's much more reliable than people's recollections."
The probability was .018 under their model. This doesn't mean that this is the true probability. Naive Bayes probabilities are typically not very reliable [1]. I have not read the paper, but I think his confident wording makes me question how believable this is.
[1] Ensembles of models are better, like random forest.
I could not find a paper (I was interested in how they constructed the dataset), but only an extended abstract of the talk [1]
It seems the dataset consisted of 70 songs. Since they don't specify the distribution, there is no saying if 80% accuracy is good or worse than average guessing.
An average of 35 samples per class is a serious few-shot constraint, which makes resorting to interpretable and simple Bayesian analysis a sane step. Note how the dataset is grossly underdefined too: 70 samples and 149 features, which can cause problems for more complex algo's.
I think we have to reconstruct this article from behind the pop-sci glasses this was presented with.
> this is really serious stuff in terms of what was done
Sure, it is no joke paper, nor meandering about music theory without any hard proofs.
> "The probability that McCartney wrote it was .018"
Probably this meant to say: The model predicts a 0.018 probability, but that is too careful for a pop-sci article. We can then question the validity of the model.
> you'd better believe the math because it's much more reliable than people's recollections
"Ha-ha! People did a lot of drugs in the 60s!" nothing more...
> And 10 years later, here we are talking about the discovery.
Also cute, in that it makes it sound like one of the authors spend 10 years working on this very basic analysis.
Meta: This article would have been published 10 years ago too (since the finding is interesting from a pop-sci view), but I doubt they would even dare describing the maths behind the publication. Now we are in techno-fetish era and write about the number of layers, GPU hours, BoW, and value networks, just to fool the reader into getting a glimpse of modern AI. While the methods in this article can be implemented in 1 hour of downloading Midi files and another hour of implementing Graham's 2002: A plan for spam. [2]
Needlessly negative/critical, but I could not edit my post. Apologies, I really liked this effort. I dig stylometry, that's why I change my username so much.
Their future research may be really cool, tracing popular chord progressions all through pop history.
>Sure, it is no joke paper, nor meandering about music theory without any hard proofs.
Well, similarly to how one comments without having read the paper, but just the article.
>"Ha-ha! People did a lot of drugs in the 60s!" nothing more...
Points to a common fact: people's memories from their drug-fueled periods are hazy, and a historical fact: the Beatles did a lot of drugs during the mid-to-late sixties.
> The probability was .018 under their model. This doesn't mean that this is the true probability.
Well of course different models will give you different probabilities. But given that this is a past event, talking about "true probability" is a bit weird. The only "true" probabilities are 0 and 1. There's nothing nondeterministic about the past. Someone wrote the song.
That's not how Bayesian statistics work. In this model (on which statistical learning and information theory based) a probability does not require nondeterminism, just incomplete information; and a probability is defined with respect to a certain amount/type of information. (Actually information is defined in terms of things that change probability, and probability is the more "core" concept.)
So for example, if someone rolls a six-sided die, hides it from you, and asks what the probability of a "6" is, under Bayesian theory it's 1/6; if they then tell you that the number rolled was odd, the probability of a "6" drops to zero, even though nothing physically changed, purely because you now have more information.
This is generally a much more useful and intuitive definition of probability than to say "the probability of it being a 6 is either 0 or 1, I just don't know yet".
The problem I have with the Bayesian definition of probability is that it isn't a definition, is it? Since probability is a ratio, you need to define two amounts. So what does the 1 in 1/6 actually mean? And the 6? What does it mean that someone is 1/6 certain of something? How does one measure certainty? The more I think about it the more I'm convinced Bayesian probabilities are a flawed concept.
>"The problem I have with the Bayesian definition of probability is that it isn't a definition, is it?"
You'll have to clarify what exactly your problem with the concept is here.
>"So what does the 1 in 1/6 actually mean? And the 6?"
It means there are six possible outcomes, one of which is of special interest and five others. You have no reason to think any one possibility is more likely than any other.
>"What does it mean that someone is 1/6 certain of something?"
You would take any bet on it with better than 6 to 1 odds. If you have to decide between a 1/6 opportunity and a 1/60th opportunity for the same reward you should choose the first. You can use such probabilities to perform a cost-benefit analysis amongst various possible actions with various costs and rewards.
>"How does one measure certainty?"
I assume you are using "certainty" as a synonym for "probability", in which case one way is to ask people to bet.
>"The more I think about it the more I'm convinced Bayesian probabilities are a flawed concept."
Ok, but your objections here don't seem to have much thought/experience behind them.
I'm not sure what you mean by "isn't a definition." Mathematically, it's extremely well defined - as a ratio of the subjective likelihood of one possibility, compared to the subjective likelihoods of all possibilities. Your exact physical interpretation can vary - from a decision-theoretic (at what odds would it be worth it to make a bet?), to an information-theoretic (how many bits does it take to encode that it's this possibility rather than any other?), etc. Generally, like quantum mechanics, while there are many interpretations of the underlying reality corresponding to the theory, in practice it is impossible to interpret the world accurately (correlation/independence, inference/learning, etc.) with previous (in this case frequentist) theories.
We agree that a probability is a ratio, right? In order to make sense of what a ratio means, we must know what the numerator and the denominator mean. From what I gather, Bayesians say probability is a degree of certainty, so a subjective belief. Okay, but in their calculations, mathematically, it's still a ratio. So what are the terms in this ratio? This is what I'm saying. They don't say it. If you say that these terms are "subjective likelihoods", then you have to provide a definition for this, because it isn't clear at all what a "likelihood" is.
No, probability is a normalized denumerably additive measure defined over a σ-algebra of subset of an abstract space. (see Kolmogorov).
Probability is a completely mathematical construct which is devoid of empirical content. Now, we have used to to varying degrees of success in various situations, by assuming various things which make varying degrees of sense, but it is important keep in mind that all of that follows from a handful of axioms none of which states that probability is a ratio.
Neither classical or Bayesian attempts to give probability empirical meaning make sense in these types of one off situations, though.
Either John wrote the song or Paul did. Therefore P(Paul wrote it) either one or zero. It cannot be anything else. We do not know which one and we are not going to find out by counting or doing arithmetic.
After listening to the song one more time, and reading the article again, I've updated my beliefs that this senior lecturer in Stats fancies himself some kind of cool guy, and is trying to make name for himself by immersing himself in the study of women's beach volleyball[1] and Beatles.
Also, it sounds to me like this something Paul would have written.
Now, can you please explain what it means to calculate that P(Paul wrote it) = 0.018? Note that I am not asking about how the calculations work.
> By your logic, every proposition is either true or false, and therefore the concept of probability is useless.
That would be nonsensical interpretation of my logic. In cases where it is possible to conceive of an experiment being repeated in the future, a "degree of belief" interpretation is appealing. If the answer were going to be revealed objectively and someone were taking bets, sure, fine, I'll go along with that. But, don't make an appeal to the Dutch book argument in a unique one-off case where no one has anything at stake other than free publicity which they have chosen to pursue by counting some stuff and dividing those counts by some numbers and stuff.
In cases where it is possible to conceive of an experiment being repeated in the future
That's just the thing: Bayesian probability is emphatically not about an experiment being repeated in the future. That's the frequentist interpretation of probability.
But, don't make an appeal to the Dutch book argument in a unique one-off case where no one has anything at stake other than free publicity which they have chosen to pursue by counting some stuff and dividing those counts by some numbers and stuff.
???
The Dutch book argument shows that rational people must have subjective probabilities which behave according to the axioms of probability. I don't see the relevance of your statement here at all, particularly the bits about "anything at stake" and "free publicity".
And what are you trying to say by the phrase "counting some stuff and dividing those counts by some numbers and stuff"?
> Bayesian probability is emphatically not about an experiment being repeated in the future. That's the frequentist interpretation of probability.
I am saying something different, but this seems like a futile discussion. What does it mean to have P(Paul wrote it) equal to some number in (0,1)? Let's call that number q. Explain what q means in words.
If the truth were ever going to be revealed, or if I could conceive of the experiment being repeated in the future, I could explain it as "I would be willing to pay up to $q to buy an asset that pays $1 if Paul wrote it and $0 otherwise."
> And what are you trying to say by the phrase "counting some stuff and dividing those counts by some numbers and stuff"?
I mean it is very hard for me to take a person seriously who is doing a whole bunch of interviews without even a working paper somewhere.
It seems that if you use some arithmetic, people are inclined to accept what you did without questioning whether it makes sense to apply such models in this case.
> The Dutch book argument shows that rational people ...
You need to do better than just quoting passages from Wikipedia if you want to understand what that means.
You just provided a valid interpretation for q. So what’s the problem here? I don’t understand why you’re arguing that q must be 0 or 1, which is nonsense.
I don’t object to your comments about the absence of a working paper.
You need to do better than just quoting passages from Wikipedia if you want to understand what that means.
You’re accusing me of not understanding the Dutch book argument. Do you have any reason for this, or is it just a baseless accusation?
> You just provided a valid interpretation for q. So what’s the problem here?
Isn't it obvious that interpretation does not apply in this situation?
> You’re accusing me of not understanding the Dutch book argument.
It is not an accusation. It is a statement.
The Dutch book argument is basically a "no arbitrage" or "no free lunch" argument. Isn't it obvious then that it cannot be used to justify anything where no money is at stake and no bet regarding the outcome can ever be resolved?
PS: This will be my last comment in this thread because I get the distinct feeling that I am talking to Eliza instead of a human being.
> Isn't it obvious that interpretation does not apply in this situation?
Why not?
> It is not an accusation. It is a statement.
It is an accusation. It is also a false statement.
> Isn't it obvious then that it cannot be used to justify anything where no money is at stake and no bet regarding the outcome can ever be resolved?
This is so ridiculous I don't know where to begin. It's like saying the expected value of a dice roll doesn't exist because no roll ever equals 3.5, or the average of any number of trials is not necessarily 3.5. It's claiming a counterfactual is false because its antecedent is false.
> PS: This will be my last comment in this thread because I get the distinct feeling that I am talking to Eliza instead of a human being.
You're welcome to run away with your tail between your legs after making baseless accusations and misunderstanding the concept of probability.
> Bayesian probability is emphatically not about an experiment being repeated in the future. That's the frequentist interpretation of probability.
As I mentioned elsethread, you misunderstand what I am saying. But, just to help you reason this out, answer me this: Where do you think the data to update your beliefs come from?
Mathematically, probability is a function over the domain of values of a random variable, where the sum (for discrete variables) or integral (for continuous ones) over the entire domain is equal to 1. Division happens for some random variable operations (for example, conditional probabilities) but generally multiplication and addition are more common
In this case, its value at 6 happens to be 1/6, because there are six possible values and for a fair die all have equal values of P[Val].
Bayeseian probability is always probability within a model. For things like dice where it's pretty uncontroversial to say the only sensible model is one where the a priori probability of the die landing on each face is equal. For questions like "who wrote the song" it is much less clear what the proper model should be.
Perhaps the most general model is that of algorithmic probability (the probability of a string being output when inputting random strings into a Universal Turing Machine): https://en.wikipedia.org/wiki/Algorithmic_probability. See also Solomonoff induction and the universal semimeasure.
Sorry, but I really have no patience for these purely "philosophical" discussions of probability. I don't see a single equation on that page but I can assure you that to compute a probability you are assuming some form of the likelihood, ie the way you calculate P(D|H), applies to your problem.
but I can assure you that to compute a probability you are assuming some form of the likelihood, ie the way you calculate P(D|H), applies to your problem.
You should really take the time to read the Scholarpedia article (or above sources) on algorithmic probability and Solomonoff induction. It’s one of the most beautiful and elegant examples of mathematically formalizing perennial philosophical intuitions.
The reason I brought up algorithmic probability is because there is a precise mathematical sense in which it is “optimal”. Again, I encourage you to read about universal semimeasures.
>">Sorry, but I really have no patience for these purely "philosophical" discussions of probability.
I’m not sure what you mean by this. Can you elaborate?"
There is nothing in that link that I can apply to a problem. It is all too general and abstract.
What I wanted to see is any example of algorithmic probability being applied to a real world problem. I am certain that to calculate the probability they need to define/derive a model of the phenomenon being modeled to use as the likelihood.
Also, I don't have easy access to those books, nor do I want to get them and flip through looking for what you are talking about, but if I did I am sure I would find that the probability calculation requires a model.
There is nothing in that link that I can apply to a problem. It is all too general and abstract.
Algorithmic probability is a framework for general prediction, so it has applications in reinforcement learning (see AIXI), grammar discovery in linguistics, text compression, continuation of integer sequences, etc. See http://www.scholarpedia.org/article/Applications_of_algorith....
I am certain that to calculate the probability they need to define/derive a model of the phenomenon being modeled to use as the likelihood... I am sure I would find that the probability calculation requires a model
Yes, and that model is a universal computer, as stated in the articles I linked to.
>"Yes, and that model is a universal computer, as stated in the articles I linked to."
This basically says "the model is anything at all". From the first example in the first link, I see the probability in that case is conditional on the binomial model correctly describing the phenomenon (equation 8). This is what I expected is going on.
This basically says "the model is anything at all".
No, no, no! It says the model is powerful enough to compute any computable function. This is where the optimality of universal measures comes from.
From the first example in the first link, I see the probability in that case is conditional on the binomial model correctly describing the phenomenon (equation 8).
Equation 8 is presenting the binomial model as an example that is different from the universal machine model.
>"No, no, no! It says the model is powerful enough to compute any computable function. This is where the optimality of universal measures comes from."
Then the model must be whatever function it is computing/approximating for that case. Otherwise I can't make sense of what you keep saying. You must be using "model" in a sense that is so general as to have no meaning to me.
>"Equation 8 is presenting the binomial model as an example that is different from the universal machine model."
Ok, where is an actual probability is calculated from the 'universal machine model"? That is all I want to see at this point (which was the original topic). I want to see some sort of real life phenomenon for which this procedure calculates a probability, so I can tell what it means. Not a general proof or something like that, some real data or simple data generating process.
Sure. GGP was claiming that it's only 0.018 "under this model", GP claimed no models mattered because this is a past event and probability is undefined, and I (parent comment) said that it does make sense to talk about probability in this context with some model.
In this case, a good model would probably include things like "a songwriter tends to write things similar to their other works", "this is the set of works written by Paul McCartney", "this is the set of works written by John Lennon", "these are the works they did together at varying degrees of cooperation", and "this is how a creative process works". All very hard to quantify and program, but possible with expert help.
Under the Bayesian definition, probability is a degree of belief. Probabilities belong “inside” an agent, not somewhere “out there” in the world.
For a justification of this, see the Dutch Book Argument: “The main point of the Dutch Book Argument is to show that rational people must have subjective probabilities for random events, and that these probabilities must satisfy the standard axioms of probability.”
I wish I was as confident in anything as these guys are in the predictive accuracy of their simple model with an out-of-sample accuracy of 80%[1] with limited and questionable training data.
[1] Accuracy is probably a misleading performance metric here as well
I think it's very possible when you collaborate, you mimic the other person's speech or writing patterns. Especially since these guys worked together for so long, you'd think they could finish each others verses.
The other issue with this kind of analysis is it assumes people are like robots, but sometimes artists just do decide to do something radically different to any of their prior work.
Most people with a certain level of obsession about The Beatles would certainly recognize the song as Lennon's, though with some amount of input from McCartney that could range from no involvement to half-and-half.
It's too conjunct to be Paul's, but that doesn't mean he didn't give any input.
I question this analysis because it seems like an easy target -- the song obviously has Lennon's fingerprint on it (a melody hovering around just a few notes with thick harmonies), and the headline of "study reveals new insight into Beatles songwriting" is too juicy for my liking.
What this article seems to also be missing is that there are actually two versions of the song. I'd say that the final recording is indisputably Lennon, but the original is very much McCartney.
The final version's lyrics are abstract and have the interplay of dark and light. The original version is very concrete and almost ballad-like, very much McCartney's style (it even mentions Penny Lane).
What this kind of approach misses is that shared writing credits don't necessarily mean writing together at the same time. While the two have been known to go off and write a piece together, I think there's a decent argument to be made that McCartney sketched out the idea for "In My Life" and Lennon refined it.
Both John and Paul agree that Paul wrote at least some of the music, and yet this article says that John wrote the whole thing. I believe John and Paul.
Thanks. I wish I had an extension that redirects to text-only npr automatically (their button doesn't redirect to the actual article and I'm assuming they broke their own site on purpose). One of those days I might actually sit down and start making Firefox extensions - especially if I can put together a basic template for quick one-off single-purpose GPDR fixers and such.
Speaking from experience, it isn't too hard. The MDN WebExtension documentation is a great place to start, and with very little effort you can write an extension that works in chrome off the same codebase.
For something as simple as this, writing a separate webext is probably overkill.
There are already extensions that let you automatically make custom redirects. (Or you could abuse HTTPS Everywhere for this purpose.)
Or you could make a bookmarklet. Or make a user script for Greasemonkey (or a similar extension). For example, I'm using this to fix the "Decline" button:
I’m too lazy to read the paper right now but I’m curious: if we can’t trust the memories of Paul & John, how did they train the model on the 70 other songs in the first place?
> Mathematics professor Jason Brown spent 10 years working with statistics to solve the magical mystery.
> The three co-authors of this paper — there was someone called Mark Glickman who was a statistician at Harvard. He's also a classical pianist. Another person, another Harvard professor of engineering, called Ryan Song. And the third person was a Dalhousie University mathematician called Jason Brown.
It took three people ten years to do this? Also all the reporting here is awful. This is nothing like a proof.
Lennon and McCartney took equal credit for every song, regardless of who actually wrote what. In some cases, it’s come out who the real songwriter was. In this case, there was disagreement.
John Lenon and Paul McCartney split credit for all the songs they wrote during the Beatles, even if the song was primarily or totally written by just one of them. People like to speculate about who wrote what, and after the Beatles broke up both John's would sometimes say who did, but they didn't always agree.
"The Two Johns" is usually used for the band They Might Be Giants, who are John Linnell and John Flansburgh (plus a technically rotating backup band, but usually Danny Weiskopf, Dan Miller and Marty Beller).
There isn't much significance on the Beatles side of it. On the computer/algorithm/bayesian/tech/math side it might be moderately significant to some.
On the Beatles side, this is one of the few Lennon/McCartney songs where there was any recorded disagreement as to who contributed what. The results of this don't convince me, and given that one of the two involved is no longer here, it won't ever get settled.
Edit, FYI: When I posted this comment, the link title repeated the story's title "Math Proves John Lennon Wrote 'In My Life'".
More like "arithmetic suggests" ... There is no mathematical proof here.
See also [1]:
The model assumes correlated multinomial counts for the
bags-of-words as a function of authorship which is then
inverted using Bayes rule. Out-of-sample classification
accuracy for songs with known authorship was 80%. We
demonstrate the results to songs during the study
period with unknown authorship.
Radio interviews with the presenting author are listed on his web site[2].
My question is: Was the number of the samples relatively small? Both authors made much more music than only while being in the Beatles, and if the later songs (since the Beatles split) were also (or only!) used for training then I could imagine that such a training can be sufficient. But if only the Beatles songs were used, I suspect it could be possibly not enough to be completely sure. Moreover, I can imagine that the contribution to some songs was made by both in various phases, and that that could confuse the algorithm.
"The probability that McCartney wrote it was .018"
"In situations like this, you'd better believe the math because it's much more reliable than people's recollections."
The probability was .018 under their model. This doesn't mean that this is the true probability. Naive Bayes probabilities are typically not very reliable [1]. I have not read the paper, but I think his confident wording makes me question how believable this is.
[1] Ensembles of models are better, like random forest.