Wednesday, September 11, 2013

Objections to Bayesian Statistics: Lars Syll pulls a fast one on his readers

Since my original post on Keynes, Bayes, and the law, Lars Syll has posted 5 subsequent entries on his blog about Bayesianism, so by frequency alone it's fair to infer that the subject is close to his heart. The general problem is that, when expressed in his own words his objections are baseless (e.g saying here that "The Bayesian rule of thumb is to simply assume that all outcomes are equally likely"), whereas when quoting from others it's impossible to know what is the argument he is trying to make. But apart from this, as I wrote in the comment section of his blog, this last post of his is, in my opinion, irredeemably misleading.

Syll borrows the title for the blog post from a 2008 article by Andrew Gelman and proceeds to quote very strong criticisms to Bayesian inference. The thing is, these are NOT Gelman's criticisms, but rather those of a hypothetical anti-Bayesian created by him to voice the objections. There are several passages in the article where this is clearly explained by Gelman, but all passages were purposefully omitted by Syll. Someone reading the blog post and not the article would rightly assume that these are criticisms that Gelman is raising himself (much like the April fool's joke on which the article is based). Worse, Syll does not make any reference whatsoever to the follow up article where Gelman presents a spirited defence of Bayesian methods.

Syll says that his purpose was to quote from an eminent statistician (Andrew Gelman) who "realized that these are strong arguments [against Bayesianism] to be taken seriously—and ultimately accepted in some settings and refuted in others." That is fine, but why do so in a way that implies that said statistician is trying to attack Bayesian inference, when in fact he is defending it?

Finally, in response to my comment Syll says: "A quote is — yes — a quote. Nothing more, nothing less." Oh yeah? Well start with this quote:

“Here follows the list of objections from a hypothetical or paradigmatic non-Bayesian:
Bayesian inference is a coherent mathematical theory but I don’t trust it in scientific applications.” Andrew Gelman
and compare it with:

“Bayesian inference is a coherent mathematical theory but I don’t trust it in scientific applications.” Andrew Gelman
Same thing right? I don't think so.

Sunday, September 8, 2013

Keynes, Bayes, and the law

This is a combined response to a post by John Kay on math and story telling, followed by another by Lars Syll on probabilistic reductionism and a final one by Philip Pilkington on multiple versions of probabilities. Since I read the posts in reverse order, I'll structure my response in the same way.

So starting with Pilkington, to say that there are alternative probabilities, one preferred by trained statisticians and another adopted by lawyers and judges is akin to say that there are alternative versions of chemistry, one suitable for the laboratory and another, more subtle and full of nuances, adopted by the refined minds of cooks and winemakers, honed in by hundreds or even thousands of years of experience. Clear nonsense, of course: the fact that a cook or winemaker uses tradition, taste, and rules of thumb, does not change the underlying chemistry. Same with probabilities and the court room.

Before leaving Pilkington's post behind, let me just observe that he seems to be under the impression that confidence intervals and such are the domain of Bayesian statistics, whereas arguments based on the "degree of belief" are something else altogether. But anyone with a basic understanding of both confidence intervals and Bayesian statistics knows that nothing could be farther from the truth, as explained in this very clear post by normaldeviate, where one can find the statement that "Bayesian inference is the Analysis of Beliefs" - as simple as that.

But Pilkington can be (partially) excused by the fact that he's getting his definitions from Lars Syll, who doesn't score any better in understanding or explaining Bayesianism. Syll gives the following example:

Say you have come to learn (based on own experience and tons of data) that the probability of you becoming unemployed in Sweden is 10%. Having moved to another country (where you have no own experience and no data) you have no information on unemployment and a fortiori nothing to help you construct any probability estimate on. A Bayesian would, however, argue that you would have to assign probabilities to the mutually exclusive alternative outcomes and that these have to add up to 1, if you are rational. That is, in this case – and based on symmetry – a rational individual would have to assign probability 10% to becoming unemployed and 90% of becoming employed.

While it is certainly true that a Bayesian would argue that you have to assign probabilities to the mutually exclusive events and that these have to add up to 1, no Bayesian would EVER say, based on symmetry or whatever, that a rational individual would have to assign probability 10% to becoming unemployed and 90% of becoming employed. A Bayesian could not care less how someone comes up with their priors. All a Bayesian says is that the priors need to add up to one and subsequently be revised in face of experience according to Bayes theorem. In this example, an assignment of 10% and 90% is just as rational as the exact opposite, namely 90% and 10%. What matters is that these priors eventually get corrected by new evidence. The only effect of a bad prior is that the correction takes slightly longer and requires a bit more evidence, that's all (for the technically minded, I should say that the only priors that don't get corrected by evidence are those of the form 0% to one event and 100% to another - no amount of new evidence can change these). Although trivial, this point is important to understand Syll's rejection of Bayesianism. For example, in this other post he explains why he's a Keynesian and not a Bayesian in terms of a "paradox" created by "the principle of insufficient reason", which is yet another way to select a prior and has precious little to do with Bayesianism.

Next, moving on to Kay and his person-hit-by-a-bus example, evidently no court should find Company A liable simply because it has more buses than Company B, but absent any other information, this is a pretty decent way to form a prior. Another one is to come up with a narrative about the person and the bus. But in either case, the court should look at further evidence and recalculate its belief that a bus from Company A actually hit the person. For example they could hear testimony from eye witnesses or look at video footage and use Bayes theorem to find the posterior probabilities, which would then enter the "balance of probabilities" to lead to a decision. A court that finds Company A liable purely based on a story without looking at evidence is just as stupid as a one that bases its decision on the number of buses from each company.

But the key passage in Kay's piece is:

Such narrative reasoning is the most effective means humans have developed of handling complex and ill-defined problems. A court can rarely establish a complete account of the probabilities of the events on which it is required to adjudicate. Similarly, an individual cannot know how career and relationships will evolve. A business must be steered into a future of unknown and unknowable dimensions. 
So while probabilistic thinking is indispensable when dealing with recurrent events or histories that repeat themselves, it often fails when we try to apply it to idiosyncratic events and open-ended problems. We cope with these situations by telling stories, and we base decisions on their persuasiveness. Not because we are stupid, but because experience has told us it is the best way to cope. That is why novels sell better than statistics texts.

so let me addressed this lest I receive hand-waving that I have not understood the criticism. There are two fundamental misunderstandings here. The first has to do with what it means to give a "complete account of the probabilities of the events" and the second is that probabilistic thinking involves some form of definite knowledge about the future, which has "unknown and unknowable dimensions".

Now if by a "complete account of the probabilities of the events" one means "to assign probabilities to the mutually exclusive alternative outcomes and that these have to add up to 1", then as we have seen this is exactly what Bayes requires. But notice that "complete account" here simply means slicing up all mutually exclusive events that one is interested in (the technical term is a partition), and this can be as simple as two events (say hit by bus from Company A or from Company B, or being employed of unemployed in Norway). It does NOT mean a complete account of all the complex and ill-defined phenomena that led a person to be hit by a bus, or found in a cafe in Oslo with a diminishing amount of money in their pocket. Once the events of interest are identified, ANY method for assigning priors is fair game. This could be a narrative, or historical data, or an agent-based model for people, buses, and firms.

Finally, about the notion that probabilistic thinking requires strong assumptions about the future, one hears often that because economics (or law, or politics, or baseball) is not ergodic, past experience is no guide to the future, and therefore there's no hope in using probability. As I said elsewhere, Bayes theorem is a way to update beliefs (expressed as probabilities) in face of new information, and as such

could not care less if the prior probabilities change because they are time-dependent, the world changed, or you were too stupid to assign them to begin with. It is only a narrow frequentist view of prediction that requires ergodicity (and a host of other assumptions like asymptotic normality of errors) to be applicable.

A few related exclamations to conclude:

Mathematical models are stories like any other!

Non-ergodicity is the friend of good modellers and story tellers alike!

So is irreducible uncertainty!

Think probabilistically! Estimate nothing!


Wednesday, August 28, 2013

Accounting identities for the Keen model

This is long overdue, but now that Steve Keen is visiting the Fields Institute again I thought I should revisit the topic of aggregate demand, income, and debt in his models. Loyal readers will recall that this was the subject of a somewhat heated exchange on this and other blogs last year. 

So here is the full balance sheet/transaction/flow of funds table for one of Keen's models, together with the implied accounting relationships, as well as my take on the "demand = income + change in debt" statement and its variants. 

As I had mentioned in one of the old posts, the point it to disaggregate the firms and households from the banking sector, so that endogenous money creation can play a significant role in the story. 

In my view, accounting identities that put the entire private sector together (i.e firms + houses + banks) somehow obfuscate the role of endogenous money and end up putting undue emphasis in the only other relevant observation, namely that private sector surplus should equal government deficit. 

Wednesday, August 21, 2013

Small-brain economics: Pilkington's strong prior against mathematics

You know you have made it in heterodox economics when someone claims that you have a plan to destroy post-Keynesianism as we know it – muhahahaha. Hyperbole aside, this is essentially what one Philip Pilkington thinks that I’m doing, as he explains in this rant, itself a spin-off of a discussion that started at the INET YSI Facebook page and went somewhat astray (the thread containing it has since been closed by the moderator of the page).

Pilkington, a journalist-cum-research assistant currently working on his dissertation at Kingston University, frames the discussion around two alleged sins that I have committed, namely, (i) not knowing what I’m talking about and (ii) mistake model for reality and make grandiose claims.

As evidence of the first sin, he offers this comment of mine from the facebook discussion (emphasis added here, you’ll see why in a second):

OK, this ergodicity nonsense gets thrown around a lot, so I should comment on it. You only need a process (time series, system, whatever) to be ergodic if you are trying to make estimates of properties of a given probability distribution based on past data. The idea is that enough observations through time (the so called time-averages) give you information about properties of the probability distribution over the sample space (so called ensemble averages). So for example you observe a stock price long enough and get better and better estimates of its moments (mean, variance, kurtosis, etc). Presumably you then use these estimates in whatever formula you came up with (Black-Scholes or whatever) to compute something else about the future (say the price of an option). The same story holds for almost all mainstream econometric models: postulate some relationship, use historical time series to estimate the parameters, plug the parameters into the relationship and spill out a prediction/forecast.

Of course none of this works if the process you are studying in non-ergodic, because the time averages will NOT be reliable estimates of the probability distribution. So the whole thing goes up in flames and people like Paul Davidson goes around repeating “non-ergodic, non-ergodic” ad infinitum. The thing is, none of this is necessary if you take a Bayes’s theorem view of prediction/forecast. You start by assigning prior probabilities to models (even models that have nothing to do with each other, like an IS/LM model and a DSGE model with their respective parameters), make predictions/forecasts based on these prior probabilities, and then update them when new information becomes available. Voila, no need for ergodicity. Bayesian statistics could not care less if the prior probabilities change because they are time-dependent, the world changed, or you were too stupid to assign them to begin with. It is only a narrow frequentist view of prediction that requires ergodicity (and a host of other assumptions like asymptotic normality of errors) to be applicable. Unfortunately, that’s what’s used by most econometricians. But it doesn’t need to be like that. My friend Chris Rogers from Cambridge has a t-shirt that illustrates this point. It says: “Estimate Nothing!”. I think I’ll order a bunch and distribute to my students.

Pilkington then goes on to say:

It is not clear that Grasselli’s approach here can be used in any meaningful way in empirical work. What we are concerned with as economists is trying to make predictions about the future.
These range from the likely effects of policy, to the moves in markets worldwide. What Grasselli is interested in here is the robustness of his model. He wants to engage in schoolyard posturing saying “my model is better than your model because it made better predictions”.

Wait, what? Exactly which part of “make predictions/forecasts based on these prior probabilities, and then update them when new information becomes available” is not clear? Never mind that I’d give a pound of my own flesh for this to be the Grasselli approach (it’s actually the Bayesian approach), the sole purpose of it is to make precise predictions and then update them based on new evidence, so it’s baffling that Pilkington has difficulties understanding how it can be used in empirical work. Not to mention the glaring contradiction of saying in one breath that
“What we are concerned with as economists is trying to make predictions about the future” and admonishing me in the next for allegedly claiming that “my model is better than your model because it made better predictions”. So you want to make predictions, but somehow don’t think that a model that makes better predictions is better than one that made worse predictions. Give me a minute to collect my brains from across the room…

Back to my comment on the “ergodicity nonsense”, the key point was that it is the frequentist approach to statistics that forces one to make estimates of priors based on past time series, and this requires a lot of assumptions, including ergodicity. In Bayesian statistics, the modeler is free (in fact encouraged) to come up with her own priors, based on a combination of past experience, theoretical understanding, and personal judgment. Fisher, the father of the frequentist approach, wanted to ban any subjectivity from statistics, advocating instead (I’m paraphrasing here) that one should “Estimate everything!”. By contrasts, Bayesians will tell you that you should estimate when you can, but supplement it with whatever else you like. To illustrate how historical estimates are not only misleading (for example when the underlying process is non-ergodic) but also unnecessary in Bayesian statistics, Chris Rogers has the mantra “Estimate nothing!”. But nowhere does it say “Predict nothing!”. On the contrary, once again, the nexus “make predictions based on probabilities--compare with reality--change the probabilities”  is what the approach is all about. So on the topic of advice for t-shirt making, Pilkington should wear one that says “I ought to read the paragraphs I quote” in front, followed by “and try to avoid self-contradictions” on the back.

Moving on, as evidence for the second sin, Pilkington quotes another long comment of mine with the “clearest explanation” (his words, but I agree!) of what I’m doing, namely:

I’m not comparing models, I’m comparing systems within the same model. Say System 1 has only one locally stable equilibrium, whereas System 2 has two (a good one and a bad one). Which one has more systemic risk? There’s your first measure. Now say for System 2 you have two sets of initial conditions: one well inside the basin of attraction for the good equilibrium (say low debt) and another very close to the boundary of the basin of attraction (say with high debt). Which set of initial conditions pose higher systemic risk? There’s your second measure. Finally, you are monitoring a parameter that is known to be associated with a bifurcation, say the size of the government response when employment is low, and the government needs to decide between two stimulus packages, one above and one below the bifurcation threshold. Which policy lead to higher systemic risk? There’s your third measure.

He then goes on to paraphrase it in a much more clumsy way (question: if someone already gave you the clearest explanation about something, why should you explain again? Just to make it worse?):

What Grasselli is doing here is creating a model in which he can simulate various scenarios to see which one produces high-risk and which will produce low-risk environments within said model. But is this really “measuring systemic risk”? I don’t think that it is. To say that it is a means to measure systemic risk would be like me saying that I found a way to measure the size of God and then when incredulous people came around to my house to see my technique they would find a computer simulation I had created of what I think to be God in which I could measure Him/Her/It.

Now I don’t know what the God example is all about, but Pilkington seems to think that the only way to measure something is to go out with an instrument (a ruler, for example) and take a measurement. The problem is that risk, almost by definition, is a property if future events, and you cannot take a measurement in the future. ALL you can do is to create a model of the future and then “measure” the risk of something within the model. As Lady Gaga would say “oh there ain’t no other way”. For example, when you drive along the Pacific Coast Highway and read a sign on the side of the road that says “the risk of forest fire today is high”, all it means is that someone has a model (based on previous data, the theory of fire propagation, simulations and judgment) that takes as inputs the measurements of observed quantities (temperature, humidity, etc) and calculates probabilities of scenarios in which a forest fire arises. As time goes by and the future turns into the present you then observe the actual occurrence of forest fires and see how well the model performs according to the accuracy of the predictions, at which point you update the model (or a combination of models) based on, you guessed it, Bayes’s theorem.

So that’s it for the accusation of mistaken model for reality. But still on the second fundamental sin according to Pilkington, recall that its second part consists of making “grandiose claims about what they have achieved or will potentially achieve that ring hollow when scrutinized”, against which his advice is to “tone down the claims they are making lest they embarrass the Post-Keynesian community at large”. This is all fine, but it sounds a bit rich coming from someone who published a piece titled Teleology and Market Equilibrium:Manifesto for a General Theory of Prices, which upon scrutiny contains some common platitudes about neoclassical denials of empirical evidence, followed by this claim:

My goal is to lay out a general theory of prices in the same way Keynes laid out a general theory of employment and output. This will provide a framework in which the neoclassical case of downward-sloping demand curves and upward-sloping supply curves is a highly unlikely special case. With such a framework we can then approach particular cases as they arise in a properly empirical manner. In doing this I hope to be able to introduce the Keynesian theory to pricing; and with that, I think, the neoclassical doctrines will be utterly destroyed and a full, coherent alternative will be available. Fingers crossed!

So instead of an actual general theory of prices, Pilkington’s “manifesto” states his goal to be as great as Keynes and utterly destroy the neoclassical doctrines. Hear the low tone!

Which brings me to Keynes’s advice on the use of mathematics that Pilkington also quotes in his rant. Again, if you read the quote carefully you see that Keynes warns against “symbolic pseudo-mathematical methods” and complains that “Too large a proportion of recent ‘mathematical’ economics are mere concoctions”. Notice the prefix pseudo and the inverted commas around the word mathematics in the original quote, which suggests that Keynes’s peeve was not with mathematics itself, but with the “imprecise…initial assumptions they rest on”. In particular, Keynes singles out methods that “expressly assume strict independence between the factors involved”, which is admittedly a very stupid assumption, but in no way necessary for the application of (true, i.e not pseudo) mathematical methods. For example, NONE of the models I work with assume independence between factors, on the contrary, they highlight the complex and surprisingly rich interdependencies, as well as their consequences.

I conclude in meta fashion with a Bayesian framing of this discussion itself. Pilkinton has a very strong prior that my mathematical methods are useless and he’s honest enough to say so: “I heard about the work Grasselli and others were doing at the Fields Institute some time ago and I was instantly skeptical”. He bases this prior on a well-documented post-Keynesian intellectual tradition, as well as a slightly misguided notion of what constitutes “giant formal models” (my models are actually pretty easy low-dimensional dynamical systems, but if you are Bart Simpsons then I guess anything looks like a giant formal model). By contrast, I have a very strong prior that my models are useful, based on similarly well-documented intellectual tradition of applications of mathematics in other areas of study. Pilkington concedes that he might be wrong and promises unreserved praise if that turns out to be true. Likewise, I might be wrong, in which case I’ll abandon the models and do something else. In either case, we’ll both be traveling the same Road to Wisdom, as in the exceptional poem by Piet Hein:

Well, it's plain
and simple to express.
Err and err and err again,
but less and less and less.

Nate Silver uses this poem as inspiration for the title of the chapter in his book explaining the Bayesian approach, which he ends with a meta statement of his own: “Bayes’s theorem predicts that Bayesians will win”.

Estimate nothing!

Wednesday, April 17, 2013

My take on Reinhart-Rogoff

I have not blogged for a while, but this is too important to ignore, mostly because for the past two years or so I have been supervising different groups of undergraduate students on a data-driven project based on reproducing and extending the results in the Reinhart-Rogoff book.

So let me start with a few observations about the book:

- the dataset used in the book is not readily available, despite what the authors say. What one can download from their website is secondary data prepared by the authors, for example the full set of crises dates for each country. On the other hand, the book does provide a more or less complete list of sources, from which we were able to download something like 95% of the primary data used by the authors (e.g from places like the IMF, World Bank, the Madison project, etc).

- the books is full of small errors, like figures that do not quite match what their caption says, or numerical results that turned out to be slightly wrong when we tried to reproduced them based on the primary data, but overall we didn't find any major errors and agreed with almost all of their conclusions. Moreover, we were able to successfully implement the signals approach described towards the end of the book for currency, banking, and stock market crisis (not mentioned in the book!).

- in other words, the book is both solid and a useful launchpad for further research, if only a little sloppy.

Now, the story is very different (pun intended!) regarding their 2010 paper. Right from the beginning I thought that the possibility of reverse causation alone was enough reason not to take the results too seriously, so I didn't even bothered to try to reproduce the results.

But alas, HAP have done the work and found that even the numerical results presented in the paper were significantly wrong, not to mention the conclusions. And the response from Carmem Reinhart is even more appalling than the article itself. Basically she claims that HAP also find a negative correlation between debt and growth, so what's the big deal?

Of course the big deal is that their original paper implied the existence of a hard threshold at 90% debt-to-GDP beyond which the slowdown in growth was very rapid, indicating some kind of nonlinear amplifying effects that would likely plunge the country into a state of crisis. But as it turns out, there's nothing special about the 90% mark, with the relationship being approximately linear all the way through, and therefore very manageable.

In any case, I wish they had never written the 2010 paper (and perhaps by now they wish the same), if only because it's probably going to drag their good and useful book (along with their reputation in general) through the well-deserved mud where they find themselves at the moment.