Wednesday, September 11, 2013

Objections to Bayesian Statistics: Lars Syll pulls a fast one on his readers

Since my original post on Keynes, Bayes, and the law, Lars Syll has posted 5 subsequent entries on his blog about Bayesianism, so by frequency alone it's fair to infer that the subject is close to his heart. The general problem is that, when expressed in his own words his objections are baseless (e.g saying here that "The Bayesian rule of thumb is to simply assume that all outcomes are equally likely"), whereas when quoting from others it's impossible to know what is the argument he is trying to make. But apart from this, as I wrote in the comment section of his blog, this last post of his is, in my opinion, irredeemably misleading.

Syll borrows the title for the blog post from a 2008 article by Andrew Gelman and proceeds to quote very strong criticisms to Bayesian inference. The thing is, these are NOT Gelman's criticisms, but rather those of a hypothetical anti-Bayesian created by him to voice the objections. There are several passages in the article where this is clearly explained by Gelman, but all passages were purposefully omitted by Syll. Someone reading the blog post and not the article would rightly assume that these are criticisms that Gelman is raising himself (much like the April fool's joke on which the article is based). Worse, Syll does not make any reference whatsoever to the follow up article where Gelman presents a spirited defence of Bayesian methods.

Syll says that his purpose was to quote from an eminent statistician (Andrew Gelman) who "realized that these are strong arguments [against Bayesianism] to be taken seriously—and ultimately accepted in some settings and refuted in others." That is fine, but why do so in a way that implies that said statistician is trying to attack Bayesian inference, when in fact he is defending it?

Finally, in response to my comment Syll says: "A quote is — yes — a quote. Nothing more, nothing less." Oh yeah? Well start with this quote:

“Here follows the list of objections from a hypothetical or paradigmatic non-Bayesian:
Bayesian inference is a coherent mathematical theory but I don’t trust it in scientific applications.” Andrew Gelman
and compare it with:

“Bayesian inference is a coherent mathematical theory but I don’t trust it in scientific applications.” Andrew Gelman
Same thing right? I don't think so.

Sunday, September 8, 2013

Keynes, Bayes, and the law

This is a combined response to a post by John Kay on math and story telling, followed by another by Lars Syll on probabilistic reductionism and a final one by Philip Pilkington on multiple versions of probabilities. Since I read the posts in reverse order, I'll structure my response in the same way.

So starting with Pilkington, to say that there are alternative probabilities, one preferred by trained statisticians and another adopted by lawyers and judges is akin to say that there are alternative versions of chemistry, one suitable for the laboratory and another, more subtle and full of nuances, adopted by the refined minds of cooks and winemakers, honed in by hundreds or even thousands of years of experience. Clear nonsense, of course: the fact that a cook or winemaker uses tradition, taste, and rules of thumb, does not change the underlying chemistry. Same with probabilities and the court room.

Before leaving Pilkington's post behind, let me just observe that he seems to be under the impression that confidence intervals and such are the domain of Bayesian statistics, whereas arguments based on the "degree of belief" are something else altogether. But anyone with a basic understanding of both confidence intervals and Bayesian statistics knows that nothing could be farther from the truth, as explained in this very clear post by normaldeviate, where one can find the statement that "Bayesian inference is the Analysis of Beliefs" - as simple as that.

But Pilkington can be (partially) excused by the fact that he's getting his definitions from Lars Syll, who doesn't score any better in understanding or explaining Bayesianism. Syll gives the following example:

Say you have come to learn (based on own experience and tons of data) that the probability of you becoming unemployed in Sweden is 10%. Having moved to another country (where you have no own experience and no data) you have no information on unemployment and a fortiori nothing to help you construct any probability estimate on. A Bayesian would, however, argue that you would have to assign probabilities to the mutually exclusive alternative outcomes and that these have to add up to 1, if you are rational. That is, in this case – and based on symmetry – a rational individual would have to assign probability 10% to becoming unemployed and 90% of becoming employed.

While it is certainly true that a Bayesian would argue that you have to assign probabilities to the mutually exclusive events and that these have to add up to 1, no Bayesian would EVER say, based on symmetry or whatever, that a rational individual would have to assign probability 10% to becoming unemployed and 90% of becoming employed. A Bayesian could not care less how someone comes up with their priors. All a Bayesian says is that the priors need to add up to one and subsequently be revised in face of experience according to Bayes theorem. In this example, an assignment of 10% and 90% is just as rational as the exact opposite, namely 90% and 10%. What matters is that these priors eventually get corrected by new evidence. The only effect of a bad prior is that the correction takes slightly longer and requires a bit more evidence, that's all (for the technically minded, I should say that the only priors that don't get corrected by evidence are those of the form 0% to one event and 100% to another - no amount of new evidence can change these). Although trivial, this point is important to understand Syll's rejection of Bayesianism. For example, in this other post he explains why he's a Keynesian and not a Bayesian in terms of a "paradox" created by "the principle of insufficient reason", which is yet another way to select a prior and has precious little to do with Bayesianism.

Next, moving on to Kay and his person-hit-by-a-bus example, evidently no court should find Company A liable simply because it has more buses than Company B, but absent any other information, this is a pretty decent way to form a prior. Another one is to come up with a narrative about the person and the bus. But in either case, the court should look at further evidence and recalculate its belief that a bus from Company A actually hit the person. For example they could hear testimony from eye witnesses or look at video footage and use Bayes theorem to find the posterior probabilities, which would then enter the "balance of probabilities" to lead to a decision. A court that finds Company A liable purely based on a story without looking at evidence is just as stupid as a one that bases its decision on the number of buses from each company.

But the key passage in Kay's piece is:

Such narrative reasoning is the most effective means humans have developed of handling complex and ill-defined problems. A court can rarely establish a complete account of the probabilities of the events on which it is required to adjudicate. Similarly, an individual cannot know how career and relationships will evolve. A business must be steered into a future of unknown and unknowable dimensions. 
So while probabilistic thinking is indispensable when dealing with recurrent events or histories that repeat themselves, it often fails when we try to apply it to idiosyncratic events and open-ended problems. We cope with these situations by telling stories, and we base decisions on their persuasiveness. Not because we are stupid, but because experience has told us it is the best way to cope. That is why novels sell better than statistics texts.

so let me addressed this lest I receive hand-waving that I have not understood the criticism. There are two fundamental misunderstandings here. The first has to do with what it means to give a "complete account of the probabilities of the events" and the second is that probabilistic thinking involves some form of definite knowledge about the future, which has "unknown and unknowable dimensions".

Now if by a "complete account of the probabilities of the events" one means "to assign probabilities to the mutually exclusive alternative outcomes and that these have to add up to 1", then as we have seen this is exactly what Bayes requires. But notice that "complete account" here simply means slicing up all mutually exclusive events that one is interested in (the technical term is a partition), and this can be as simple as two events (say hit by bus from Company A or from Company B, or being employed of unemployed in Norway). It does NOT mean a complete account of all the complex and ill-defined phenomena that led a person to be hit by a bus, or found in a cafe in Oslo with a diminishing amount of money in their pocket. Once the events of interest are identified, ANY method for assigning priors is fair game. This could be a narrative, or historical data, or an agent-based model for people, buses, and firms.

Finally, about the notion that probabilistic thinking requires strong assumptions about the future, one hears often that because economics (or law, or politics, or baseball) is not ergodic, past experience is no guide to the future, and therefore there's no hope in using probability. As I said elsewhere, Bayes theorem is a way to update beliefs (expressed as probabilities) in face of new information, and as such

could not care less if the prior probabilities change because they are time-dependent, the world changed, or you were too stupid to assign them to begin with. It is only a narrow frequentist view of prediction that requires ergodicity (and a host of other assumptions like asymptotic normality of errors) to be applicable.

A few related exclamations to conclude:

Mathematical models are stories like any other!

Non-ergodicity is the friend of good modellers and story tellers alike!

So is irreducible uncertainty!

Think probabilistically! Estimate nothing!


Wednesday, August 28, 2013

Accounting identities for the Keen model

This is long overdue, but now that Steve Keen is visiting the Fields Institute again I thought I should revisit the topic of aggregate demand, income, and debt in his models. Loyal readers will recall that this was the subject of a somewhat heated exchange on this and other blogs last year. 

So here is the full balance sheet/transaction/flow of funds table for one of Keen's models, together with the implied accounting relationships, as well as my take on the "demand = income + change in debt" statement and its variants. 

As I had mentioned in one of the old posts, the point it to disaggregate the firms and households from the banking sector, so that endogenous money creation can play a significant role in the story. 

In my view, accounting identities that put the entire private sector together (i.e firms + houses + banks) somehow obfuscate the role of endogenous money and end up putting undue emphasis in the only other relevant observation, namely that private sector surplus should equal government deficit. 

Wednesday, August 21, 2013

Small-brain economics: Pilkington's strong prior against mathematics

You know you have made it in heterodox economics when someone claims that you have a plan to destroy post-Keynesianism as we know it – muhahahaha. Hyperbole aside, this is essentially what one Philip Pilkington thinks that I’m doing, as he explains in this rant, itself a spin-off of a discussion that started at the INET YSI Facebook page and went somewhat astray (the thread containing it has since been closed by the moderator of the page).

Pilkington, a journalist-cum-research assistant currently working on his dissertation at Kingston University, frames the discussion around two alleged sins that I have committed, namely, (i) not knowing what I’m talking about and (ii) mistake model for reality and make grandiose claims.

As evidence of the first sin, he offers this comment of mine from the facebook discussion (emphasis added here, you’ll see why in a second):

OK, this ergodicity nonsense gets thrown around a lot, so I should comment on it. You only need a process (time series, system, whatever) to be ergodic if you are trying to make estimates of properties of a given probability distribution based on past data. The idea is that enough observations through time (the so called time-averages) give you information about properties of the probability distribution over the sample space (so called ensemble averages). So for example you observe a stock price long enough and get better and better estimates of its moments (mean, variance, kurtosis, etc). Presumably you then use these estimates in whatever formula you came up with (Black-Scholes or whatever) to compute something else about the future (say the price of an option). The same story holds for almost all mainstream econometric models: postulate some relationship, use historical time series to estimate the parameters, plug the parameters into the relationship and spill out a prediction/forecast.

Of course none of this works if the process you are studying in non-ergodic, because the time averages will NOT be reliable estimates of the probability distribution. So the whole thing goes up in flames and people like Paul Davidson goes around repeating “non-ergodic, non-ergodic” ad infinitum. The thing is, none of this is necessary if you take a Bayes’s theorem view of prediction/forecast. You start by assigning prior probabilities to models (even models that have nothing to do with each other, like an IS/LM model and a DSGE model with their respective parameters), make predictions/forecasts based on these prior probabilities, and then update them when new information becomes available. Voila, no need for ergodicity. Bayesian statistics could not care less if the prior probabilities change because they are time-dependent, the world changed, or you were too stupid to assign them to begin with. It is only a narrow frequentist view of prediction that requires ergodicity (and a host of other assumptions like asymptotic normality of errors) to be applicable. Unfortunately, that’s what’s used by most econometricians. But it doesn’t need to be like that. My friend Chris Rogers from Cambridge has a t-shirt that illustrates this point. It says: “Estimate Nothing!”. I think I’ll order a bunch and distribute to my students.

Pilkington then goes on to say:

It is not clear that Grasselli’s approach here can be used in any meaningful way in empirical work. What we are concerned with as economists is trying to make predictions about the future.
These range from the likely effects of policy, to the moves in markets worldwide. What Grasselli is interested in here is the robustness of his model. He wants to engage in schoolyard posturing saying “my model is better than your model because it made better predictions”.

Wait, what? Exactly which part of “make predictions/forecasts based on these prior probabilities, and then update them when new information becomes available” is not clear? Never mind that I’d give a pound of my own flesh for this to be the Grasselli approach (it’s actually the Bayesian approach), the sole purpose of it is to make precise predictions and then update them based on new evidence, so it’s baffling that Pilkington has difficulties understanding how it can be used in empirical work. Not to mention the glaring contradiction of saying in one breath that
“What we are concerned with as economists is trying to make predictions about the future” and admonishing me in the next for allegedly claiming that “my model is better than your model because it made better predictions”. So you want to make predictions, but somehow don’t think that a model that makes better predictions is better than one that made worse predictions. Give me a minute to collect my brains from across the room…

Back to my comment on the “ergodicity nonsense”, the key point was that it is the frequentist approach to statistics that forces one to make estimates of priors based on past time series, and this requires a lot of assumptions, including ergodicity. In Bayesian statistics, the modeler is free (in fact encouraged) to come up with her own priors, based on a combination of past experience, theoretical understanding, and personal judgment. Fisher, the father of the frequentist approach, wanted to ban any subjectivity from statistics, advocating instead (I’m paraphrasing here) that one should “Estimate everything!”. By contrasts, Bayesians will tell you that you should estimate when you can, but supplement it with whatever else you like. To illustrate how historical estimates are not only misleading (for example when the underlying process is non-ergodic) but also unnecessary in Bayesian statistics, Chris Rogers has the mantra “Estimate nothing!”. But nowhere does it say “Predict nothing!”. On the contrary, once again, the nexus “make predictions based on probabilities--compare with reality--change the probabilities”  is what the approach is all about. So on the topic of advice for t-shirt making, Pilkington should wear one that says “I ought to read the paragraphs I quote” in front, followed by “and try to avoid self-contradictions” on the back.

Moving on, as evidence for the second sin, Pilkington quotes another long comment of mine with the “clearest explanation” (his words, but I agree!) of what I’m doing, namely:

I’m not comparing models, I’m comparing systems within the same model. Say System 1 has only one locally stable equilibrium, whereas System 2 has two (a good one and a bad one). Which one has more systemic risk? There’s your first measure. Now say for System 2 you have two sets of initial conditions: one well inside the basin of attraction for the good equilibrium (say low debt) and another very close to the boundary of the basin of attraction (say with high debt). Which set of initial conditions pose higher systemic risk? There’s your second measure. Finally, you are monitoring a parameter that is known to be associated with a bifurcation, say the size of the government response when employment is low, and the government needs to decide between two stimulus packages, one above and one below the bifurcation threshold. Which policy lead to higher systemic risk? There’s your third measure.

He then goes on to paraphrase it in a much more clumsy way (question: if someone already gave you the clearest explanation about something, why should you explain again? Just to make it worse?):

What Grasselli is doing here is creating a model in which he can simulate various scenarios to see which one produces high-risk and which will produce low-risk environments within said model. But is this really “measuring systemic risk”? I don’t think that it is. To say that it is a means to measure systemic risk would be like me saying that I found a way to measure the size of God and then when incredulous people came around to my house to see my technique they would find a computer simulation I had created of what I think to be God in which I could measure Him/Her/It.

Now I don’t know what the God example is all about, but Pilkington seems to think that the only way to measure something is to go out with an instrument (a ruler, for example) and take a measurement. The problem is that risk, almost by definition, is a property if future events, and you cannot take a measurement in the future. ALL you can do is to create a model of the future and then “measure” the risk of something within the model. As Lady Gaga would say “oh there ain’t no other way”. For example, when you drive along the Pacific Coast Highway and read a sign on the side of the road that says “the risk of forest fire today is high”, all it means is that someone has a model (based on previous data, the theory of fire propagation, simulations and judgment) that takes as inputs the measurements of observed quantities (temperature, humidity, etc) and calculates probabilities of scenarios in which a forest fire arises. As time goes by and the future turns into the present you then observe the actual occurrence of forest fires and see how well the model performs according to the accuracy of the predictions, at which point you update the model (or a combination of models) based on, you guessed it, Bayes’s theorem.

So that’s it for the accusation of mistaken model for reality. But still on the second fundamental sin according to Pilkington, recall that its second part consists of making “grandiose claims about what they have achieved or will potentially achieve that ring hollow when scrutinized”, against which his advice is to “tone down the claims they are making lest they embarrass the Post-Keynesian community at large”. This is all fine, but it sounds a bit rich coming from someone who published a piece titled Teleology and Market Equilibrium:Manifesto for a General Theory of Prices, which upon scrutiny contains some common platitudes about neoclassical denials of empirical evidence, followed by this claim:

My goal is to lay out a general theory of prices in the same way Keynes laid out a general theory of employment and output. This will provide a framework in which the neoclassical case of downward-sloping demand curves and upward-sloping supply curves is a highly unlikely special case. With such a framework we can then approach particular cases as they arise in a properly empirical manner. In doing this I hope to be able to introduce the Keynesian theory to pricing; and with that, I think, the neoclassical doctrines will be utterly destroyed and a full, coherent alternative will be available. Fingers crossed!

So instead of an actual general theory of prices, Pilkington’s “manifesto” states his goal to be as great as Keynes and utterly destroy the neoclassical doctrines. Hear the low tone!

Which brings me to Keynes’s advice on the use of mathematics that Pilkington also quotes in his rant. Again, if you read the quote carefully you see that Keynes warns against “symbolic pseudo-mathematical methods” and complains that “Too large a proportion of recent ‘mathematical’ economics are mere concoctions”. Notice the prefix pseudo and the inverted commas around the word mathematics in the original quote, which suggests that Keynes’s peeve was not with mathematics itself, but with the “imprecise…initial assumptions they rest on”. In particular, Keynes singles out methods that “expressly assume strict independence between the factors involved”, which is admittedly a very stupid assumption, but in no way necessary for the application of (true, i.e not pseudo) mathematical methods. For example, NONE of the models I work with assume independence between factors, on the contrary, they highlight the complex and surprisingly rich interdependencies, as well as their consequences.

I conclude in meta fashion with a Bayesian framing of this discussion itself. Pilkinton has a very strong prior that my mathematical methods are useless and he’s honest enough to say so: “I heard about the work Grasselli and others were doing at the Fields Institute some time ago and I was instantly skeptical”. He bases this prior on a well-documented post-Keynesian intellectual tradition, as well as a slightly misguided notion of what constitutes “giant formal models” (my models are actually pretty easy low-dimensional dynamical systems, but if you are Bart Simpsons then I guess anything looks like a giant formal model). By contrast, I have a very strong prior that my models are useful, based on similarly well-documented intellectual tradition of applications of mathematics in other areas of study. Pilkington concedes that he might be wrong and promises unreserved praise if that turns out to be true. Likewise, I might be wrong, in which case I’ll abandon the models and do something else. In either case, we’ll both be traveling the same Road to Wisdom, as in the exceptional poem by Piet Hein:

Well, it's plain
and simple to express.
Err and err and err again,
but less and less and less.

Nate Silver uses this poem as inspiration for the title of the chapter in his book explaining the Bayesian approach, which he ends with a meta statement of his own: “Bayes’s theorem predicts that Bayesians will win”.

Estimate nothing!

Wednesday, April 17, 2013

My take on Reinhart-Rogoff

I have not blogged for a while, but this is too important to ignore, mostly because for the past two years or so I have been supervising different groups of undergraduate students on a data-driven project based on reproducing and extending the results in the Reinhart-Rogoff book.

So let me start with a few observations about the book:

- the dataset used in the book is not readily available, despite what the authors say. What one can download from their website is secondary data prepared by the authors, for example the full set of crises dates for each country. On the other hand, the book does provide a more or less complete list of sources, from which we were able to download something like 95% of the primary data used by the authors (e.g from places like the IMF, World Bank, the Madison project, etc).

- the books is full of small errors, like figures that do not quite match what their caption says, or numerical results that turned out to be slightly wrong when we tried to reproduced them based on the primary data, but overall we didn't find any major errors and agreed with almost all of their conclusions. Moreover, we were able to successfully implement the signals approach described towards the end of the book for currency, banking, and stock market crisis (not mentioned in the book!).

- in other words, the book is both solid and a useful launchpad for further research, if only a little sloppy.

Now, the story is very different (pun intended!) regarding their 2010 paper. Right from the beginning I thought that the possibility of reverse causation alone was enough reason not to take the results too seriously, so I didn't even bothered to try to reproduce the results.

But alas, HAP have done the work and found that even the numerical results presented in the paper were significantly wrong, not to mention the conclusions. And the response from Carmem Reinhart is even more appalling than the article itself. Basically she claims that HAP also find a negative correlation between debt and growth, so what's the big deal?

Of course the big deal is that their original paper implied the existence of a hard threshold at 90% debt-to-GDP beyond which the slowdown in growth was very rapid, indicating some kind of nonlinear amplifying effects that would likely plunge the country into a state of crisis. But as it turns out, there's nothing special about the 90% mark, with the relationship being approximately linear all the way through, and therefore very manageable.

In any case, I wish they had never written the 2010 paper (and perhaps by now they wish the same), if only because it's probably going to drag their good and useful book (along with their reputation in general) through the well-deserved mud where they find themselves at the moment.

Monday, October 15, 2012

And now for something completely different: Ngo Bau Chau at the Fields Institute

I'm taking a break from all economics/finance/accounting-related activities (blogging included) tonight to attend the opening ceremony for the inaugural Fields Medal Symposium, which celebrates the work of a recent Fields medalist each year, right here at the Fields Institute.

This year's symposium is dedicated to Ngo Bao Chau and you can watch his public lecture from 7:00pm onwards by following the link provided in the website above (his lecture will actually start a bit later, after the ceremonial speeches by a string of dignitaries, but if you tune in at 7:00 you can catch a glimpse of yours truly, sitting beside Ngo on the second row of the theatre and looking totally starstruck).

Saturday, October 13, 2012

Of course it's a model, duh! A final post on income, expenditure, and endogenous money

Many comments on the different threads related to Ramanan's critique of my paper with Steve Keen amount to saying that if we were trying to write down some kind of model for a perceived phenomenon (in this case the role of private debt in macroeconomics), then it would be ok, but because we violated an accounting identity (or more) in the process, oh boy, we have been very very naughty indeed.

The thing is, we never claimed to be doing any accounting, let alone violating it. Accounting is about recording stuff during a given period (a year, a month, a day, but NOT an instant, since you need to wait for stuff to happen to record it) and in the only part in the paper where we mention any recording (Appendix, page 24, last paragraph of the paper) we say that "recorded expenditure and income over a finite period (t2 − t1 ), such as those found in NIPA tables, necessarily agree".

So I'll say this again in a separate line and in capital for emphasis (with some superlatives in bracket, as commenters like):


Now suppose you read income statements for an economy months after months, year after year, and wonder why recorded spending (= recorded income !!) for the different periods happen to be different. You might think it has something to do with the Mayan calendar, or with the incidence of flu during that period, or maybe that it's completely random. If you are an economist you might want to explain it with a DSGE model that ignore private debt. Heck you might even write down a regression model that includes the change in private debt in one period as an explanatory variable for the spending (= income !!) to be recorded over the next period, as one commenter suggests. Or if you are Steve Keen you write down a model using differential equations, because they happen to be tractable and cool and predict many properties that sort of look like what goes on in real life. But none of that is accounting - all of it is modelling.

Everything else we wrote in the paper was with the view of explaining why the heck recorded spending (= recorded income !!) changes from year to year. If along the way we wrote stuff down that looked like a violation of an accounting identity, then I profusely apologize for it (in fact I already bought a whip to punish myself) and pinky-promise never to do it again. So will the accounting police chill out and move on? Unless you actually care about the model, in which case please read on.

As far as the model goes, what we are trying to capture is Minsky's assertion that "for real aggregate demand to be increasing, . . . it is necessary that current spending plans, summed over all sectors, be greater than current received income and that some market technique exist by which aggregate spending in excess of aggregate anticipated income can be financed."

So our Y_E represent "current spending plans" (per unit of time) and our Y_I represent "current received income" (per unit of time). Equation 1.5 in the paper is the key behavioural assumption that links investment to change of debt, and is a schematic representation of the mechanism that both Steve and I have in the back of our minds, what I call the "Keen model" described in this paper, where investment (the rate of change in capital) is a function of current net profits, but can exceed profits in times of boom and therefore be financed by debt.

All of this is pure modelling: in reality nobody looks at a differential equation before spending. The true test of the model is to see if it predicts the right behaviour for the key variables (employment rate, wage share, output, level of private debt, etc) over time, once the parameters of the several structural equations are calibrated using historical data (which includes income and flow of funds statements over many periods).

As a final word, notice that neither Y_E nor Y_I are meant to represent recorded expenditure or income  over a period anywhere in the paper (which again, are necessarily equal !!). Both are modelling abstractions of what goes on in the economy and could include stuff like the Mayan calendar and incidence of flu, but happen to depend on the level of private debt.