A Pandemonium of Confusions: Kay and Marsh on Tiebout

In a recent issue of this journal Adrian Kay and Alex Marsh consider the literature on Charles Tiebout’s model as an example of the ‘public choice research programme’. They argue that the evidence suggests Tiebout has been falsified in favour of other models of residential mobility. They suggest that the fact there is so much literature on Tiebout shows that formal methodology does not allow models to be falsified but protects them, and imply that therefore the Tiebout model has no lessons for local public goods provision. Their entire article is suffused with confusions about almost every aspect of the topic that they discuss, and these confusions interweave and overlap in self-supporting pandemonium. In this short comment I try to bring some clarity and sense to the issue of what the Tiebout model predicts and how well those predictions are supported.

the tactics Peter John, Thanos Mergoupis and I adopted in examining the Tiebout model of efficient service provision.
Popper uses the term 'theory' somewhat casually, or at least in different senses, but his official view seems to be that a theory is any set of logically consistent universal statements that (together with singular statements which apply to a specific event 'initial conditions') purport to explain something, and an empirical theory is one from which new empirical hypotheses (or 'basic statements') can be derived. 4 The theories that are pertinent here are those purporting to explain aspects of the physical world. The empirical content of a theory is defined in terms of what it excludes. 5 The hypothesis that 'people voluntarily move house for all sorts of reasons' is a 'theory' about geographical mobility, but it does not exclude as much as the hypothesis that 'people move house only for fiscal reasons'. It does not exclude as much since the latter can be falsified by finding that there is at least one person or household who has voluntarily moved for reasons that are not fiscal. The first hypothesis might be almost unfalsifiable and in that sense not scientific. The latter theory is more falsifiable because there is more evidence that could refute it and so has more empirical content. It has more empirical content even though both theories purport to explain precisely the same physical reality -the geographical mobility of households.
If the latter theory passes its tests each time -when we examine household mobility we do indeed find each time that when people move it is always for fiscal reasons -then the theory is corroborated each time. Popper is adamant that this does not increase the probability that the theory is true. That is because he believes that induction cannot be justified, and each discovery that a household moves for fiscal reasons cannot logically (deductively) demonstrate that all households do (or, for example, that all households in the future will always move for fiscal reasons). 6 Nevertheless, each test corroborates the theory. However, corroboration is not the same as verification. What is important here for Popper is that any finding that is logically derivable from an hypothesis confirms it. Thus discovering that cars do not geographically move for fiscal reasons confirms that households do, since it follows from all households moving for fiscal reasons that anything not moving for fiscal reasons is a not a household. 7 However, examining why cars move will never falsify a theory about fiscal geographical mobility. Hence such findings cannot corroborate the theory either. In that sense, falsifiability and corroboration are converse concepts. In another sense they are not. Each test which a theory passes corroborates it, only one need falsify it, but the degree of corroboration is not straightforwardly increased each time a theory passes a falsifiability test. Some theories have more empirical content because they are more falsifiable even though they purport to explain precisely the same physical events, but also because some purport to explain a greater number of physical events. More general theories have greater empirical content than less general ones. Theories purporting to explain all human behaviour are more general than ones purporting to explain only geographical household mobility, although any theories about the latter might be derived from the former, more general theory. 8 However, Popper does say that passing tests corroborates theories, and increasing corroboration does increase their similitude or 'truthlikeness'. He says we can calculate this degree of corroboration or similitude using the probability calculus, but the degree is a 'probability but not in the sense of the probability calculus'. 9 Looking at his formula for degree of corroboration across two theories, it seems that the relative degree of corroboration is a relationship to how much a theory can be falsified and the degree to which it has passed those tests. 10 In relative terms corroboration is the odds a theory is correct, relative to how much it could be falsified. However, he is adamant that this is not the (relative) probability that a theory is true. Indeed he repeatedly claims that the more falsfiable a theory, the more likely it is to be actually false, and as a consequence, as our scientific knowledge grows, the more probable it is that it is false. 11 (If we do not think we know much we are less likely, overall, to be wrong.) The greater the corroboration of a theory, the greater its similitude: we get closer to the truth the more our (non-falsified) theories explain. 12 Kay and Marsh are sceptical about economic theory and use the Tiebout model as an example of how economics refuses to accept that theories are falsified by the evidence. They utilise some arguments from Imre Lakatos to suggest that theories can always be saved by ad hoc assumptions, or what Lakatos call 'auxiliary hypotheses'. Lakatos's favourite example is drawn from astronomy, such that adding extra circles can always explain the Ptolemaec planetary system, or assuming that there must be another planet saved its Newtonian replacement when careful astronomic measurements of the known planets did not fit predictions drawn from Newton's theory.
We might point out that the two cases are different in important ways. Adding extra circles saves Ptelomaeous, but only metaphysically. It does not provide a means which is open to empirical refutation if we allow circles to continually be added to the model, since the circles themselves are not empirically observable. However, in the case of Newton the latter provides a falsfiable prediction -one which could be, and was, corroborated. We might always find that Newton does not quite get the planetary orbits correct, but it remains a scientific theory while we can look for other planets or material orbiting the sun. We should stick with it, according to Popper, until a better theory comes along -and one did, of course. 13 We might say of economic theory that, to the extent new assumptions do not produce new predictions, then economics resembles the Ptolemaec ad hocery and, to the extent it adds new predictions, it provides scientific explanation. I think we can probably choose examples from economics that fit one or the other. 14 In other words, Popper's demarcation criterion cuts through different economic models. What we learn from the Newtonian account is that we do not discard a theory simply because some predictions derived from it have been falsified. Importantly, however, it means that we modify theories (or what I would prefer to call models) even as the evidence comes in. We might also add that we should not discard economics despite some suspicious ad hocery in some examples, at least not until we have a better method -that is, one that has greater corroboration. I do not see that method yet.
It also follows from these reflections that we can modify a model, either by some internal change in its formal framework or through some new empirical predictions, thus saving the original formal framework intact. 15 Lakatos and others might see something fishy about such an enterprise. I do not. I believe that is Debate the way science progresses and how it should progress. We modify models as we test them and we need to examine each modification on its own merits and in relation to the merits of rival models. Core elements that are 'saved' according to Lakatos might just be those elements that are most supported, or support more -that is, have the greatest empirical content. Now remember models that are contained within broader theories will have lower empirical content. Thus a model from which we draw the hypothesis H1: 'households move in response to fiscal conditions' might be part of a broader theory from which we can draw the broader hypothesis H0: 'consumers respond to price signals'. If it were shown that consumers do not respond to price signals, then even without any direct evidence about fiscal mobility we are less likely to believe the narrower model, but the fact that H1 is falsified by some direct test does not necessarily falsify H0. In that sense H0 is closer to the 'core'. Households might not move in response to fiscal conditions because they turn out not to be price signals, for example. Why it is claimed they are not price signals might affect our attitudes to the theory. If we simply redefine 'price signal' to exclude local tax-service packages, for example, then a dose of Lakatosian scepticism about the scientific veracity of the claim might be in order. If they turn out not to be price signals because fiscal conditions are simply too noisy -that is, households are shown not to have information about fiscal conditions so cannot respond to themthen we can react to Lakatos in good teen fashion: 'so what's your problem?'. 16 Moreover, in that later response we generate some new predictions, namely, that consumers react to price signals in direct relationship to how noisy they are. In order to test these new predictions we need to measure 'noise' in a different way from how we measure 'consumer reaction', and that might not be easy, but I believe such problems we face in devising empirical tests constitute the major part of the fun and beauty of social scientific research.
A further reason we need to be careful when claiming that a model is falsified occurs as a result of the nature of economic models. Equilibrium analysis, for example, predicts a specific outcome. Generally in the social sciences we use such equilibrium analyses to find the forces that operate in the direction we predict, though we do not expect the equilibrium ever to be reached because other forces also act. This is a problem for social scientific analysis. Unlike the natural sciences, there are few point predictions in the social sciences. 17 It is because we have few point predictions in the social sciences that we only ape the natural sciences and remain the poor partner. However, it behoves those who criticise the social sciences in this manner to lay out the new paradigm for how we are to achieve point predictions. Until we are in a position to start doing that, we have to make do with the models and statistical analyses we have at our disposal. To that extent, finding forces that are predicted in a given equilibrium analysis is enough for us to say that the model is corroborated. Where Kay and Marsh are correct is that such tests do not allow us to choose between models which produce the same predictions or hypotheses about those forces. However, I am not sure what this good point is supposed to tell us about the veracity of the 'Tiebout model', since I am not sure what other models Kay and Marsh have in mind that our evidence cannot help to distinguish from Tiebout. Too many comments that Kay and Marsh make show they do not understand Popper's falsifiability account. For example, there is nothing wrong with suggesting that conclusions are tentative and disputable and yet the array of evidence suggests the Tiebout model contains important truths for urban politics. 18 One reason is that the degree of explicit corroboration given by testing a particular model might be quite low, but is still higher than any other single rival. There might not be much evidence for a model, but what there is tends to corroborate it, and we have to remember that sometimes a test might seem to falsify a model, but it might be that the test is flawed -perhaps the statistical model is weak, or confounding factors have not been taken into account, or there is some query about the quality of the data used, and so on. Indeed, a great deal of dispute in the empirical social sciences is about the quality of the data and the methods used to analyse the data. Popper suggests we should back theories that explain most, even when some of their predictions have apparently been falsified. We remain sceptical about them, but we back them, unless there is a rival and better theory -better in the sense of being better corroborated (explains more and has failed fewer tests). In relation to the specific quotation from Dowding et al., however, what we meant is that for each study purporting to test Tiebout the conclusions must remain tentative and disputable -because of methodological problems and the fact that often studies only find consistent not truly corroborative evidence -but that overall the 'Tiebout family of models' provides an array of evidence that urban scholars should note. I stand by that statement. 19 Again, contrary to Kay and Marsh's claim, Popper has never argued that we eliminate theories 'that are demonstrably false and choose between the remaining, unfalsified theories'. 20 He claims, time and again, that we remain sceptical about all theories but we back those that seem to explain most even if some hypotheses have been drawn from them have been falsified. 21 (Their verisimilitude is higher, even if we know, overall, that they are false.) A more general reason for suspecting Kay and Marsh do not fully grasp Popper's falsifibility criterion is their discussion of the evidence for the Tiebout model to which we now turn. Indeed we might think from their evidence that the Tiebout model has been falsified and that better corroborated models exist, which are preferable to it. However, as we shall see, that is not the case.

Tiebout: the evidence
Rather than leaving Popper behind, I will begin this section with one final comment about Popper's account of scientific explanation: he argues that scientific theories are developed in order to provide solutions to problems. So, when examining a scientific theory one must keep in mind the problem the theory was designed to solve. 22 The Tiebout model was designed to solve the problem of efficient service delivery. Paul A. Samuelson suggests that there is market failure for collective goods and so government needs to intervene. 23 However, how could the government decide what was an efficient level of service? Voting is a crude measure since we vote for parties or candidates over a whole range of issues; surveys can (and do) help, but individuals do not always have a good judgement when simply asked questions (they might suffer from 'fiscal illusion') Debate and they might have reasons for giving misleading answers. Tiebout suggests a solution for local collective goods. If in metropolitan areas there are competing local governments, they could provide different packages of taxes and services and households could locate in different jurisdictions, thus providing a signal over the packages they desired. Thus 'voting with the feet' would provide market signals to local governments.
Thus when considering whether the Tiebout model has been falsified, we must remember of what it is a model. 24 Kay and Marsh suggest that since the residential mobility literature from Peter H. Rossi's work onwards does not find fiscal mobility an important factor in geographical mobility, we should drop the Tiebout model in favour of Rossi's or other residential mobility models. 25 They say 'formal models should not enjoy a privileged status simply because they are part of a research programme . . . and considered as one explanation of residential mobility to be compared with other alternatives in the literature', 26 but Tiebout and Rossi are not alternatives! The Tiebout model is not a model of residential mobility! It is a model of efficient service provision. No-one using Tiebout thinks it explains why people move. Rather, they argue that if fiscal factors impinge on moving decisions, those movements can provide signals about services to local authorities.
There are three factors to be considered in considering the Tiebout model in relation to efficient local services. First, does fiscal mobility occur? Second, does it occur in a manner that can provide signals to local government? Third, could it occur in a manner that can provide signals to local government? With colleagues I have argued that the first exists. I believe it is possible that that the third could exist, but am sceptical about the second in the major conurbations of the UK. Kay and Marsh's inability to recognise this simple argument leads them to misunderstand and thus misrepresent the work they discuss. Let us first consider the first part.
In our 1994 survey article, we argue that there is evidence that is consistent with fiscal mobility. 27 However, at least some of that evidence is consistent with rival explanations of the revealed preference evidence. We then conducted surveys to see if people explicitly take into account fiscal factors in their moving decisions. 28 We were sceptical that they would take into account fiscal factors and so had a research design under the best conditions for people to take into account such factors. We reasoned that if fiscal factors do not enter into mobility decisions in our four London boroughs with varying reputations for good services and large differences in the high-profile poll tax, then people would not take them into account anywhere else. In other words, if there was a lot of information about fiscal factors, but people did not take those factors into account when moving, then Tiebout was falsified, and not simply because the information was not available.
However, we found substantial evidence that people claimed to take into account fiscal factors in their moving decisions. Twenty per cent of people claimed tax was very important in their locational decisions, with a further 20 per cent claiming it was fairly important. Furthermore, these movers were Tiebout-rational -that is, they were moving to jurisdictions with better taxservice reputations. Over 70 per cent of people who had moved from high poll tax areas to low poll tax areas, or between low poll tax boroughs, claimed tax was important in their locational decisions. Similar figures were found between boroughs with a good reputation for local services. We suggested that these numbers were high enough to send the relevant signals to local authorities. There are no models, as far as I can establish, as to what level of consumers need to be informed to in order to generate market efficiency. 29 I imagine that the figure would depend heavily on the conditions themselves. A relatively small number might generate an efficient market if there was herd behaviour and the herd followed those who have and process the information. A larger number would be required to send signals if most people chose products randomly, and how efficient the market would be would depend upon the degree of competition between producers (how many there are and the chances for takeovers). However, with a large number of people behaving randomly the market might only work for the informed consumers with a poor market for the rest. Again, however, one might expect that if the informed consumers did better, then that information would seep through the market and so the others would follow. Another condition would be time -how long it would take for the non-informed consumers to follow the informed ones.
Note that the discussion over how many fiscal movers are 'enough' is not concerned with how many fiscal movers are required in order to demonstrate that there are indeed fiscal movers. If you find X per cent of people take into account fiscal factors in their moving decisions then you have found X per cent, and that is that. If you have a regression and find Y per cent of the variance in the dependent variable ('where move to') is explained by fiscal factors, then you have found Y per cent of the variance is explained by fiscal factors. It is not a question of 'falsifying' Tiebout and 'corroborating' life-cycle accounts if you find the other factors in Rossi's regression explain more of the variance in the dependent variable. That is simply not the point of such regressions. Fiscal mobility occurs, and the reason why Rossi and much of the subsequent residential mobility literature do not find it is because they do not ask questions about fiscal factors. You will find that if you do not include a variable on the right-hand side of an equation, you will not find that variable explaining variation on the left-hand side! 30 The third element is the interesting aspect for the Tiebout model properly understood. How many fiscal movers would you get if the conditions were as ideal as practically possible to fit with his conditions? Would that be enough to generate a market? Finally, would that market provide enough efficiency to prefer that way of organising local government than more centralised systems? I do not believe there is an easy answer to those questions and thus it is worth continuing to debate, but I side with larger jurisdictions for reasons given below. Now I want to include some self-criticism here. At the time we ran our original surveys, I believed that it was necessary to demonstrate conscious fiscal moving in order to underpin aggregate-data evidence of fiscal moving. 31 I no longer think so. If we find aggregate data evidence of fiscal moving, then there is fiscal migration even if people are unaware that they are taking into account fiscal factors. In later work with Thanos Mergoupis, we were able to tie objective factors (such as acreage of parks) to subjective evaluation of satisfaction, even though these Debate objective factors were not mentioned by survey respondents (though they might mention items such as 'nice neighbourhood'). 32 We believe we can predict satisfaction level from objective factors better than the respondents could themselves, since we have a better judgement of the basis of their preference. I now believe stated-preference evidence is best used to interpret rather than underpin revealed-preference evidence.
Because of their confusion over what the Tiebout model actually is, and therefore our arguments about it, Kay and Marsh are confused over the some of the conclusions reached by Dowding and Mergoupis. Using both revealed-and stated-preference evidence we conclude that there is some fiscal mobility in the UK, but that same evidence leads us to suggest that people would derive greater satisfaction if the average size of jurisdiction was larger. That latter claim does not in any sense affect the evidence that fiscal mobility occurs in the UK (though its effects are minor compared to other factors affecting geographical mobility); nor the claim that under other environmental conditions fiscal mobility might occur much more. It does suggest that those other environmental conditions might make people worse off because there are other benefits to larger jurisdictions that far outweigh any benefits from noisy price signals that fiscal mobility currently engenders. However, this does not 'reverse' the 'standard Tiebout logic', as Kay and Marsh suggest, 33 any more than sitting on a log and paddling it upstream 'reverses' the logic that a log will float downstream. It merely shows that other factors might overwhelm the effects hypothesised in Tiebout. 34 Should we Tieboutise the local state?
Under Tiebout's pure assumptions, all geographical mobility would signal satisfaction with services. As the assumptions are relaxed, any such signal will become noisy. That is all we need to get from the model. The normative constitutional question is could we ever design our local state so that the signal would be clean enough to provide signals that local authorities could respond to. There is not an obvious answer to this question. Our evidence from four London boroughs during the poll tax era suggests that it might. However, that on its own is hardly enough to recommend reorganising local jurisdictions and tax-service provision! There are other issues that might lead us not to want to design a local state along Tiebout lines. It is for that reason that Tiebout studies will continue.
Suggesting that consumers respond to market signals is one thing. To suggest, as the first theorem of welfare economics proclaims, that this is efficient is something else, and to proclaim that we should celebrate such efficiency (as a good thing) is a third claim. Joseph Stiglitz has long argued that small imperfections in information transmission can have large effects on whether a market is Pareto efficient. 35 We cannot be assured that any market is efficient. We tend to assume that those markets where information is most transferable (for simple private goods) are most efficient and those where information is most problematic (complex private and collective goods) are least. Even the simplest private goods markets attract government regulation, as even here information acquisition can be problematic if only because of the human (indeed primate) propensity to hide and provide false information for individual advantage, and information can be easily misunderstood. British parents pore over the scores that schools receive in the league tables, blissfully unaware that apart from in, maybe, the top 10 and perhaps bottom 15 per cent relative to the other 75 per cent, all the differences noted in those scores have no statistical significance whatsoever.
Should efficiency always be applauded? There is a large body of ethical literature suggesting otherwise. When the rich and poor exchange together in a perfect market they might both gain, but the rich will remain rich and the poor will remain (relatively) poor. If that were not so, the market would not be efficient. Egalitarians will argue that this shows efficiency is not to be simply applauded. 36 Thus, even if Tiebout were corroborated to the satisfaction of Kay and Marsh, it would still need to be argued that we should design our institutions in order to take account of those signals. (I have maintained that the attempt to do so in Britain in the 1980s failed miserably. 37 ) Tieboutising the state would be likely to lead to a situation where the rich congregate in some jurisdictions because they have less need of services and so avoid local tax to pay for them. The poor congregate where services are provided but have to pay higher taxes. 38 In other words, in a Tiebout world local taxation is likely to be highly regressive. As an egalitarian I do not like that. 39 Should we forget all about Tiebout?
The motivation for Kay and Marsh writing about Tiebout is an attack upon rational choice theory. They see the Tiebout model as part of the 'protective belt' of the theory. I have criticised their understanding of Popper and, as someone who finds Lakatos's work woefully under-contended, I find it odd I should criticise their understanding there too. However, the Tiebout model could not possibly be part of the protective belt of the rational choice research programme. In Lakatos' work (as far as I can establish, though his writings are less than perfectly clear), the core is made up of theoretical and conceptual apparatus. 40 It is the formal framework that produces empirical hypotheses. The protective belt are 'auxiliary hypotheses' that are brought in to save the hard core when it the hypotheses drawn from it are falsified. Thus measurement of the known planets falsified the predictions from the Newtonian hard core, but Newtonian theory was 'saved' by the auxiliary hypothesis that there is another planet. How does this work with economics? How is the Tiebout model an auxiliary hypothesis brought in to save the core elements of rational choice theory? What are these core elements exactly, the rationality assumptions? Utility maximisation might be there together with (a weak) interpretation of a self-interest assumption, so perhaps would the hypothesis that consumers respond to price signals. How does the Tiebout model protect these?
A closer analogy to Lakatos would be that a formal version of Tiebout would be part of the hard core (as part of the broad theory of economics, perhaps using the rationality assumption and utility maximisation) and the auxiliary hypothesis saving it might be that people do not react to fiscal conditions because of lack of information, or that local governments do not efficiently react to household movement because those price signals are noisy. These sorts of move make the Tiebout framework look rockier. As I argued above, the Tiebout model itself could be dropped as a solution to efficient service provision if the problem was that the price signals are too noisy to be interpreted, without threatening the Debate rational choice edifice at all. 41 Falsifying Tiebout simply is not a problem for rational choice theory. Indeed many of those who fight over what we can conclude from Tiebout are all academics who work well within the economic methodological paradigm. The reason they continue to study Tiebout is that demonstrating that such signals could never be clear enough is really difficult -falsifying the hypothesis that 'fiscal mobility can never provide signals to local governments about their tax-service policies' is hard, and how that interacts with the supply side even harder. That is why Tiebout will continue to be written about, not because dropping it threatens the rational choice edifice.
Kay and Marsh think too much is written about Tiebout, but, to paraphrase Deirdre McClosky, how much is too much? Despite their pretence at a Popperian approach Kay and Marsh do not compare writing on Tiebout with writing on other models in urban politics. I have done a casual search on Google Scholar on a few urban theories I could easily name by a simple title. I inputted them into Google Scholar by the phrase on the left-hand side of Table 1, and we see the number of hits on the right-hand side. Now, of course, 'hits' does not equate to articles about the subject, simply to references. 42 Of course, in some cases, such as 'World Cities', we cannot be sure that we are getting references to the general theory that there is something special to be explained in global terms about world cities. However, for other entries, such as first five listed in Table 1, the number of hits should be reasonably comparable. The first thing that strikes me about the order is that with the exception of 'post-Fordism' -a theory that arose and fell in a short time (around a decade) -the more references there are, the less empirically examinable are the ideas. Items 2 and 3 are formally specifiable and there is a host of quantitative as well as qualitative discussion. Advocacy coalitions are less formalised, but there are numerous quantitative analyses based on the Sabatier models. Urban regimes have not been formalised and quantitative evidence is lacking. 43 There are formal network theories, but the vast number is based on casual theorising and qualitative evidence. Similarly, there are formal articles on power, and these might mention community power, but the vast number here is non-formal theory and qualitative research.
I do not want to comment on whether there is too much written on any of these subjects. I have contributed articles and books on all but World Cities and

See
The Logic of Scientific Discovery, pp. 59-60, 84-6. He uses hypothesis and theory interchangeably, but also hypothesis and prediction interchangeably (a prediction is the effect, the initial conditions the cause, but initial conditions together with universal statements ('theories') give us hypotheses. He also says predictions are derived from theories (p. 33). A theory does not have to be formalised, and certainly he considers that we have folk theories, that still deserve the name 'theory' and that from birth we observe the world theoretically (pp. 39, 423). See also Karl Popper, Objective Knowledge: An Evolutionary Approach (Clarendon Press, 1972), ch. 7. There is no such thing as theory-free observation (see footnote 6 below). 5. He discusses this in many places, but it is most succinctly and explicitly discussed in Conjectures and Refutations, pp. 217-20, 385 -7. 6. Popper's 'solution' to the problem of induction is that there is no solution, and hence we do not, as a matter of fact, use induction. Rather, he argues, we always have theories about the way the world is, and they are corroborated or falsified as we observe the world. Thus we have a theory about the sun rising each morning, corroborated as it rises each morning and falsified when it does not (for example in the Artic Winter). Sophisticated theories will tell us why the sun does not rise each morning in the Artic and Antartic Winters, but does everywhere else. Popper sees this as a major reason for preferring his approach; Bayesian inference does not suffer from Hempel's paradox. 8. Does a theory that is more general but less falsifiable have more or less content than a less general one which is more falsifiable? According to Popper, it has less content. So an unfalsifiable theory about all human behaviour is less general than a falsifiable one purporting to explain only geographical mobility (in which case the