Does linguistic explanation presuppose linguistic description?

I argue that the following two assumptions are incorrect: (i) The properties of the innate Universal Grammar can be discovered by comparing language systems, and (ii) functional explanation of language structure presupposes a “correct”, i.e. cognitively realistic, description. Thus, there are two ways in which linguistic explanation does not presuppose linguistic description. The generative program of building cross-linguistic generalizations into the hypothesized Universal Grammar cannot succeed because the actually observed generalizations are typically one-way implications or implicational scales, and because they typically have exceptions. The cross-linguistic generalizations are much more plausibly due to functional factors. I distinguish sharply between “phenomenological description” (which makes no claims about mental reality) and “cognitively realistic description”, and I show that for functional explanation, phenomenological description is sufficient.


Introduction
Although it may seem obvious that linguistic explanation necessarily presupposes linguistic description, I will argue in this paper that there are two important respects in which this is not the case. Of course, some kind of description is an indispensable prerequisite for any kind of explanation, but there are different kinds of description and different kinds of explanation. My point here is that for two pairs of kinds of description and explanation, it is not the case, contrary to widespread assumptions among linguists, that the latter presupposes the former.
Specifically, I will claim that i. linguistic explanation that appeals to the genetically fixed ("innate") languagespecific properties of the human cognitive system (often referred to as "Universal Grammar") does not presuppose any kind of thorough, systematic description of human language; and that ii. linguistic explanation that appeals to the regularities of language use ("functional explanation") does not presuppose a description that is intended to be cognitively real.
These are two rather different claims which are held together only at a rather abstract level. However, both are perhaps equally surprising for many linguists, so I treat them together here. Before getting to these two claims in §3 and §4, I will discuss what I see as some of the main goals of theoretical linguistic research, comparing them with analogous research goals in biology and chemistry.
2. Goals of theoretical linguistics I take "theoretical linguistics" as being opposed to "applied linguistics" (cf. Lyons 1981: 35), so that all kinds of non-applied linguistics fall in its scope, including language-particular description. 1 There are many different goals pursued by theoretical linguists, e.g. understanding the process of language acquisition, or understanding the spread of linguistic innovations through a community. Here I want to focus just on the goals of what is sometimes called "core linguistics". I distinguish four different goals in this area: i. language-particular phenomenological description, resulting in (fragments of) descriptive grammars; ii. language-particular cognitively realistic description, resulting in "cognitive grammars" (or "generative grammars"); iii. description of the "cognitive code" for language, i.e. the elements of the human cognitive apparatus that are involved in building up (= acquiring) a cognitive grammar (the cognitive code is also called "Universal Grammar"); iv. explanation of restrictions on attested grammatical systems, i.e. the explanation of grammatical universals.
The difference between the first two goals is that while descriptive grammars claim to present a complete account of the grammatical regularities, only cognitive grammars claim to mirror the mental grammars internalized by speakers. 2 This more ambitious goal of formulating cognitively realistic descriptions is shared both by Chomskyan generative linguists and by linguists of the Cognitive Linguistics school. In practice the main differences between the two kinds of description are (i) that descriptive grammars tend to use widely understood concepts and terms, while cognitive/generative grammars tend to use highly specific terminology and notation, and (ii) that descriptive grammars are often content with formulating rules that speakers must possess, while cognitive/generative grammars often try to go beyond these and formulate more general, more abstract rules that are then attributed to speakers' knowledge of their language. 3 As a simple example of (ii), consider the Present-Tense inflection of three Latin verb classes (only the singular forms are given here): (1) a-conjugation e-conjugation Ø-conjugation Any complete descriptive grammar must minimally contain these three patterns, because they represent productive patterns in Latin. However, linguists immediately see the similarities between the three inflection classes, and a typical generative or cognitive grammar will try to relate them to each other, e.g. by saying that the abstract stems are laudā-, habē-, and ag-, that the suffixes are -ō, -s and -t, and that morphophonological rules delete ā and shorten ē before ō, shorten both ā and ē before -t, and insert i between g and s/t. It seems to be a widespread assumption that speakers extract as many generalizations from the data as they can detect, and that linguists should follow them in formulating their hypotheses about mental grammars. (This assumption will be questioned below, §4.4.) The third goal, what I call "description of the cognitive code for language", is at first sight the most controversial one among linguists. While this goal (often called "characterization of the nature of Universal Grammar") is seen as the central goal of theoretical linguistics by Chomskyan generative grammarians, many non-Chomskyans deny that there are any grammar-specific components of the human cognitive apparatus (e.g. Tomasello 1995; cf. also Fischer, this special issue). However, it is clear that the nature of human cognition is relevant for our hypotheses about cognitive grammars, so the notion "cognitive code" can be understood more widely as referring to those properties of human cognition that make grammar possible (see Wunderlich, this special issue, §1, for a very similar characterization of Universal Grammar). The traditional Chomskyan view is that the cognitive code in this sense is domain-specific, while non-Chomskyans prefer to see it as domain-general, being responsible also for non-linguistic cognitive capabilities.
The fourth goal has also given rise to major controversies, because very different proposals have been advanced for explaining grammatical universals. Generative linguists have often argued that the explanation for universals can be derived directly from hypotheses about the cognitive code (= Universal Grammar), and that conversely empirical observations about universals can help constrain hypotheses about the cognitive code. By contrast, typologically oriented functionalists have argued that grammatical universals can be explained on the basis of properties of language use. This issue will be the main focus of §3.
In view of these controversies in linguistics, it seems useful to compare the four major goals of theoretical linguistics to analogous goals in other sciencies. Some of my arguments below (see especially §3.3) will be based on analogies with other disciplines. I restrict myself to biology and chemistry. Parallels between linguistics and biology have often been drawn at least since August Schleicher (cf. recently Lass 1997;Haspelmath 1999b;Nettle 1999;Croft 2000), and the parallel with chemistry is due to Baker (2001). The parallels are summarized in Table 1. The unit of analysis that is compared to a language (or grammar) is a species in biology and a compound in chemistry. The first column of the table contains an abstract characterization of the goals.
Parallel to descriptive grammars in linguistics, we have zoological and botanical descriptions in biology and phenomenological description of a chemical compound. The latter is not a prestigious part of the theoretical chemist's job (though it is crucial in applied chemistry), but at least in traditional biology the phenotypical description of a newly discovered plant or animal species was considered an important task of the field biologist, making field biology and field linguistics quite parallel (the difference being that linguists cannot easily deposit a specimen in a museum). 4 At a higher level of abstraction, chemists are interested in the molecular structure of compounds that is ultimately responsible for its phenomenological properties, and biologists are interested in the genome of a species that gives rise to the phenotype in a process of ontogenetic development. Similarly, linguists would like to know what the mental reality is behind the grammatical patterns that can be observed in speakers' utterances. Cognitive grammars "underlie" speech in much the same way as the genome "underlies" an organism and the molecule "underlies" a chemical compound.
Next, all three disciplines are interested in the basic building blocks that are used by the underlying system: atoms making up molecules, the genetic code for the genome of a species, and the "cognitive code" for a mental grammar. The basic building blocks put certain restrictions on possible underlying systems: There are only a little over 100 different types of atoms which can combine in limited ways, thus constraining the kinds of possible molecules (and thus compounds); there are only four different "letters" of the genetic "alphabet" (or twenty different amino acids coded by them), thus constraining the kinds of possible genomes (and hence species); and presumably the cognitive code also shows its limitations, thus constraining the possible kinds of mental grammars (and thus languages). Now if we want to explain the properties of the basic building blocks, we have to move to a different scientific discipline: Chemists have to go to nuclear physics to learn about the nature of atoms, biologists have to go to biochemistry to learn about the nature of DNA, and cognitive scientists have to go to biology (neurology, genetics) to learn about the nature of the cognitive apparatus.
However, there is also a different mode of deeper explanation, both in biology and in linguistics. Biologists explain the properties of organisms by an evolutionary process of adaptation to the environment, and similarly linguists can explain many properties of grammars through a diachronic process of functional adaptation (Haspelmath 1999b;Nettle 1999). Biological organisms live in many different kinds of environments, and their diversity is in part explained in this way. Grammatical systems, by contrast, "live" in very similar kinds of environments; human "needs" for grammar are largely invariant across populations, and cultural differences have only a limited impact on grammars (e.g. in the area of polite pronouns). Thus, functional explanations are mostly confined to universal properties of grammars in linguistics, but otherwise the similarities between evolutionary explanation in biology and functional explanation in linguistics are very strong. (I am not aware of an analogy to evolutionary/functional explanation in chemistry. ) We are now ready to discuss the two major controversial claims of this paper.

The search for Universal Grammar does not presuppose linguistic description
For Chomsykan generative linguistics, the characterization of the cognitive code ("Universal Grammar") is the ultimate explanatory goal. The general consensus seems to be that Universal Grammar is explanatory in two different ways: On the one hand, UG explains observed universals of grammatical structure: The next task is to explain why the facts are the way they are, facts of the sort we have reviewed, for example [e.g. binding phenomena, M. H.]. This task of explanation leads to inquiry into the language faculty. A theory of the language faculty is sometimes called universal grammar… Universal grammar provides a genuine explanation of observed phenomena. From its principles we can deduce that the phenomena must be of a certain character, given the initial data that the language faculty used to achieve its current state. (Chomsky 1988: 61-62) On the other hand, UG explains the fact that language acquisition is possible, despite the "poverty of the stimulus". The above quote continues as follows: To the extent that we can construct a theory of universal grammar, we have a solution to Plato's problem [i.e. the question how we can know so much despite the poverty of our evidence, M. H.] in this domain. (Chomsky 1988: 62) Chomsky's choice of a grand term like "Plato's problem" suggests that he regards this second explanatory role of UG as more important. Hoekstra and Kooij (1988: 45) are quite explicit about this: [T]he explanation of so-called language universals constitutes only a derivative goal of generative theory. The primary explanandum is the uniformity of acquisition of a rich and structured grammar on the basis of varied, degenerate, random and non-structured experience… This situation contrasts sharply with the one found in [functionalist theories]. The explananda for these theories are the language universals themselves. (1988: 45) Thus, UG is conceived of as a very important type of explanation in Chomskyan linguistics. In this section I argue that UG cannot be discovered on the basis of linguistic description (either cross-linguistic or language-particular), and that it cannot serve as an explanans for observed universals of language structure.

From comparative grammar to Universal Grammar?
Now how do we arrive at hypotheses about the nature of UG? Haegeman (1994: 18) summarizes a view that was widespread in the 1980s and 1990s (and is perhaps still widespread): [B]y simply looking at English and only that, the generative linguist cannot hope to achieve his goal [of formulating the principles and parameters of UG]. All he can do is write a grammar of English that is observationally and descriptively adequate but he will not be able to provide a model of the knowledge of the native speaker and how it is attained. The generativist will have to compare English with other languages to discover to what extent the properties he has identified are universal and to what extent they are language-specific choices determined by universal grammar … Work in generative linguistics is therefore by definition comparative.
Generative work in the comparative-grammar tradition arrives at hypotheses about UG by examining a range of phenomena both within and across languages, formulating higher-level language-internal and cross-linguistic generalizations, and then building these generalizations into the model of UG. That is, the nature of UG is claimed to be such that the generalizations fall out automatically from the innate cognitive code. When a situation is encountered where some non-occurring structures could just as easily be described by the current descriptive framework (= the current view of UG) as the occurring structures, this is taken as indication that the descriptive framework is too powerful and needs to be made more restrictive. In this sense, one can say that description and explanation coincide in generative linguistics (whereas they are sharply distinguished in functional linguistics; cf. Dryer 1999, forthcoming). Let us look at a few simple examples.

Syntax: The X-bar schema
In the generative framework of the 1960s, it was theoretically possible not only to have phrase-structure rules such as (2a-c) which actually occur, but also rules such as (2d-e), which apparently do not occur in any language.
(2 To make the framework more restrictive, Chomsky (1970) and Jackendoff (1977) proposed that Universal Grammar includes an X-bar schema (such as "XP AE Y [ x¢ X ZP]") which restricts the possible phrase structures to those which consist of a head X plus a complement ZP and a specifier Y. The fact that only structures like (2a-c) occur now falls out from the theory of UG. Moreover, the X-bar schema captures the behavioral parallels between the projections of different categories (e.g. America invaded Iraq and America's invasion of Iraq), and it may allow us to derive some of the best-known word-order universals of Greenberg (1963): We assume that ordering relations are determined by a few parameter settings. Thus in English, a right-branching language, all heads precede their complements, while in Japanese, a left-branching language, all heads follow their complements; the order is determined by one setting of the head parameter. (Chomsky and Lasnik 1993: 518) 3.1.2 Morphology: Lexicon and syntax as two separate components Greenberg (1963, universal 28) had observed that derivational affixes always come between the root and inflectional affixes when both inflection and derivation occurs on the same side of the root. Anderson (1992) proposed a model of the architecture of Universal Grammar from which this generalization falls out: If the lexicon and syntax are two separate components of grammar, and derivation is part of the lexicon, while inflection is part of the syntax, and if rules of the syntactic component, applying after lexical rules, can only add material peripherally, then Greenberg's generalization follows from the model of UG. Chomsky and Halle (1968: Ch. 9) had observed that the machinery used throughout their book on English phonology could also be used to describe all kinds of nonoccurring or highly unusual phonological patterns. They felt that they were therefore missing significant generalizations and proposed a markedness theory as part of UG to complement their earlier proposals. A more recent and more successful version of this markedness theory is the markedness constraints of Optimality Theory. For example, Kager (1999: 40-43) discusses the phenomenon of final devoicing, as found in Dutch, where the underlying form /bed/ 'bed' is pronounced [bet]. This could be described by a 1960s-style rule "[obstruent] AE [−voice] / __$" (= an obstruent is unvoiced in syllable coda position), but that framework would also allow formulating a non-occurring rule like "[obstruent] AE [−voice] / $__" (= an obstruent is unvoiced in syllable onset position). In Optimality Theory, a markedness constraint *Voiced-Coda is proposed, which may be ranked below the faithfulness constraint Ident-IO(voice) (which favors the preservation of underlying voice contrasts), as in English, where /bed/ surfaces as [bed]. Alternatively, *Voiced-Coda may be ranked higher than Ident-IO(voice), so that a Dutch-type language results. The impossibility of a language with only initial devoicing follows from the fact that there is no constraint *Voiced-Onset in the model of UG. In this way, OT's descriptive apparatus simultaneously explains crosslinguistic generalizations.

Phonology: Innate markedness constraints of Optimality Theory
I believe that none of these proposals are promising hypotheses about UG, and that they do not help explain cross-linguistic patterns, as I will argue in the next sub-section.

Cross-linguistic evidence does not tell us about the cognitive code (= UG)
That typological evidence cannot be used in building hypotheses about UG, contrary to the views summarized in §3.1, has already been argued in some detail by Newmeyer (1998b) (see also Newmeyer, this special issue). Newmeyer's main arguments are: (i) Some robust typological generalizations, such as the correlation between verb-final order and wh-in situ order, do not fall out from any proposal about UG; (ii) the D-structure of generative syntax is not a good predictor of wordorder correlations; (iii) the predictions of the famous null-subject parameter have not held up to closer scrutiny (see also Newmeyer 1998a:357-358); (iv) simpler grammars are not necessarily more common than more complex grammars, e.g. grammars with preposition-stranding; (v) typologically rare patterns are not in general acquired later than frequent patterns; (vi) the Greenbergian word-order correlations are best explained by a processing theory such as Hawkins's (1994) theory.
Here I would like to add three more arguments that lead me to the same conclusion.

Universals as one-way implications
A principles-and-parameters model is good at explaining two-way implications. If there is a head parameter, as suggested by Chomsky and Lasnik (1993) (see §3.1.1), it predicts that there should be exactly two types of languages: head-final languages (like Japanese) and head-initial languages (like English). Thus, Greenberg's universal 2 (prepositional languages have noun-possessor order, postpositional languages have possessor-noun order) can be easily made to follow from categorial uniformity (i.e. X-bar theory) and a head parameter. However, in practice the observed cross-linguistic generalizations are mostly one-way implications, as illustrated by the examples in (3).
(3) Some typical cross-linguistic generalizations a. If a language has VO order, the relative clause follows the head noun (but not the converse: if a language has OV order, the relative clause precedes the head noun) (Dryer 1991: 455). b. If a language has case-marking for inanimate direct-object NPs, it also has case-marking for animate direct-object NPs (but not the converse) (Comrie 1989: Ch. 6). c. If a language has a plural form for inanimate nouns, it also has a plural form for animate nouns (but not the converse) (Corbett 2000). d. If a language uses a reflexive pronoun with typically self-directed actions ('wash (oneself)' , 'defend oneself '), then it also uses a reflexive pronoun with typically other-directed actions ('attack' , 'criticize') (but not the converse) (König and Siemund 1999). e. If a wh-phrase can be extracted from a subordinate clause, then it can also be extracted from a verb phrase (but not the converse) (Hawkins 1999: 263). f. If a language has a syllable-final voicing contrast, then it has a syllable-initial voicing contrast (but not the converse) (Kager 1999: 40-43).
The fact that robustly attested universals are mostly of the one-way implicational type means that they can also be conceived of in terms of universal preferences (Vennemann 1983): Postnominal relative clauses are universally preferred, animate plurals are preferred, reflexive pronouns are preferred for typically other-directed actions, syllable-initial voicing is preferred, and so on. In a model that just consists of rigid principles and variable parameters, such patterns cannot be accounted for.
And conversely, such patterns do not yield evidence for principles of UG, unless one adopts a very different model of UG, in which the principles are not rigid but are themselves conceived of as preferences, as in much work under the heading of Optimality Theory. As was mentioned in §3.1.3, the constraint *Voiced-Coda explains the one-way implication in (3f) if no corresponding constraint *Voiced-Onset exists. Similarly, one might propose the constraints *RelNoun, *InanimAcc, *InanimPlural, *SelfdirectedReflpron, and *ClausalTrace to acount for (3a-e), and in fact the OT literature shows many markedness constraints of this type. According to McCarthy (2002: 15), "the real primary evidence for markedness constraints is the correctness of the typologies they predict". Thus, this mode of explanation of observed universals is even more blatantly circular than the Chomskyan principles-and-parameters model, where there are usually other considerations apart from cross-linguistic distributions that also play a major role in positing principles of UG. Moreover, the resulting model of the cognitive code contains hundreds or thousands of highly specific innate principles (= constraints), many of which have a fairly obvious explanation in terms of general constraints on language use. To some extent, the OT literature itself mentions these functional explanations and cites them in support of the assumed constraints. For instance, Kager (1999: 5) states that "phonological markedness is ultimately grounded in factors outside of the grammatical system proper", and Aissen (2003) relates her OT account of differential object marking to economy and iconicity (see also Haspelmath 1999b: 183-184). To the extent that good system-external explanations for the constraints are available, the standard OT model is weakened. An OT model with innate markedness constraints may be attractive from a narrow linguistic point of view because it allows language-particular description and cross-linguistic explanation with the same set of tools, but from a broader cognitive perspective it is very implausible.
It is not just functionally oriented linguists who have pointed out that crosslinguistic generalizations of the type in (3) are best explained functionally and do not provide evidence for UG. Hale and Reiss (2000: 162), in a very antifunctionalist paper, write (for phonology): [M]any of the so-called phonological universals (often discussed under the rubric of markedness) are in fact epiphenomena deriving from the interaction of extragrammatical factors like acoustic salience and the nature of language change… Phonology [i.e. a theory of UG in this domain, M. H.] is not and should not be grounded in phonetics since the facts that phonetic grounding is meant to explain can be derived without reference to phonology.

Universals as preference scales
Many implicational universals of the type in (3)  (4) a. Constituent order for languages with prepositions: RelN > GenN > AdjN > DemN (Hawkins 1983: 75ff.). b. Case-marking on direct objects: inanimate > animal > human common NP > proper NP > 3rd person pronoun > 1st/2nd person pronoun (Silverstein 1976;Comrie 1989: Ch. 6). c. Plural marking on nouns: mass noun > discrete inanimate > animal > human > kin term > pronoun (Smith-Stark 1974;Corbett 2000). d. Extraction site for wh-movement: S in NP > S > VP (Hawkins 1999: 263). e. Voicing contrast: word-final > syllable-final > syllable-initial. Scalar phenomena immediately suggest an explanation in terms of gradient extralinguistic concepts like economy, frequency, perceptual/articulatory difficulty, and so on. Thus, the scale in (4e) is presumably due to the increasing difficulty of maintaining a voice contrast in syllable-initial position (when it is easiest), syllablefinal position, and word-final position. Similarly, the further left a direct object is on the scale in (4b), the easier it is to predict its object role, so that case-marking is increasingly redundant. And as Hawkins (1994) shows, the shorter a prenominal constituent is in a prepositional language, the less processing difficulty it causes, which explains the implicational scale in (4a).
These scalar universals have always been felt to be irrelevant to principles-andparameters models of UG, but more recently they have been discussed in the context of Optimality Theory. Thus, Aissen (2003) proposes a fixed constraint hierarchy ("*Obj/Human » *Obj/Animate » *Obj/Inanimate") that allows the implicational scale in (4b) to fall out from her model of UG. But as in the case of the constraints mentioned in §3.2.1, this constraint hierarchy is very implausible as a component of UG. Attributing it to UG is apparently motivated exclusively by the desire to make as many phenomena as possible fall under the scope of UG. 5

Universals typically have exceptions
According to Chomsky (1988: 62), "the principles of universal grammar are exceptionless", but we know that many of the observed cross-linguistic generalizations have exceptions. Greenberg (1963) was aware of exceptions to some of his universals, and he weakened his statements by the qualification "almost always", or "with overwhelmingly greater than chance frequency". In the meantime, further research has uncovered exceptions to most of the universals that for Greenberg were still exceptionless, and none of the generalizations in (3) or (4) is likely to be exceptionless. So should we say that universals with exceptions are ignored, and only those relatively few universals for which no exceptions have been found are taken as significant, providing evidence for Universal Grammar? This would not be wise, because, as noted by Comrie (1989:20), we will never know whether we simply have not discovered the exceptions yet. Some generalizations have many exceptions (perhaps 20% of the cases), others have few (say, 2-3%), and yet others have very few (say, 0.01%), and so on (see also Dryer 1997). Thus, on purely statistical grounds, there is every reason to believe that there are also generalizations with exceptions that we could only observe if there existed six billion languages in the world.
The same conclusion is drawn by the antifunctionalists Hale and Reiss (2000: 162), for phonology: It is not surprising that even among their proponents, markedness "universals" are usually stated as "tendencies". If our goal as generative linguists is to define the set of computationally possible human grammars [i.e. those allowed by UG, M. H.], "universal tendencies" are irrelevant to that enterprise. This echoes Newmeyer's (1998b: 191) conclusion, for the domain of syntax: The task of explaining the most robust typological generalizations, the Greenbergian correlations, falls not to UG, but to the theory of language processing. In short, it is the task of grammatical theory [= UG theory, M. H.] to characterize the notion possible human language, but not the notion probable human language. In this sense, then, typology is indeed irrelevant to grammatical theory.
The generative linguists Hale and Reiss and Newmeyer are thus in agreement that the role of the generative enterprise in accounting for the limits of linguistic diversity is much smaller than is typically assumed. Wunderlich (this special issue, §3) concurs: "UG is less restrictive than is often thought". In practice, language structure is primarily constrained by functional factors, not by Universal Grammar.

Possible languages and possible organisms
Clearly, in the vast space of possible human languages, only a small part is populated by actual languages -that part which contains languages that are usable. There is little doubt that the set of computationally possible languages includes languages with only monosyllabic roots and only disyllabic affixes; languages with accusative case-marking of only indefinite inanimate objects; languages with eight labial and sixteen dorsal, but no coronal consonants; and so on. Such languages could be acquired and used, but they would not be very user-friendly, and they would undergo change very soon if they were created artificially in some kind of experiment. This is completely analogous to the vast space of possible organisms. Presumably, the structure of the genetic code readily allows for three-legged mammals, trees that shed their leaves in the spring, or herbivorous spiders. The reasons why we don't find such things among the existing species is well-known: they would have no chance of surviving. 6 We do not even need experiments involving genetic engineering to be sure of this, because nature itself occasionally creates monsters whose sad fate we can observe.
Of course, there are presumably also some restrictions on possible organisms which are due to the genetic code, and likewise, it seems plausible that there are some restrictions on possible grammars which are due to the cognitive code. For instance, it could be that no language can have a rule that inserts an affix after the third segment of a word ("grammars don't count"), or a rule that requires certain constructions to be pronounced faster than others ("grammars make use of pitch and intensity, but not speed of pronunciation"). Such rules may simply be unlearnable in an absolute sense. But the comparative study of attested languages does not help us much to find restrictions of this kind if they also have a plausible functional explanation. More generally, it does not help us much in identifying the cognitive code for language. 7 Analogously, the comparative study of plant and animal species does not help us in identifying the genetic code in biology. Comparative botany and zoology were sophisticated, well-developed disciplines before genetics even began to exist. And Darwinian evolutionary theory was originally built on comparative botany and zoology, not on genetics. The discoveries of 20th century genetics mostly confirmed what evolutionary biology had discovered in the 19th century. Similarly, once we know more about the cognitive code for language, I expect it to confirm what functionalist linguists have discovered on the basis of comparative linguistics.
So I conclude that the empirical study of cross-linguistic similarities does not help us in identifying the cognitive code that underlies our cognitive abilities to acquire and use language. The cognitive code evidently allows vastly more than is actually attested, and cross-linguistic generalizations can be explained by general constraints on language use. 8 From this perspective, it is odd to refer to UG as a "bottleneck" through which innovations in language use must pass (Wunderlich, this special issue, §1). The real bottleneck is language use itself (cf. Kirby 1999: 36, Kirby et al., this special issue, §4).

What kind of evidence can give us insights into the cognitive code?
There are of course many other ways in which one could try to get insights into the nature of the cognitive code. The most direct way would be to study the neurons and read the cognitive code off of them directly, somewhat like modern genetics can look at chromosomes at the molecular level, sequence DNA strings and identify the genes on them. But neurology is apparently much more difficult than molecular genetics, so this direct method does not give detailed results yet.
The study of the genetic code did not begin at the molecular level with DNA sequencing, but at the level of the organism, using a range of simple but ingenious experiments with closely related organisms (Gregor Mendel's experiments with the progeny of different varieties of pea plants, which led him to formulate the first theory of heredity). I would like to suggest that unusual experiments of this kind hold some promise for the study of the cognitive code. However, while ordinary psycholinguistic or neurolinguistic experiments with mature speakers may give us insights about their language-particular mental representations, they do not tell us much about the cognitive code in general.
What we really need to test the outer limits of UG is experiments on the acquisition of very unlikely or (apparently) impossible languages. For ethical and practical reasons, it is virtually impossible to create an artificial language, use it in the environment of a young child and see whether the child acquires it. And yet this is the kind of experiment that would give the clearest results. So it is worth looking at situations that approach this "ideal" experimental setup to some extent: i. The natural acquisition of an artificial language like Esperanto: There has long been a sizable community of Esperanto speakers, and some have acquired Esperanto natively because it is used as the main language at home (see Versteegh 1993). To the extent that Esperanto has structural properties that are not found in any natural human languages, we can study the language of Esperanto native speakers and see whether these speakers have problems in acquiring them. (I do not know whether such studies have been carried out.) Similarly, it may be possible to derive insights from languages which were once only used in written form and acquired through instruction in the classroom but then became spoken vernaculars, as happened most famously with Modern Hebrew. See Weiß (this special issue) for related discussion.
ii. Artificial acquisition experiments with adult subjects: Bybee and Newman (1995) created fragments of artificial languages and exposed adults to them, letting them "acquire" these languages as second languages. They found that a language with systematic stem changes is not more difficult to acquire than a language with affixation, and they claimed that the comparative rarity of stem changes has to do with the likelihood of certain diachronic changes, not with their synchronically dispreferred status. See also Smith et al. (1993) for somewhat more sophisticated experiments with a single highly skilled speaker.
iii. Language games (also known as "ludlings"): These are special speech registers involving rule-governed phonological manipulations of ordinary speech, such as, for example, "insert a k into every syllable", or "say every word backwards". They are often used fluently by speakers, and they show that the cognitive possibilities are apparently much greater than the patterns that are attested in ordinary languages (see also the quotation from Anderson 1999 in note 8). An example is the Indonesian language game Warasa, which in each word replaces the first onset of the final foot and anything preceding it with war. Compare the following sentence, recorded from spontaneous speech (Gil 2002; the second line gives the ordinary Indonesian equivalents): (5) Warengak warabu warengkau warumbuk waranges ang.
(Bengak labu engkau tumbuk n-anges ang.) lie lie you hit ag-cry fut [Conversation amongst friends deteriorates into argument] 'Liar, I'm going to beat you until you cry.' No known ordinary language has processes of this kind (at least not applying to every word in an utterance), but the existence of such language games shows that this is not because the cognitive code does not allow us to learn and use them.
Another conceivable source of insights into the cognitive code would be unlearnable patterns in adult languages. It is often claimed that some patterns cannot be learned on the basis of positive evidence ("poverty of the stimulus", see the discussion in Fischer, this special issue), but we still know very little about what can and what cannot be acquired on the basis of positive evidence. As Hawkins (1988: 7-8) pointed out, there are also language-particular facts that seem difficult to acquire without negative evidence (e.g. the English contrast between *Harry is possible to come and Harry is likely to come). Culicover (1999), too, stresses the large amount of language-particular idiosyncrasies that every child acquires effortlessly and points out that a highly general mechanism such as the Chomskyan UG does not seem to be of much help here.
Be that as it may, all these diverse approaches to understanding the cognitive code do not depend on a thorough, systematic description of languages (recall that this was my first claim of §1). Rather, the nature of UG needs to be studied on the basis of other kinds of system-external evidence (which of course does presuppose some kind of superficial description, but not the sort of thorough, systematic description that linguists typically spend much effort on).  Comrie 1989;Hawkins 1994Hawkins , 1999Haspelmath 1999a;Croft 2003) do not presuppose cognitively realistic descriptions of languages, but can make do with phenomenological descriptions (using basic linguistic theory, cf. Dryer forthcoming). This is of course what we find in practice: Functional-typological linguists draw their data from reference grammars and generalize over them to formulate universals, which are then explained with reference to grammar-external factors. This is similar to adaptive explanations in biology, which do not presuppose knowledge of the genome of a species, but can be based on phenomenological descriptions of organisms and their habitat. This approach has been criticized by generative linguists on the grounds that only detailed analyses of particular languages can meaningfully be used in crosslinguistic comparison. For example, Coopmans (1983) (in his review of Comrie 1981) maintains that observations about surface word order cannot be used to argue against a particular X-bar theory, because only specific, thorough grammatical analyses that are incompatible with a proposal about UG can be used to refute such a proposal. Newmeyer (1998a) goes even further in demanding that functional explanations should be based on "formal analysis" even if they are not presented as being incompatible with hypotheses about UG: 9 [F]ormal analysis of language is a logical and temporal prerequisite to language typology. That is, if one's goal is to describe and explain the typological distribution of linguistic elements, then one's first task should be to develop a formal theory. (Newmeyer 1998a: 337) I would agree with Newmeyer if he accepted phenomenological descriptions (of the kind typically found in reference grammars) as constituting "formal analyses" in his sense. They are surely "formal" in that every satisfactory reference grammar will make use of grammatical notions such as affix, case, agreement, valence, indirect object; virtually everybody agrees that grammars cannot be described using exclusively semantic or pragmatic notions (like agent, focus, coreference, recipient), i.e. in practice virtually everybody assumes the "autonomy of syntax" in Newmeyer's (1998a: 25-55) sense. 10 The point that I want to emphasize here is that for the purposes of discovering empirical universals (and explaining them in functional terms), it is sufficient to have phenomenological descriptions that are agnostic about what the speakers' mental patterns are. We do not need "cognitive" or "generative" grammars that are "descriptively adequate". "Observational adequacy" is sufficient. In other words, a descriptive grammar must contain all the information that a second-language learner (or perhaps a robot) would need to learn to speak the language correctly, but it need not be a model of the knowledge of the native speaker.
Thus, most of the issues that have divided the different descriptive frameworks of formal linguistics and that have been at the center of attention for many linguists are simply irrelevant for functional explanations. In the next subsection, we will see a few examples illustrating this general point.  Keating et al. 1983).
The functional explanation for this presumably refers to the phonetic difficulty of maintaining voicing distinctions in final position.
This explanation is independent of the type of description: -whether we assume an abstract underlying form /bed/ that is -either transformed to a surface form by applying a sequence of rules (of the type [+obstr] AE [−voice]/__$), as in Chomsky and Halle (1968), -or used as the input for the generation of candidates, from which the optimal output form is selected, as in Optimality Theory (Kager 1999;McCarthy 2002), -or whether we assume no abstract underlying form, so that all alternating stems have to be listed separately.

Inflection and derivation
This was discussed in §3.1.2. The basic observation is that derivational affixes always come between the root and inflectional affixes when both inflection and derivation occur on the same side of the root. A functional explanation for this generalization appeals to the meaning differences between inflectional and derivational affixes: There is "a "diagrammatic" relation between the meanings and their expression" (Bybee 1985: 35), such that the "closer" (more relevant) the meaning of a grammatical morpheme is to the meaning of the lexeme, the closer the expression unit will occur to the stem. This explanation is independent of the type of description: -whether the inflectional and the derivational components are strictly separate (as in Anderson 1992), -or whether inflection and derivation are assigned to the same component obeying the same kinds of general principles (as in Lieber 1992).

Differential case-marking
This was discussed in §3.2.1-2 (see 3b and 4b). It basically says that case-marking on direct objects is the more likely, the higher the object referent is on the animacy scale. A functional explanation for this is that the more animate a referent is, the less likely it is that it will occur as a direct object, and it is particularly unlikely grammatical constellations that need overt coding (cf. Comrie 1989: Ch. 6).
This explanation is independent of the type of description: -whether object-case marking is achieved by a set of separate rules as in Relational Grammar (cf. Blake 1990), -or whether object-case marking is achieved by specifier-head agreement with an Agreement node, as in some versions of the Chomsykan framework, -or whether a set of Optimality Theoretic constraints are employed (as in Aissen 2003).

Extraction of interrogative pronouns
This was mentioned in §3.2.1-2 (see 3e and 4d). The relevant generalization here is that the more deeply embedded the gap is, the less likely the extraction is (S in NP > S > VP; Hawkins 1999: 263). A functional explanation for this is that constructions with more deeply embedded gaps have larger "Filler-Gap Domains" and are hence more difficult to process (Hawkins 1999).
This explanation is independent of the type of description: -whether extraction constructions are described by an undelying structure with the interrogative pronoun in its expected position, which is transformed by a movement operation (restricted by subjacency and bounding nodes), -or whether the interrogative pronoun is base-generated in initial position and related to the gap by a more eleborate feature system (as in Gazdar et al. 1985).

Word-order preferences
There is a very strong preference for agents to precede patients in simple transitive clauses (cf. Greenberg 1963: 77, Universal 1). A functional explanation of this is that agents are typically thematic, and more thematic information tends to precede less thematic information (see Tomlin 1986).
This explanation is independent of the type of description: -whether consituency or dependency is assumed to be the major organizing principle of syntax, and if constituency is assumed, -whether a completely flat structure, lacking a VP, is assumed ([S NP ag V NP pat ]) -or whether a clause structure with a VP is assumed ([S NP ag [VP V NP pat ]].

Article-possessor complementarity
In Haspelmath (1999a), I discussed the phenomenon that languages sometimes require the definite article to be omitted in the presence of a possessor (cf. English *Robert's the bag/*the Robert's bag).
My functional explanation of this was that the definite article is somewhat redundant in this construction, because possessed noun phrases are significantly more likely to be definite than non-possessed noun phrases.
This explanation is independent of the type of description: -whether a determiner position is assumed that can be filled only once, either by the definite article or by the possessor (as in Bloomfield 1933: 203;Givón 1993: 255;McCawley 1998: 400, among many others, for English), -or whether it is not assumed that there is such a determiner position, and that the grammar has to include a separate statement to the effect that the definite article must be omitted from possessed noun phrases (as in pre-structuralist descriptions, as well as Abney 1987: 271, and much subsequent work in the Chomsykan tradition).
Thus, as Dryer (1999: §2) points out (note that Dryer uses the term "descriptive framework" and "metalanguage" interchangeably): [W]e do indeed need to describe languages, and describing them entails having some sort of metalanguage, but it does not particularly matter what the metalanguage is. There may be practical considerations, such as choosing a mode of description that is user-friendly, but on the whole the choice of metalanguage is devoid of theoretical implications.
A reviewer observes that it should not be a criterion for the scientific value of an approach that it avoids making choices in the cases of §4.2.1-6, especially since the competing descriptions do not all make the same predictions. The latter observation is probably correct, though (as the reviewer also recognizes) the full range of predictions of a particular description is rarely explored. Typically linguists argue for a particular description primarily on conceptual grounds (see §4.4 below), not because it accounts better for all the data. This introduces a strong element of subjectivity into linguistic description, and for this reason I have to disagree with the reviewer: It is indeed a sign of the scientific value of an approach if it avoids subjective decisions and stays out of debates that are hardly resolvable by empirical considerations.
In the preceding two subsections I have contrasted my approach mostly with the Chomsykan approach, but of course many functional linguists, too, are claiming that their descriptions are cognitively real (e.g. work in the cognitive grammar tradition of Langacker 1987: 91). What I have said about generative approaches mostly also applies to these functionalist approaches: Their descriptive proposals presuppose a dangerous number of subjective decisions, and it is a virtue of the approach favored here that it depends neither on particular generative nor on particular functionalist descriptive frameworks.

Cross-linguistic generalizations are not premature
We saw in §4. 1 that Newmeyer (1998a) asserts the need for "formal analysis" to precede cross-linguistic generalization and functional explanation. And clearly he is not content with phenomenological descriptions of the sort found in reference grammars: [T]he only question is how much formal analysis is a prerequisite [to functional analysis]. I will suggest that the answer is a great deal more than many functionally oriented linguists would acknowledge.
To read the literature of the functional-typological approach, one gets the impression that the task of identifying the grammatical elements in a particular language is considered to be fairly trivial. (Newmeyer 1998a: 337-338) In the last sentence of this passage, Newmeyer seems to confuse two things: on the one hand, the definition of language-particular grammatical classes, which many reference grammars devote considerable attention to (and which by contrast is typically considered trivial by generative linguists), and on the other hand, the definition of categories for cross-linguistic comparison. The latter must be based on meaning (cf. Croft 2003: 6-12), so the detailed formal analysis found in reference grammars is not directly relevant to it. For instance, distinguishing adjectives and verbs in a particular language may require detailed discussion of mood forms, relativization strategies and comparative constructions, but a cross-linguistic study of (say) property word syntax only needs a ("fairly trivial") semantic characterization of its subject matter.
Most of the analytical effort in generative grammar is in fact not devoted to the identification of language-particular categories, but to the identification of categories attributed to Universal Grammar. And this is, of course, extremely difficult: Assigning category membership is often no easy task… Is Inflection the head of the category Sentence, thus transforming the latter into a[n] Inflection Phrase? … Is every Noun Phrase dominated by a Determiner Phrase? … There are no settled answers to these questions. Given the fact that we are unsure precisely what the inventory of categories for any language is, it is clearly premature to make sweeping claims about their semantic or discourse roots. Yet much functionalist-based typological work does just that. (Newmeyer 1998a: 338) The idea that Infl is the head of IP (= S¢), or that noun phrases are really DPs, did not come from the study of particular languages, but from certain speculative considerations about what the categories of UG might be. 11 As we saw in §3, it is clearly premature to make sweeping claims like these about UG, so it is not surprising that consensus about such matters is generally reached only through authority. But it is not premature to provide phenomenological descriptions of particular languages, and to formulate cross-linguistic generalizations on their basis.

What kind of evidence can be used for cognitively realistic descriptions?
It is fortunate that we do not need cognitively realistic descriptions for functional explanations, because such descriptions are extremely difficult to come by. How would we choose between two competing descriptions of a phenomenon for an individual language? To take a concrete example: How do we choose between the determiner-position analysis of English article-possessor complementarity and the alternative analysis that operates without a determiner concept? Both descriptions are "observationally adequate", but which one is more "descriptively adequate", i.e. which one reflects better the generalizations that speakers make? How do we know whether English speakers make use of a determiner concept? 12 Two general guiding principles that formal linguists use to make the choice are: (i) Choose the more economical or elegant description over the less economical/ elegant description, and (ii) choose the description that fits better with your favorite view of Universal Grammar. The determiner-position analysis was first proposed for English by Bloomfield (1933: 203), who was among the most influential authors in disseminating the idea that descriptions are more highly valued if they are economical or elegant. It was adopted by generative grammarians, until for unrelated reasons a view of UG became prevalent which did not allow the determiner-position analysis anymore (cf. Abney 1987 and subsequent work in the Chomskyan framework, where the determiner and the possessor are seen as occupying two different positions). 13 Unfortunately, both these principles are unlikely to lead to success. The first principle (favoring economy/elegance) is of little help because we do not know whether speakers prefer the most economical or elegant description, and even if we knew that they do, we would not know exactly what they want to economize on primarily (for instance, whether they want to economize on components of the grammar, on analytical concepts, on individual rules, or on items listed in the lexicon), and exactly what appears elegant to them. (In actual fact, there are many indications that speakers are more concerned with processing efficiency than with elegance of the system.) The second principle (favoring conformity with UG) is of little help because, as we saw in §3, cross-linguistic description (or indeed detailed language-particular description) does not help us in discovering UG, and the other sources of evidence have not yielded much information yet, so that we know almost nothing about UG at this point.
It seems that as in the case of the search for UG (cf. §4), we have to look beyond the evidence provided by language description, and consider evidence from psycholinguistics, neurolinguistics, and language change (i.e. "external evidence", or "sub-stantive evidence"). The relevance of evidence from these sources for cognitive grammars and the cognitive code has often been acknowledged by linguists, but what they have typically had in mind is that external evidence can be used in addition to evidence from language description (and in practice, the evidence from language description has played a much more significant role). What I am saying here is that external evidence is the only type of evidence that can give us some hints about how to choose between two different observationally adequate descriptions.

Conclusion
To summarize, I have made the following claims in this paper: -cross-linguistic data cannot be used to argue for (or against) a model of UG; -conversely, a model of UG cannot be invoked to explain cross-linguistic generalizations; -a model of the cognitive code requires evidence from domains other than language description; -cross-linguistic generalizations are best explained by system-external constraints on language use, i.e. functionally; -cognitively realistic description of individual languages is not a necessary prerequisite for functional explanation of universals; -a model of a speaker's knowledge of a language cannot be based on a description of the language but requires evidence from domains other than language description.
Thus, we see that pure language description can only give us phenomenological descriptions and phenomenological universals, and that it does not help us much with cognitively realistic description and the cognitive code. This may seem like a somewhat pessimistic conclusion, because it reduces the role of "pure" linguistics in addressing the theoretical goals of Table 1 above. However, "pure" linguistics will not become unemployed anytime soon. Even if half the world's languages become extinct by the end of this century, there will still be three thousand languages left to be described, and plenty of cross-linguistic generalizations (and their functional explanations) remain to be discovered or tested. And those who mostly care about what is in our head before language acquisition (i.e. the cognitive code, or Universal Grammar) or after language acquisition (i.e. the cognitive grammar) will have plenty of other sources of evidence to tap.
Notes * I am grateful to Martina Penke, Anette Rosenbach, and Helmut Weiß for detailed comments on an earlier version of this paper, as well as to an audience at the Max Planck Institute for Evolutionary Anthropology.

4.
True, just as biologists can come home from a field trip with a specimen of a new species, field linguists can collect specimens of speech and deposit tapes and transcriptions in a linguistic archive. But of course the ultimate goal is the description of the type, not the specimen token, and in linguistics the "type" (i.e. the grammar) cannot easily be reconstructed on the basis of specimens of speech, especially if they consist of only a few hours (or less) of speech. Complete grammatical description also requires experimentation (i.e. elicitation).

5.
Aissen does not actually say that she conceives of her constraints and constraint hierarchies as being part of the innate cognitive code; and by prominently invoking the notions of economy and iconicity, she invites the inference that she thinks of her model as a kind of formalization of the functional explanations of Silverstein (1976) and Comrie (1989), not as a contribution to the theory of UG. If that is the right interpretation, then Aissen's work is irrelevant to the present concerns. A recent paper that makes use of similar concepts but adopts an explicitly functionalist point of view, minimizing the role of innate factors, is Jäger (2003).
6. Not surprisingly, linguists of Chomskyan persuasion often point out that even in biology, there may be other, nonadaptive factors that explain certain properties of organisms, such as Thompson's (1961) principles of biological forms (e.g. Lightfoot 1999: 237; see Newmeyer 1998c for critical discussion). In this line of thinking, a reviewer suggests that the nonexistence of three-legged mammals might be due to a general symmetry preference. One should not dismiss such a possibility out of hand, but it is hardly an accident that almost all moving organisms show symmetrical bodies, while stationary organisms need not be symmetrical (flowers often have three, five or seven petals). Apparently symmetrical bodies make movement easier (note also that cars and airplanes are usually symmetrical, whereas houses are often asymmetrical).

7.
Of course, there is one (rather trivial) sense in which cross-linguistic research gives us information about the cognitive code: If we find a language with a certain surprising property (e.g. a manner adverb agreeing in gender with the object, as in Tsakhur), then we know that the cognitive code must allow such a language. Data from language description can thus give us a lower bound on what the cognitive code can do, but not an upper bound.

8.
Here is another quotation from a well-known generative linguist who agrees with this conclusion: …the scope of the language faculty cannot be derived even from an exhaustive enumeration of the properties of existing languages, because these contingent facts result from the interaction of the language faculty with a variety of other factors, including the mechanism of historical change. To see that what is natural cannot be limited to what occurs in nature, consider the range of systems we find in spontaneously developed language games, as surveyed by Bagemihl (1988)… …the underlying faculty is rather richer than we might have imagined even on the basis of the most comprehensive survey of actual, observable languages… …observations about preferences, tendencies, and which of a range of structural possibilities speakers will tend to use in a given situation are largely irrelevant to an understanding of what those possibilities are. (Anderson 1999: 121) 9. Note that since §3 argued that arguments for UG cannot be derived from typological evidence, it is implied that arguments against UG cannot be derived from typological evidence either.

10.
Functionalists often describe their stance as differing from formalists in rejecting the Chomskyan autonomy thesis, but by this they generally mean that they reject the idea that language use should play no role in the explanation of language form, not that they reject autonomy in Newmeyer's sense (i.e. that purely formal, non-semantic, non-functional concepts are systematically needed in the description of language form). See Haspelmath (2000) for more discussion of Newmeyer's autonomy notion.

11.
A reviewer objects that in the development of these ideas, data analysis and theoretical considerations went hand in hand. However, a close reading of Chomsky (1986) (the source of the "IP" idea) and Abney (1987) (the source of the "DP" idea) clearly shows that conceptual elegance was the main motivation, in particular the desire to fit all phrases into a uniform X-bar schema. In Abney's (1987) crucial section II.3 ("The DP analysis", pp. 54-88), the first twenty pages are entirely free of data, i.e. they consist of speculative considerations about what the categories of Universal Gramar might be.

12.
Moreover, how do we know that all speakers of English make the same generalizations? It could be that for whatever reason, some speakers make use of a determiner concept in their mental grammars, while other speakers do not.
13. Abney (1987) proposed that the determiner occupies a head position. The possessor cannot be in this position because it can be phrasal (as in the girl's bike), and heads cannot be phrasal.