Pre-established categories don't exist: Consequences for language description and typology

Abstract 1. Introduction Structural categories of grammar (such as clitic, affix, compound, adjective, pronoun, dative, subject, passive, diphthong, coronal) have to be posited by linguists and by children during acquisition. This would be easier if they simply had to choose from a list of pre-established categories. However, existing proposals for what such a list might be are still heavily based on the Latin and English grammatical tradition. Thus, descriptive linguists still have no choice but to adopt the Boasian approach of positing special language-particular categories for each language. Theorists often resist it, but the crosslinguistic evidence is not converging on a smallish set of possibly innate categories. On the contrary, almost every newly described language presents us with some “crazy” new category that hardly fits existing taxonomies. Although there is thus no good evidence for pre-established categories, linguists still often engage in category-assignment controversies such as “Is the Tagalog ang-phrase a subject or a topic?”, “Is German er a pronoun or a determiner?”, “Are Mandarin Chinese property words adjectives or verbs?”, or “Is the Romanian definite article a clitic or a suffix?”


Structural categories and typology
In this programmatic paper, I argue that the non-existence of preestablished categories of language structure has profound consequences for both language description and language typology.
All linguistic description, which is the prerequisite for typological comparison, is based on structural categories of language form such as the following: clitic, affix, compound, phrase, adjective, verb, relative pronoun, complementizer, instrumental case, subject, unaccusative, passive, diphthong, coronal It is evident that such categories are necessary and cannot be dispensed with: Children must acquire them in order to use language productively and creatively, and linguists must posit them in their descriptions in order to arrive at satisfactory accounts of the language system. This is not to say that linguists must use the same categories that children use, but in any event some categories are necessary to capture the productivity and regularity of the language system. Typological research has made great progress in expanding our knowledge of what categories exist in the world's languages and how they pattern, starting with von Humboldt's (1827) study of the dual category.
Many other recent (and not-so-recent) typological contributions on particular categories could be cited, e.g. on the passive (von der Gabelentz 1860, Siewierska 1984Siewierska , 2005, the accusative (Henkelmann 2006), or the adjective (Wetzer 1996). There are also studies that look at entire category systems such as number (Corbett 2000), voice (Brus 1992), or parts of speech (Hengeveld 1992).
The notion of category could be broadened to include also constructions (which are just complex syntactic categories with specific properties), such as ditransitive constructions (Haspelmath 2005), non-verbal predicative constructions (Stassen 1997), relative clauses (Lehmann 1984), or noun phrases (Rijkhoff 2002), as well as phonological categories such as velar nasals (Anderson 2005) or stress (Goedemans & van der Hulst 2005). At a more abstract level, general notions such as sentence, clause, phrase, word, clitic, affix, stem, root, inflection, syllable, diphthong, consonant must also be regarded as (highly abstract) categories of language form, though it is less obvious that typology has made significant contributions to understanding them.
But how do we identify the categories we need? Here the serious problems begin, to which we now turn.

Pre-established categories
Which are the right categories for a given language? This is a potential problem both for children acquiring a language and for linguists wishing to describe one. If they could choose them from a list of pre-established (or a priori) categories, this would make life easier for both of them. For children, there would have to be an innate list of categories that they have access to during language acquisition. For descriptive linguists, there would have to be a list that contains the pre-established categories that general linguists have figured out in some way. These would not necessarily have to be innate, but they would have to be universal in the sense that a descriptive linguist can be sure that the categories needed for describing his/her language are on the list.
Of course, in the Chomskyan school the idea that a set of innate categories (or more elementary innate features) exists is very widespread, and generative linguists have explicitly set for themselves the goal of discovering these categories (or features). But also linguists who do not follow this Chomskyan assumption of innate substantive universals often act as if there were a set of pre-established categories. This could be (i) because they have not thought about the issue (which has not been widely discussed in linguistics), or (ii) because they confuse universal functions with language-particular or universal categories, or (iii) because they think that all languages for some reason end up with the same categories, even though they do not start out with them.
This latter possibility of course needs to be taken very seriously. But in any event, no linguist would seriously claim that were know what the supposedly universal categories are -neither the innate ones (if the Chomskyans are right that they exist), nor the secondarily universal ones (if universal categories exist but are not innate). The generative linguists have in fact not made a strong effort of identifying universal categories or features (except perhaps in phonology, where the work of Jakobson, Fant & Halle 1952 has been followed up by some more recent work on phonological features). In syntax, generativists have simply assumed that the traditional European categories are universal, at least in practice (there is very little theoretical discussion, an exception being Baker 2003). And also in nongenerative circles, to the extent that the same category labels are used across languages, this is typically due to the fact that grammatical concepts are carried over from one language to another. But this generally happens in a haphazard way that mostly reflects the contingencies of the history of linguistics (strong influence of powerful languages like Latin, or English, or languages described by influential linguists such as Dyirbal, and so on).
Thus, descriptive linguists still have no choice but to adopt the Boasian approach of positing special language-particular categories for each language, unless they do not mind Anglocentric or Dyirbalocentric descriptions that give a distorted picture of their language.

Structural categories are language-particular
Since the early 20th century, linguists have become aware that the categories of language structure are language-particular. This realization had two prominent sources, only one of which can be said to be an achievement of (a kind of) typology. First, in the innovative movement initiated by de Saussure (1914), linguists emphasized the paradigmatic relationships of categories within a system, so that they found the Russian dative to have a different nature from the German dative (Jakobson 1936), and the Latin t to have a different nature from the Greek t (Trubetzkoy 1939), due to the different paradigmatic oppositions in which they stand. But second, philosophically inclined North American fieldworkers such as Boas (1911) and his heirs found the categories in their languages to diverge so radically from the Standard average European languages that they taught their students not to make any assumptions about the categories in terms of which the language should be described.
Both groups of linguists later came to be known as structuralists, and some modern typologists still trace their way of thinking to the first group (e.g. Lazard 2005:4-5) or to the second group (e.g. Matthew S. Dryer). 1 But the generativist attitude of leaving aside the structuralist concerns with language-particular definition and justification also had a strong impact on language typology as practiced between the 1960s and 1990s, where problems with the assumption of universal a priori categories were not widely discussed. The need to define categories in language-particular terms has come to be emphasized again only more recently (Lazard 1992, Dryer 1997, Croft 2000, especially for syntactic functions such as subject and word classes such as adjective. These authors note that there are no necessary and sufficient properties that could identify such categories across languages, and that the formal criteria used for identifying them are themselves language-particular and hence not generally applicable. Moreover, it is clear that the cross-linguistic evidence is not converging on a smallish set of universal (possibly innate) categories. On the contrary, almost every newly described language presents us with some "crazy" new category that hardly fits existing taxonomies, e.g.
-two adjective-like parts of speech in Japanese, one of which is a little more like verbs (but clearly distinct), the other one a little more like nouns (but again clearly distinct) -an "affective case" for the experiencer of only a handful of verbs in Andic languages like Godoberi (Kibrik 1996) in the Eastern Caucasus -a distinction between "weak" and "strong" adjective declension in German and a few closely related languages -an Ablative Absolute construction in Latin, consisting of a subject phrase and an agreeing participial verb, both in an oblique case, to render a type of adverbial subordination -the locative predicator in Mina (Chadic; a category "hitherto not observed in other languages"; Frajzyngier & Johnston 2005) -the prioritive applicative in Hakha Lai (Tibeto-Burman), 'do something before somebody' (Peterson 2007:20) So not only are similar categories in two languages never identical, but languages also often exhibit categories that are not even particularly similar to categories in other languages. The situation is completely parallel in phonology. Pierrehumbert (2001:137) notes that "it is not possible to point to a single case in which analogous phonemes in two different languages display exactly the same phonetic targets and the same pattern of contextual variation." According to Port & Leary (2005:940), "there are many sounds that are isolates, that is, sounds that have been found in only one or a very small set of languages." And Mielke (to appear) argues in great detail that the cross-linguistic evidence shows that phonological distinctive features are not universal, but language-particular entities.
The same skepticism as for particular categories also seems justified for category-systems such as number, case, voice, tense and aspect. Whether a particular category (e.g. the English Perfect) belongs to one category system or another (e.g. to the tense system or to the aspect system) is a question that is often unanswerable, except by convention.  explicitly argues that category-systems such as tense and aspect are irrelevant to understanding linguistic categories, and that these terms are useful only to help linguists organize their work. And likewise, constructions (such as ditransitives or relative clauses) cannot be identified with each other across languages, but are language-particular (Croft 2001).
I will therefore not assume here that pre-established categories (or category-systems) exist, and ask what consequences follow if they do not exist. If it turns out that they exist after all, then what follows will be irrelevant. 2 However, as long as the evidence for them is so weak, it seems safer to adopt the non-aprioristic approach of this paper.
While among typologists the belief that grammatical categories are language-particular and pre-established categories do not exist is now widely shared, this is still news to generative grammarians (who do not seem to even have addressed the issue). And the claim that the language-particular nature of categories extends to formal categories such as affix, clitic and compound does not seem to have been widely considered anywhere.

Controversial category assignments
Although, as we saw, there is no good evidence for pre-established categories, linguists still often ask questions such as the following: Is the Tagalog ang-phrase a subject or a topic? (Schachter 1976) (vi) Is German er a pronoun or a determiner? (Vater 2000) (vii) Is the Englush I/me and she/her contrast a contrast of case? (Hudson 1995) (viii) Is English that in relative clauses a pronoun or a complementizer?
(van der Auwera 1985) (ix) Is the English adverbial -ly an inflectional or a derivational suffix? (x) Are the two types of intransitive verbs in Jalonke (Mande) unacccusatives and unergatives? (or are they something else?) (Lüpke 2006) (xi) Are French subject clitics (je, tu, il…) pronouns or agreement markers? (De Cat 2005) (xii) Is the German dative a structural case or an inherent case? (Wegener 1991, Woolford 2006 Such category-assignment controversies seem to presuppose that preestablished categories exist, because they do not make much sense without such a presupposition. In fact, some of them make no sense at all without it, e.g. the question whether the Tagalog ang-phrase is a subject or a topic. Others could be made sense out of if we interpreted the categories as language-particular categories. Thus, we could sensibly ask whether English -like is more like English stems or more like English suffixes, and hence whether it should be classed with the former or the latter. However, if categories are not given to us a priori, another obvious possibility is that English -like belongs to a third English-specific class (or even that it belongs to no larger class at all, being an item sui generis). But the literature is full of category-assignment controversies that do not even consider the possibility of positing novel or unconventional categories. So in any event the assumption of pre-established categories seems tom be deeply ingrained in linguistics.

Consequences for language description
An important consequence of the non-existence of pre-established categories for language description is that category assignment controversies like those just seen are pointless. There is usually no way to resolve them, because no universally applicable necessary and sufficient criteria for defining a priori categories can be given. Instead of fitting observed phenomena into the mould of currently popular categories, the linguist's job should be to describe the phenomena in as much detail as possible, using as few presuppositions as possible. Language describers have to create language-particular structural categories for their language, rather than being able to "take them off the shelf". This means that they have both more freedom and more work than is often thought.
Does this mean that these categories should be given totally opaque names, such as "the -fu-form", or "class 34"? Opaque names of course have the advantage of avoiding unwanted connotations with Latin or English grammar, and opaque names have for this reason been widely used by practitioners of Tagmemics and other American structuralists.
However, I would not recommend the use of opaque names to descriptive linguists. Opaque names may be justified by theoretical considerations, but they are not practical because they are very hard to remember. The best solution is to use familiar terms for mnemonic reasons, but to capitalize them (following Comrie 1976, Croft 2001) in order to emphasize that the categories are "proper names" (e.g. the German Perfect, the Tagalog Subject, the Japanese Inflected Adjective, the Indonesian Passive, the English Ditransitive Construction).
The linguistics literature is of course full of category-assignment controversies, but on closer inspection it typically turns out that languageparticular descriptions with different category assignments are merely notational variants, i.e. they do not differ substantively in terms of the generalizations that they embody. They are often motivated less by the desire to describe the language in a complete way than by the desire to describe them in terms of categories that have been proposed within some prestigious theory, or by the desire to show that one notational variant is better than another given some criteria such as elegance or descriptive economy. Such discussions seem to distract descriptive linguists from their more urgent business, that of describing languages in a way that is as complete as possible. Even the best-known languages have been described only very partially so far.

Substance-based comparison
The most important consequence of the non-existence of pre-established categories for language typology is that cross-linguistic comparison cannot be category-based, but must be substance-based, because substance (unlike categories) is universal.
In phonology, this means that comparison must be phonetically based; in morphosyntax, it means that comparison must be semantically based.
This has been widely recognized in the Greenbergian functionaltypological approach (e.g. Croft 2003), though it is often hidden by the widely practiced terminology: Typologists who study word order often talk about "noun-genitive" order, "verb-object" order, and so on. To understand what these typologists are doing, it is crucial to be aware that all these terms refer to semantically defined entities. This is very explicit, for instance, in Matthew Dryer's work (e.g. Dryer 1995:1062-1063, Dryer 2005, and already Greenberg (1963Greenberg ( [1966) said that he used semantic criteria to define subject, object, and similar notions. And in phonology, Maddieson (1984:160-163) is very clear that he used phonetic criteria for comparing segments in the world's languages.

Problems for typology
Form the above considerations it is clear that categories which cannot be semantically defined are extremely difficult to compare across languages. Examples of such categories are "adposition" (e.g. in comparison to "serial verb"), or "case" (e.g. in comparison to "adposition"). Interestingly, it is with categories of these types that the maps of the World Atlas of Language Structures show the greatest discrepancies: For example, "case" is defined differently in Iggesen (2005) and Baerman & Brown (2005), and "adposition" is defined differently in Bakker (2005) and Dryer (2005).
Similarly, the Grammatical Relations scale (Keenan & Comrie 1977) for accessibility to relativization, one of the best-known typological generalizations, is extremely problematic, because grammatical relations are language-particular (Dryer 1997) and cannot be easily compared across languages. Thus, it seems to me that typologists need to rethink some of their best-known results in view of the realization that linguistic categories are not pre-established.
Perhaps even more disturbingly, since the categories "sentence" (see Mithun 2003) and "word" (see Hildebrandt & Bickel 2005) are languageparticular, too, not even the major grammatical divisions "syntax" and "morphology" (which are generally defined in terms of 'sentence' and 'word') are preestablished and universal. Thus, it does not really make sense to ask whether a particular notion is expressed morphologically or syntactically in a given language.
Much of this has been widely recognized by descriptive linguists and typologists, but it seems that the full consequences of radical languageparticularity have yet to be digested by linguists. Maybe language description and language comparison will turn out to be much more difficult than has been thought so far.

Are categories totally different across languages?
Of course, there are many similarities between categories across languages, and this fact often leads to the temptation of equating language-particular categories with each other. The English Passive and the Japanese Passive share many properties, and Russian Suffixes are similar to Arabic Suffixes in many ways. These similarities are hardly accidental, and it is an important task of linguistics to find out how far they go and how they might be explained. However, it is important to realize that similarities do not imply identity: It is very hard to find categories that have fully identical properties in two languages, unless these languages are very closely related. Crosslinguistic similarities of categories are often best expressed in the form of implicational scales or semantic maps (see Croft 2001, Haspelmath 2003. In order to find generalizations of this sort, one has to start with the awareness that each language may have totally new categories.

Are category-assignment controversies a total waste of time?
It is true that by asking the wrong questions (about assignment to supposedly pre-established categories), linguists are often prompted to look for further evidence that they might otherwise have overlooked. In this way, category-assignment controversies indirectly play a positive role, and many category-assignment controversies have led to clarifications and new insights. In this way, such controversies have not been totally sterile, but have at least had some positive side effects, even though it has not been possible to resolve them.

Is semantics universal?
For morphosyntactic comparison to be possible, we must hold the meaning constant -at least this must be universal. But there is ample evidence that meaning, too, is conventional and varies across languages. One cannot simply presuppose that for a typology of possessive constructions, all languages conveniently have a meaning of "possession" whose diverse modes of formal expression the typologist can study.
The question of semantic universals is the most difficult to answer, and it cannot be excluded that we will ultimately find out that meanings cannot really be compared across languages either, thus making cross-linguistic comparison in this domain impossible.
But this is unlikely, because experience shows that people can understand each other across linguistic boundaries with some efforts. Translation is generally possible, even if not always straightforward. Notice that for the purposes of typological comparison we do not need identity of strictly linguistic meanings. All we need is some level of meaning at which meanings must be commensurable. Thus, it is not necessary for both Polish and Igbo to have the same semantic category of "possession" in order to be able to compare possessive constructions in these two languages. All we need is the possibility to translate low-level notions like 'my father's house' into both languages. We can define the semantic relation between 'I' and 'father', and between 'father' and 'house' as "possessive". It seems that in almost all languages that have been described so far, the grammatical form of the equivalent of 'my father's house' is used for a broader range of relations (e.g. also for 'my mother's bike', 'my sister's husband', 'my husband's hair', 'my daughter's teacher', or even 'my life's purpose', 'the planet's orbit', 'the guest's seat', etc.). Languages differ considerably in this regard, making comparison more complex, but as long as there is translatability of simple concepts, comparison should be possible. 3

Conclusions
Let me summarize the main claims of this paper: Working without the assumption that pre-established categories exist implies a fairly substantial reorientation of the work of both descriptive and typological linguistics. In descriptive linguistics, this reorientation has a long history going back at least to Boas and de Saussure, but despite their influence, the idea that structural categories of language form are given to us in advance has kept reasserting itself. By shedding the assumption of a priori categories descriptive linguists can avoid getting into category-assignment controversies and can concentrate on refining their descriptions. Typologists must realize that they cannot base their comparisons on formal categories, and need to resort to semantic-pragmatic or phonetic substance as a foundation of their classifications and generalizations.
It seems to me that by and large, common practice among descriptive linguists who are likely to be readers of this journal reflects the postulated theoretical stance of this paper. That is, linguists whose business is to describe smaller, less well-known languages that are not taught widely in schools or universities are generally careful to define the categories they use and do not assume that they can take them off some shelf. By contrast, linguists who engage in descriptive work on the "big rich languages" (some of which even have specialized journals devoted to their study) often seem to get embroiled in category-assignment controversies, and they seem to find the notion of innate universal categories much less problematic than descriptive linguists of smaller languages. (Is this perhaps because the most popular assumptions about what the universal categories are are primarily informed by the bigger languages?) It also seems to me that by and large, common practice among typologists who are likely to be readers of this journal reflects my postulates, so again, I am perhaps telling the news to the wrong audience. However, my personal experience tells me that while typologists are generally doing the right things, they are not necessarily aware of the generality of the claim that categories are language-particular. Especially with regard to categories such as affix, clitic, word, and clause, even functionally oriented typologists who would never post a universal subject category seem to persist in the assumption of universality. And there are some typologists who emphasize the non-universality of some categories, only to replace them by a set of other universal categories (e.g. Van Valin 1977, who rejects the universality of subjects but posits the universal syntactic categories Actor and Undergoer, which are only partially semantically defined). Finally, generative typologists normally define their universal categories in semantic terms, like functionally oriented typologists, but they are not explicit about it, and sometimes they even claim, contrary to fact, that their definitions are based on formal properties of language (Newmeyer 1998:337-343).
Thus, I hope that this paper will after all make a contribution to helping linguists to meet what I see as a major challenge (still, a century after de Saussure and Boas): Recognizing the full implications, for both language description and typology, of the realization that structural categories of readily translated from one language to the next. The typological comparison of such particles is a much more formidable challenge than the comparison of simple grammar and lexicon.
language are language-particular, and we cannot take pre-established, a priori categories for granted.