The heterogeneity of European Higher Education Institutions: a configurational approach

ABSTRACT Classifications are a basic tool for research, which allow summarizing the diversity of objects in a number of categories that fits the cognitive abilities of the human mind. Their relevance for higher education is emphasized by the differentiation of institutional profiles. Yet, unlike in the US, there is currently no classification of European Higher Education Institutions (HEIs). This paper fills this gap by developing a classification of European HEIs, which focuses on differences in activity profiles and subject scope. To this aim, it uses data from an enriched version of the European Tertiary Education Register on a sample of more than 2000 HEIs in a large number of European countries. The classification comprises six classes that occupy distinct positions in a configuration space defined by two dimensions, i.e. research vs. educational orientation and subject specialization. Ex-post analysis shows that classes are identifiable and can be attributed meaningful labels; the class of research universities comprises most European HEIs competing in international rankings, while a class of generalist HEIs with lower research orientation that cuts across the traditional distinction between universities and Universities of Applied Sciences can be distinguished. Furthermore, three classes of specialist HEIs can be identified. The classification provides a meaningful representation of European higher education that is more fine-grained than the distinction between university and non-university sectors while remaining parsimonious. We, finally show how national categories map to the classification, displaying its potential to compare differences in national institutional settings across Europe.


Introduction
Higher Education Institutions (HEIs) have been described 'as different as chalk and cheese' (Huisman 2000), in terms of institutional mandate and mission, mix of activities (Huisman et al. 2015) and identity (Paradeise and Thoenig 2013). On the one hand, blurring borders between the so-called university and non-university sectors Morphew and Huisman 2002) has made a representation of the structure of higher education increasingly difficult (Kyvik 2004). On the other hand, international rankings have promoted a unidimensional view along a vertical reputational hierarchy (Sauder and Espeland 2009), spreading the norm of the research-oriented university as the reference for policymakers and stakeholders (Lepori, Geuna, and Mira 2019).
Developing classifications of HEIs has been proposed as a way to represent HEI heterogeneity in a parsimonious way (Brint 2013) without reducing it to a single number (Moodie 2009). Unlike rankings, classifications focus on 'essential' differences between HEI groups by recognizing the diversity of institutional profiles (Jalote, Jain, and Sopory 2020) and are, therefore, a prerequisite for sensible rankings (Marginson and Van der Wende 2007). Classifications have also been used to control for (unobserved) heterogeneity in other analyses, such as tuition fees and faculty salaries (Shin and Toutkoushian 2011). By providing recognition to different institutional profiles, classifications are also important policy and communication tools (Borden and McCormick 2020): and build the ground for the development of tailored policies, for example in performance-based funding, as dimensions of performance are systematically associated with institutional types (Ramsden 1999).
While in the US the Carnegie classification established a blueprint for a widely accepted classification (Brint 2013), HEI classifications in Europe have largely focused on institutional categories such as universities and colleges (Borden and McCormick 2020), which, however, even if similar labels are used, are not comparable across countries . For instance, former UK polytechnics are now recognized as universities (Fulton 1996), while their counterparts in Germany, Netherlands and Finland are still distinct. Furthermore, blurring of borders between categories has taken place, with non-university HEIs in some countries developing a sizeable research activity (Switzerland) and even acquiring the right to award the PhD (Norway, Ireland; Kyvik and Lepori 2010). Therefore, the value of classifications based (only) on such categories is increasingly questionable. Finally, existing classifications focus on the research vs. education distinction, while disregarding the so-called third mission (Gulbrandsen and Slipersaeter 2007) and differentiation along with subject profiles between generalist and specialist HEIs (Van Vught 2009).
In this context, the goal of this paper is to develop a classification of European HEIs, which focuses on differences in activity profiles (education vs. research vs. third mission; Huisman et al. 2015) and subject scope (Van Vught 2009). We test empirically the classification from the data by using latent class clustering (Muthén 2004;Vermunt and Magidson 2002) applied to more than 2000 HEIs in a large number of European countries derived from an enriched version of the European Tertiary Education Register (ETER; Lepori et al. 2015a).
Further, to understand its added value, we analyze the classification from three perspectives. First, we describe classes in terms of their characteristics, but also labelling (as expressed by their names). Second, we look to how classes are positioned in terms of activity profile and subject composition and how this differs from the distinction between research and educational HEIs, as well as from the image provided by international rankings. Third, we analyze how national categories map to the European-level classification and how this reveals structural differences between national systems.
Finally, we highlight the advances gained by the classification, how it could be mobilized for the analysis of higher education systems in Europe and, finally, we discuss refinements and extensions.

Configurational approaches and classifications
Classifications are a basic tool for research and decision-making (Moodie 2009), which allow summarizing the diversity of objects in a number of categories (indicatively between five and ten) that fits the cognitive abilities of the human mind (Brint 2013).
Theoretically, classifications are grounded on the assumption that organizational characteristics are not randomly distributed, but there are interdependencies that generate a small number of configurations (Fiss 2007). Hence, the relationship between observable variables reveals underlying structures within organizations and in their interaction with the environment.
The literature has suggested different mechanisms leading to the selection of configurations (Ruef and Nag 2015), distinguishing between internal processes, such as organizational forms (Hannan and Freeman 1989) and identities (Albert and Whetten 1985), and interdependencies between the organization and the environment, such as resource niches (McPherson 1983). More recent approaches emphasize the importance of evaluation and categorization by audiences (Glynn and Navis 2013) and how classifications are powerful instruments to shape markets and define competitive relationships (Cattani, Porac, and Thomas 2017).
Two approaches to classifying can be distinguished (Meyer, Tsui, and Hinings 1993). On the one hand, 'ideal types' are derived a priori from conceptual reasoning as 'a unique combination of the organizational attributes that are believed to determine the relevant outcomes' (Doty, Glick, and Huber 1993). On the other hand, taxonomies are constructed inductively from data by grouping observations through statistical methods (Drazin and Van de Ven 1985). In practice, the two approaches need to be combined: a parsimonious classification requires a priori choices on the distinctive dimensions, which also match audiences' perceptions; however, its validity also depends on some correspondence with the observed data (Brint 2013).

Classifications in higher education and their functions
In higher education, the emergence of classifications is related to the expansion and differentiation in the post-war period with the growth in the number of students and the integration within higher education of institutions, which did not correspond to the traditional university model (Brint 2013), such as vocational institutions (Kyvik 2004;Moodie 2009).
This need was prominent in the US given the size of the system and the lower importance of legally defined categories. Based on the experience of the California Master Plan, the Carnegie classification was issued in 1972. To create order in the US higher education, it made strong a priori choices by relying on the level of degrees awarded as the main classificatory criterion (McCormick and Zhao 2005), while purposefully disregarding other criteria such as institutional control (public vs. private).
The simplicity of the Carnegie made its lasting success, but also triggered research on more finegrained classifications, for example concerning doctoral universities (Harmon et al. 2019). Some empirical classifications have also been proposed. Brint, Riddle, and Hanneman 2006 focused on resource dependency, status and adaptive capacity to classify 4-year colleges and doctoral universities using cluster analysis; their classification had a good fit with aspirations sets of HEI presidents, but a limited fit with (politically driven) classifications such as the Carnegie. Similarly, Ruef and Nag 2015 focused on institutional control, the HEIs' resource niche and institutional mission; their classification provides for a fine-grained distinction of colleges, based on subjects taught, while doctoral universities are grouped together. The take-home of these studies is the need of combining a priori approaches (to develop parsimonious classifications) with empirical data (to avoid arbitrary choices).
These works triggered the development of classifications in other world regions (see Moodie 2009), such as Australia (Ramsden 1999), Korea (Shin 2009) and India (Jalote, Jain, and Sopory 2020). They illustrate the value of classifications to analyze institutional performance, to provide meaningful representations of large systems and to identify 'research universities' competing in international rankings.
While some classifications have been proposed for individual countries, such as the UK (Tight 1996), there is little tradition of European-wide classifications. International comparisons focused on structural differences between national systems (Kyvik 2004), as well as on legally defined categories, such as the so-called binary divide between universities and professionally oriented HEIs (Huisman and Kaiser 2000). However, with European integration processes, HEIs are increasingly competing for students and resources in a European space (Maassen and Olsen 2007), in which national categories are less relevant, while research-based rankings do not reflect the diversity of European higher education (Marginson and Van der Wende 2007;Hazelkorn 2009).
The first attempt at classifying European HEIs was based on a conceptualization as multi-input and multi-output organizations  and, therefore, on indicators measuring inputs (staff and revenues), and outputs in education (degrees), research (publications, PhD degrees) and third mission (patents). Using these data, Schubert et al. 2014 identified two clusters, composed by research and educational institutions (universities and colleges), respectively by smaller, mostly private, teaching-only institutions. Using ETER data, Lepori, Geuna, and Veglio 2017 also proposed a taxonomy based on the mix of activities (research vs. education) and subject scope (generalists vs. specialists). A multi-dimensional framework for profiling HEIs was also developed by the U-Multirank project around five dimensions (educational profile; student profile; research involvement; international orientation; regional engagement; Van Vught 2009). While no stable classification has emerged so far, these attempts converge on some relevant dimensions to classify European HEIs (Huisman et al. 2015).

Designing a classification of European HEIs
The literature review suggests a design process involving different steps and interactions ( Figure 1).
The first step is understanding the processes, which ground the selection of a small number of configurations. Relying on the literature on organizational forms (Ruef 2000), we consider categorization by audiences as the core selection process (Hannan, Polos, and Carroll 2007), which is associated with cognitive limitations and the need to identify 'distinctive' categories (Glynn and Navis 2013). The recognition of legitimate categories is a powerful force to shape organizations in a field through isomorphic processes (DiMaggio and Powell 1983) and the willingness of audiences to provide resources (Cattani, Porac, and Thomas 2017) and, therefore, has practical effects on activities and resources as well.
Accordingly, the second step is the identification of the relevant distinctions as perceived by audiences, such as policy-makers, students, and scholars in the field. In that respect, if the goal is to provide a broad classification of HEIs, identified as institutions delivering at least a bachelor degree, 1 the literature suggests two dimensions around which to construct a classification (Van Vught 2009;Huisman et al. 2015).
The first dimension refers to the distinctive activity for which the HEI is recognized. While most HEIs are multifunction , some HEIs are recognized 'research universities' (Mohrman, Ma, and Baker 2008), others as 'educational institutions' and others as focused on third mission and societal engagement (the 'entrepreneurial university'; Etzkowitz 2004). Membership to these categories might be associated with institutional position (such as the right of awarding the PhD), labelling ('excellence university'), research reputation as expressed in international rankings (Hazelkorn 2009). The divide between research and educational HEIs is also at the core of European higher education, but lacking a clear delineation, it has been largely equated with a distinction between 'universities' and 'colleges' that does not any more fit empirical reality Huisman and Kaiser 2000).
The second dimension refers to the subject domains covered by HEIs' activities (Clark 1995). Empirical studies show that almost all HEI characteristics are dependent on their subject composition, including costs per student (Johnes, Johnes, and Thanassoulis 2008), efficiency (Sarrico et al. 2009) and scientific production (Piro, Aksnes, and Rørstad 2013). Subject composition is also relevant in terms of identity: European higher education has a long tradition of HEIs identified by their subject, such as Technical Universities, Art Schools, Business Schools (Lepori, Baschung, and Probst 2010). The third step is the translation of these distinctions in terms of HEIs' observable characteristics, which are considered as markers of deeper cognitive dimensions. While coherence with the theoretical framework is important, data should also be available for most of the HEIs to be classified, the characteristic should be measurable and, finally, continuous measures are preferred as they allow observing partial membership to classes (Ruef and Nag 2015).
The fourth step is a statistical analysis to identify classes as characterized by similar observed characteristics and to attribute HEIs to classes.
The fifth step is the interpretation of the empirically derived classes as distinct and identifiable in a meaningful way. To this aim, we resort to statistical analysis to identify distinctive characteristics of classes (for example, through discriminant analysis) and we look to the HEIs' names and institutional categories as they are informative on how HEIs are recognized. This step might reveal that some classes are not well defined or cases not correctly classified, triggering refinement processes in terms of indicators and model selection until the result is deemed satisfactory.

Data
The analysis is based on a version of the European Tertiary Education Register (ETER) provided by the RISIS project, which has been enriched with data on scientific publications from the Web of Science version at the University of Leiden, on European projects from the EUPRO database at the Austrian Institute of Technology and on patents from the PATSTAT version at IFRIS in Paris. The 2014 edition includes 2830 HEIs in 37 European countries and provides an extensive coverage of HEIs delivering at least a bachelor degree (Lepori et al. 2015a). Due to missing data, our final sample includes 2034 observations in 28 European countries. Missing cases are mostly by country and, therefore, do not affect the coverage of national systems with the exception of France, where only HEIs under the responsibility of the education ministry are covered.
While ETER has limitations in terms of the type of data available and of comparability, it still represents a unique source on European HEIs by the breadth of the coverage and the effort to achieve standardization of data ). It has been extensively used for studies of European higher education (see, for example, Seeber, Meoli, and Cattaneo 2020). Most ETER variables also follow international statistical definitions (UOE 2013). In that respect, it responds best to the needs of a broad classification of European HEIs, which compares with some US classifications such as the Carnegie (Lepori, Geuna, and Mira 2019). We further discuss on gaps and how they might be filled in the conclusion section.

Variables
The selection of variables draws on the identified dimensions, as well as on the indicators proposed by the literature (see particularly Huisman et al. 2015) and adopted in previous classifications. At the same time, we take into account issues of data availability and measurability.
First, we select indicators for the three distinctive activities, i.e. research, education and third mission. We resort to intensity indicators in order to avoid a too strong dependency on organizational size.
As of research, we include three indicators, i.e. the number of scientific publications in the Web of Science, (based on the definitions adopted for the Leiden ranking), the number of PhD students and the number of European framework program projects, all normalized by academic staff. The first indicator closely corresponds to international visibility in international rankings, the second provides for a broader measure of research activity (particularly relevant in social sciences and humanities), while European projects also include applied research activities (Lepori et al. 2015b).
As of education, we resort to the number of students at the diploma, bachelor and master level (levels 5-7 of the International Standard Classification of Educational Degrees), normalized by academic staff. To characterize the educational profile, we include the share of master students among total students, which allows distinguishing HEIs focused on bachelor education (Van Vught 2009).
We use the number of priority patents normalized by academic staff as an indicator of thirdmission activities. Admittedly, this is a narrow and technology-oriented measure; unfortunately, other measures, such as private funds acquired, are not available for most of the sample. Data refer to counts for the period 2010-2014 to reduce volatility.
Second, we include indicators to characterize the subject domains covered by HEIs, based on the breakdown of student numbers by the EUROSTAT Fields of Education and Training (ISCED-F 2013). To distinguish between generalist and specialist HEIs, we compute the concentration of students by domain, while we include indicators on the relative specialization in social sciences and humanities, respectively in natural sciences and engineering.
The scale and scope of HEIs' activities is strongly influenced by size and we expect different organizational structures and strategies for large vs. small organizations (Koshal and Koshal 1999). Furthermore, size is associated with reputation and market position (Lepori, Geuna, and Mira 2019) and with legitimacy and willingness of audiences to provide resources. Hence, we include the number of academic staff in Full-Time Equivalents as a measure of organizational size; it includes permanent staff such as professors, as well as researchers and doctoral students to the extent they are hired by the HEI. Staff is a more comparable measure of size than revenues and has a better availability in ETER. 2 Finally, we include two variables characterizing the institutional position of HEIs (Ruef and Nag 2015): institutional control (public vs. private) 3 and the research mandate, measured through the right of awarding the PhD (Table 1).
In the ex-post analysis, we use revenues per student (in euros Purchasing Power Parities) to characterize HEI's position in the resource space (Jongbloed and Lepori 2015). For interpreting classes, we further mobilize descriptive information provided by ETER, specifically the HEI name in

Subject scope Subject concentration
The index is computed as the sum of the squares of the share of bachelor and master students in each of the ten subject fields of educational statistics (Herfindahl concentration index). It ranges between 1 (all students in a single field) and 0.1 (students equally distributed between fields). Relative specialization in social sciences and humanities Share of bachelor and master students in the corresponding fields normalized by the average share in the whole sample. Relative specialization in natural sciences and engineering Share of bachelor and master students in the corresponding fields normalized by the average share in the whole sample.

Resources
Academic staff in Full-Time Equivalent Educational and research personnel, including PhD students.

Institutional variables
Institutional control Dummy, 0 if the institution is under public control or is mostly financed by the state, 1 if it is private and mostly funded by private sources.

Research mandate
Dummy, 1 if the HEI has the legal right to award the PhD, 0 otherwise.
All variables refer to the year 2014 when not otherwise remarked.
the national language and the (official) English translation, the foundation year, the national institutional category.

Methods
To attribute HEIs to classes, we use latent class clustering (Muthén 2004). The model fits the distribution of a set of observed variables conditional to the observations belonging to non-observed (latent) classes; compared with other clustering methods, latent-class clustering is more flexible, as it can incorporate assumptions on distributions (Vermunt and Magidson 2002). More precisely, given a sample of HEIs, the model represents the observed characteristics as a mixture of distributions conditional to the probability of belonging to a latent class: where y is the set of observed variables. Our model assumes Gaussian distributions (contingent to HEIs belonging to classes); accordingly, we log transform academic staff and square root transform all intensity variables to reduce skewedness. The model parameters are class-specific means and variances of distributions, as well as (class-specific) covariances between variables.
The probability of belonging to a class is contingent on two exogenous variables where u i is a normally distributed random variable. Accordingly, it is assumed these variables affect the probability of an HEI belonging to a class.
The model computes the distribution functions for each variable and the posterior probability for each HEI to belong to a class and searches iteratively for the solution, which best represents the observed data, maximizing the model fit. Given that the likelihood function might have local maxima, models have been run with a set of 100 initial conditions to ensure robustness. To interpret results and to analyze characteristics of classes, each case is finally assigned to the class with the highest probability.

Model selection and diagnostics
Model selection is a core issue in Latent Class Models. While a number of fit statistics are available (Nylund, Asparouhov, and Muthén 2007), the final model selection must also be based on ex-post diagnostics and on the substantive interpretability of the classes.
We followed the strategy suggested by Masyn (2013): first, testing a broad range of models and selecting the best candidates based on fit statistics; second, comparing the selected models in terms of fit and of interpretability of classes. Third, analyzing the final model in terms of the homogeneity of classes and their separation, as well as of the substantive interpretation of classes.
Our baseline model includes a class-varying diagonal covariance matrix, as we expect that variances differ by class and indicators; this model requires that the observed variables are conditionally independent to the observations belonging to a class and is a reasonable compromise as of complexity. The analysis shows that the model fit improves up to 6 classes, then further slightly improves for the 8 and 9 class models. 4 As alternative models, we allow free covariance for the variables that are highly correlated, i.e. the three research variables and the three subject mix variables. 5 However, the model fit does not improve as compared with diagonal covariate structure, i.e. the additional complexity does not pay off in terms of fitting the data.
The two most promising model candidates, i.e. the 6 and 8 class models, have been replicated with a set of different starting conditions. This comparison showed that the (best) 8 class model provides a more consistent delineation with two exceptions: first, one class, including only 32 HEIs, is very heterogeneous, including some distance universities, graduate schools and specialized HEIs. We interpreted this class as a residual of HEIs that cannot be classified and we dropped it from the analysis. Second, two classes display large overlap, accordingly, we decided to merge them into a single class, leading to a final classification comprising six classes.
As a measure of classification uncertainty, we compute the average class posterior probabilityp ik over all cases classified in class k (Masyn 2013): In the final model, this statistics is above 0.9 for all classes, while for only 25 HEIs the highest class probability is below 0.70. Accordingly, almost all HEIs are classified unambiguously in a single class.
To assess class homogeneity and distinctiveness, we also compute averages and standard deviation of characterizing variables by classes and we use boxplots to investigate the overlap between classes.

Describing classes and their labelling
As Table 2 shows, the final classification is well balanced in terms of the number of HEIs by class and there are systematic differences in the characteristics of each class. Three classes are essentially composed of PhD awarding HEIs and two by non-PhD awarding, while class 4 is mixed. As of research, classes 1 and 2 display high scores for all indicators, while class 6 has no research outputthe remaining classes displaying different orientations in terms of research and transfer. Consistently with our framework, the subject scope is the second dimension of distinction: two classes (1 & 4) are composed by generalists HEIs, two (5 & 6) are characterized by orientation towards social sciences and humanities, while two towards natural and technical sciences (2 & 3).
Class 1 (Research universities) is composed of PhD-awarding HEIs that have a large research output and cover most subject fields. This class includes the European top-ranked international universities, such as Cambridge and Oxford, as well as the largest universities in terms of enrolments (Rome, Madrid) and middle-size universities with a sizeable research output (Basel, Twente). This class enrolls 4 out of 10 bachelor and master students in European higher education and could, therefore, be best described as HEIs corresponding to the Humboldtian model of research and educational universities, similarly as their US counterparts; the high research intensity is, however, the distinctive characteristics of this class; with this understanding, we keep the research university label, which is well established in the literature (Clark 1995). In terms of names, HEIs are consistently named as 'university' without any qualification except the location; only 11 HEIs include 'science' or 'technology' in their name, while three institutes of technology in Ireland (Cork, Waterford and Dublin, amalgamated in TU Dublin in 2019) also belong to this class. Therefore, this class can be consistently characterized with the label 'research university' in terms of characteristics and public recognition.
Class 2 (Science and technology-oriented HEIs) is composed of PhD-awarding HEIs oriented towards natural and technical sciences, such as the Polytechnic of Milan, and to a lesser extent medical sciences, such as Karolinska. These HEIs have a similar research intensity as class 1, but a much larger patent intensity, as associated with their subject specialization. While originally created as technical universities (as indicated by their names), they expanded their subject scope to most natural sciences and, in some cases, also to (bio)medicine, like ETH Zurich and TU Munich. The combination of higher research intensity and subject specialization is associated with the highest level of resources per student. Over 80% of HEIs in this class bear a name associated Median and half of the interquartile range (IQR/2). IQR is the distance between 1st and 3rd quartile and is a measure of the variation of characteristics around the median.

STUDIES IN HIGHER EDUCATION
with their specialization (mostly 'technology'), about 60% are labelled as 'universities', while others bear names such as Institute of Technology, Polytechnic, etc. Class 3 (Applied sciences HEIs) is composed of HEIs without the right to award a PhD with an orientation towards natural and technical sciences, as reflected by the share of students in these domains. These HEIs do not bear the word 'university' in their national language name. The main groups are German (69 HEIs), Austrian (8) and Swiss Fachhochschulen (7), as well as Portuguese Polytechnics (10). While research output is low, this class is characterized by a sizeable patenting activity, suggesting HEIs are engaged in applied R&D. It, therefore, reflects a country-specific orientation of the non-university sector towards applied sciences (Jongbloed, Enders, and Salerno 2008).
Class 4 (Generalist HEIs) is composed of middle-size HEIs, which are multidisciplinary, but enrolling most of the students in social sciences and humanities. Research intensity is lower than in classes 1 & 2 and highly variable. 2/3 of these HEIs bear in some form the name 'university', while very few are qualified by their subject. Within this class, we identify regional universities (Messina and Macerata in Italy, Klagenfurt in Austria), SSH oriented universities in large agglomerations such as Paris (Pantheon-Assas University), UK 'new universities' and HEIs recently acquiring university status in Nordic countries (Mid Sweden University), as well as Universities of Applied Sciences in Finland, Germany and Norway. As compared with groups 1 & 2, these can be characterized as educational providers with some research activity, largely as newcomers in terms of age (three-quarters have been founded after WWII) and, frequently, peripheral in geographical terms. This class cuts across the traditional distinction between universities (PhD awarding) and Universities of Applied Sciences and, therefore, identifies the specific cases where boundaries between these two institutional types are becoming blurred.
Class 5 (SSH specialized HEIs) is composed of small and specialized institutions in social sciences and humanities, such as academies of arts and music, with a high intensity in PhD education. Only about one-third is named as 'university', other frequent names being 'academy' and 'high school'; the most distinguishing feature of their name is the highly specific subject label, such as 'music', 'arts', 'theology, 'education'. These are niche players, frequently with a strong reputation in their domain, including some of the oldest schools of arts in Europe.
Class 6 (Educational HEIs) includes non-PhD awarding institutions, whose distinguishing feature is to have no research and technology output. Half of HEIs have been founded after 1995 and half of them are private. Many HEIs are specialized, including teacher education institutions, music colleges, colleges of economics and of public administration, but this group also includes some larger multidisciplinary HEIs, such as the Dutch Hogeschoolen. We characterize them as educational providers, most of them specialized in specific niches.
All in all, this analysis shows that classes are different in terms of their characteristics and are also identified by the specificity of their names, indicating that they are also recognized by audiences as distinct.

European higher education as represented by classes
To analyze the positioning of classes, we resort to discriminant analysis, a statistical technique to identify which combinations of dimensions distinguish groups of cases (McLachlan 2004). Consistently with our assumptions, the analysis identifies two main factors, whose loadings allow for a simple interpretation: factor 1 (75% of the variance), is associated with research intensity (as well as HEI size), whereas factor 2 (14% of the variance) is associated with the orientation towards natural sciences and technological production vs towards social sciences and humanities. Figure 2 shows that, while there is some overlap, 6 classes occupy distinct positions in the configuration space. First, two groups of 'research universities', respectively of 'educational HEIs' can be singled out. However, an intermediate group of HEIs oriented towards education, but having some research activity, can also be identified, which represents the main area of blurring between 'universities' (mostly the new ones founded after WWII) and HEIs founded with an educational mandate that have developed some research activity ('Universities of Applied sciences'). With more than one-quarter of the students at the bachelor and master level, this class is a significant presence in European higher education.
Second, the classification highlights the importance of the second dimension, i.e. the subject domain covered by the HEI. We have identified three groups of specialized HEIs: science and technology-oriented research-intensive HEIs (a broader scope than technical sciences), applied sciences HEIs and, finally, specialized schools in arts, music and theology, which award the PhD and, frequently, enjoy a high status in their field.
As expected, research universities and science and technology-oriented HEIs account for the lion share of scientific publications; science and technology-oriented HEIs alone account for half of all patents filed by HEIs in Europe. Accounting for more than 40% of the students enrolled, the role of research universities is highly relevant also in education (Figure 3).  However, 40% of master students and more than half of the bachelor students are enrolled in classes, which are not characterized by their research orientation. Accordingly, the statement that there is no differentiation between research and educational-oriented HEIs  is an outcome of considering HEIs labelled as universities as a synonym of research universities; the more selective view produced by our classification shows that, also in Europe, bachelor and master education takes place largely outside research universities.
In terms of resources, research universities indeed receive more funding per student, but, in the aggregate, the distribution broadly follows student numbersagainst enrolling 41% of bachelor and 47% of master students, research universities account for 51% of academic staff and 54% of revenues. This confirms that the main difference with the US is the lack of concentration of resources in research universities independent from students (Lepori, Geuna, and Mira 2019).
Finally, it is enlightening to compare with the US Carnegie classification and with the ARWU Shanghai ranking. Most HEIs in the 'research intensive' and 'science and technology-oriented' classes satisfy the Carnegie criterion for 'doctoral universities', i.e. at least 20 PhD degrees in the reference year (Table 3), but the same applies also for half of the generalist HEIs. Indeed, the Carnegie delineation of doctoral universities is broad and attempts have been made to single out a smaller class of universities with high research intensity (Harmon et al. 2019).
Our classification, therefore, responds to the quest for a more restrictive delineation of research universities, while still including most HEIs competing in international rankings. As shown by Table 3, the research universities class includes 140 out of 183 HEIs in the top-500 of the ARWU ranking and 22 out of 29 in the top-100. Noticeably, few science and technology-oriented HEIs, despite their large research output, are present in the ranking, most likely because of the absence of (bio)medical sciences. 7

Classes and national categories
A rationale for a data-driven classification of European HEIs is the blurring between categories such as university and non-university sectors  and the lack of comparability of national categories due to different system structures (Kyvik 2004). Accordingly, it is enlightening to compare the classification with the national HEI categories as reported in ETER.
The first level of comparison is provided by the distinction between 'universities', 'universities of applied sciences' (UAS) and 'other HEIs' provided by ETER and based on the traditional notion of a binary divide in higher education (Huisman and Kaiser 2000), as recognized by national legally defined categories.
Our classification confirms that this distinction has become questionable; while four classes still correspond to the divide between 'university' and 'non-university' sectors, two classes are mixing the two sectors (see Table 4). Specifically, the generalist HEI class cuts across the typological distinction between universities and Universities of Applied Sciences. A country-level analysis also reveals different levels of blurring: in Norway, more than half of the UAS (University Colleges) are classified among the generalist HEIs, while in the Netherlands UAS (Hogescholen) are mostly among educational HEIs. By providing a common baseline, the classification allows, therefore, comparing the national systems in terms of the extent of blurring between sectors. A more fine-grained analysis based on national categories, as provided by education ministries, reveals other interesting patterns (see Table 5). First, research universities (class 1) are consistently categorized as universities in all countries (both in national language and in English). However, the university category also characterizes generalist HEIs (class 4) in Italy, Spain and the UK, while in Germany the categorization is mixed (half universities, half Fachhochschulen), while in Finland and in Sweden these HEIs belong to other categories (Ammattikorkeakoulu, respectively Högskola). HEIs categorized at national level as universities are present also in other classes, particularly the specialist ones. In other words, some countries, such as Spain and the UK, have a broader extension of the 'university' category than others, such as Germany.
Second, there is variation in national categorization of science and technology-oriented HEIs, some of them being categorized as 'universities', others as 'technical universities', 'polytechnics' (Italy, Switzerland), as well as 'engineering schools' like in France. The extent to which science and technology-oriented HEIs are considered as universities or are set in a specific category, therefore, differs. The same mix of national categories is found in the specialized HEIs' class, which are categorized as universities in Bulgaria, art schools (Kunstfachhochschulen) in Germany and University colleges in the UK.
Third, to add to the terminological ambiguity, the English translation frequently departs from national categories, in most cases by adding the word 'university' to the category; this applies, for example, to pedagogical schools in Germany and in Switzerland, which are named 'Universities of Teacher Education' in English, but are consistently classified as educational HEIs in our classifications.
In other words, while national classifications are relevant to understand the history of national higher education systems and the struggles for recognition, they cannot be hardly used to construct a coherent classification at the European level given their incomparability and ambiguity (particularly between national language and English categories).

Conclusion
In this paper, we have developed a comprehensive and cross-country classification of European HEIs, which is based on the actual HEI characteristics, rather than on distinctions such as between 'universities' and 'colleges' or on national categories, which are hardly comparable. The classification follows the insight of the Carnegie for a priori choices on the relevant distinctions (McCormick and Zhao 2005), but validates them using empirical data (Brint 2013). Moreover, the framework takes into account the importance of the distinction between generalist and specialist HEIs, which has a long tradition in the European context. With six classes, the classification provides for a reasonable balance between parsimony and detail. Ex-post analysis shows that classes can be described in terms of their characteristics, but also of the names borne by HEIs, and can be labelled consistently. Hence, the classification responds to a first important criterion, i.e. to be narrable in a meaningful way. Moreover, the classification provides for a delineation of 'research universities', which is more selective than the one by the Carnegie, while still comprising most European HEIs that feature in international rankings.
Further, the empirical analysis shows that the classification represents European higher education in terms of two distinctions, i.e. the one between research and education, respectively HEIs' subject composition. Our results move beyond the traditional divide between (researchoriented) 'universities' and educational HEIs to a more fine-grained understanding. A group of research-intensive universities, largely present in international rankings, can still be identified alongside a large number of educational HEIs with almost no research activity. However, we were also able to identify a large class of generalist HEIs with some research activity that cuts across the traditional distinction between universities and Universities of Applied Sciences. Its significance is underlined by the fact that this class enrolls more than one-quarter of all students at the bachelor and master levels. While moving beyond binary distinctions, these results avoid the notion that everything has become blurred, by identifying a specific area of overlap between research and educational HEIs. As of the subject composition, the classification singles out three groups of specialist HEIs with different characteristics in terms of their subject, but also of activities and, therefore, allows for a more fine-grained understanding of the role of specialized HEIs in European higher education (and across countries).
Finally, we have shown that the classification relates in a meaningful, but complex way with nationally defined categories. Some cross-country regularities emerge, particularly for what concerns the categorization of research universities, but also different extensions of the 'university' category beyond research universities and to specialist HEIs. Therefore, contrasting national categorizations with the classification enlightens national specificities.
These remarks highlight the potential for future uses of the classification. On the one hand, it would allow for comparisons of more homogeneous groups of HEIs, either in descriptive analysis or as dummy variables in statistical analyses of HEI characteristics and efficiency; this is a superior approach than restricting the analysis to HEIs labelled as 'universities' or awarding the PhD. On the other hand, it becomes possible to compare national systems not only on legally defined categories, but also comparing groups of HEIs that are similar in their characteristics; differences in categorization would then reveal variation in higher education structures across countries. Finally, providing an understandable representation of European higher education that recognizes the diversity of functions and specializations is an important communicative and political task to which such a classification might contribute.
Our results also highlight a number of directions for future work. On the one hand, there are notable gaps in the indicators at the European level, the most important ones being a broader coverage of the third mission (beyond technology production) and the inclusion of internationalization indicators (as of research and education). As the example of the Carnegie shows, the challenge in this respect will be to add dimensions, while keeping the original simplicity of the classification. On the other hand, since the theoretical grounding of the classification lies in audiences' representations, it would be interesting to collect directly these representations as already done in the US (Brint 2013). Finally, there is room for more fine-grained classification of some classes, particularly research universities and specialists HEIs.