Ethnic discrimination in hiring decisions: a meta-analysis of correspondence tests 1990–2015

ABSTRACT For almost 50 years field experiments have been used to study ethnic and racial discrimination in hiring decisions, consistently reporting high rates of discrimination against minority applicants – including immigrants – irrespective of time, location, or minority groups tested. While Peter A. Riach and Judith Rich [2002. “Field Experiments of Discrimination in the Market Place.” The Economic Journal 112 (483): F480–F518] and Judith Rich [2014. “What Do Field Experiments of Discrimination in Markets Tell Us? A Meta Analysis of Studies Conducted since 2000.” In Discussion Paper Series. Bonn: IZA] provide systematic reviews of existing field experiments, no study has undertaken a meta-analysis to examine the findings in the studies reported. In this article, we present a meta-analysis of 738 correspondence tests in 43 separate studies conducted in OECD countries between 1990 and 2015. In addition to summarising research findings, we focus on groups of specific tests to ascertain the robustness of findings, emphasising differences across countries, gender, and economic contexts. Moreover we examine patterns of discrimination, by drawing on the fact that the groups considered in correspondence tests and the contexts of testing vary to some extent. We focus on first- and second-generation immigrants, differences between specific minority groups, the implementation of EU directives, and the length of job application packs.


Introduction
Whenever members of one minority group are less likely to obtain paid work, or do so under unfavourable conditions, some people are quick to shout 'discrimination'. Social scientists tend to be more cautious and highlight that there are many reasons why one group is more likely to obtain paid work than others apart from discrimination (Pager 2007). To rule out these alternative explanations, field experiments were devised in the UK in the 1960s, allowing researchers to draw inferences about racial discrimination in hiring decisions (Daniel 1968;Jowell and Prescott-Clarke 1970). Fifty years after the first British Race Relations Act of 1965, prohibiting racial discrimination in public places, interest in discrimination and hiring decisions remains high. Indeed, in recent years numerous studies using field experiments have been carried out to test whether discrimination in terms of race, ethnicity, immigration background, or other minority statuses remains a problem (Riach and Rich 2002;Bendick 2007;Pager 2007;Pager and Shepherd 2008;Rich 2010Rich , 2014. Field experiments offer strong evidence of discriminatory behaviour in the labour market, using either in-person audit tests or written correspondence tests. Since discrimination in hiring decisions usually cannot be observed directly, researchers resort to fictitious candidates with equivalent and thus exchangeable qualifications. One employer is presented with two substantially identical job applications. The only difference is the characteristic of interest: the ethnic or racial group of the applicant. This results in controlled experiments on discrimination in hiring decisions in a real world setting. It can plausibly be argued that differences in call-back rates of equally qualified minority and majority candidates can be attributed to discrimination (Midtbøen and Rogstad 2012;Jackson and Cox 2013), especially in the case of correspondence tests where the experimental manipulation can be controlled better.
Studies employing correspondence tests find systematic evidence for discrimination in hiring decisions. At first sight, there are no apparent differences across time, location, and minority groups tested. These findings suggest that while overt racial and ethnic discrimination is no longer practised as much as it was in the pastconsider racial segregation in the USAethnic and racial discrimination remains a common phenomenon, albeit a more subtle and covert one: 'Today, it is harder to assess the degree to which everyday experiences and opportunities may be shaped by ongoing forms of discrimination' (Pager and Shepherd 2008, 6; see also Arrow 1998). In most countries under study there is anti-discrimination legislation in place that prohibits ethnic and racial discrimination in hiring, but field experiments highlight that current legislation seems to be inefficient and that discrimination remains commonplace. Indeed, using field experiments it is possible to enumerate the degree of discrimination members of ethnic and racial minorities face when applying for jobs.
As is common with experiments, however, a single audit or correspondence test is unable to explain why discrimination occurs. To overcome this limitation, studies increasingly resort to finer distinctions of carefully chosen groups, or seek other methods. In this article we draw inferences from various studies by contrasting comparable groups from different correspondence tests in a meta-analysis.

Theory and background
Racial or ethnic discrimination can be defined in various ways, often depending on the research question and scientific tradition of the study. For this meta-analysis, we use the US National Research Council's definition, which focuses on 'differential treatment on the basis of race that disadvantages a racial group and treatment on the basis of inadequately justified factors other than race that disadvantages a racial group (differential effect)' (Blank, Dabady, and Citro 2004, 39, italics in original) thus covering groups such as immigrants. This definition is similar to the one used in the European Union's Directive 2000/43/EC, commonly known as the 'Race Directive', which differentiates between direct and indirect discrimination (Art. 2) and prohibits both forms.
Given that racial and ethnic discrimination are outlawed in many jurisdictions and have thus become hidden, questions of how to measure discrimination have taken centre stage in recent years (Quillian 2006). We refer readers to Veenman (2010) for a thorough review of the approaches used: statistical analysis of observational data, behavioural research, attitudes research, and victim research. The field experiments focused on in this article are a form of behavioural research.
The literature offers different explanations why discrimination occurs in hiring processes. A classic distinction is that between taste-based discrimination and statistical discrimination. Taste-based discrimination describes the situation where the employer has racial or ethnic preferences (Becker 1957). This includes xenophobia and racism, but also personal preferences of other kinds; the employer will discriminate against a group irrespective of other information he or she has about the applicants. Because of racial or ethnic preferences, the employer is willing to pay a higher price to hire a person who matches the desired racial or ethnic profile. Put differently, employers do not act in a purely profit-maximising manner, but 'an avoidance of the psychic cost of contact with the "wrong" race [ … ] takes precedence' (Riach and Rich 1991, 247). Following this logic, employers without racial preferences have a competitive advantage, which, in the long-run, should lead to the elimination of racial discrimination in the market place.
By contrast, statistical discrimination describes the situation where members of a specific group are discriminated against because the employer is lacking information (Phelps 1972;Arrow 1973).
The employer who seeks to maximize expected profit will discriminate against blacks or women if he believes them to be less qualified, reliable, long-term, etc. on the average than whites and men, respectively, and if the cost of gaining information about the individual applicants is excessive. (Phelps 1972, 659) This is a characteristic of the hiring process where the employer will never be able to obtain all the information about the candidate, or obtaining such information is too costly. The employer will thus rely on signals and other cues from the application and CV (Pager 2007). Ethnic minority status may be such a signal that members of a particular group are less skilled or otherwise unsuitedor in some cases more skilled, harder working, and so on. Drawing on stereotypes, hearsay, or previous experience with a small number of group members, the employer discounts the applicant because of his or her ethnicityethnicity acts as a proxy for unobserved information. With more information, the employer would not discriminate against the minority candidate. As a consequence of statistical discrimination, an employer will not always succeed in hiring the most qualified applicants. If hiring decisions are taken on a regular basis the employer may regard statistical discrimination as an acceptable trade-off between the effort to obtain more information about an applicant and the recruitment of a productive employee (Bursell 2007).
Besides these predominant economic theories of discrimination, other explanations analyse discriminatory treatment of minority groups more generally. Many researchers have become more cautious, preferring terms such as 'ethnic penalty' to describe differential treatment on the basis of race and ethnicity simply because the act of discrimination or the intention to discriminate are not observed. For instance, Heath and Cheung (2006) highlight that certain differentials between ethnic minority groups and majority groups in the labour market cannot be explained by age, education, or country of origin. Human capital theory, by contrast, focuses on factors like age, education, work experience, or health. The theory highlights the often lower human capital of members of minority groups compared to their majority competitors to explain their disadvantaged position on the labour market (e.g. Andriessen, Dagevos, and Iedema 2008). It is argued that members of ethnic minority groups on average are less educated, are unfamiliar with host-country institutions, are not fluent in the language, or lack relevant networks for job searching. Differences in economic outcomes persist, however, when human capital differences are controlled for (Blommaert, Coenders, and van Tubergen 2014).
Theories of social dominance highlight that groups are not only distinguished but also ranked according to their social position and negative stereotypes connected with these groups, resulting in status hierarchies. Men tend to be 'ranked' higher than women, and natives are usually ranked higher than immigrants (Andriessen et al. 2010). Closely related to this theory, the notion of ethnic hierarchies is often discussed in the Dutch context, where Moroccans are consistently 'ranked' at the bottom and Surinamese immigrants are regarded more favourably (Andriessen et al. 2012). To some extent, ethnic hierarchies draw on cultural distance, where groups perceived as 'more different' tend to have less status and thus rank lower in the hierarchy. Cultural distance can reflect social distance (Parrillo and Donoghue 2005), but it frequently draws on visible markers like skin colour and dress as signals of cultural distance (Fetzer 2013). Ethnic hierarchies may play a role in taste-based discrimination and statistical discrimination, and they serve to remind us that discrimination in the hiring process is not a binary decision: the hiring decision may be context-dependent and depend on the other applicants for the same job.
The literature further highlights factors like the size and composition of the minority population, the economic situation and outlook, policies, media reporting, as well as attitudes in the population. The way minorities are presented in the media and how they are politicised in public debates is likely to play an important role (Klingeren et al. 2014;van der Brug et al. 2015). The mediatised debate provides and reinforces stereotypes that can be used as shortcuts in statistical discrimination. At the same time, employers gain additional knowledge about different minority groups when immigration is politicisedmaking them less likely to (have to) resort to shortcuts. Taste-based discrimination may also be affected by the public debate and attitudes in the population (Pettigrew and Tropp 2006;Pecoraro and Ruedin 2015). Because of 'in-group loyalty' and 'out-group rejection', it can similarly be expected that applicants from one's in-group are more likely to be invited for a job interview (cf. Ford 2015).

Expectations
Based on the existing theories outlined, we have identified four expectations. Obviously, many other expectations could be stated, given the numerous variables that are potentially related to patterns of discrimination, but in this article we will focus on those related to taste-based and statistical discrimination: E1: According to statistical discrimination theory, employers are expected to react to signals like education completed in the country under study. Similarly, children of immigrantssecond-generation immigrantstend to have more social ties in the country under study. Employers are thus likely to perceive them more positively, with generation serving as a signal for civic integration. It can therefore be expected that discrimination is lower for second-generation immigrants than first-generation immigrants. More generally, the more established an immigrant group is in a country, the more information can be expected to be available, translating into lower rates of statistical discrimination. E2: Taking taste-based discrimination seriously, because of ethnic and status hierarchies, it can be assumed that more distant and visible minority groups are discriminated against more than other groups. Ostensible difference is used as a reason to discriminate, including the degree to which a particular minority group is established in a country. Immigrant groups associated with guest-worker programmes or colonial ties tend to be more established and are expected to face less discrimination than newly arrived groups. E3: Two EU directives adopted in 2000 were designed to reduce discrimination. Irrespective of the effectiveness of the ensuing policies, it can be assumed that awareness of discrimination in hiring and the labour market has increased due to the political and public debates at the time. Hence discrimination is likely to be lower after 2000 than before. E4: Depending on the country, job applications require different details. If statistical discrimination prevails, it can be expected that discrimination is lower in countries where more details are the norm in job applications, like diplomas or transcripts. In these contexts employers have less need to resort to mechanisms that can result in statistical discrimination (Weichselbaumer 2015b). More detailed application packages are widespread in Germanspeaking countries. It can therefore be expected that discrimination rates are lower in German-speaking countries than in other European countries.

Methods and data
Correspondence tests are well suited for identifying discrimination in hiring, especially because they are able to minimise other influences (Bendick and Nunes 2012;Midtbøen and Rogstad 2012;Jackson and Cox 2013). In correspondence tests, researchers apply in writing for actual positions at real companies, and thus capture real hiring decisions. They are much easier to implement than in-person audits, and allow more control over the application process. Correspondence tests can be repeated in relatively great numbersespecially now that electronic applications are commonplaceand enable researchers to apply for a wider variety of jobs with different skill levels. They allow some conclusions about discrimination in the hiring process.
However, there are limits to correspondence tests. First, they usually rely exclusively on the applicant's name to convey information about race or ethnicity: stereotypical ethnic names may lead to different responses than lesser-known names from the same group, some ethnic names may be misattributed to other ethnic groups, and names may have connotations of class or socio-economic status the researcher is unaware of (Bertrand and Mullainathan 2004;Pager 2007). These are confounding effects beyond the control of the researcher. 1 Second, correspondence tests are only suited for occupations where written applications are the norm. This excludes many entry-level and unskilled jobs where applications are typically made in person. Third, correspondence tests can only be used for publicly announced jobs and exclude informally or internally filled vacancies. Fourth, since correspondence tests rely on deception to obtain results, correspondence tests also face ethical challengesin some cases also legal constraints. Today researchers take ethical questions increasingly seriously and obtain serious ethical clearance.
By design, correspondence tests only cover the first step of the hiring process and it is impossible to observe the behaviour of employers as is done during in-person audit studies. The second step is not unimportant, but estimates suggest that the first step may account for as much as 90% of the discrimination levels measured (Riach and Rich 2002).
In this article, we use meta-analysis to summarise existing research in a systematic manner, drawing on the fact that all correspondence tests are conducted in a similar fashion (Weichselbaumer and Winter-Ebmer 2005). Meta-analyses use statistics to combine the reported findings across studies, offering a quantitative means to synthesise research with less reliance on the subjective assessment of the reviewing authors (Wolf 1986;Petticrew and Roberts 2006). We will benefit from the fact that correspondence tests have been carried out for different kinds of groups and subgroups to draw inferences about taste-based and statistical discrimination where possible.
We carried out systematic searches using Web of Knowledge and Google Scholar, limiting the search to ethnic and racial discrimination in hiring and correspondence tests, which includes the discrimination of immigrant groups. We chose not to include inperson audit studies as written correspondence tests have become the dominant method in recent years. We further narrowed down the focus to correspondence studies in countries belonging to the Organisation for Economic Co-operation and Development (OECD) that were conducted between 1990 and 2015 to increase comparability. The following keywords were used: 'discrimination', 'correspondence test', 'ethnic discrimination', 'racial discrimination', 'discrimination in hiring', 'discrimination AND labour market' 'discrimination AND field experiment', and 'discrimination AND employment'. We also relied on the often extensive bibliographies provided in the literature, especially in the systematic reviews conducted by Riach and Rich (2002) and Rich (2014). Furthermore, we carefully checked the bibliographies of every correspondence study and broadened our search from there. We were able to include studies published in English, French, German, and Dutch.
We note that there is no standard for reporting the results of correspondence tests and a wide variety of approaches are found (see supplementary material S4). Many studies report discrimination using relative call-back rates as the sole measure, other studies focus on net discrimination rates. Often only absolute numbers or only percentages are presented; we recalculated the absolute numbers wherever possible because this allows the calculation of corresponding call-back rates and odds ratios, drawing on four categories: 'positive treatment minority', 'negative treatment minority', 'positive treatment majority', and 'negative treatment majority'. The vocabulary here reflects the fact that meta-analyses are more established in the medical sciences (Petticrew and Roberts 2006). Majority applicants constitute our control group, while minority applicants are considered the treatment group. In studies that combine in-person audit tests with correspondence tests, we singled out the results from the written correspondence tests and included them in our database. We are unable to identify a reason as to how a subsequent in-person test could affect the preceding correspondence test. Generally speaking, we note that the level of data provided in the studies is often incomplete, and for that reason we often rely on relative call-back rates to maximise the number of cases considered (see supplementary material S6-S10 for odds ratios).

Data and variables
The present article includes data from 43 studies conducted in 18 countries, looking at over 20 minority groups. 2 In Table 1 each study presents one datapoint. For most analyses, each study can be broken down into several subgroups, namely specific minority or immigrant groups, depending on the level of detail provided in the data included in the articles. We treat Akintola (2011) as two separate studies because it covers both Canada and Sweden. There are in total 738 subgroups, and to some extent each can be treated as an independent experiment, given that hiring decisions were made by different employers and are thus unlikely to influence each other. While a study may aggregate discrimination rates across say Serbian, or Turkish applicants, a subgroup is more specific, like Chinese men applying to be cooks. At the subgroup level, we gain variance in otherwise relatively homogeneous setups. This variance is used as a test of robustness for the overall meta-analysis, but also to test expectations related to the nature of discrimination. The supplementary material includes considerations of publication bias (S12).
The variable of interest in this article is discrimination in hiring decisions. Two measures are available: relative call-back ratios and odds ratios. Relative call-back ratios compare how often a majority applicant is called back for an interview (control) to how often a minority applicant is called back for an interview (treatment). The call-back ratio is available for most subgroups. Odds ratios compare the odds of being invited for a job interview, drawing on a different means to express probabilities. By necessity, we were forced to accept that definitions of race and ethnicity vary across studies. For the comparisons across specific minority groups it was necessary to reclassify some of these groups, like when we included 'Swedes of Middle Eastern origin' in the category 'Arabs and people of Middle Eastern origin'. These coding decisions are apparent in the supplementary material (S2 and S3).

Discrimination across studies
As a first step, a meta-analysis of all studies is presented. Using a random-effects model, the forest plot in Figure 1 presents the odds ratios for the studies for which the data to calculate odds ratios was available on a log scale. With the exception of Bendick et al. (1991) in the USA, 3 who used CVs with enhanced credentials for Latino applicants, but not for Anglo applicants, most studies found significant evidence of discrimination against the minority applicants. Notable are also Akintola (2011) who found only little discrimination against minority applicants in Canada, and Decker et al. (2015) who reported very low rates of discrimination against black minority applicants in their study in the USA. These are among the few studies where the two standard deviations cross the line at 1, indicating that the interpretation of 'no discrimination' cannot be ruled out. Across all studies for which sufficient details are available to calculate odds ratios, the odds ratio is 0.51, indicated by the rhomboid at the bottom of the figure: minority applicants have 49% lower odds to be invited for an interview, compared to the equally qualified majority candidate. Given that each study covers several subgroups, the result of a model on subgroups is of equal interest: the odds ratio in this case is 0.60, around the same order of magnitude (supplementary material S6). In some studies insufficient details are reported to calculate odds ratios, so a comparison of the relative call-back rates is necessary to cover more studies. Figure 2 shows the relative call-back rates reported in the studies. It ranges from Bendick et al. (1991) in the USA to Cédiey and Foroni (2007) in France, where the highest relative call-back rates were measured. The mean relative call-back rate is 1.55 at the study level (indicated with a straight black line in the figure) and 1.75 at the level of subgroups. The median values are 1.44 for studies and 1.49 at the subgroup level. This means that minority applicants have to write around 50% more applications to be invited for a job interview.
When interpreting these numbers, however, it must be borne in mind that the ethnic groups studied in correspondence tests are rarely chosen at random: Often researchers suspect discrimination for specific groups, or they examine the most salient minority groupsusually groups considered 'different' or with historical ties to the country, and not necessarily the largest minority groups in society. This may mean a focus on visible minority groups while ignoring immigrants from other European countries. The reported rates of discrimination may thus overestimate the extent of discrimination.
In a second step, the robustness of the meta-analysis is tested by examining specific subgroups. For instance, comparing European and North American correspondence studies indicates that minority applicants may be facing more discrimination in Europe than in the USA and Canada, as far as it is possible to compare these groups (consider, for example, the tradition of strong anti-discrimination legislation). Discrimination occurs on both sides of the Atlantic, irrespective of whether we consider racial discrimination in North America or ethnic discrimination in Europe. At the subgroup level, the relative call-back rate is 1.84 in Europe and 1.69 in the USA/Canada. These results, however, do not take into consideration that in-person audits are still prevalent in the USA and often report high rates of discrimination (Pager 2007). The reported differences should be interpreted with caution.
The second dimension we focus on is gender. Stereotypes and media images of immigrant women tend to be less radical than those of immigrant men (Bovenkerk 1992;Andriessen et al. 2012). This may lead to women being perceived as better integrated into and less threatening to society than immigrant men, and thus lowering discrimination for women. The opposite expectation can be drawn from status hierarchies, where men tend to be ranked 'higher' (Andriessen et al. 2010). Indeed, women seem to fare slightly worse than men (relative call-back rate 1.74 for women and 1.63 for men). However, these small differences are not statistically significant (p > .1) and may be related to the particular occupations and positions chosen in the correspondence test, where gender stereotypes of 'typical' male or female jobs may influence the results. Substantively, there is no indication of systematic gender differences on a large scale.
A third dimension in which studies may be differing in a systematic way is the economic context. During times of economic boom and labour shortage, employers are likely to take more risks when hiring. It can be assumed that this affects discrimination rates: employers become more likely to 'give a candidate a chance', irrespective of past experience with other members of the same group or prevailing stereotypes. It can therefore be expected that discrimination is lower during times with low unemployment and high GDP growth (Baert et al. 2013). By contrast, Carlsson, Fumarco, and Rooth (2015) showed that for Sweden ethnic discrimination increases when the labour market improves. Focusing our analysis on GDP growth and unemployment rates we find no systematic association between the economic situation and ethnic discrimination in hiring. While a higher level of discrimination can be observed at times of high unemployment (mean call-back rates of 2.03 and 1.50), when considering median call-back rates, the differences disappear (supplementary material S11). Looking at the correlation between unemployment rates and call-back rates, there is no clear association (r = −0.05, p > .1). Similarly, the correlation between annual GDP growth rates and call-back rates is not significant (r = 0.04, p > .1). Taken together, there is no evidence that rates of discrimination vary according to the national economic situationalthough the relevant level of analysis may be occupation-specific and region-specific and unattainable in this analysis.
Rather than looking at the influence of individual factors, the supplementary material also includes multivariate regression meta-analysis to examine the influence of different factors (S13). In particular, the skills level may be of interest, and the regression coefficient for high skills is positive (0.28, p < .05), while the regression coefficient for low skills is negative (−0.16, p < .05). The substantive patterns reported in this section remain unchanged when controlling for gender or whether firstor second-generation applicants are considered, suggesting that the reported findings are robust.

Taste-based and statistical discrimination
Having established that ethnic discrimination in hiring exists across contexts in a fairly robust manner, we now make use of the variation in the studies. First, we focus on the difference between firstand second-generation immigrants. While some studies explicitly mention if their candidates belong to the first or second generation, most studies just mention that applicants have been schooled in the country where the testing is conducted. We treat these minority applicants as second-generation immigrants. As summarised in Table 2, the relative call-back rate for first-generation immigrants on the subgroup level is 1.93 (mean), while it is 1.71 for second-generation immigrants. There is no clear pattern across studies, and no evidence that discrimination would generally be lower for the second generation in substantive terms. In the multivariate models presented in the supplementary material (S13), the coefficient for the second generation is negative (−0.35, p < .05).
The minority groups selected for testing have become more diverse in recent years, but there are some groups which are included frequently, especially in European correspondence tests. By focusing on specific ethnic groups, we are able to minimise the influence of unobserved variables on call-back rates. We focus on the ethnic groups most commonly studied: Arabs and people of Middle Eastern origin; Chinese; Indians, Pakistani, and Bangladeshi; and Turks. 4 The results in Table 3 make apparent a clear hierarchy of minority groups: Discrimination is highest for Arabs and people of Middle Eastern origin, followed by Chinese, Indians, Pakistani, and Bangladeshi; it is lowest for Turks. Similar patterns are reported in individual studies where more than one minority group was included. For instance, in Austria Serbs face the lowest relative call-back rate (1.31), followed by Chinese (1.37), Turks (1.46), and Nigerians (1.98) (Weichselbaumer 2015b; see also McGinnity and Lunn 2011;Booth, Leigh, and Varganova 2012). Multivariate regression analysis in the supplementary material suggests that these differences are robust to differences in skill levels (S13). Taken together, the results suggest clear ethnic hierarchies, but hierarchies that are specific to a place and probably time.
As the issue of racial and ethnic discrimination appeared on the European political agenda at the end of the last century, two EU directives where adopted in record time (Directive 2000/43/EC and 2000/78/EC). Table 4 presents the discrimination rates in the European Union (thus excluding Switzerland and Norway) before and after the adoption of these directives in order to examine the impact of these anti-discrimination policies. Interestingly, the reported level of discrimination has increased since the adoption of the EU directives, with the relative call-back rate rising from 1.36 to 1.96. The observed increase is probably a reflection of the groups included in the correspondence tests and may be due to the fact that most European studies were conducted after the adoption of the directives, but there is certainly no evidence that the EU directives would have led to a direct reduction in discrimination. Rather than looking at the EU directives, the level of discrimination in German-speaking countries is of particular interest because it allows direct inferences about statistical discrimination. German-speaking countries are known for their extensive application packs, requiring detailed documentation about job candidates. In order to be considered as a serious applicant in German-speaking countries, it is customary to compile an application package that contains not only a cover letter and a CV, but also at least a photograph and school reports for entry-level positions such as apprenticeships, or university transcripts and diplomas as well as reference letters from former employers for people with more experience (Kaas and Manger 2012;Schneider, Yemane, and Weinmann 2014;Weichselbaumer 2015b). This amount of detailed information provides employers with more knowledge about candidates than in other contexts, and is thus likely to reduce statistical discrimination. The results in Table 5 suggest that this is the case, with levels of discrimination being lower in German-speaking countries than elsewhere. Multivariate regression analysis in the supplementary material shows that this difference is robust and not just a reflection of the skills level tested (S13). The implications are twofold. On the one hand, the difference suggests that statistical discrimination indeed plays a role, something that could be addressed with more information or different application packs. On the other hand, the call-back rates in the German-speaking countries suggest that there is a high degree of discrimination even where application packs are more substantial, indicating that statistical discrimination is not the only factor explaining discriminatory behaviour in hiring decisions. In this case we are looking at preferences and attitudes, and remedies are less obvious.
In several studies using correspondence tests it is suggested that discrimination is higher in private companies and that the chances of minority applicants to be invited for a job interview are greater in public companies (e.g. Wood et al. 2009;Eid 2012;Midtbøen 2014). Our analysis confirms the assumption that public employers are less likely to discriminate against minority applicants, with the mean call-back rate for private employers at 1.65, and a corresponding call-back rate for the public sector at 1.19. However, the number of studies included is relatively small and further research is needed to confirm this relationship. Public employers bear a special responsibility and are often bound by specific procedures to ensure equal opportunities during employment (e.g. the use of standardised application forms; Wood et al. 2009).

Discussion
Across OECD countries, members of ethnic and racial minority groups face discrimination in the hiring process. Most studies report discrimination of minority groups, and across studies the difference amounts to minority groups having 49% lower odds to be invited for a job interview than their majority competitor. Looking at relative callback ratios, members of minority groups need to send around three applications for every two applications a member of the majority group needs to send in order to be called back for an interview. These patterns of discrimination are relatively robust across countries and economic situations. The fact that discrimination is still prevalent in all countries where testing has been conducted, despite the adoption of anti-discrimination legislation, shows that there is still much room for future research, especially concerning the underlying reasons for discrimination and how the reported differentials come into existence.
For instance, many more correspondence tests focus on male candidates than on female candidates, something in part attributable to the ILO studies of the 1990s (Bovenkerk et al. 1995;Goldberg, Mourinho, and Kulke 1995;de Prada et al. 1995;Arrijn, Feld, and Nayer 1998). Recent Scandinavian studies (e.g. Arai, Bursell, and Nekby 2011;Bursell 2014) suggest that men with foreign names are less likely to be invited for a job interview than women with foreign names. It is unclear whether women are perceived as being lower qualified and thus are considered for lower quality work, or men are discriminated against because they are perceived as more threatening (Bovenkerk 1992). While across studies there appear to be no systematic differences between the discrimination of minority men and minority women, further research in this area is warranted to identify relevant mechanisms, especially because most existing studies were not designed to test the stipulated gender differences.
There is no systematic difference between the relative call-back rates for firstand second-generation applicants, suggesting that taste-based discrimination dominatessecond-generation candidates have local qualifications so employers have no need to use ethnicity to guess. As Carlsson (2010, 272) highlighted, 'the factor driving discrimination seems to be ethnicity per se'. In this case, as Heath and Cheung (2006) emphasise, disadvantage is unlikely to disappear between generations. There is some evidence that levels of discrimination decrease over time, but the lack of a clear substantive difference between the firstand second-generation candidates is problematic in as much as many immigrant integration policies in Western Europe are based on what is perceived as a meritocratic society, where qualifications and language skills should allow for equal chances. This is also the case for the EU directives that do not appear to have lowered discriminatory practices in hiring directly. More research is needed to understand how these policies fail to make a dent on discrimination in hiring, including considerations of indirect and lagged effects.
Further evidence for taste-based discrimination comes from the fact that different minority groups fare differently in hiring decisions. Research is necessary to make sense of patterns of ethnic hierarchies, because correspondence tests often contrast more established minority groups with more recent arrivals, an approach recommended by Bovenkerk (1992). As a result, studies may confound different mechanisms. Nonetheless, numerous explanations are provided in the different studies, ranging from ethnic hierarchies, to social distance between the minority groups tested and the majority (e.g. Andriessen et al. 2010). While these all point towards status hierarchies, the differences across countries and time indicate that these hierarchies are neither universal nor purely based on skin colour. Research linking discriminatory behaviour towards certain immigrant or minority groups with attitudes towards these minority groups would be fruitful to further understand what characteristics of the minority candidates lead to discrimination. For instance, most studies on Arab applicants have been conducted in Scandinavia after 2006, at a time when attitudes towards Arabs have become negative, stereotypes threatening, and Islamophobia widespread (Dolezal, Helbling, and Hutter 2012;Berkhout and Ruedin Forthcoming;Helbling 2014). While our focus has been on discrimination by employers, (anticipated) discrimination by colleagues and/or customers might also play an important role (Baert and De Pauw 2014).
Evidence that statistical discrimination plays a role comes from German-speaking countries where more extensive application material is the norm and from public sector employers where non-discriminatory hiring practices are often explicitly sought. Discrimination is higher in the private sector and in countries without the extensive application packs commonplace in German-speaking countries. With more information, there is less room for statistical discrimination. Results by Weichselbaumer (2015b) highlight that simply providing more information is no cure for discrimination: the photograph required in German-speaking application packs seems to be used to systematically discriminate against applicants with headscarves (i.e. taste-based discrimination). The situation is somewhat different in the public sector where more careful selection of candidates with regard to diversity may play a rolepossibly deliberate demonstrative action to forward a political agendaaspects perhaps less valued in the private sector where efficiency and productivity may be overruling other concerns. Moreover, standardised application procedures are more widespread in the public sector (Wood et al. 2009). The introduction of standardised procedures and requirements for more detailed application packs or other means to increase the information employers receive for example, by officially vetting foreign qualificationsare readily actionable.

Conclusion
This article provided a meta-analysis of ethnic discrimination in hiring decisions, showing that such discrimination has remained widespread across OECD countries in the last 25 years. Correspondence tests clearly indicate that the discrimination of ethnic and racial minority groups in hiring decisions is still commonplace: Equivalent minority candidates need to send around 50% more applications to be invited for an interview than majority candidates. In a second step, we used the variation across studies to draw inferences on the presence of taste-based and statistical discrimination as far as possible. There are many indications that taste-based discrimination remains dominant, although in some instances there is evidence that statistical discrimination also plays a role. This is important since the two forms of discrimination require different interventions: more extensive and standardised procedures seem to reduce statistical discrimination, albeit at the cost of adding bureaucracy, while awareness and consciousness may help reduce taste-based discrimination.
It lies in the nature of a meta-analysis that no detailed examination of discrimination can be provided. We identified much scope for further research, particularly with regard to identifying the underlying mechanisms that lead to discriminatory practices: how it is that discrimination takes place. Carefully designed correspondence tests may play a role here, and differences in response rates across minority groups merit further examination, given that these differences seem to follow patterns, albeit complex patterns that seem to depend on time and place. It is likely that insights from work on attitudes towards foreigners and minority groups and other related research can help understand why there are differences in discrimination and which groups are likely to be discriminated against. With discrimination found across countries and time, there seems to be plenty of research material out there, so to speak. What is needed are studies that go beyond showing that ethnic discrimination in hiring exists, to identifying the exact mechanisms and how more equitable hiring can be achievedunless we want to keep wasting talents. Notes 1. For a detailed discussion on unobservable characteristics in field experiments we refer readers to Heckman and Siegelman (1993), Heckman (1998), and Neumark (2012). 2. See supplementary material S1 for a complete list of studies. Studies included in the analysis but not cited elsewhere in the text: Allasino et al. 2006, Arceo-Gómez and Campos-Vázquez 2013, Attström 2007, Dechief and Oreopoulos 2012, Drydakis 2012, Drydakis and Vlassis 2010, Duguet et al. 2010, Esmail and Everington 1993, 1997, Fibbi, Kaya, and Piguet 2003, Gaddis 2014, Jacquemet and Yannelis 2012, Lodder, McFarland, and White 2003, Nunley et al. 2014, Oreopoulos 2011, and Widner and Chicoine 2011 We also note the studies by Duguet et al. (2015), Agerström et al. (2012), and Adida, Laitin, and Valfort (2010), but their measurements are not comparable to the other studies. In the study by Weichselbaumer (2015a) we did not include the manipulations with headscarves to maintain comparability across studies. Also not considered in the analysis were studies using unsolicited applications; see Diekmann, Jann, and Näf (2014) and Ariel et al. (2015) for recent examples. 3. We only looked at the part of the study where applications were sent by mail. While response rates were higher for Latino applicants, the differences were not statistically significant (Bendick et al. 1991, 8). 4. Many US studies focus on Hispanics, but most of them are audit studies rather than correspondence tests.