How Much Has the North Atlantic Ocean Overturning Circulation Changed in the Last 50 Years?

Volume transports from six ocean reanalyses are compared with four sets of in situ observations: across the Greenland–Scotland ridge (GSR), in the Labrador Sea boundary current, in the deep western boundary current at 43 8 N, and in the Atlantic meridional overturning circulation (AMOC) at 26 8 N in the North Atlantic. The higher-resolution reanalyses (on the order of 1 / 4 8 3 1 / 4 8 ) are better at reproducing the circulation pattern in the subpolar gyre than those with lower resolution (on the order of 1 8 ). Simple Ocean Data Assimilation (SODA) and Estimating the Circulation and Climate of the Ocean (ECCO)–Jet Propulsion Laboratory (JPL) produce transports at 26 8 N that are close to those observed [17Sv (1Sv [ 10 6 m 3 s 2 1 )]. ECCO, version 2, and SODA produce northward transports across the GSR (observed transport of 8.2Sv) that are 22% and 29% too big, respectively. By contrast, the low-resolution reanalyses have transports that are eithertoo small[by31% for ECCO-JPL and 49% forOcean Reanalysis, system3 (ORA-S3)] ormuch too large [Decadal Prediction System(DePreSys)]. SODAhad thebestsimulationsof mixedlayerdepthandwith two coarse grid long-term reanalyses (DePreSys and ORA-S3) is used to examine changes in North Atlantic circulation from 1960 to 2008. Its results suggest that the AMOC increased by about 20% at 26 8 N while transportacross the GSR hardly altered. The other (less reliable)long-termreanalysesalso had smallchanges across the GSR but changes of 1 10% and 2 20%, respectively, at 26 8 N. Thus, it appears that changes in the overturningcirculationat 26 8 NaredecoupledfromtheﬂowacrosstheGSR.Itisrecommendedthat transport observations should not be assimilated in ocean reanalyses but used for validation instead.


Introduction
The overturning circulation of the North Atlantic carries warm surface water to high latitudes where the transfer of heat into the atmosphere warms northern Europe and provides the buoyancy imbalance to drive the overturning. Early global ocean circulation models suggested that a rapid decrease in salinity in the North Atlantic could cause a catastrophic shut down of the Atlantic meridional overturning circulation (AMOC) (e.g., Rahmstorf 1995;Vellinga and Wood 2002). More recent assessments suggest that that the circulation will most likely slow down, but not collapse, during the twenty-first century (e.g., Meehl et al. 2007). In response to these concerns, a number of programs for monitoring the overturning circulation in the North Atlantic have been established in recent years. In this study we focus on the output from three of these programs: Thermohaline Overturning-at Risk? (THOR), North Atlantic Climate (NACLIM; European projects to monitor the exchange across the Greenland-Scotland ridge) superseding THOR, and Rapid Climate Change (RAPID; a United Kingdom and United States led program to monitor the strength of the AMOC at 268N).
Some of the early results from these programs were quite alarming. Measurements in the Arctic water flowing through the Faroe Bank Channel between 1996 and 1999 were used to infer that the transport had declined by at least 20% since the 1950s (Hansen et al. 2001). Subsequently, Bryden et al. (2005), using five repeat sections from 1957 to 2004 across the North Atlantic at about 258N, reported that the AMOC had slowed by about 30% in that time. However, this result has been confounded by more recent measurements from the RAPID array at 268N that have shown that both intra-annual (Cunningham et al. 2007;Rayner et al. 2011;Wunsch and Heimbach 2006) and interannual (McCarthy et al. 2102) variability in the AMOC are substantial. In addition, a longer set of observations supported by an hindcast ensemble finescale ocean model showed that transport through the Faroe Bank Channel had been largely steady since the 1950s . These results show that estimations of the changes in ocean circulation from a limited set of observations are likely to be unreliable.
One way to overcome the problem is to use ocean reanalysis, a technique that synthesizes ocean models with historical observations of the ocean and hindcasts of atmospheric forcing, to produce estimates of the ocean state. Balmaseda et al. (2007), using output from the European Centre for Medium-Range Weather Forecasts (ECMWF) Ocean Reanalysis, system 3 (ORA-S3), found a decline in the AMOC of about 4% decade 21 between 1959 and 2008. By contrast, Wang et al. (2010) found a small increase of 2 Sv (1 Sv [ 10 6 m 3 s 21 ) in the German contribution to Estimating the Circulation and Climate of the Ocean (GECCO) ocean reanalysis, and the National Centers for Environmental Prediction (NCEP) ocean reanalysis produced an increase in the AMOC from 1980 to 1995 followed by a reduction from 1995 to 2008 (Huang et al. 2012). Thus, despite their sophistication, even ocean reanalysis can produce conflicting results.
To date, most intercomparisons of reanalyses have looked at the ocean state rather than ocean transports. A review of nine ocean analyses (including two that were not reanalyses per se) found a mean increase in the global heat content of the upper ocean of 0.24 W m 22 between 1960 and 1992, but that in the North Atlantic while the subtropical gyre had warmed the subpolar gyre (SPG) had cooled (Carton and Santorelli 2008). Lee et al. (2009) reviewed the temperature and salinity signals between 1992 and 2002 from eight reanalyses and found a relatively small spread in the temperature of the upper 300 m of the North Atlantic. More recently, investigations of the transport in the AMOC found that most ocean reanalyses appear to reproduce it fairly well (Munoz et al. 2011;Haines et al. 2013).
This study takes this analysis further and asks whether the results of Munoz et al. (2011) hold for circulation in the rest of the North Atlantic and for the AMOC as a whole. The objective is to establish confidence in six different ocean reanalyses (see Table 1) in order to determine what change has occurred in the strength of the AMOC over the last 50 years. All the reanalyses assimilate historical temperature, salinity, and altimeter data but not transport, which thus becomes an independent measure of skill. Transport also provides a useful test since it effectively examines the balance of forces within a model. While a realistic transport does not guarantee that the right forces are in balance, an unrealistic one is a matter for concern. Since decadelong observation sets of transport in the North Atlantic now exist, it is timely to test the reanalyses against them.
A major effort has been made in recent years to monitor and measure the strength of the AMOC in the subtropical gyre at 268N, where the maximum northward oceanic heat transport occurs (e.g., Cunningham et al. 2007). The observation set from the RAPID array has been used for validation by most reanalyses and is also used here. The AMOC is believed to be driven in part by deep-water formation at high latitudes (e.g., Broecker 1991;Rahmstorf 1995;Zickfeld et al. 2007), so it is reasonable to expect an ocean reanalysis to reproduce a sufficiently accurate heat transport farther north, in particular in the regions of deep convection that are found in the Labrador Sea (e.g., Dickson et al. 2008) and the Nordic seas (e.g., Karstensen et al. 2005). Thus, a necessary (if not sufficient) requirement for confidence in a reanalysis is that it should also represent with a reasonable degree of accuracy the mean transports from the high-latitude regions of buoyancy forcing. For this reason, we investigate the skill of reproducing transport in the SPG as well as in the subtropical gyre.
Compared to the simple structure of the AMOC, which at 268N can be represented with a 2D meridional section of zonal average velocities, the circulation of the SPG in the northern North Atlantic is convoluted and merits a detailed examination in section 3. The ability of the reanalyses to reproduce this circulation qualitatively is also discussed here. Quantified comparisons of simulated transports and mixed layer depths against observations  1980-2009 1959-2011 1993-2011 1960-2008 1948-2008 1992-2006 Laboratory are given in sections 4 and 5, respectively, and from these comparisons an objective system is used to rank the reanalyses in section 6, which makes it possible in section 7 to assess the likely quality of long-running hindcasts of the AMOC. Finally, the results of this work are brought together in section 8. The paper follows with a description of the reanalyses, while the observation sets are described as they appear in other sections.

Ocean reanalyses
Ocean reanalyses have been developed by several groups under the auspices of the Global Ocean Data Assimilation System (GODAS; for an overview, see Lee et al. 2009). A subset of six published ocean reanalyses has been selected to maximize the diversity of their base ocean models, horizontal and vertical resolution, and assimilation methods (see Table 1 for details). Most of the products (or outputs) were obtained from the Climate Variability and Predictability (CLIVAR) ocean synthesis directory (online at http:// icdc.zmaw.de/easy_init_ocean.html) apart from those for the Decadal Prediction System (DePreSys), which were made available by the Met Office. Crucially, none of the reanalyses assimilates transports (Table 1), but almost all assimilate salinity, temperature, and altimeter data from standard database sources such as the World Ocean Database. Horizontal resolution ranges from 18 km to 1.258 (up to 125 km) and invariably the vertical resolution is very fine near the surface (typically 10 m) but becomes much coarser by middepths, with some models having layer thicknesses of over 700 m. This choice of vertical resolution seems to reflect the requirements of these reanalyses, which often focus on the air-sea interface to the possible detriment of the bottom boundary currents.
The objectives of the reanalyses were wide ranging, from forecasting El Niño (NCEP and ORA-S3, although the latter also claims skill with the AMOC and the SPG) to simply reproducing the ocean mean state [Simple Ocean Data Assimilation (SODA) and Estimating the Circulation and Climate of the Ocean (ECCO) products]. Uniquely, DePreSys is based on a coupled climate model [the Met Office Hadley Centre Coupled Model, version 3 (HadCM3)] that has been adapted to assimilate ocean observations with the aim of improving decadal climate projections. Further information can be found from the references cited in Table 1.
The ocean models used in the reanalyses have biases that are likely to be reflected in their transports. In addition, since the amount of assimilated data has increased enormously in time the manifestation of bias in the reanalyses is likely to be time dependent. We have no way of knowing the true impact on trends of the shortage of historical data because the reanalyses are not perfect tools, but we can test their ability to describe the present known transports and circulation patterns before using them to quantify changes in the AMOC.
3. Circulation of the subpolar gyre

a. Circulation pathways
The SPG is a cyclonic circulation of the surface and intermediate water masses that is centered in the Labrador Sea and Irminger Basin but reaches across to the eastern side of the northern North Atlantic and interacts with the circulation in the Nordic seas to the north of the Greenland-Scotland ridge (GSR; Figs. 1 and 2; e.g., Olsen and Schmith 2007). The forcing of the system comprises a mean cyclonic wind stress over the area coupled to a combination of deep convection in the Nordic and Labrador Seas (e.g., Bacon 1998;Hakkinen and Rhines 2009), which is probably augmented by convection in the Irminger Basin (Pickart et al. 2003) and entrainment on the southern flanks of the GSR.
The bottom-following outflows from the Denmark Strait and Faroe Bank Channel transport cold dense water cyclonically around the edge of the Labrador Sea and discharge as the deep western boundary current (DWBC; Fischer et al. 2010). Water that is convected to intermediate depth either recirculates through the Irminger Basin or is lost to the east (Yashayaev and Dickson 2008). Surface water that is advected northward across the GSR by the meridional density gradient is a combination of cool SPG water from the west and warm subtropical gyre water from the south (e.g., Hansen et al. 2010). It replaces the water that sinks in the Nordic seas and which outflows through gaps in the GSR as a precursor to the DWBC. As this cold water sinks on the southern flanks of the GSR, it at least doubles its volume by entraining warm surface water from the south (e.g., Hansen and Østerhus 2000). Thus, the overturning circulation immediately to the south of the GSR is about twice the size of the thermohaline exchange across it. It is through this complex region of forcing that the overturning water of the AMOC makes its way south.

b. Midwater drifter observations of the circulation
Streamlines of the mean flow fields of intermediatedepth water in the SPG (Lavender et al. 2005) were derived from over 200 neutrally buoyant floats deployed to drift at depths of between 400 and 1500 m between 1994 and 2002 although with the majority of floats only being operational between 1995 and 1997 (Fig. 2). The floats surfaced at regular intervals to make a CTD profile and report position from which an objective analysis of the dynamic height and flow field at 700 m were derived. Although the internal consistency of the flow field was confirmed, no specific accuracy was reported. The floats reveal the pathway of Iceland-Scotland overflow water as it flows westward along the bottom from the Faroe Bank Channel around the Reykjanes Ridge to join the Denmark Strait overflow water that circulates cyclonically around the edge of the Labrador Sea and on toward the Grand Banks. The subpolar front shows up as a pathway for drifters traveling eastward at about 528N and then northeastward toward the Iceland Basin.

c. Qualitative evaluation of the reanalyses in the SPG
The overall pattern of circulation in the SPG at 700 m provides a good qualitative basis for examining the reanalyses. Velocities from all six of them at the nearest depth to 700 m were time averaged from 1995 to 1997 and area averaged to a resolution of 28 3 18 (about 100 3 100 km 2 at 608N) to give equivalent simulated velocities to the Lavender et al. (2005) analysis (Fig. 3). Around Greenland, three of the lower-resolution reanalyses [NCEP, ORA-S3, and ECCO-Jet Propulsion Laboratory (JPL)] have weak circulations, and the boundary current produced by DePreSys appears to be driven by excessively strong currents between Iceland and Greenland. Apart from NCEP, the lower-resolution models have very weak cyclonic circulation in the Labrador Sea.
By contrast, the deep circulation patterns produced by the finer-scale SODA and ECCO phase 2 (ECCO2) reanalyses are both qualitatively and quantitatively similar to that observed.

Direct transport estimates a. In situ observations
Historical estimates of ocean transport have mainly used the geostrophic method in which internal dynamic heights and hence vertical shear across a section are determined from conductivity-temperature-depth (CTD) observations. There are weaknesses in this method and in the last couple of decades such determinations have been augmented with direct measurements using in situ recording current meters. These measurements have been converted into transport time series and provided by the data originators who have the expertise and local knowledge required to undertake such conversions. Although there are many sources of error and bias (e.g., the spacing between moorings may be too large) not all originators quote accuracies for their calculations.
Since 2004, continuous measurements of the internal meridional transports at the 268N RAPID array (red line R in Fig. 1) have been made using a combination of basin-scale geostrophy and moored instruments deployed either side of the Atlantic, with an allowance made for the wind-driven surface currents (e.g., Bryden et al. 2009;Cunningham et al. 2007). One of the principal limitations of the geostrophic method is that it does not determine the background barotropic current, and furthermore in the shallow (and barotropic) slope regions it is entirely inappropriate. At 268N these problems are resolved by assuming zero net meridional transport across the section and deploying current meters on the western slope of the ocean. The array gives a mean transport for the period 2004-12 of 17 Sv, which is decreasing at a rate of about 0.55 Sv yr 21 (Smeed et al. 2013).
Direct measurement of the exchange across the shallow waters of the Greenland-Scotland ridge ( Fig. 1) depends almost exclusively on the use of current meters. The current flowing around the northwest slope of Iceland has been measured since 1994 using five single point recording current meters (RCMs) deployed at three mooring sites (red line A in Fig. 1; mean of 0.9 Sv; Jonsson and Valdimarsson 2005). The inflow across the Iceland-Faroe Ridge has been inferred from four acoustic Doppler current profilers (ADCPs) deployed over the northern slope of the Faroes (line B in Fig. 1) since 1997 (mean of 4.6 Sv; Hansen et al. 2003). The inflow through the Faroe-Shetland Channel has been computed since 1994 from altimetry observations of sea surface height that are calibrated against five long-term moored ADCP records deployed across the channel (Berx et al. 2013). Its mean net transport (2.7 Sv) comprises a surface inflow of 3.5 Sv on the Shetland side that is countered by an outflow of 0.8 Sv on the Faroese side. In all three sections the estimates of mass, heat, and salt transport are supported by regular, if occasional, CTD sections ). These continuous measurements of the surface transport across the GSR form the longest direct record of transport in the overturning circulation of the North Atlantic. Over the full period of observation to 2012, the mean and interannual standard deviations in the surface transports were 8.2 6 0.6 Sv.
The deep recirculating cyclonic boundary current in the Labrador Sea has been measured since 1997 by the Institute for Marine Science-Research Center for Marine Geosciences (IFM-GEOMAR) with a mix of RCMs and ADCPs and (initially) five moorings that were gradually reduced to one from 2003 (Fischer et al. 2004(Fischer et al. , 2010. Its mean transport through the line D to the surface in the 3 yr up to 1999 was 39 Sv ( Fig. 1 and Table  2). There follows a gap in the database to 2009 after which IFM-GEOMAR only report the transport in the outflow below about 1000 m. For consistency with elsewhere in this paper, these later results have been ignored.
Transport in the DWBC across and east of the eastern United States and Canadian shelf edge was monitored from September 2008 to August 2009 using an array of bottom pressure recorders and temperature and salinity moorings in the RAPID-funded Western Atlantic Variability Experiment (WAVE) array. More datasets may become available in due course. The observations were  Fig. 1). Its mean transport was 18 Sv (Table 2), close to the AMOC transport at 268N.

b. Evaluation methodology
Care has been given to the method of evaluating transport through a section. Ocean reanalyses report the strength of the MOC by using zonal averages across the Atlantic at different depths to produce a two-dimensional slice of the overturning streamfunction (see, e.g., Munoz et al. 2011). The weakness of this approach is that the streamfunction decays sharply near the GSR where the MOC transforms from a vertical to a quasi-horizontal circulation as the surface inflow on its eastern side is complemented by the surface outflow on its western side. In such circumstances zonal averages of transport tend to zero. This problem has been overcome by assuming that the inflow is formed from the sum of the values at all grid points where transport has the same sign. This method thus allows large recirculations to count as inflows. It forms an upper bound on the reanalyzed transports because, where they are weak and there is a strong permanent eddy or substantial grid noise, small-scale recirculations may produce a spurious positive transport.
The location of the line along the GSR was determined by great circle arcs between five points (588N, 58W; 628N, 78W; 658N, 148W; 658N, 258W; and 698N, 328W), which pass along the ridge in the six reanalyses (magenta lines in Fig. 3). Each great circle arc was approximated by a number of points each about 5 km apart (depending on the grid size and length of arc). Reanalysis data were first annually averaged. At each grid point and depth level (labeled i and k, respectively), the transport S ik 1 normal to the line was computed as S ik 1 5 (V ik cosu i 2 U ik sinu i )x i T ik when the right-hand side was positive and as S ik 1 5 0 otherwise. The terms V ik and U ik are the annually averaged meridional and zonal velocity components taken from the nearest point (along a great circle) on the reanalysis grid, u i is the angle that the great circle arc makes with lines of latitude at point i, x i is the length of the arc (approximately 5 km for points on the arc and half this for the end points), and T ik the vertical thickness of each cell. The total northward flow across the line S 1 is then S 1 5 SS ik 1 .
Transports in the SW corner of the Labrador Sea (line D; Fig. 1) were computed using a similar method with great circle arcs computed between 51.258N, 58.28W and 55.258N, 46.258W. The section matches the mooring array located there but has been extended because the currents (and hence the total boundary transports) in the reanalyses are often wider than those observed. At both 268 (line R) and 438N (line W) the zonal average streamfunction between 808W and 08 was used to compute transports. Values were taken from the streamfunction at those latitudes closest to 268 and 438N, respectively. Both the mean and variability (standard deviation) of the annual mean transports of the six reanalysis have been computed (Table 2). The finite nature of the grids of the ocean reanalysis meant that the lines could not coincide exactly with the locations of the observation sets, so additional transports were also computed north and south of them. This was done by adjusting the latitude coordinates of all the points in the great circle arcs and streamfunctions by 0.58, with the exception of the end point at northwest Scotland. The latitude within 60.58 of the central line at which the transports were closest to the observed values was used in all further analysis.

c. Evaluation of the reanalyzed transports
If the observed transports are accurate, then a credible reanalysis should be able to reproduce the observed mean and variability of the transport across critical sections with sufficient accuracy. However, it is difficult to test the variability or trends in transport directly since some of the observation sets have only been in place for a few years and even the longest (across the GSR) is much shorter in duration than the multidecadal scales of variability in the Atlantic. So, before considering variability, the mean transports from the reanalyses are compared against the observations. Those that compare poorly will be less valuable because they are more likely to have the wrong balance of accelerating and decelerating forces in the AMOC even if they are satisfactory in other respects.
The northward ocean heat transport across the GSR, which is driven by advection, is proportional to temperature in degrees Celsius given that the balancing return flow is at about 08C. Errors in the reanalyzed temperatures are likely to be small because they are directly assimilated into the reanalyses. Thus, a large transport error will dominate any heat transport error across the ridge. Where this happens, the conditions that lead to forcing by deep convection in the Nordic seas will necessarily be determined from an incorrect implementation of at least one of the other relevant processes and confidence that the reanalysis can produce accurate hindcasts (and forecasts) will be weakened. For this reason, an important test is that the transport across the GSR is reasonably accurate.

1) THE NORTHWARD TRANSPORT OF THE AMOC AT 268N
The 1961-2000 average transport at 258N from the slightly different set of ocean reanalyses (Munoz et al. 2011) ranged from 10 to 21 Sv with a cluster around 14 Sv. In the present calculations the equivalent transport at 268N is similar (15 Sv) but is also smaller than the 17 Sv observed there more recently (Table 2). Of our six reanalyses, three (ORA-S3, ECCO-JPL, and SODA) are within about 10% of the observed transports while ECCO2 and NCEP (after shifting the line north by 0.58) are within 25%, but the DePreSys transports are much weaker. In general, the magnitude of the overturning transport in the reanalyses does not change much within 60.58 of latitude.

2) TRANSPORTS ACROSS THE GSR
Somewhat surprisingly, given the relative accuracy at 268N, there are large differences between the reanalyzed and the observed transports in both their mean and their variability across the GSR ( Fig. 4 and Table 2). No reanalysis comes within 10% of the observed mean and only ECCO2 is within 25%. SODA is a little larger than the 10.3-Sv 25% upper limit (at 20.58) and ORA-S3 and ECCO-JPL are less than the 6.1-Sv 25% lower limit (also at 20.58). DePreSys has a ridge derived from HadCM3 (Roberts and Wood 1997) that is unrealistically deep near Greenland which may explain why it is the only reanalysis to have a transport that is much too large (by a factor of 3). The NCEP reanalysis is ignored here because it is cut off at 658N which is too close to the ridge.
The interannual variability in the remaining five reanalyses is between 0.2 and 0.5 Sv, which is somewhat less than the 0.7 Sv observed ( Fig. 4 and Table 2). The reason for these relatively small variabilities is not entirely clear, although one cause could be error in the observations due to mesoscale activity: for example, the annual mean transport in the Faroe-Shetland Channel is accurate to 60.2 Sv (Sherwin et al. 2008). The internal Rossby radius in the vicinity of the ridge is on the order of 10 km, much smaller than all model grids except ECCO2, so mesoscale variability should be smaller in the reanalyzed transports.

3) THE BOUNDARY CURRENT IN THE LABRADOR SEA
The 1997-99 average of the directly measured southward boundary transport through line D in Fig. 1 (39 Sv; Table 2) has been extended with additional data that include more recent current meter and CTD observations that put it at 37 Sv (Fischer et al. 2010). Of this the average southward export of the SPG from the Labrador Sea is estimated to be about 17 Sv, which leaves about 20 Sv to recirculate within the gyre. The observed variability between 1997 and 1999 of 3 Sv is based on too short a dataset for meaningful comparison with the reanalyses. However, it is encouraging to see that SODA, DePreSys, and NCEP all show an overall reduction in southward transport that is similar to that observed (Fig. 4).
The mean outflows from the Labrador Sea in DePreSys, SODA, and ECCO2 are all close to that observed, and the mean outflow from NCEP is within 18% ( Table 2 and Fig. 3). By contrast, the transports from ORA-S3 and ECCO-JPL are much too small.

4) THE DEEP WESTERN BOUNDARY CURRENT AT 438N
Some care is needed in evaluating the reanalyses at 438N as the observational record (18 Sv) is based on a single year. Nevertheless, two reanalyses (ORA-S3 and ECCO-JPL) have transports that are within 10% of that observed while all the remainder are within 25% of it. Year-on-year variability is around 1 Sv in all but ORA-S3, where it is somewhat larger (2 Sv). None of the reanalyses show great sensitivity to adjusting the line by 60.58 latitude.

Mixed layer depths
The two high-resolution analyses may have better simulations of the observed transports than the lowresolution ones because they are better at assimilating temperature and salinity. While it is beyond our scope to undertake a forensic investigation of this possibility a relatively simple approach is to investigate the highlatitude winter mixed layer depth (MLD), which, although not a direct measure of the meridional pressure gradient or the rate of deep-water formation, may be a convenient proxy for them.
High-latitude deep-water formation can be summarized as the result of a competition in the surface that is mainly between overturning due to winter convective cooling and summer stratification due to freshwater inflow (e.g., Rahmstorf 1995). However, the importance of deep-water formation to the strength of the AMOC remains a matter of conjecture. A survey of 12 leading climate scientists placed it third of the physical oceanic processes controlling the AMOC (behind heat fluxes and diapycnal mixing) and less important than atmospheric freshwater transport (Zickfeld et al. 2007).  Fischer et al. (2010) found that the weakening of deep convection, and the decreased vertical extent of the winter mixed layer in the Labrador Sea, has not changed the strength of the DWBC. Although there was a weakening of deep convection in the Nordic seas in the 1990s, the strength of the overturning across the GSR has remained almost constant over the last 15 years, probably a result of increased outflow from the Arctic (Karstensen et al. 2005).

a. Methodology
The observed MLD was taken from the mean climatology between 1900 and 1992 derived by Monterey and Levitus (1997, hereafter ML97), who define MLD to be the depth at which potential temperature is 0.58C cooler than at the surface. ML97 capped their MLD values at 1000 m.
To simplify the analysis, area-averaged MLDs were computed for three subregions: (i) the open-ocean SPG, (ii) the Labrador Sea (LS), (iii) the Nordic seas (NS), and (iv) the northern North Atlantic as a whole, which is a combination of the three subregions (see Fig. 5 and Table 3). The split between the Nordic seas and the SPG is reasonably well defined by the line of the ridge. In contrast, the split between the Labrador Sea and the SPG is somewhat arbitrary but was chosen to put the observed maximum MLD south of Cape Farewell (Greenland) into the Labrador Sea region.  For the purpose of comparison with ML97 the reanalysis potential temperature data for the period 1994-2007 were conservatively area averaged to the 18 3 18 grid of ML97 and monthly average (10-day average for ECCO2) temperature profiles were computed. The MLDs from these profiles were defined using linear interpolation, in an analogous manner to ML97, and capped at 1000 m. They were then averaged over January-March (JFM) to give a mean winter MLD. The monthly average of the long-term changes were computed with the 1000-m capping restriction relaxed. They were then averaged to give uncapped MLDs that are generally deeper than their capped counterparts (Table 3). However, the two methods of calculation are not commutative and in ORA-S3 the capped depths are less than the uncapped ones. Time series were computed from the monthly means for each of the four regions and then converted to anomalies relative to the 1994-2007 average.

b. Evaluation of the reanalyzed winter MLDs with capped depths
The spatial distribution of the ML97 MLDs and the capped MLDs from the three reanalyses that have the longest datasets (SODA, DePreSys, and ORA-S3) are shown in Fig. 5. ML97 (Fig. 5a) has a small region with a very deep mixed layer in the Labrador Sea, a broad region where the MLD is in the range of 400-700 m in the SPG (close to the GSR and to the west of Ireland), and a deep layer in the central Nordic seas and off the coast of Norway. ORA-S3 (Fig. 5b) compares poorly with ML97, with no deep layer in the Labrador Sea, too shallow a one near the GSR, and a small region with a deep layer in the Nordic seas. By contrast, DePreSys does much better (Fig. 5c), missing some of the deep mixed layer off the coast of Norway and being somewhat shallower in the Labrador Sea than ML97. SODA also has a comparable, although more widespread, distribution to ML97 (Fig. 5d) but with a deeper layer in the Labrador Sea and some detailed differences in the Nordic seas.
The area mean MLDs in DePreSys, SODA, and ECCO2 (Table 3) are close to the ML97 climatology across the three smaller regions as well as the northern North Atlantic as a whole. Without deep convection in the Labrador Sea ORA-S3 is somewhat too shallow across the northern North Atlantic. ECCO-JPL compares well with ML97 in the Labrador Sea but is deeper in the Nordic seas and much too deep in the SPG. Finally, NCEP's capped MLD is too deep across the SPG and in the Labrador Sea.
In the two most accurate MLD reanalyses (from SODA and ECCO2) the difference between the capped and uncapped MLDs is least in the SPG (,20 m; Table 3), where much of the mixing is driven by the entrainment of cold water flowing over relatively shallow isobaths across the GSR. In the Nordic seas, capping reduces the reanalyzed MLDs by about 30 m but in the Labrador Sea, where deep winter convection is significant, its impact is quite large and the uncapped MLDs are deeper by between 78 (DePreSys) and 149 m (SODA).

c. Uncapped winter MLDs
There is a tendency between the reanalyses for transport across the GSR to increase with uncapped MLDs in the Nordic seas (Tables 2 and 3). However, this correlation is weak so that DePreSys has a very large transport compared to its MLD while the reverse is true for ECCO-JPL. This poor correlation may highlight the difficulties that the different models face in representing the shallow water of the GSR. However, it is also very difficult to see any relationship between the average MLDs in the North Atlantic and transports at 268N, where such constraints are less severe. (If anything, transport is inversely related to MLD.) Thus, while MLD is a relatively simple metric to compute it does not appear to relate directly to the size of the reanalyzed circulation of the higher latitudes of the North Atlantic. The apparent variation in the MLD that is required to drive the AMOC probably reflects an inherent truth that the transport is determined by a delicate balance between a number of different accelerating and decelerating forces that is beyond the scope of the present investigation.

An intercomparison of the reanalyses
The short duration of many of the transport time series compared with the time scales of ocean forcing, plus the complication of seasonal and mesoscale variability creates a problem for the validation of the reanalyses using observations at just one latitude. It is only when the mean transports across the GSR are combined with those at 268N and elsewhere that the increase in the number of datasets makes an objective distinction between the different reanalyses in the northern North Atlantic possible.
A simple and objective ranking has been adopted by quantifying the ability of the reanalyses to reproduce the magnitude of the observed mean transport at each of the four sections, along with the capped MLDs. Arbitrary limits of 610% and 625% in the differences between reanalyzed data and its equivalent observation appear to be sufficient to differentiate their overall skill. No specific claim is made for these limits except that they reflect the lower and upper estimates of observational uncertainty and are sufficiently sensitive to differentiate the relative accuracy of the reanalysis products. To clarify this discussion, the qualifier ''good'' is used if a reanalyzed transport is within 10% of the observation, ''poor'' is used if it falls outside 625%, and ''reasonable'' is used otherwise.
On this basis both high-resolution reanalyses perform fairly well. The transport in ECCO2 is good across the GSR and reasonable elsewhere while its MLDs are all either good or reasonable. SODA has two good transports (Labrador Sea and 268N), one is reasonable (438N), and all its MLDs are good. Its transport 0.58 south of the GSR is only 5% too large to be classified as reasonable, and a visual inspection of the section revealed high wavenumber oscillations so that the transport quoted in Table 2 (10.6 Sv) may well be an upper bound. Given the shortness of the records and that they have reasonable patterns of circulation in the SPG, we conclude that the products of both high-resolution reanalyses are broadly consistent with the true transports in the northern North Atlantic.
All low-resolution reanalyses perform relatively poorly. ECCO-JPL is the best with a good at 438 and 268N, but it is poor in both the Labrador Sea and GSR as well as in three of its MLDs. The transport in DePreSys is good in the Labrador Sea and at 268N, reasonable at 438N, but poor (much too strong) at the GSR. (By contrast, most of its MLDs are good.) NCEP is reasonable at 438 and 268N but is poor in the Labrador Sea, where the flow is too weak. Unfortunately, NCEP could not be evaluated across the GSR. ORA-S3 is good at 438N and, although it is reasonable at 268N, it is poor, because of weak flows, in the SPG and across the GSR. Furthermore its MLD across the northern North Atlantic is poor. None of the low-resolution reanalyses provide very convincing circulation pathways in the SPG.
These results suggest that a necessary (but not sufficient) condition for the simulation of large-scale features of the mean Atlantic transport is that sufficiently high-resolution (e.g., at least 1 /38 3 1 /38) ocean reanalyses are required. Although none of the reanalyses are good everywhere, SODA and ECCO2 seem best suited to examine historical changes in the amplitude of the AMOC since they are able to reproduce the observed circulation paths in the SPG as well as the magnitude of the critical transports at 268N. Ideally, both should have slightly smaller flows over the GSR (SODA particularly so). In addition their interannual transport variability and MLDs are similar to those observed. Ultimately, we recognize that, even though they stand out as being more skilful than the others, future reanalyses and longer transport observation datasets may show this impression to be misplaced.

Variations in the strength of the Atlantic circulation and MLD since 1960
The aim of this work is to use independent oceanic indices from reanalysis products to draw conclusions about decadal changes in the strength of the AMOC in order to determine whether the circulation is changing in strength. This section focuses on changes in the transports and, to a lesser extent, MLDs from the three reanalyses that have made hindcasts since at least 1960 (SODA, DePreSys, and ORA-S3). None of the others extend back that far. Less weight is given to the results from DePreSys and, in particular, ORA-S3 in light of their apparent weaknesses in reproducing the observed transport of the northern North Atlantic compared to SODA.

a. Transport trends and variability
The observational record across the GSR (between 1994 and 2007) is long enough to allow a reasonable comparison to be made of all the reanalyses (except NCEP). All the reanalyzed transports are consistent with the small observed decrease in magnitude (20.3 Sv; Table 2).
It is more difficult to make significance estimates of the reanalysis trends for the longer period from 1960 to 2007 because to do so requires an estimate of variability on multidecadal time scales. At 268N the situation is complicated by the recent rapid decline in the strength of the AMOC (Smeed et al. 2013), which has occurred since the end of the long-term reanalyses. For systems such as the ocean, where there is significant autocorrelation, estimations of this variability are normally done using state-of-the-art climate models (e.g., Hegerl et al. 2007). Future publications will aim to make such estimates. Here the reanalyzed changes are assessed against a simple white noise model with magnitude determined from the trend residuals. Any change in a reanalyzed transport is not significant if its trend can be explained by random interannual variability.
The three long-term reanalysis agree that between 1 January 1960 and 31 December 2007 there was very little change in the GSR inflow (Table 4), which is also  (Table 4 and Fig. 6). At the same time the DWBC at line D in Fig. 1 increased by about 0.5 Sv decade 21 . These results taken together suggest that the AMOC increase in SODA has been driven by changes in the Labrador Sea, which then propagate through the Atlantic. DePreSys also shows a small increase in transport at both 268 and 438N, although the trend is only significant at 438N. By contrast ORA-S3 has a statistically significant decrease at 268N of 0.8 6 0.2 Sv decade 21 (close to the 0.7 Sv decade 21 for a slightly different period in Balmaseda et al. 2007) and also at 438N. The increase in the AMOC at 268N reported by the more reliable reanalyses is consistent with other reported reanalysis studies (e.g., 0.5 Sv decade 21 from GECCO; Wang et al. 2010;Huang et al. 2012) but conflicts with ORA-S3 and the ensemble hindcast of Olsen and Schmith (2007) (20.4 6 0.5 Sv decade 21 since 1947). There appears to be considerable year-on-year variability in transports at all the sections (Fig. 6). With the exception of the Labrador Sea, SODA and DePreSys generally agree with the timing of the fluctuations (which are larger in SODA) while ORA-S3 has a very different pattern of variability, especially prior to 1980.

b. MLD trends and variability
The three long-term reanalyses all find that the winter MLDs in the convectively driven Nordic and Labrador Seas have deepened since 1960 (Table 5)   (where entrainment may be as important as convection) SODA and DePreSys produce very little change, while ORA-S3 has a stronger shallowing (29 6 4 m decade 21 ). Over the northern North Atlantic, the best estimates suggest that in the last 50 years the MLD has deepened by between 25 and 50 m, but only DePreSys shows trends (all deepening) in MLD that are statistically significant. The MLD interannual time series also show considerable year-on-year variability superimposed on decadal variability (Fig. 7). In SODA the variability in the Labrador Sea MLD appears to be characterized by deeper MLDs in the 1970s and 1980s and shallower ones in the 1960s, 1990s, and 2000s. The Nordic seas show no systematic decadal variability while the pattern in the SPG is the mirror of the Labrador Sea with deeper MLDs in the 1960s and 1970s.
As with transport, MLD variability in DePreSys and SODA correlate in the Nordic seas and SPG but not in the Labrador Sea, where the mismatch is so great that it masks the overall variability in the northern North Atlantic. By contrast MLD variability in ORA-S3 is generally uncorrelated with the other reanalyses.
c. Is there a relationship between MLD and transport?
If there were a simple relationship between transport and winter MLD then all reanalyses would find that winter MLD variability at high latitudes was correlated with the transports across those lines, even if they disagreed about timing and magnitude of such changes. This was tested by computing correlations between time series of MLD averaged over the four regions discussed above and the four transport lines. Correlations are statistically significant, against a white noise null hypothesis, when they are larger than 0.3. Both SODA and ORA-S3 had significant correlations with the winter MLD in the subpolar gyre and the Labrador Sea flow (not shown). However, these were of opposite sign suggesting that any such link is dependent on the reanalysis being used. DePreSys had several significant correlations between average MLD and transport at high latitudes. However, none of these appeared in other reanalyses, suggesting no consistent pattern either between the reanalyses. Given this result, it appears that changes in winter MLD in the reanalyses are unlikely to be the primary drivers of the AMOC changes.

Conclusions
This investigation has broken new ground by using transports to assess the skill of six ocean reanalyses to reproduce the circulation in the North Atlantic and comparing them with observed time series datasets at four key sections or boundaries. It has also compared the circulation patterns in the SPG with a synthesis of drifter observations and mixed layer depths with climatology. It has gone further than other comparisons, which in general have focused on the comparing state variables, by using the objective scoring scheme in section 6 to rank the reanalyses. We maintain that it is proper to undertake such a ranking exercise because it will help the authors of reanalyses to improve their products. We hope that others will adopt this or a similar ranking scheme.
SODA was the best reanalysis with which to assess the history of the circulation in the North Atlantic from 1960 to 2007. Its small trend in GSR transport is insignificant, but elsewhere its trends are significant despite considerable interannual variability. Our analysis indicates over the 48 years to 2007 the AMOC at 268N has increased at a rate of 0.7 Sv decade 21 , or about 20% (3.5 Sv) in total. Circulations in the Labrador Sea and at 438N also appear to have increased, although against a background of considerable variability that in the Labrador Sea includes a multidecadal signal. The other two (less reliable) long-term reanalyses agree that there has been little change in the transport across the GSR, but the AMOC in DePreSys shows a small increase (1.0 Sv at 268N) while ORA-S3 has a larger decrease (4.5 Sv).
Changes in MLD are strongest in the Labrador Sea and weaker elsewhere in the northern North Atlantic. It would thus be tempting to take a simplistic view and say that in SODA the changes in the AMOC are being driven by commensurate changes in deep-water formation in the Labrador Sea. However, examination of the time series of the MLD and outflow from this region (Figs. 6b and 7b) suggest this is not the case, at least on decadal time scales, since the outflow did not respond consistently to the shallowing of the mixed layer in the 1970s and 1980s.
It is recognized that the transport will also be driven by changes in large-scale wind stress patterns such as those implicit in the North Atlantic Oscillation (e.g., Hurrell and Deser 2009). However, it is beyond the scope of this investigation to examine the relationships between ocean circulation and atmospheric forcing, which are likely to be quite noisy because of the chaotic nature of the atmosphere and ocean system.
Finally, the better performance of both of the highresolution reanalyses suggests that they are capable of becoming more skilful, particularly with finer resolution and better parameterization of processes. However, one can never be sure how suited they are for historical periods when few direct observations of the ocean state existed. It is for this reason that existing time series observations of ocean transport need to be continued, particularly at the GSR, line D and line R in Fig. 1. Finally, it is recommended that reanalyses do not attempt to assimilate transport so that there remains a sensitive independent metric by which they can be judged.