Achieving realistic Arctic-midlatitude teleconnections in a climate model through stochastic process representation

The extent to which interannual variability in Arctic sea ice influences the midlatitude circulation has been extensively debated. While observational data supports the existence of a teleconnection between November sea ice in the BarentsKara region and the subsequent winter circulation, climate models do not consistently reproduce such a link, with only very weak inter-model consensus. We show, using the EC-Earth3 climate model, that while a deterministic ensemble of coupled simulations shows no evidence of such a teleconnection, the inclusion of stochastic parameterizations to the ocean and sea ice 5 component of EC-Earth3 results in the emergence of a robust teleconnection comparable in magnitude to that observed. We show that this can be accounted for entirely by an improved ice-ocean-atmosphere coupling due to the stochastic perturbations. In particular, the inconsistent signal in existing climate model studies may be due to model biases in surface coupling, with stochastic parameterizations being one possible remedy.

Hydrology Tiled ECMWF Scheme of Surface Exchanges over Land (H-TESSEL) (Balsamo et al. (2009)). The ocean model component is represented by the NEMO model version 3.6 (Madec and the NEMO team (2016)) which includes the LIM3 sea ice model (Vancoppenolle et al. (2012)). The atmospheric model is run at a spectral resolution of T255, which corresponds 95 to an approximate grid spacing of 80km at the equator, with 91 vertical layers. Note that this model produces a reasonable looking quasi-biennial oscillation ). NEMO is run at a resolution of around 1 • with 75 vertical layers. Note that, as discussed in Haarsma et al. (2020), the original NEMO configuration for EC-Earth3P produced an Atlantic Meridional Overturning Circulation (AMOC) with unrealistically low values: the configuration used in these experiments corresponds to the p2 configuration discussed in ibid, and therefore does have a realistic AMOC. We also note that the EC-Earth3P The stochastic simulation OCE on the other hand has three stochastic ocean schemes (Juricke et al. ( , 2018) and one 105 stochastic sea ice scheme (Juricke et al. (2013); ; ) switched on.
The stochastic ocean schemes are based on perturbations to 1. the classical Gent-McWilliams parametrization for eddy induced advection (Gent and Mcwilliams (1990)) used in coarse resolution, non eddy-resolving ocean simulations (henceforth StoGM); 110 2. the enhanced vertical diffusion parametrization which is used for unstable stratification (henceforth StoDV); 3. the turbulent kinetic energy (TKE) parametrization through which the amplitude of vertical diffusivity and viscosity are obtained (henceforth StoTKE); For the deterministic control simulation, all above mentioned parametrization schemes are used in their default, non-stochastic form.

115
The stochastic ocean schemes have been explained in detail and tested in long ocean-only simulations by Juricke et al. (2017) and have also been tested in coupled seasonal forecasts by Juricke et al. (2018). These studies showed that the StoGM and StoTKE schemes in particular can have a considerable impact on near surface variability in regions of strong horizontal gradients (StoGM) or strong atmosphere-ocean interactions (StoTKE). The StoGM scheme showed the strongest impact on variability in western boundary currents such as the Kuroshio and the Southern Ocean, as it varies the effective amplitude of 120 eddy induced temperature and salinity advection. The StoTKE scheme on the other hand had a pronounced impact on variability and mean state in the tropics and also in mid-latitudes, as it can affect the response of the mixed layer to atmospheric forcing. The StoDV scheme showed only a very limited response in these previous studies, as its variations only matter in areas of deep convection in the high latitudes and even there only at times of strong and deep convective activity. However, the effect of these schemes has so far not been tested in long (multidecadal) coupled simulations. 125 contributors when it comes to sea ice thickness distributions (Juricke et al., 2013). The StoSIS scheme simulates the variations and uncertainties in the local ice strength and can lead to either faster or slower ice convergence, where the non-linearity in the process leads to a stronger response for weak ice compared to strong ice (Juricke et al., 2013). Consequentially, ice tends to move faster under stochastic ice strength perturbations until the effect is balanced by thicker sea ice that also acts to strengthen the resistance towards plastic deformation. Furthermore, ridging tends to be stronger with StoSIS, especially along coastlines 140 if the ice is moved towards the coast (Juricke et al., 2013). However, due to the strongly coupled system consisting of sea ice, atmosphere and ocean in the high latitudes, the climatological response both with respect to mean changes in the sea ice as well as surface flux variability varies between uncoupled (large increase in sea ice volume) and coupled (balancing increase of thick ice vs decrease of thinner ice) simulations . The sensitive response of StoSIS and, consequently, sea ice dynamics to the atmospheric coupling is one of the main foci of this study. 145

Model simulations and observational data
For each of the two configurations CTRL and OCE, described in the previous section, three ensemble members were generated according to the hist-1950 experimental protocol (Haarsma et al. (2020)). Each member is therefore initialised on January 1st 1950, and run with observed anthropogenic forcing until January 1st 2015: these simulations thereby span 65 years. Differ-150 ent ensemble members are created by slightly perturbing the upper air temperatures at 5 randomly chosen grid points: the atmospheric variability of the members are found to be effectively uncorrelated after just a few months.
To estimate the impact of mean state biases, we will also consider simulations with prescribed SSTs and sea-ice. A set of three deterministic ensemble members were generated in the same way by initial condition perturbation: each member then simulates the period 1950-2015. This ensemble, and its experimental configuration, will be referred to as AMIP. Note that in 155 accordance with the HighResMIP Protocol, the prescribed forcing uses daily data, as opposed to the more common monthly forcing. This in principle allows the simulations to simulate more sub-seasonal variability.
In order to further assess the statistical significance of the teleconnection, we make use of additional deterministic coupled simulations using the slightly earlier version EC-Earth3.1, as generated for the Climate SPHINX Project ).
The data we make use of consists of three ensemble members spanning the period 1900-2015 using historical forcings. Finally, for observational data, we make use of the ERA5 data set (Hersbach et al. (2020)). The OSI450 sea ice data set ((Lavergne et al., 2019)) is also used in Section 4.1 to compare the teleconnection region identified in ERA5.

Methods
The NAO is defined, for each data set, as the leading principal component (PC) of daily, detrended geopotential height anomalies at 500hPa (zg500). The sign of the principal component is imposed to make the corresponding empirical orthogonal 165 function match the standard NAO pattern. Because we will be making use of daily data, a daily climatology is directly fitted to the data and subtracted from the daily PC. All available data is used for each data set, though no meaningful differences were found if this procedure was carried out using, e.g., only data from 1980 onwards.
Timeseries of daily sea-ice concentration (siconc) anomalies in a given region is computed by averaging over grid points, detrending and removing a directly fitted daily climatology. As with the NAO, this is done using all available data, before 170 restricting to specific time periods. Two regions will be considered in this paper: Barents-Kara (BK), defined by the box 67N-80N, 30E-75E, and Barents-Greenland (BG), defined by the box 65N-82N, 5E-60E.
Correlations are computed as Pearson correlation coefficients. In plots showing correlations at multiple grid points, statistical significance at the 95% confidence interval is computed using the standard error. In plots showing changes to the mean state at grid points, significance is estimated with a two-tailed t-test, with no equal variance being assumed. In plots showing changes 175 in the standard deviation at grid points, significance is estimated using the Bartlett test for SSTs, and the Levene test for sea ice. When computing correlations between sea ice and NAO timeseries (as in Table 11), a null hypothesis of each timeseries as an AR1 process is assumed. By fitting such a process to each timeseries and simulating 10,000 random draws, we obtain confidence intervals for the correlations expected by chance. For ERA5 and the individual ensemble members of CTRL and OCE, which have a sample size of 35 (covering the 35 complete DJF seasons between January 1980 and December 2015), a 180 95% confidence interval is approximately given by ±0.35. For the concatenated timeseries of CTRL and OCE, which have a larger sample size of 35 · 3 = 105, the 95% confidence interval is approximately ±0.2, with a 99% interval given by ±0.24.
Taking ensemble means does not make sense for free-running models, so instead, when evaluating differences in time series between the CTRL and OCE ensembles (of the NAO, sea ice, or some grid point), the time series of each ensemble member are typically concatenated back to back prior to comparison. Note that in a standard forecasting context, where one is comparing 185 an ensemble of forecasts x i for i = 1, . . . , N , against a fixed observational time series y, the correlation between the ensemble mean and y equals the correlation between the time series of the concatenated members x i and the time series obtained by concatenating y with itself N times. Therefore, the concatenated time series can be thought of as a natural extension of the ensemble mean which makes sense even when the 'observed' time series y is no longer fixed (as is the case when comparing CTRL and OCE).

190
Finally, in this paper we will generally focus on the 35 winters between 1980 and 2015, where observational estimates are particularly trustworthy. Teleconnections in model data will be considered over earlier time periods as a test of significance and consistency. of the stochastic ocean schemes in longer coupled simulations with EC-Earth has not previously been documented in the literature.  Figure 1 shows the differences in the mean and standard deviation of sea ice concentration (siconc) for CTRL versus ERA5, as well as the impact on these differences in OCE, across 35 November months from the period 01- 01-1980 to 31-12-2015; note that data is only drawn from the 35 winters spanning a full November-February season. A significant bias around the 200 Greenland and Barents seas can be seen in the CTRL experiments, with the model producing too much sea ice with excessive interannual variability. The OCE model considerably reduces the mean state bias in the Barents sea, with only a limited impact on the Greenland sea bias. The bias in the standard deviation is also reduced, both in the Barents and Greenland seas, though the changes are not significant at every gridpoint. Similarly, the CTRL bias in the Labrador sea is also reduced with OCE, both with respect to the mean and the variability, but these changes do not show up as significant. On the other hand, OCE appears to have a neutral or negative impact on the biases in the Kara and Bering seas. All these impacts were found to be qualitatively similar when considering the DJF period instead.
The general tendency for the StoSIS to reduced sea ice concentration along the ice edge due to stronger ice motion towards the coast of Greenland and the Canadian Arctic Archipelago has also been observed in coupled climate simulations with the AWI-CM model by . This result is therefore consistent with their physical explanation of accelerated  The changes in the mean and standard deviation in both model simulations with respect to ERA5 suggests that the general patterns of interannual sea ice variability are not the same in the reanalysis and the models. Figure 2 shows the leading empirical orthogonal function (EOF) of November sea ice for ERA5 and the model data, demonstrating that this is indeed the case, with all three data sets exhibiting visibly different EOFs. Of particular relevance to the topic of this paper is the fact that while both 215 ERA5 and OCE show a lot of variability in the Barents-Kara region, the CTRL model has most of its variability concentrated in the Bering sea. This will be discussed further in the next section. showed, using shorter simulations, that the stochastic ocean schemes improve the variability there even at depth. The reduced mean state bias in the North Atlantic is a new result: CTRL exhibits the common model bias of a cold North Atlantic (Wang et al. (2014)), which is alleviated in OCE. This bias reduction comes hand in hand with a bias reduction of the North Atlantic jet, as measured using zonal winds at 850hPa: see Figure B1 in Appendix B. The close link between North Atlantic SST biases and the jet have been examined in, e.g., Keeley et al. (2012). Somewhat counter-intuitively, the stochastic schemes generally 225 decrease the interannual variability of SSTs. In effect, a small increase in variability is found in the Gulf Stream and Kuroshio regions at all timescales and seasons, with a general decrease elsewhere, especially the equatorial Pacific. The changes in the North Atlantic and Pacific will be revisited again in later sections. Examination of other variables support an overall conclusion that the stochastic sea ice and ocean schemes are having only a moderate impact on the climate mean state, and do not, for example, alter the net surface global energy (not shown). It is 230 likely that this limited impact is due to the fact that the 1 • ocean does not permit eddies and is strongly damped by viscosity.
Consequently, the stochastic ocean schemes, which are primarily attempting to perturb the variability of turbulent processes, cannot achieve much without making the perturbations extreme in magnitude . This viewpoint is supported by the fact that in separate experiments using a 0.25 • ocean, the impact of the same schemes on SST variability was much greater (not shown). The main exception to this is in the Arctic and regions such as the North Atlantic, where, firstly, the sea 235 ice perturbations play an active role, and, secondly, where the interplay between Gulf stream variability, atmosphere-ocean coupling and vertical mixing in the ocean is large enough for the ocean perturbations to have a bigger non-linear mean impact.

Identification of the centres of action
We examine the impact of stochasticity on the teleconnections between Arctic sea ice in November and the winter NAO.

240
Studies in the literature often do this by first creating a timeseries of sea ice anomalies for a chosen region, such as Barents-Kara, and then regressing this against atmospheric variables (e.g., zg500 or sea level pressure) at each gridpoint, or against an NAO timeseries (e.g., Blackport et al. (2019); Koenigk and Brodeau (2017)). However, as shown in the previous section (c.f. particularly the Barents sea, acts as a transition zone between the open seas in the south and the permanently ice-covered north (Deser et al. (2000); Vinje (2001); Koenigk et al. (2009)). The strong variability here is therefore largely reflecting variations in where the sea ice edge ends up extending to each winter, with the primary driver being the extent to which anomalous winds transport sea ice into the region from the central Arctic (see, e.g., Koenigk et al. (2009) for a comprehensive overview). The strong temperature gradients between the relatively warm ocean and the cold atmosphere aloft mean that these variations in the 250 extent of the sea ice edge are associated with heat fluxes anomalies reaching as high as 500W m −2 (ibid). Because it is these heat flux anomalies that are hypothesised to influence the circulation, as opposed to the sea ice per se, we strongly argue that an examination of Arctic-midlatitude teleconnections using free-running climate model data needs to take into account the fact that models will inevitably drift to their own preferred sea ice edge with their own leading modes of variability. Because of the possible dependency on the mean state (Deser et al. (2007); Strong and Magnusdottir (2010)), model drift in its atmospheric 255 component may also play a role here, due to interactions with the North Atlantic storm track.
To do this, rather than prescribe the sea ice region of interest up front, we specify the NAO as the fixed phenomenon of interest, and then correlate the winter NAO index with (detrended) November sea ice anomalies at every gridpoint of the Arctic. Spatially coherent regions of significant correlations that coincide with the peaks of the sea ice EOF patterns are taken as evidence of a potential teleconnection, which we can then examine further. The result of this process is seen in Figure 4 The Barents sea, and to a lesser extent, the Greenland sea, correspond to regions of high EOF values. Note that, due to the large sample size of 105 years, the correlations are found to be significantly different from 0 at every gridpoint outside the zero contour for CTRL and OCE. The question of statistical significance is addressed in depth in the next section. The reason the 270 signal appears to be bigger and covering a larger region when using the full time-period is likely due to the sea ice loss that occurs between 1950 and 2015 (not shown). Both models lose a considerable amount of sea ice in the Greenland, Barents and Kara seas, with the OCE model losing somewhat less in Barents and Greenland and somewhat more in Kara. The loss of sea ice in the Greenland sea in particular is associated with a permanent retreat of the sea ice edge, and a subsequent loss of any interannual variability there. From these observations we draw the following conclusions. For observational data, the Barents-Kara region clearly stands out as the region where a potential teleconnection link with the NAO may occur. In the OCE model, the centre of action appears shifted west to the Barents-Greenland region, with the importance of Greenland diminishing over time as the sea ice retreats 11 https://doi.org/10.5194/wcd-2021-61 Preprint.  Table 1. Correlations between November sea ice anomalies in the BK (resp. BG) region for ERA5 (resp. EC-Earth3 experiments) and the DJF NAO mean, over the period 1980-2015. The "Raw correlations" row contains values estimated using the actual model and reanalysis data, while the "LIM correlations" row contains estimates from the Linear Inverse Model reconstructions (cf Section 5??). Subscript labels CTRLn and OCEn (N=35) denote ensemble members 1-3, while the entry for CTRL and OCE (N=105) uses the concatenated timeseries.
Entries that are significant (p<0.05) are marked in bold. Significance is measured against a null hypothesis of uncorrelated, random AR1 processes.
due to global warming. In the CTRL model, the only notable signal appears to be from the Bering sea. Because this is consistent with the Bering sea being a region of peak variability in CTRL, it is possible that this region should be viewed as the equivalent 280 of 'Barents-Kara' for the deterministic model. While some studies, such as Koenigk et al. (2016), have examined the impact of sea ice from other regions in the Arctic, this is not common, with the majority of studies focusing attention on either the Arctic as a whole or Barents-Kara in particular. In order to keep our analysis aligned with existing studies, and because of both the remoteness of the Bering sea relative to Barents-Kara sea and the lack of such a signal from the Bering sea in observational data, we will not pursue the role of the Bering sea further.

285
For the remainder of this paper, we will therefore be examining and comparing the teleconnections between the winter NAO and the mean November sea ice anomaly across the following regions: 1. For ERA5: Barents-Kara (BK), defined by the box 67N-80N, 30E-75E.
To allow for ease of language, we will sometimes omit the abbreviations BK and BG and simply refer informally to 'the 290 teleconnection', 'the ice-NAO teleconnection', or 'the sea ice', with the implicit understanding that this should be understood in a context-dependent sense.

Assessment of statistical significance
The "Raw correlations" row in Table 1 summarises the correlations between the November sea ice and the winter NAO over the period 1980-2015. To assess the statistical significance of the correlations, a null hypothesis of each timeseries as an AR(1) 295 process is assumed. By fitting such a process to each timeseries and simulating 10,000 random draws, we obtain confidence intervals for the correlations expected by chance. For ERA5 and the individual ensemble members of CTRL and OCE, which have a sample size of 35, a 95% confidence interval is approximately given by ±0.35. For the concatenated timeseries of CTRL and OCE, which have a larger sample size of 35 · 3 = 105, the 95% confidence interval is approximately ±0.2; a 99% interval is given by ±0.24. Correlations significant to the 95% threshold are shown in bold in Table 1.

300
It can be seen that ERA5 exhibits a significant positive correlation of ≈ 0.39, consistent with that reported by many other studies using a variety of techniques and observational datasets. While every ensemble member of OCE shows a positive correlation, only one is significantly different from 0. However, after concatenation, OCE has a correlation of 0.29, which is even significant at the 99% threshold. By contrast, and as expected from Figure 4, neither individual CTRL members, nor the concatenated data set, shows significant correlations, with all showing a small negative correlation. 305 Figure 5. In (a), timeseries of the correlations between November sea ice and the DJF NAO across successive 30-year chunks 1950-1980, 1951-1981, . . . 1985-2015; for the three CTRL members (red), and the three OCE members (blue). In As discussed in the introduction, there remains considerable uncertainty in the literature concerning the robustness of the teleconnection. In particular, it has been suggested that the observed correlation is in fact just a spurious signal arising from decadal variability in the coupled ocean-atmosphere dynamics (Koenigk and Brodeau (2017)). Therefore, while the raw correlations of Table 1 already support a conclusion that the stochastic ocean and sea ice schemes have, at least partially, restored a teleconnection in observations that was missing in the deterministic model, it is appropriate to assess the robustness of this  1950-1980, 1951-1981, . . . , 1985-2015, for each ensemble member. Note that values inferred from the last data points of this plot will differ somewhat from those of Table 1 due to the shorter 30-year time-periods being considered.
While all the simulations appear to be initialised into a period of positive correlations, the correlations are systematically smaller for CTRL, and after only 5 years all three CTRL members appear to have drifted to a state with no robust teleconnection.

315
None of the CTRL members achieve a correlation as high as that of ERA5 across any 30-year time-period, where we note that the correlations in ERA5 vary roughly from 0.37 to 0.41 across the 30-year chunks between 1980 and 2015. The decadal variability in the CTRL correlations also suggests that the slightly negative correlations found in this recent time-period is mostly coincidental, consistent with the lack of statistical significance of these correlations. On the other hand, all three OCE members maintain positive correlations across every chunk, and frequently attain values comparable to ERA5. This implies 320 that the highly significant correlation obtained when concatenating all three members is not just a result of a lucky choice of time-period, but a signal that can be consistently located across the entire simulation period: for each 30-year chunk, the concatenated ensemble members of OCE are found to produce statistically significant correlations.
To further assess the likelihood that the behaviour seen in the OCE experiments is just chance, we can compare the distri- the likelihood of all of them consistently attaining correlations exceeding 0.2 by chance is therefore less than 0.13%. In reality, the assumption of independence is unlikely to be strictly true, since the initial ocean and sea ice state may bias the experiments towards certain decadal trends. However, since the CTRL members fail to manifest similar behaviour, such a dependence between members is likely to be weak, and either way, even if our crude estimate of 0.13% is off by an order of magnitude, the conclusion would still hold that the behaviour of OCE is unlikely to be purely coincidental. We note that the same conclusion 335 was found to hold if we boosted the sample size of deterministic correlations by adding in those observed in the period 1980-2010 across the CMIP6 ensemble (not shown). Indeed, the distributions from CTRL, SPHINX and CMIP6 all appear to be equivalent, ranging between ±0.3 with a mean of 0.
We interpret this analysis as supporting the assertion that the teleconnection is significantly stronger in the OCE experiments relative to CTRL, and the remainder of the paper will take this assertion as the starting point for further analysis. In the 340 following sections, the goal of our analysis is to understand in more detail how the teleconnection has changed and why. We therefore conclude that the AMIP ensemble does not exhibit an ice-NAO teleconnection. The same conclusion holds when using the BG region instead.

Reconstructing the teleconnection from daily time scale coupling
The fact that prescribing the ocean and sea ice forcing does not result in the deterministic model exhibiting a significant teleconnection implies that a crucial role is being played by the dynamic coupling of the atmosphere and the ocean/ice. The 355 potential importance of coupling to maintain the circulation response associated with Arctic sea ice anomalies was already emphasised in Strong and Magnusdottir (2011). To gain additional insight into this, we examine the temporal evolution of the teleconnection. This is done in Figure 6, which shows the regression coefficients of November BK (resp. BG) sea ice anomalies against monthly zg500 anomalies in November, December and DJF in ERA5 (resp. CTRL and OCE).
The initial November response is fairly similar in all three data sets, with a tripole pattern formed by a low over the Kara sea, 360 a high near Greenland and a low near the eastern coast of North America. In the following months, both ERA5 and OCE see this tripole evolve into a larger low centred on Greenland, with a broad high across the North Atlantic and Europe; in DJF this pattern clearly corresponds to a positive NAO for both. In CTRL, on the other hand, the same evolution never takes place, and seasonal evolution simply consists of a slow dissipation of the initial November response. Because the initial response projects weakly onto the negative NAO, the correlation between November BG sea ice and the winter NAO is often slightly negative, 365 as seen in Figure 5(a). Similar plots showing the initial heatflux anomalies (Fig B3) and the temperature anomaly evolution ( Figure B4) corroborate this story, with a similar initial response that evolves more realistically over time in OCE. Note that qualitatively similar results are found if we use the BK region for the CTRL experiments.
The ability of OCE to evolve the same initial response better than CTRL might be plausibly attributed to changes in persistence, either of the NAO or the sea ice. However, since the atmospheric component is identical in OCE and CTRL, there is no 370 obvious mechanism for how the persistence of the NAO might change, and the lack of a teleconnection when prescribing the sea ice rules out the persistence timescales of the sea ice alone from explaining the change. Figure B5 confirms that the daily autocorrelation of the NAO has not changed notably between OCE and CTRL; both OCE and CTRL have much higher sea ice persistence than ERA5, but again, the difference between OCE and CTRL is small. By contrast, Figure 7 shows a marked dif- ference between OCE and CTRL in the cross-correlation. While there is no clear difference between observations and models 375 in the cross-correlation when the NAO leads the sea ice, the OCE model appears to be considerably more realistic when the sea ice leads the NAO, with CTRL showing virtually no impact of sea ice on the NAO whatsoever. Note that the opposite signs of the cross-correlation based on whether the ice or the NAO is leading suggests a natural physical interpretation of the coupling.
A positive sea ice anomaly in the BK or BG region (i.e., an extension of the sea ice edge) leads to a reduced local heatflux into the atmosphere, which, via a combination of Rossby wave forcing and changes to the meridional temperature gradient, 380 force the positive phase of the NAO. This corresponds to a northward shift of the eddy-driven jet (Woollings et al. (2010)), which would lead to anomalous wind stress in the BK (or BG) region and a consequent reduction of the initial positive sea ice anomaly. The importance of wind forcing was emphasised in Koenigk et al. (2009). The more northerly jet may also lead to shifts in the sea ice more broadly which are potentially important to support a realistic evolution of the initial atmospheric response.

385
In order to test if the change in daily coupling between the ice and the NAO can account for the changes in the teleconnection, we model the system using the following pair of coupled ordinary differential equations: Here NAO is just the daily NAO index, and ICE is the daily sea ice anomaly in either BK or BG depending on the data set.

390
The coefficients a and d are capturing the presence of autocorrelation, while the coefficients b and c capture the presence of coupling; the ξ terms in both equations represent the residual forcing on both quantities, and are assumed to be random Gaussian processes with no temporal autocorrelation and a mean of 0. This model (hereafter referred to as the LIM), has been used extensively in the literature to capture coupled variables in climate data, such as atmosphere-ocean coupling, and the coefficients and noise terms can be estimated using Linear Inverse Modeling (see, e.g. Penland and Sardeshmukh (1995);  CTRL (resp. OCE) data is marked in red (resp. blue), with crosses used for ensemble members and plus signs for the concatenated data.
ERA5 is shown with a black cross. Daily data covering November-February, 1980-2015, is used. Figure 8 shows the estimated LIM coefficients for ERA5 (black), CTRL (red) and OCE (blue), where daily data spanning  Table 1. Comparison with the first row suggests the LIM has, to a large extent, reproduced the observed correlations for all the data sets: plots of the reconstructions are shown in Figure B6 of the SI. In other words, the seasonal timescale teleconnection appears to be explainable using only information about daily timescale coupling and initial conditions.
Having thus confirmed the LIM's ability to reproduce the teleconnection to good accuracy, we may with some confidence ex-410 amine the variations in the coefficients seen in Figure 8. Two obvious features stand out. Firstly, all EC-Earth3 data experiments have a d coefficient which is too small (Figure 8d), which corresponds precisely to their overly high sea ice autocorrelation ( Figure B5). Secondly, the coupling coefficients b and c are closer to ERA5 in the OCE experiments compared to CTRL, especially for the b coefficient. The larger b coefficient in particular corresponds to the increased daily lag-correlation between sea ice and the NAO seen in Figure 7. As a final note, little difference was found in the estimated magnitude of the noise terms 415 ξ NAO and ξ ICE across CTRL and OCE, though the latter was found to be smaller in magnitude compared to ERA5. This is consistent with the overly high autocorrelation in modelled sea ice ( Figure B5).
To conclude, analysis of the LIM model suggests that the teleconnection (or lack thereof) in each data set considered can be accounted for from daily timescale coupling alone, and that the strength of the coupling is quantitatively stronger in OCE than in CTRL. If one collapses the coupling coefficient b to 0 (not shown), the LIM fails to reconstruct the observed correlations for 420 OCE and ERA5, which supports the hypothesis that this coupling is a crucial component of the teleconnection.

Alternative Hypotheses
We now consider three alternative hypotheses for why the observed teleconnection may have changed: unforced internal atmospheric variability, changes in tropical forcing, and changes in the atmosphere-ocean coupling in the North Atlantic.

425
It has been argued that the observed teleconnection may, to a large extent, be driven by (possibly chaotic) atmospheric variability. To assess if this is playing a role here, we will use the framework of Blackport et al. (2019). For each November between 1980 and 2015, the signs of the average ocean heatflux (latent+sensible) and sea ice anomalies across the month are examined; denote these by S f lux and S ice respectively. Note the heatflux sign-convention is taken such that a positive sign corresponds to a net flux up into the atmosphere. Physical reasoning justifies the following classification: November months 430 with {S f lux > 0, S ice < 0}, or {S f lux < 0, S ice > 0} are classified as ones where the sea ice is forcing the atmosphere. Similarly, November months with {S f lux < 0, S ice < 0}, or {S f lux > 0, S ice > 0} are classified as ones where the atmosphere is forcing the sea ice. By restricting to these two disjoint subsets of years, one can assess which class of winters contribute the most to the teleconnection. Figure 9. Regression coefficients between November BK (resp. BG) sea ice anomalies and DJF zg500 anomalies for ERA5 (resp. CTRL and OCE). In (a) all years, (d) years where the ice drives the atmosphere and (g) years where the atmosphere drives the ice. In (b), (e) and (h): the same but for CTRL. In (c), (f) and (i): the same but for OCE. The value of C in the headers is the correlation between the sea ice timeseries and the DJF NAO index when restricted to those years. The value P is the proportion of years falling into the given category for that data set.
The period used is 1980-2015. Stipling highlights gridpoints where the associated correlation coefficient is statistically significant (p < 0.05).
The result of this process for ERA5, CTRL and OCE is shown in Figure 9. The first column ((a), (d) and (g) and (i)) shows the result for OCE. As in ERA5, both subsets project positively onto the NAO, but the entirety of the overall teleconnection appears to be accounted for by years where the sea ice is driving the atmosphere. In these years, OCE shows a correlation of almost 0.5 between November BG sea ice and the DJF NAO, primarily driven by the considerable negative pressure anomaly extending across Greenland. We note that the proportion of years spent in each of the two subsets is almost identical across all three data sets, implying that the results have not been contaminated by any spurious, non-linear effects 445 (e.g., due to changes in the mean sea ice cover).
The fact that the teleconnection in OCE arises almost entirely from forcing of the sea ice on the atmosphere is consistent with the results of both the LIM and lag-correlation analysis, which suggested that the primary difference between CTRL and OCE was in the daily timescale forcing of sea ice on the NAO. We take this as strong evidence for the hypothesis that changes in the teleconnection are driven by the local impact of stochasticity on the surface coupling, as opposed to random, atmospheric 450 variability unrelated to the stochastic schemes. Because in ERA5 the signal is dominated by years where the atmosphere drives the sea ice, and the opposite happens in OCE, it is possible that OCE may be getting 'the right answer for the wrong reasons', and its sea ice forcing is actually unrealistically strong. However, since the daily timescale forcing in OCE closely matches that in ERA5, this is not obvious. It may rather be that the difference seen between the two subsets of years in ERA5 is partially due to chance, and/or that there is some additional source of predictable atmospheric forcing which is present in ERA5 but not 455 in OCE (e.g., from the stratosphere).

The role of tropical forcing
The results of the previous Section 6.1 already indicate a limited role for external atmospheric forcing, including that originating in the tropics. However, it has been argued that the sea ice induced anomalous wave may require favourable atmospheric conditions, such as a favourable storm track, in order to grow, and such conditions may be influenced by the tropics. Addition-460 ally, it is possible that the sea ice state in November is influenced by some preceding or concurrent tropical forcing. In order to check for such potential links, we correlated October (resp. November) gridpoint SSTs with the BK/BG sea ice timeseries (resp. DJF NAO index) for each data set. The results are shown in Figure 9.
While ERA5 shows some signs of a weak link between October ENSO and the November sea ice, this is not present in either CTRL or OCE. The only consistent signal between all three data sets is the expected local link between SSTs and sea 465 ice. In terms of the DJF NAO, ERA5 and OCE both share a signal around the BK/BG region, consistent with the sea ice teleconnection, but otherwise have no common signals. CTRL, on the other hand, exhibits a significant link between the DJF NAO and tropical SSTs covering the ENSO region and extending out to the Indian Ocean. This link is not present either in ERA5 or OCE. The fact that deterministic models may sometimes exhibit dynamics that are too strongly driven by ENSO has been noted previously, and stochastic physics schemes have been shown to alleviate this (Strømmen et al. (2017)). The removal of this unrealistic signal in OCE appears to be another incarnation of this. While it may seem plausible that this excessive ENSO signal is contributing to an obscuring of an underlying ice-NAO teleconnection in CTRL, we found that the AMIP simulations do not show such an ENSO-NAO link (not shown), and yet still do not exhibit an ice-NAO teleconnection.
So, while the removal of this signal in OCE is a clear improvement, it appears to be mostly unrelated to the ice-NAO link. Finally, we considered the possibility that the coupling between the atmosphere and the ocean in, e.g., the North Atlantic, has improved, which might be expected to influence NAO variability. To assess possible changes in atmosphere-ocean coupling, we followed a common methodology (Frankignoul et al. (1998); Von Storch (2000)) of correlating monthly SST anomalies with monthly heatflux anomalies at every gridpoint. The result is shown in Figure 11. Figure 11 (b) shows that the CTRL model has several common biases in the coupling, including the tropics and North Atlantic. But from (c) it appears as if OCE is having 480 virtually no impact on these biases. This is in contrast to the impact on the sea ice coupling, as measured using the LIM, where the stochastic schemes lead to a coupling magnitude comparable to ERA5. It is possible that there may be more substantial changes in the coupling occurring on timescales shorter (or longer) than the monthly timescale, but Figure 10 suggests that North Atlantic SSTs have little to no impact on the NAO either way. Therefore, we consider it unlikely that changes to North Atlantic SST variability is making a notable contribution to the altered teleconnection.

Discussion and Conclusions
We briefly summarise the results of our analysis.
-The inclusion of stochasticity to the ocean and sea ice components of EC-Earth3, as evaluated with the OCE ensemble experiments, leads to the emergence of a statistically significant teleconnection between November Barents-Greenland sea ice and the DJF NAO, comparable to that observed with Barents-Kara sea ice in ERA5. No such teleconnection is 490 present in the deterministic EC-Earth3 (the CTRL ensemble; Table 1).
-The shift in the region of interest from Barents-Kara to Barents-Greenland in OCE is consistent with its differing sea ice EOF: in ERA5 the November variability is concentrated in Barents-Kara, while in OCE it is concentrated in Barents-Greenland ( Figure 2).
-Comparison with a large sample of deterministic EC-Earth3 simulations suggests that the odds of generating three 495 random OCE ensemble members which, by chance, all show a consistent ice-NAO teleconnection, is very low ( Figure   5).
-An ensemble of deterministic simulations with prescribed sea ice and SSTs is still not able to manifest an ice-NAO teleconnection, implying that the relevant model biases are related to coupling at the surface.
-Analysis using cross-correlations and a simple LIM model suggests that the teleconnections (or lack thereof) in ERA5,

500
CTRL and OCE can all be reconstructed using estimates of daily timescale ice-atmosphere coupling. OCE is found to significantly improve this daily coupling up to a level comparable to that estimated in ERA5 (Figures 7 and 8).
-Splitting the data into years where the ice drives the atmosphere and years where the atmosphere drives the ice (as in Blackport et al. (2019)) shows that the teleconnection in OCE is primarily accounted for by years when the ice drives the atmosphere, suggesting that the change is not simply due to random, internal atmospheric variability (Figure 9).

505
the Gulf Stream and Kuroshyo regions, neither of which clearly influence the winter NAO in CTRL or OCE. Similarly, we do not find any clear evidence that changes in tropical SST forcing is having an impact on the ice-NAO teleconnection ( Figures 10 and 11).
These results have important implications for the study of Arctic-midlatitude teleconnections, by highlighting the impor-510 tance of realistic sea ice-ocean-atmosphere coupling. While there is a wealth of literature on atmosphere-ocean coupling, less attention appears to have been paid to coupling involving also the sea ice, with a few notable exceptions, such as Strong and Magnusdottir (2011). Our work supports their conclusion that realistic coupling may be crucial to obtain a realistic teleconnection, and furthermore makes it clear that one cannot expect, a priori, that a given climate model does have realistic coupling.
It is therefore possible that some of the inconsistency across different models (Blackport and Screen (2021)), and within long 515 simulations of a single model (Koenigk and Brodeau (2017)), is due to varying model biases in sea ice-ocean-atmosphere coupling. It is also possible that the signal-to-noise paradox in seasonal forecasting is partially due to such biases. The importance of coupling also implies that considering simulations with prescribed boundary conditions, the obvious 'remedy' to the complexities arising from coupled models all having distinct modes of sea ice variability, may not suffice, as the loss of coupling may counteract any gains from a perfect sea ice state. We note that Blackport and Screen (2021) do in fact find that simulations 520 using prescribed boundary conditions exhibit weaker teleconnections than simulations using coupled models.
While the most direct interpretation of our analysis seems to be that stochasticity has altered the surface coupling, we cannot fully rule out the role of mean state changes. This is due to the potential sensitivity of the sea ice signal to a favorable storm track, which implies that getting the right signal may depend on the combined sea ice, ocean, and atmospheric mean state.
As such, the lack of a teleconnection in the AMIP experiments does not conclusively rule out that the mean state changes in 525 OCE (Section 3) are crucial. The analysis of lagged correlations and the LIM model lend support to the role of coupling, but it might be that the mean state of OCE is simply better able to propagate a signal which is otherwise identical to CTRL, and that the changes seen in Figures 7 and 8 are simply reflecting this. Untangling this would likely require more careful analysis of shorter, targeted experiments.
If the stochastic schemes are, in fact, improving the coupling between sea ice, ocean, and the atmosphere, what is the 530 precise mechanism which makes the perturbations have this impact? Here too, more careful analysis would be required which goes beyond the scope of this paper. However, a simple conceptual hypothesis might be the following. The evolution of the deterministic sea ice, including how it responds to atmospheric forcing, may be overly 'regular'. Years where the sea ice forces the NAO may depend sensitively not just on the initial sea ice anomaly, but also on how the sea ice then adjusts due to the initial atmospheric response. The deterministic model may simply be confidently doing these adjustments wrong, with the role 535 of stochasticity being to force the model to not always do the same thing. We note that this idea of stochasticity acting as 'damage mitigation' against over-confident deterministic models is ubiquitous in weather forecasting (Palmer et al. (2009);Berner et al. (2017)). That the stochastic sea ice scheme can lead to very different mean responses for Arctic sea ice due to the coupling response with the atmosphere has been shown by Juricke et al. (2018) with dedicated full and one-way coupling experiments.

540
Our analysis has some important limitations. When it comes to assessing the robustness of the teleconnection in OCE, there is no substitute for a larger ensemble size, and we cannot exclude the possibility that the experiments we considered are biased in some way, e.g. due to the initial conditions predisposing OCE towards certain patterns of decadal variability. Another limitation is that the stochastic schemes considered have only been tested in the context of these teleconnections using a single climate model, and may not produce similar effects in other models. However, the consistency across a range of different types 545 of analysis lends confidence to the hypothesis that the improved teleconnection is not just due to chance. In particular, both the LIM analysis and the methodology of Blackport et al. (2019) provide separate lines of evidence for the change being a result of local sea ice processes. An obvious additional point is that the schemes were introduced precisely to improve variability and coupling (Juricke et al. (2013); Juricke and Jung (2014); Juricke et al. ( , 2017), so there is good reason a priori to expect such changes to manifest in experimental data.

550
Our results, which suggest that the inclusion of a stochastic component in the sea ice and ocean can alleviate model biases, both in coupling, the mean state and variability, adds to an increasingly large body of work showing that stochasticity can be beneficial in models across all timescales. It is especially noteworthy, that although changes to the mean and variability of the model due to stochasticity in this study may, in some regions, be rather moderate, other important physical mechanisms in the climate system, such as teleconnections, might be better represented with the stochastic schemes. It is likely that the 555 stochastic sea ice scheme is the dominant factor for changes to the sea ice-NAO teleconnection seen here, but we note that the study Juricke et al. (2018) has shown that the stochastic ocean schemes can also lead to improved skill over North America in seasonal forecasts, a teleconnection effect they relate to improved ocean-atmosphere coupling. While the use of stochasticity in the atmospheric component is becoming more widespread, its inclusion in other components of the model is still novel. The potential benefits of representing uncertainty in all major components of a climate model was first raised in Palmer (2012), and 560 our paper adds further weight to this view.
NOTE FOR REVIEWERS/EDITORS: The Zenodo data is currently 'Closed Access', but this would be changed to 'Open Access' upon the acceptance of the manuscript. The data can be made available to reviewers prior to this upon request.
Appendix A: Linear Inverse Modelling

565
A system of coupled ordinary differential equations, defined by equations (1) and (2), is used to describe coupled ice-NAO interactions on daily time-scales. The parameters are fitted using standard methods of linear inverse modelling. Detailed descriptions of how to do this in full generality can be found in [45] and [44], but to aid the reader we briefly outline the computational steps. Let B = a b c d be the coefficient matrix, and let x be the vector made up of the daily time-series of the NAO and ICE. To estimate B in terms of daily time-scale coupling, we computed the lag-0 covariance matrix C 0 and the lag-1 covariance matrix C 1 of x. The matrix B was then estimated as the matrix logarithm B = log(C 1 C −1 0 ). The noise terms can then be estimated by computing the eigenvectors and values of the 'noise matrix' Q = −1 · (BC 0 + C 0 B T ).
Once the parameters have been fitted in this way for a given data set, a reconstruction of a daily DJF NAO time series can then be created, for a given year, by initializing the coupled system with the NAO and ICE anomalies of November 1st of that year and integrating the system forward in time. We generated, in this way, a perfect deterministic reconstruction of the period 1980-575 2015 for each data set by suppressing the noise terms: this corresponds to the ensemble mean reconstruction over an infinitely large ensemble. To create the time series of Figure B6 a scaling factor was deduced by creating 1000 explicit simulations (with the noise terms included) and measuring the average standard deviation across these stochastic reconstructions; the deterministic reconstruction was then re-scaled to have this standard deviation. The timeseries thus obtained amounts to what the ensemble mean would look like if it were realized as a stochastic instance. Note that the correlations computed in Table 1 580 are obviously insensitive to such a re-scaling.

Appendix B
We include some Figures left out of the main text. Figure B1. Mean zonal winds at 850hPa (ua850) over the DJF season. In (a) CTRL minus ERA5, (b) OCE minus CTRL. Stipling highlights gridpoints where the change is statistically significant (p < 0.05); in (a) most points outside the zero contour are significant so stipling is not included for visual ease. The period covered is 1980-2015.