The Arctic Oscillation (AO) describes a seesaw pattern of variations in atmospheric mass over the polar cap. It is by now well established that the AO pattern is in part determined by the state of the stratosphere. In particular, sudden stratospheric warmings (SSWs) are known to nudge the tropospheric circulation toward a more negative phase of the AO, which is associated with a more equatorward-shifted jet and enhanced likelihood for blocking and cold air outbreaks in mid-latitudes. SSWs are also thought to contribute to the occurrence of extreme AO events. However, statistically robust results about such extremes are difficult to obtain from observations or meteorological (re-)analyses due to the limited sample size of SSW events in the observational record (roughly six SSWs per decade). Here we exploit a large set of extended-range ensemble forecasts within the subseasonal-to-seasonal (S2S) framework to obtain an improved characterization of the modulation of AO extremes due to stratosphere–troposphere coupling. Specifically, we greatly boost the sample size of stratospheric events by using potential SSWs (p-SSWs), i.e., SSWs that are predicted to occur in individual forecast ensemble members regardless of whether they actually occurred in the real atmosphere. For example, the S2S ensemble of the European Centre for Medium-Range Weather Forecasts gives us a total of 6101 p-SSW events for the period 1997–2021.

A standard lag-composite analysis around these p-SSWs validates our approach; i.e., the associated composite evolution of stratosphere–troposphere
coupling matches the known evolution based on reanalysis data around real SSW events. Our statistical analyses further reveal that following p-SSWs,
relative to climatology, (1) persistently negative AO states (

Day-to-day variability in the northern extratropical hemispheric-scale circulation during winter is dominated by the so-called Northern Annular Mode

Although a single index cannot represent the entire extratropical weather, it indicates tendencies towards certain weather patterns, which in turn can
also have strong local effects. AO values that deviate considerably from 0 (the climatological mean) are especially rare, by construction, and can often be associated with strong

The AO can also be influenced by “external” weather patterns, and one prominent teleconnection exists between the AO and the stratospheric polar
vortex. The latter describes a strong westerly wind band around 60

Consistent with the local implications of a negative AO index, SSWs can for example lead to cold spells in northern Europe and increased storminess
over southern Europe

In order to allow for analyses of larger event sample sizes, past studies have used, for example, idealized model simulations

The analysis is thus based on the assumption that the forecast models simulate the observed variability in the AO sufficiently well, including its
modulation due to stratospheric variability. High-top models, in particular, show realistic stratosphere–troposphere coupling

We compute statistical measures that combine conditional and base rate probabilities for stratospheric and AO extreme (co-)occurrences and in
order to address our following research questions:

By how much is the probability of persistently positive or negative AO phases increased following stratospheric polar vortex extremes?

By how much is the probability of subsequent AO extremes increased following stratospheric polar vortex extremes?

What fraction of AO extremes may be attributable to preceding stratospheric polar vortex extremes?

To illustrate which AO extremes are classified as “attributable”, consider the following scenarios where a stratospheric event is followed by an AO
extreme:
in relation to the AO extreme the stratospheric extreme may

represent a necessary and sufficient cause;

represent one among multiple contributory causes;

be caused by a confounding factor, which also causes the AO extreme;

not be causal.

In scenario (a), the AO extreme is attributable to the preceding stratospheric event, whereas it is not attributable in scenario (d). In scenario (b),
disentanglement of different contributory factors is difficult. Each involved process can but does not need to be also a necessary cause. (Consider
for example a situation where an AO extreme would have occurred also without a preceding stratospheric extreme, but the stratospheric extreme resulted
in a stronger or earlier manifestation.) In this study, we aim to neither disentangle the multiple involved pathways (a–c) nor to provide a
rigorous quantification of causality (which is itself ambiguous in a complex system). Instead, we estimate how many AO extremes may be attributable to
the stratospheric extreme, which refers to the fraction that would have statistically not occurred without the stratospheric event. Importantly,
scenario (c) shows that “without the stratospheric event” also requires removing any confounding factors. The analysis follows an observational
approach (which is based on post hoc computation of conditional probabilities) rather than a counterfactual approach (which is based on active interventions in the system;

The paper is organized as follows: Sect.

The subseasonal-to-seasonal (S2S) prediction project

We choose to use European Centre for Medium-Range Weather Forecasts (ECMWF) and UK Met Office (UKMO) forecasts for this study as these models best meet the above-listed requirements. Importantly, both models have been
demonstrated in previous studies to have a realistic representation of stratosphere–troposphere coupling

For the decision on which initialization dates to use for the analyses, a trade-off has to be made between having as large a sample as possible and the fact that the forecast models are updated about every 1–3 years. Since late 2016, the ECMWF model (CY43R1) has been running at a higher horizontal resolution. Therefore, to avoid mixing forecasts before and after 2016, forecasts from winter 2017/18 up to and including 2020/21 are analyzed. Note that a minor model version change occurred in 2019, where initial conditions for the hindcasts are then obtained from ERA5 instead of ERA-Interim. However, we do not expect this to be a major limitation for our analyses as we are mostly interested in the overall statistical behavior of stratosphere–troposphere coupling, as opposed to single forecast performance.

We focus on northern winter dynamics by analyzing forecasts initialized between mid-November (16 November) and end of February (22 February). For the four winter seasons, the ECMWF model thus features 114 real-time ensemble forecasts of 51 members each and 2280 ensemble hindcasts of 11 members each. This results in a total of 30 894 individual model runs, all of which we refer to as “forecasts”
for simplicity. For consistency, UKMO forecasts are used from the same initialization period, leading to 9795 forecasts available for this model. A
summary of the key specifications of the forecasts is given in Table

Dataset specifications.

Each of the forecasts from the total set of 30 894 ECMWF forecasts provides a 47

For both datasets, ECMWF and UKMO, all individual forecast runs are treated equally and independently. This assumption is violated especially for
forecasts belonging to the same ensemble. In fact, at initialization time these forecasts agree

Furthermore, it is ensured that for each identified event both negative and positive lags can be considered. Due to the finite maximum lead time of
each forecast, this demand is generally limited. For a predicted event that occurs early in the forecast (but after 10

Extreme events are defined that refer to exceptional anomalies in the stratospheric and tropospheric circulation, respectively. As a measure of the strength of the stratospheric polar vortex we use the zonally averaged zonal wind at 10

We define sudden stratospheric warmings (p-SSWs) as days when u60 transitions from positive to negative; i.e., the polar vortex breaks down. As
explained above, we do not include p-SSWs predicted within the first 10

Moreover, the analyses were repeated with a modified event definition, which we call

Distribution of analyzed p-SSWs in ECMWF forecasts. Absolute event counts

In Fig.

The largest number of events is identified in the winter season 2017/18, which also includes the most forecasts (real-time 2017/18 plus hindcasts related to initializations from 2018/19 to 2020/21). Different factors lead to a highly varying number of events between the different years. These include internal dynamic variability; a slightly varying number of underlying forecasts, due to the real-time/hindcast prediction setup; and the varying number of events per winter due to the evolution of the polar vortex of the real atmosphere in the respective winter, which determines the initial conditions of the forecasts.

A forecast that is initialized with a strong polar vortex tends to maintain a strong polar vortex and produces fewer SSWs compared to a forecast with
an initially weak polar vortex. Moreover, forecasts that do not start with 10 consecutive days of positive u60 are discarded by default. Thus, if the
polar vortex in the real atmosphere is already easterly at the initialization time or is predicted to become easterly within the first 10 d, such
forecasts will not contribute any events to the analysis. This can be illustrated by the example of the 2009 SSW

This also seems consistent with the interpretation of this event as falling under the category of self-induced resonance, which requires conditions (e.g., vortex geometry) to be “just right”

Based on the average number of 226 events per day of lead time in the ECMWF model (cf. Fig.

While the rate of events per forecast day fluctuates only weakly in the ECMWF model, it moderately increases with lead time in the UKMO model
(Fig. S1, bottom left panel). One might expect this to be due to the longer maximum lead time of the UKMO model (

Consistent with reanalyses

Past literature has identified stratosphere–troposphere coupling not only following SSWs, but also following strong polar vortex events

In the troposphere, we define extreme events based on the Arctic Oscillation index (short: AO; equivalent to the Northern Annular Mode index at
1000

We define tropospheric extreme events as the first day when the AO falls below a certain negative threshold (e.g.,

In this study, conditional probabilities are computed to estimate the modulated likelihood of AO extremes under the presence or absence of preceding
stratospheric extremes. For example, we expect the probability of at least one

In climate attribution science “exposure” may be thought of as “under the influence of anthropogenic climate change”, whereas lack of exposure
(the condition in the denominator) may be thought of as “without the influence of climate change” (e.g., based on pre-industrial control
climate). In our case of stratosphere–troposphere coupling exposure may be thought of as “given that a stratospheric extreme occurred”. However,
lack of exposure has to be evaluated with care. For example, assume that a given day

Definitions for (conditional) predicted SSW and AO events. Subscript wt is short for “within time

One way to circumvent the above-discussed issue of conditioning onto “unexposed” is to swap the conditioning. That is, we may condition onto the
occurrence of an AO extreme and evaluate the probability that a given preceding time period showed at least 1 d with stratospheric extreme
occurrence; in this case the AO extreme is considered to be “exposed”. Likewise, if the preceding time period shows no occurrence of stratospheric
extreme, the AO extreme is considered to be “unexposed”. Using Bayes' theorem this allows us to estimate the fraction of attributable risk (FAR) of
AO extremes to a preceding stratospheric extreme. FAR quantifies the reduction in the fraction (0 to 1) of AO extremes without preceding
stratospheric events (and without any confounding factors; see discussion in Sect.

Relative probability increase and attributable risk among the exposed and among the population all quantify, from different perspectives, the increased
likelihood of AO extremes following stratospheric events. Mathematical definitions of how they are derived from base rate and conditional probabilities
are introduced in the respective sections. We provide an overview table here about the event definitions that are be used
(Table

To provide a baseline for our more detailed statistical analyses in the following sections, we first evaluate the general behavior of
stratosphere–troposphere coupling based on p-SSW events in the S2S data. To do so we focus on the lag-composite evolution of the AO index relative to
p-SSWs compared to real-atmospheric SSWs from ERA5. In addition, we show the NAM index at 200

Lagged composite evolution of u60

Figure

By construction, u60 transitions from westerly to easterly at lag 0. Anomalies of u60 are slightly positive ahead of

Overall, the results are in agreement with ERA5 and previous literature, and especially the evolution of u60 is remarkably similar. The negative NAM
response at 200 and 1000

In the following, we exploit the larger available sample size of p-SSW events to diagnose and estimate whether the shift in the average AO index
towards negative values is caused by (1) more persistent negative AO phases and/or (2) an increased probability of

Figure

Histogram of the duration of negative AO periods, quantified by the number of consecutive days of AO

In addition, the diagnostic is presented for periods following p-SSWs. Here, the AO index is analyzed between lag day

Sampling uncertainties turn out to be negligible within 95 % confidence intervals. A similar analysis based on UKMO data shows very good quantitative agreement (not shown), which further confirms the robustness of the results.

Daily probabilities of AO

It is known that SSWs shift the subsequent AO distribution (see Fig.

In addition, the overall daily probabilities of AO

The fraction of events in the p-SSW composite that have negative AO values fluctuates around

Extremely negative AO values in the dataset appear with a climatological probability that is similar to what would be expected for a (one-sided)
3

Note that we have standardized the AO in ERA5 such that the inter-annual standard deviation is 1, similar to the deseasonalization
that is applied to the S2S forecasts. The lower baseline probabilities are consistent with a non-zero kurtosis of the AO distribution in ERA5 of

An altered probability of extreme AO events may be of higher socio-economic relevance than a small shift in the mean. However, the absolute daily probabilities of extremely negative AO events are still small even though the relative increase given the p-SSWs is indeed considerable. In practice, the relevant question might not be how much the probability increases on only 1 specific day following a p-SSW. It may be more relevant to quantify the increased risk for an extreme AO within a given time period following a p-SSW.

Probabilities of at least one

Figure

We choose 10

Clearly, all probabilities increase with

Generally, all probabilities appear approximately linear in

Based on the presented probabilities, the probability increase of at least one AO event within time

A relative probability

Probability increase (in percent) for at least one negative (positive) p-AO extreme below (above) the threshold following p-SSWs within a certain period

The relative probability increase of AO events around 0 (e.g., at least 1 d below/above 0) is very small as these events are already almost
certain, even in the climatological reference. Both models show a gradual increase in relative probability of more negative AO thresholds (e.g.,

The last section focused on given p-SSWs and subsequent statistical signatures in AO extremes within a period

In this section, we aim to evaluate the alternative question: how many

Probabilities of at least 1 d u60

The basis of the evaluation in this section is that instead of conditioning on the occurrence of a SSW, we condition on the occurrence of an
AO extreme. This allows the classification of all AO events according to whether they were or were not exposed to a preceding SSW within a time
window

Figure

We can use the estimated probabilities

First we define the FAR among the exposed

This quantifies the fraction of SSW–

Inserting these expressions we obtain for

This expression involves

Our estimates of

The attributable risk may also be evaluated for

Figure

As in Fig.

The previous sections revealed that SSWs increase the probability of subsequent

The composite-mean evolution of p-SPVs (Fig.

Following the same methodology as for p-SSWs, we use the large event sample sizes to quantify the statistical relation between p-SPVs and subsequent
AO

Figure

As in Fig.

Consistent with the positive shift in the AO distribution following SPVs, the risk gradually increases for positive AO extremes, whereas it gradually
decreases for negative AO extremes. For extreme thresholds of up to 2 standard deviations, the relative probability change appears to be of similar
magnitude compared to periods following SSWs (

As in Fig.

Figure

Finally, the fraction of all

More detailed analyses that apply the diagnostics presented in Figs.

Our results, based on a large number of extended-range ensemble forecasts, provide further evidence for stratospheric modulation of large-scale
weather patterns near the surface, broadly consistent with previous results

By how much is the probability of persistently positive or negative AO phases increased following stratospheric polar vortex extremes?

Climatologically, 38 % of negative AO phases (days with consecutive AO

Following p-SPVs, the probability of positive AO phases that last longer than 7

By how much is the probability of subsequent AO extremes increased following stratospheric polar vortex extremes?

Following p-SSWs, the probability of subsequent negative AO extremes increases, whereas it decreases for positive AO extremes. For instance,

Following p-SPVs, the probability of

What fraction of AO extremes may be attributable to preceding stratospheric polar vortex extremes?

About 50 % (ECMWF) to 60 % (UKMO) of

While our stratospheric-event definitions are based on absolute thresholds of the zonal-mean zonal wind, the tropospheric response is quantified via standardized anomalies of averaged geopotential. The construction of an appropriate corresponding climatology is crucial, in particular for the analysis of extreme events. However, it is also not unambiguous. Standardized anomalies are computed by normalizing differences from a population mean with the population standard deviation (taking into account seasonal variations). As the population is usually finite, any additional data point may change the population mean and will change the population standard deviation, resulting in a small adjustment of all previous (standardized) data points. On the one hand, the effect is negligible in the limit of a large population. On the other hand, it is generally larger when the additional data point is an outlier with respect to the previous distribution. For this study, S2S forecasts were deseasonalized using the available hindcasts. The assumption is that these hindcasts sufficiently sample different kinds of variability, such that (a) extreme events that occurred in individual years do not significantly distort the population distribution and thereby also the population mean and standard deviation and that (b) the constructed population is robust across different initialization dates (e.g., a given event that is equally predicted at two different lead times corresponds to the same standardized event in both model integrations).

Do the analyses of modulated probabilities allow conclusions about causal links between stratospheric and tropospheric circulation extremes?

A definition of (probabilistic) causality is provided by

This definition relies on counterfactual dependence; i.e., if there had not been the cause, then there would not have been the effect (and if there had been the cause, then there would have been the effect).

. In the atmosphere, such controlled situations can usually only be simulated using numerical model experiments. In this study, a post hoc analysis of an existing dataset is presented. No interventions are performed, and therefore, no strict causal relations can be inferred following the provided definition. Instead, conditional probabilities are computed, whichOur knowledge of coupled stratosphere–troposphere dynamics suggests that a causal connection does in principle exist

It is important to keep in mind that the coupling is, in general, mutual, and causality works in both directions (even though, as always, some cause has to precede the effect).

. This connection manifests in observed conditional probabilities, which may, however, be modulated also by further possibly involved pathways.First, conditional probabilities may in practice overestimate the (direct) causal link between stratospheric and AO extreme due to the existence of
confounding factors (see scenario c listed in the introduction). For example, the Madden–Julian Oscillation (MJO) may lead to modified risk of
AO extremes

However, even if common drivers can be neglected the statistical nature of inferred fraction of attributable risk can only quantify an

Despite these caveats, conditional probabilities may provide useful insights. The conversion into statistical metrics such as RPI and FAR may thereby facilitate the practically relevant interpretation. For example, RPI of AO extremes due to the prior occurrence of a stratospheric extreme does serve to quantify the state of the stratosphere as a predictor of subsequent AO extremes, which may be of practical value regardless of its underlying causal nature. Furthermore, FAR provides an estimate of how many AO extremes would statistically be expected less without preceding stratospheric events, when keeping in mind that “without a preceding stratospheric event” would also require removing confounding factors.

How should the observed differences between the ECMWF and UKMO model be interpreted? Overall, our analyses show that the probability modulations of
AO extremes up to about 2 standard deviations given preceding stratospheric extremes are similar between the ECMWF and the UKMO model. AO extremes
of 3 standard deviations, i.e., AO

In general, we note that any two different imperfect models will likely always reveal quantitative differences in the analysis of extreme events for a sufficiently strict extreme threshold. In the present study, we find such differences, e.g., for the relative risk, at a threshold of around 3 standard deviations. It is possible that more data are needed to conclusively attribute the differences to particular dynamical processes. Nevertheless, we argue that our analyses, even at a threshold of 3 standard deviations and given the associated uncertainties, are able to provide insightful quantitative estimates, especially as no obvious a priori estimate exists, even for the order of magnitude of the investigated probability metrics.

In addition to the particular points already mentioned, future work should address the question of how much of the predicted surface impact following
predicted stratospheric extremes, i.e., following p-SSWs and p-SPVs, can be explained by the AO. Lastly, we conclude that the analysis of

In addition to real-time forecasts, all S2S forecasting systems also create hindcasts (or “reforecasts”), which allow the construction of the
respective model's climatology. In the following, we describe the procedure

Based on the ECMWF article

Schematic workflow for the computation of a climatology for a S2S forecast model, based on hindcasts. Gray planes illustrate that forecasts belong to the same hindcast year, where the axis from left to right denotes time.

Compute the ensemble mean of the hindcasts (Fig.

Compute the inter-annual mean of the hindcast ensemble means. In case of the ECMWF forecasts for example, the
hindcasts cover the past 20 years (see Fig.

Select all (inter-annually averaged) hindcasts that start within

Average the hindcasts obtained in step 3 such that the forecast valid times match (e.g., average forecasts for 22 February, 23 February, … as opposed to matching forecast lead times, e.g., forecasts with lead time

Apply, to the resulting time series, a 7

Due to the

Anomalies are obtained by subtracting the climatology from the raw field. Standardized anomalies can be computed by dividing the anomalies through a
climatology standard deviation, which is computed similarly to the climatological mean, but where

The presented deseasonalization procedure comes with several implications, for example,

the climatologies for real-time forecasts and for hindcasts are always based only on hindcasts;

by computing anomalies from a climatology, model errors that are a function of the season are mitigated;

by computing anomalies from a climatology, model errors that are a function of the forecast lead time (“model drift”) are not mitigated because the climatology averages information that stems from different forecast lead times (see step 4);

in case of the ECMWF model, 9 hindcast ensembles/4-week window

From observations, the annual probability of SSWs can be derived by normalizing the number of winters with SSWs with the total number of winters. In the S2S model framework, it is however less straightforward to compute the frequency of SSWs per winter as the maximum lead time is shorter than a winter period, and many forecasts overlap. It is reasonable to tie a 0 % SSW probability to the case where there is not one ensemble member in any of the forecasts that predicts a SSW. The 100 % upper boundary is less clear: should the probability be 100 % if all ensemble members in all forecasts show a SSW? In that case, a longer maximum lead time would result in a higher SSW probability even for the same model. Should the probability be 100 % if there is at least one ensemble forecast in a winter where all members show a SSW? Again, the result would depend on the ensemble size, i.e., the technical setup, not solely on the model physics.

In this study, we compute a proxy for the model's seasonal SSW probability based on the number of SSWs per forecast day, as described in the following.

For each winter season

Note that the computed probabilities

The probability quantifies SSW occurrences beyond 10

The SSW probability becomes 0 % if there are no ensemble members that predict SSWs at any time beyond 10

Seasonality is not explicitly resolved in the calculations but assumed to average out when enough forecasts are sampled.

Estimating a seasonal SSW probability proxy based on daily SSW probabilities. Colored points show the computed seasonal probability proxy for different winter seasons as applied to the ECMWF forecasts.

Forecasts from the S2S archive can be found at

The supplement related to this article is available online at:

JS performed the analyses under the guidance of TB. JS wrote the first draft of the paper. Both authors contributed to the interpretation of the results and improved the paper.

The contact author has declared that neither of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors thank Inna Polichtchouk for fruitful discussion on deseasonalization of S2S data. Jonas Spaeth appreciates the valuable scientific exchange within Waves to Weather's early career scientist program. This work is based on S2S data. S2S is a joint initiative of the World Weather Research Programme (WWRP) and the World Climate Research Programme (WCRP). The original S2S database is hosted at ECMWF as an extension of the TIGGE database. Finally, we thank Sandro Lubis and the second, anonymous reviewer for their constructive comments that helped to improve the paper.

This research has been supported by the Deutsche Forschungsgemeinschaft (DFG; grant no. SFB/TRR165, “Waves to Weather”).

This paper was edited by Nili Harnik and reviewed by Sandro Lubis and one anonymous referee.