<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">WCD</journal-id><journal-title-group>
    <journal-title>Weather and Climate Dynamics</journal-title>
    <abbrev-journal-title abbrev-type="publisher">WCD</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Weather Clim. Dynam.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">2698-4016</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/wcd-7-767-2026</article-id><title-group><article-title>A spread-versus-error framework to reliably quantify the potential for subseasonal windows of forecast opportunity</article-title><alt-title>Spread-versus-error framework for windows of opportunity</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Rupp</surname><given-names>Philip</given-names></name>
          
        <ext-link>https://orcid.org/0000-0001-7833-1748</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1 aff2">
          <name><surname>Spaeth</surname><given-names>Jonas</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="yes" rid="aff1 aff3">
          <name><surname>Birner</surname><given-names>Thomas</given-names></name>
          <email>thomas.birner@lmu.de</email>
        <ext-link>https://orcid.org/0000-0002-2966-3428</ext-link></contrib>
        <aff id="aff1"><label>1</label><institution>Meteorological Institute Munich, Ludwig-Maximilians University (LMU), Munich, Germany</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Research Department, European Centre for Medium-Range Weather Forecasts (ECMWF), Bonn, Germany</institution>
        </aff>
        <aff id="aff3"><label>3</label><institution>Deutsches Zentrum für Luft- und Raumfahrt (DLR), Institut für Physik der Atmosphäre, Oberpfaffenhofen, Germany</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Thomas Birner (thomas.birner@lmu.de)</corresp></author-notes><pub-date><day>12</day><month>May</month><year>2026</year></pub-date>
      
      <volume>7</volume>
      <issue>2</issue>
      <fpage>767</fpage><lpage>785</lpage>
      <history>
        <date date-type="received"><day>4</day><month>October</month><year>2025</year></date>
           <date date-type="rev-request"><day>23</day><month>October</month><year>2025</year></date>
           <date date-type="rev-recd"><day>26</day><month>April</month><year>2026</year></date>
           <date date-type="accepted"><day>29</day><month>April</month><year>2026</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2026 Philip Rupp et al.</copyright-statement>
        <copyright-year>2026</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://wcd.copernicus.org/articles/7/767/2026/wcd-7-767-2026.html">This article is available from https://wcd.copernicus.org/articles/7/767/2026/wcd-7-767-2026.html</self-uri><self-uri xlink:href="https://wcd.copernicus.org/articles/7/767/2026/wcd-7-767-2026.pdf">The full text article is available as a PDF file from https://wcd.copernicus.org/articles/7/767/2026/wcd-7-767-2026.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e111">Mid-latitude forecast skill at subseasonal timescales often depends on “windows of opportunity” that may be opened by slowly varying modes such as ENSO, the MJO or stratospheric variability. Most previous work has focused on the predictability of ensemble-mean states, with less attention paid to the reliability of such forecasts and how it relates to ensemble spread, which directly reflects intrinsic forecast uncertainty. Here, we introduce a spread-versus-error framework based on the Spread-Reliability Slope (<inline-formula><mml:math id="M1" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula>) to quantify whether fluctuations in ensemble spread provide reliable information about variations in forecast error. Using ECMWF S2S forecasts and ERA5 reanalysis data, aided by idealised toy-model experiments, we show that spread reliability is controlled by at least three intertwined factors: (1) sampling error, (2) the magnitude of physically driven spread variability and (3) model fidelity in representing that variability. Regions such as northern Europe, the mid-east Pacific, and the tropical west Pacific exhibit robustly high <inline-formula><mml:math id="M2" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> values (i.e. reliable spread fluctuations) for 50-member ensembles, consistent with robust spread modulation by slowly varying teleconnections. In contrast, areas like eastern Canada show very low <inline-formula><mml:math id="M3" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> (little or no spread reliability), even for 100-member ensembles, reflecting limited low-frequency modulation of forecast uncertainty. We further demonstrate two practical implications: (i) a simple variance rescaling yields a post-processed “corrected spread” that enforces reliability and may help to bridge ensemble output with user needs; and (ii) time averaging effectively boosts ensemble size, allowing even 10-member ensembles to achieve reliability of spread fluctuations comparable to larger ensembles. Finally, we discuss possible links to the signal-to-noise paradox and emphasize that adequate representation of ensemble spread variability is crucial for exploiting subseasonal windows of opportunity.</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>Deutsche Forschungsgemeinschaft</funding-source>
<award-id>SFB/TRR 165</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d2e159">Atmospheric predictability at subseasonal timescales (about 2 weeks to 2 months) varies strongly with region and season. At these timescales probabilistic (i.e. ensemble) forecasts are vital as uncertainty is generally large and forecast skill may arise due to both an ensemble mean signature and reduced ensemble spread. Previous work has identified that during certain periods, and over particular regions, intrinsic forecast uncertainty can be anomalously low, indeed enabling higher forecast skill at these longer leadtimes <xref ref-type="bibr" rid="bib1.bibx14" id="paren.1"/>. Such periods with reduced uncertainty are often referred to as “windows of forecast opportunity”. These windows typically arise due to the influence of slowly varying atmospheric modes <xref ref-type="bibr" rid="bib1.bibx26" id="paren.2"/> that can be considered to exert a quasi-external influence on the region of interest. For northern hemispheric mid-latitude forecasts these slowly varying modes include those of tropical origin such as the El Nino-Southern Oscillation <xref ref-type="bibr" rid="bib1.bibx9" id="paren.3"><named-content content-type="pre">ENSO; </named-content></xref> and the Madden-Julian Oscillation <xref ref-type="bibr" rid="bib1.bibx1" id="paren.4"><named-content content-type="pre">MJO; </named-content></xref>, and of upper atmospheric origin such as stratospheric polar vortex variability <xref ref-type="bibr" rid="bib1.bibx2" id="paren.5"/>. Importantly, windows of opportunity can only occur if intrinsic forecast uncertainty varies substantially across different forecast situations. Regions with little variability in forecast uncertainty cannot exhibit such windows, because uncertainty remains close to its climatological value at all times. We refer to such situations as a low potential to exhibit windows of opportunity. The “potential for windows of opportunity” therefore refers to the variability of ensemble spread, that is, the capacity of the atmosphere to occasionally enter low-uncertainty states. Most studies investigating windows of opportunity analyse changes in the skill of the ensemble-mean forecast <xref ref-type="bibr" rid="bib1.bibx26 bib1.bibx16" id="paren.6"><named-content content-type="pre">e.g.</named-content></xref>. However, probabilistic forecasts aim to accurately capture not only the mean of the probability distribution (the prediction of the actual value), but also higher moments such as the ensemble spread (the prediction of the uncertainty). Quantitative analyses of this intrinsic forecast uncertainty, as measured explicitly by the ensemble spread or variance of a forecast, are comparatively rare, and our study attempts to help fill this gap.</p>
      <p id="d2e187">Previous work established that this forecast uncertainty can be highly flow-dependent <xref ref-type="bibr" rid="bib1.bibx23" id="paren.7"/>. Hence, mid-latitude tropospheric forecast spread may indeed be substantially modulated by slowly varying tropical or stratospheric teleconnections. Specifically, <xref ref-type="bibr" rid="bib1.bibx22" id="text.8"/> recently demonstrated that the northern European subseasonal forecast uncertainty of 1000 hPa geopotential height (z1000), as directly reflected in ensemble spread, is reduced after breakdowns of the stratospheric polar vortex due to equatorward shifts of the North Atlantic storm track. Such slowly varying teleconnections and associated intrinsic predictability changes can have significant non-local impacts on ensemble spread as well. For instance, <xref ref-type="bibr" rid="bib1.bibx22" id="text.9"/> also showed how ENSO can modulate ensemble spread in the northern Pacific. Identifying and quantifying the potential of the atmosphere to exhibit windows of opportunity might therefore be beneficial for both improving forecasts and gaining insights into the underlying dynamics of the system.</p>

      <fig id="F1" specific-use="star"><label>Figure 1</label><caption><p id="d2e201">Intrinsic variability of subseasonal z1000 forecasts in the northern hemisphere for 50-member ensembles initialized in DJF (see Sect. <xref ref-type="sec" rid="Ch1.S2"/> for information on the dataset). Contour lines show DJF climatological mean spread [in m<sup>2</sup>] computed from daily ensemble variance values averaged over subseasonal leadtimes (14 to 46 d). Note that computing the figure based on weekly averaged data leads to overall similar large-scale structures. The shading shows relative variability in subseasonal spread, measured as standard deviation of the ensemble variance across all DJF forecast initializations and subseasonal leadtimes, normalised by the climatological average variance. Areas with large relative spread variability suggest that spread computed from individual forecasts provide added value over a fixed spread climatology, potentially identifying windows of opportunity.</p></caption>
        <graphic xlink:href="https://wcd.copernicus.org/articles/7/767/2026/wcd-7-767-2026-f01.png"/>

      </fig>

      <p id="d2e222">Figure <xref ref-type="fig" rid="F1"/> illustrates the climatology of ensemble spread in subseasonal winter-time forecasts of z1000 over the northern hemisphere. In addition to the spread itself, it highlights regions exhibiting substantial variability in spread (i.e., areas occasionally associated with relatively low or high spread). Such areas with large relative spread variability suggest that ensemble spread computed from individual forecasts provides added value over a fixed spread climatology. Only in such regions can forecast uncertainty occasionally be anomalously low, implying a potential for windows of forecast opportunity to exist. In contrast, regions with little spread variability lack this potential, as forecast uncertainty remains close to its climatological value. Prominent regions of high relative spread variability include the northern Euro-Atlantic sector, the subtropical Atlantic, and the tropical and subtropical Pacific.</p>
      <p id="d2e227">In the extratropics, enhanced z1000 variability often appears on the flanks of the climatological maxima in spread over the North Pacific and North Atlantic. This pattern is consistent with meridional shifts of the storm tracks and associated jets, such that locations on the flanks alternately lie within or outside the high-spread belt, while the core persistently contains high-spread and therefore exhibits smaller relative variability. Such flank variability can be further amplified by remote influences via atmospheric teleconnections, for example through stratospheric downward influence or tropical-extratropical coupling. At the same time, several highlighted regions, particularly in the tropical Pacific, likely reflect more local signatures of slowly varying modes such as ENSO or the MJO. Some of these slow modes and teleconnections have been explicitly linked to enhanced forecast skill at subseasonal to seasonal timescales <xref ref-type="bibr" rid="bib1.bibx9 bib1.bibx1" id="paren.10"><named-content content-type="pre">e.g.,</named-content></xref>. Given this significant spatial and temporal variability in ensemble spread, an important question arises: Are these variations in ensemble spread within a model forecast reliable indicators of the uncertainty within the physical system?</p>
      <p id="d2e235">Ensemble forecasting is a widely used approach to quantify forecast uncertainty and exploit windows of opportunity. The fundamental assumption of ensemble forecasting is that ensemble members and observations are statistically exchangeable, i.e., indistinguishable. From this assumption, it follows that the spread of the ensemble (i.e. the variability across members) reflects the intrinsic forecast uncertainty or, in other words, the expected error of the ensemble mean. Specifically, in an ideal, infinitely large, and statistically reliable ensemble system, the ensemble's variance across members should, on average, equal the squared error of the ensemble-mean forecast <xref ref-type="bibr" rid="bib1.bibx5" id="paren.11"/>. If this is the case, we would consider the spread to be a reliable estimate of the forecast error.</p>
      <p id="d2e241">However, real-world ensemble systems rarely achieve perfect reliability. While comprehensive diagnostics of spread-reliability are limited at subseasonal lead times, ensemble spread has been shown to lack reliability at other lead times. At short to medium lead times (up to 2 weeks), model forecasts typically show under-dispersion, where the error exceeds the error predicted by the ensemble spread <xref ref-type="bibr" rid="bib1.bibx11" id="paren.12"><named-content content-type="pre">e.g.</named-content></xref>. At seasonal to decadal lead times, on the other hand, atmospheric ensemble forecasts seem to be, on average, relatively reliable in their prediction of uncertainty <xref ref-type="bibr" rid="bib1.bibx29" id="paren.13"/>. However, while the average ensemble spread might be a good indicator of forecast error, misrepresentations of short-term fluctuations in spread may still exist. In the following, we investigate whether such misrepresentations can limit the ability of ensembles to identify genuine windows of opportunity, since periods with low predicted uncertainty might underestimate actual errors. Ensuring that variability in ensemble spread realistically reflects variability in forecast error is therefore critical for reliable decision support.</p>
      <p id="d2e252">Numerous studies have examined how the spread-error relationship varies regionally and among forecast variables, especially during Northern Hemisphere winter (DJF). Results indicate that regional variations in ensemble reliability are linked to the intrinsic variability of the ensemble spread itself. For example, <xref ref-type="bibr" rid="bib1.bibx19" id="text.14"/> analysed different uncertainty metrics for forecasts over Europe, finding generally reliable spread-error relationships. However, most previous analyses, including that of <xref ref-type="bibr" rid="bib1.bibx19" id="text.15"/>, have been restricted to medium-range forecasts (up to approximately two weeks). At these shorter leadtimes, forecast dynamics strongly depend on specific synoptic conditions, with ensemble spread variations largely reflecting the error growth dynamics initiated by initial condition perturbations <xref ref-type="bibr" rid="bib1.bibx20 bib1.bibx21" id="paren.16"/>, rather than mostly describing the intrinsic predictability of the underlying physical system, which is the focus of this study.</p>
      <p id="d2e264">Reliability diagrams have also been widely used in other contexts. One common approach involves comparing forecasted versus observed probabilities of specific weather events <xref ref-type="bibr" rid="bib1.bibx4 bib1.bibx28" id="paren.17"/>. Such reliability diagrams typically assess the overall calibration of ensemble predictions but do not explicitly focus on the spatio-temporal variability of ensemble spread as done in this study. Another frequent application is the evaluation of initial perturbation schemes during short to medium-range forecasts. These analyses typically use reliability diagrams of spread and error primarily to identify under- or over-dispersion at early forecast stages, subsequently guiding adjustments to initial perturbation magnitudes <xref ref-type="bibr" rid="bib1.bibx12 bib1.bibx6" id="paren.18"/>. In contrast, the framework applied in this study emphasises the intrinsic atmospheric uncertainty and flow-dependent variations in ensemble spread and associated forecast error specifically at subseasonal lead times rather than the average or climatological level of forecast skill (although it can also highlight model errors in representing these processes).</p>
      <p id="d2e274">Throughout this paper, we distinguish between windows of forecast opportunity and the potential for windows of forecast opportunity. A window of forecast opportunity refers to a specific forecast situation characterised by anomalously low intrinsic forecast uncertainty, as reflected by reduced ensemble spread. The potential for windows of opportunity refers to the variability of forecast uncertainty across different forecast situations. Only regions or variables with substantial spread variability can occasionally realise low-spread states and thus exhibit windows of opportunity. However, only if the ensemble spread of a forecast is a reliable measure of forecast error do these windows actually represent opportunities with reduced forecast error.</p>
      <p id="d2e277">The structure of the present manuscript is as follows: Sect. <xref ref-type="sec" rid="Ch1.S2"/> describes the datasets and ensemble configurations. Section <xref ref-type="sec" rid="Ch1.S3"/> introduces the spread-error reliability metric and identifies potential processes modifying reliability. Section <xref ref-type="sec" rid="Ch1.S4"/> uses an idealized toy model to isolate if and how these processes affect reliability curves. Section <xref ref-type="sec" rid="Ch1.S5"/> then applies the reliability diagnostics to operational subseasonal forecasts and assesses regional spread forecast skill. Finally, Sect. <xref ref-type="sec" rid="Ch1.S6"/> summarizes the main findings and their implications for atmospheric dynamics and model development.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Data and numerical models</title>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Re-analysis data</title>
      <p id="d2e305">We use the ERA5 re-analysis dataset <xref ref-type="bibr" rid="bib1.bibx7" id="paren.19"/> of the European Centre for Medium range Weather Forecasts (ECMWF) as the representation of the atmospheric state between 1 December 2015 and 30 April 2025. In particular, we use daily snapshots of geopotential height at 1000 hPa (z1000) at 00:00 UTC and daily mean 2-metre temperature (t2m) computed from 3-hourly data. All outputs are analysed on a 2.5° <inline-formula><mml:math id="M5" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 2.5° regular grid covering the entire northern hemisphere.</p>
</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Subseasonal ensemble forecasts</title>
      <p id="d2e327">This study uses ensemble forecasts provided by ECMWF as part of the S2S Prediction Project <xref ref-type="bibr" rid="bib1.bibx27" id="paren.20"/>. In particular, we use real-time forecasts initialised during boreal winter months December to February for the period from late 2015 to early 2025. The inherent horizontal model resolution is roughly 30 km, but we analyse outputs for z1000 and t2m on the same 2.5° grid as used for the re-analysis (see Sect. <xref ref-type="sec" rid="Ch1.S2.SS1"/>). Model output is given for 46 d after initialisation, although most of our analyses will focus on subseasonal leadtimes of 14 to 46 d. Consistent with the ERA5-data, we analyse daily instantaneous snapshots of z1000 and daily mean t2m.</p>
      <p id="d2e335">Forecasts are initialised twice a week (Monday and Thursday) as 50-member ensembles before 27 June 2023 and daily as 100-member ensembles afterwards. Note that in this study we do not use the “unperturbed control member” of each forecast, as it is not strictly statistically indistinguishable from the perturbed members.</p>
      <p id="d2e338">To study the effect of ensemble size and forecast uncertainty we create a set of smaller ensembles by subsampling the original 100-member and 50-member ensembles. The subsampling is done by simply splitting, for example, the 100-member ensembles into two 50-member ensembles (by taking members 1 to 50 and 51 to 100, respectively). We use the same approach to subsample the 100-member and 50-member ensembles into 10 and 5 ensembles with 10 members each, respectively. The combined set of original and subsampled ensembles gives us a total of 181 forecasts with 100 members, 568 forecasts with 50 members and 2840 forecasts with 10 members. Note hereby that these sets of forecasts combine different model versions, as the operationally used S2S model gets updated regularly. The subsampling approach mixes, for example, “original” 50-member ensembles using version CY47R2 with “subsampled” 50-member ensembles using version CY49R1. However, for the purposes of this study we assume the representation of atmospheric uncertainty to vary little between different model versions, as previous studies indicate that ensemble spread characteristics remain broadly consistent across model designs. For example, <xref ref-type="bibr" rid="bib1.bibx13" id="text.21"/> reported only small changes (a few percent) in subseasonal forecast spread when changing the stochastic perturbation scheme in the IFS model. In addition, a qualitative comparison with an independent CNRM ensemble (see model details below) yields similar large-scale patterns in spread reliability, further supporting this assumption (see Sect. <xref ref-type="sec" rid="Ch1.S6"/>).</p>
      <p id="d2e346">As representation of model spread in this study, we use the unbiased sample variance over ensemble members at a given time. Model errors are further given as squared error, i.e., the squared difference of the ensemble mean and the corresponding re-analysis value. The main focus of this study is then the relation of spread and squared error. Using sufficiently large sample averages, these two metrics should be equal to each other if the model accurately represents the inherent uncertainty of atmospheric evolution (as discussed in Sects. <xref ref-type="sec" rid="Ch1.S1"/> and <xref ref-type="sec" rid="Ch1.S4"/>). Note that, in contrast to some other studies, we will not analyse the relationship of spread and mean squared error given by temporal averages, but rather as averages over different atmospheric states associated with a given uncertainty. We do so by averaging over groups of forecast ensembles and/or time steps with the same ensemble variance. Forecast situations with low spread are then expected to also show, on average, low squared error.</p>
      <p id="d2e354">However, for finite ensemble sizes, sampling errors can lead to deviations between the estimated ensemble spread and the underlying forecast uncertainty. In individual forecast situations, this can result in cases where relatively low estimated spread is associated with comparatively large forecast error (or the other way round), not because spread is an inherently poor predictor, but because the spread is underestimated due to sampling variability. This will in general weaken the expected linear relationship of spread and error and reduce the slope of spread-error curves. What constitutes a “small” ensemble size in this context is relative and depends on the strength of flow-dependent variability and the number of available forecast cases; while 50-member ensembles are large by operational standards, sampling effects are nevertheless present and become increasingly important for smaller ensembles. These aspects are further discussed in Sects. <xref ref-type="sec" rid="Ch1.S3"/> and <xref ref-type="sec" rid="Ch1.S4"/>.</p>
      <p id="d2e361">In addition to the IFS ensemble forecasts, we use a second, independent ensemble dataset produced with the CNRM subseasonal forecast system for a qualitative comparison of spread reliability patterns (see Sect. <xref ref-type="sec" rid="Ch1.S6"/>). The CNRM data are used solely to assess the robustness of large-scale features identified in the IFS and are not analysed in the same level of detail. We analyse CNRM daily model data output on a 0.5° grid for DJF-initialisations between end of 2016 and early 2025, with one initialisation per week. For CNRM geopotential height is only available on the 925 hPa surface (z925), necessarily resulting in deviations from results based on the IFS z1000 data, which we assume to be sufficiently small for our comparisons. CNRM ensembles have 50 members before December 2020, and 24 members afterwards. The 50-member ensembles are sub-sampled, as explained above, to further obtain 24- and 10-member ensembles.</p>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Methodology of reliability analyses</title>
      <p id="d2e375">In the present study, we analyse fluctuations in the spread of subseasonal ensemble forecasts to investigate the flow-dependent variability of intrinsic forecast uncertainty. For a reliable model, those fluctuations in ensemble spread are expected to translate into corresponding fluctuations in mean squared error. We emphasise that our framework focuses on spread variability and the associated potential for windows of forecast opportunity, whereas other measures of forecast skill or accuracy may depend on additional factors beyond the scope of this study. In particular, we present a framework to investigate the potential of the atmosphere to develop prolonged periods of reduced spread, i.e., potential windows of opportunity. We then use this framework to quantify the ability of the model to accurately predict such periods given limited ensemble sizes, model misrepresentation and other common sources of errors. Note that, our framework can, in principle, also be used to identify periods with anomalously high ensemble spread (which one might analogously refer to as “walls of adversity”).</p>

      <fig id="F2" specific-use="star"><label>Figure 2</label><caption><p id="d2e380">Visualisation of ensemble spread and spread-error relation at an example point in northern Europe (60° N/15° E) based on 50-member ensembles. <bold>(a)</bold> Evolution of spread in terms of ensemble variance as function of leadtime. Shown is the DJF climatological mean with shading indicating 1 standard deviation and an example forecast initialised on 20 February 2020. Vertical dashed line indicates 14 d of leadtime. <bold>(b)</bold> Spread-error relation with blue dots showing z1000 variance and squared error for every subseasonal leadtime (days 14–46) and all 568 ensembles in the dataset. Orange crosses show the means of 10 bins in variance direction, spaced to contain equal number of points each. The dashed line shows a linear fit through the bin means, with slope indicated in the top right. Black solid line shows the <inline-formula><mml:math id="M6" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> line for reference.</p></caption>
        <graphic xlink:href="https://wcd.copernicus.org/articles/7/767/2026/wcd-7-767-2026-f02.png"/>

      </fig>

      <p id="d2e407">Figure <xref ref-type="fig" rid="F2"/>a illustrates how the ensemble spread evolves with leadtime in subseasonal z1000 forecasts for a selected point in northern Europe. Shown are the “climatological” spread evolution as averaged over all forecasts in DJF, as well as one specific example of the forecast initialised on 20 February 2020. For short leadtimes (up to about day 14), the climatological spread increases substantially until it converges to a roughly constant value at subseasonal leadtimes (after 14 d), marking the transition from the rapid initial error-growth phase to a quasi-saturated spread level. The spread of the example forecast essentially follows the climatological evolution, but shows a large degree of day-to-day variability around it. In the following, we therefore focus on leadtimes 14–46 d. Restricting the analysis to this window avoids a trivial spread-error relationship arising purely from their common dependence on lead time. A sensitivity analysis using leadtimes 28–46 showed overall consistent large-scale patterns in <inline-formula><mml:math id="M7" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> (not shown), indicating that residual lead-time trends do not dominate our results.</p>
      <p id="d2e425">In general, the spread of an ensemble forecast should be a measure of the forecast uncertainty (cf. Sect. <xref ref-type="sec" rid="Ch1.S1"/>). We can therefore assess the reliability of predicted fluctuations in spread by comparing them to the associated forecast errors. Note that, following <xref ref-type="bibr" rid="bib1.bibx5" id="text.22"/>, the expected squared error of the ensemble mean equals the ensemble variance only when scaled by a factor of <inline-formula><mml:math id="M8" display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:mo>/</mml:mo><mml:mo>(</mml:mo><mml:mi>M</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> for ensemble size <inline-formula><mml:math id="M9" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula>. This factor essentially results from a reduction in degrees of freedom when computing the error based on an empirically estimated ensemble mean. In this study we consistently apply this “unbiasing” correction when computing the error. However, for simplicity, we will refer to this unbiased error as error (without the prefix “unbiased”) throughout the paper.</p>
      <p id="d2e460">Figure <xref ref-type="fig" rid="F2"/>b shows how the z1000 squared error behaves in relation to the z1000 ensemble variance for the point in northern Europe. Plotted are the squared error and spread at every subseasonal leadtime (days 14–46) within all 50-member forecasts in our dataset (see Sect. <xref ref-type="sec" rid="Ch1.S2.SS2"/>), we hence capture fluctuations on daily to inter-annual time scales.</p>
      <p id="d2e467">The forecast errors shown in Fig. <xref ref-type="fig" rid="F2"/> are computed based on a single observed evolution of the atmosphere and hence the resulting spread-error scatter plot shows a very disperse distribution. For a given value of ensemble variance, and under the assumption that the ensemble members are normally distributed and the model is perfect, the corresponding squared error follows a <inline-formula><mml:math id="M10" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="italic">χ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>-distribution <xref ref-type="bibr" rid="bib1.bibx30" id="paren.23"><named-content content-type="pre">e.g.</named-content></xref>. For a reliable forecast, the mean of that error distribution should equal the associated variance value.</p>
      <p id="d2e488">Indeed, when binning cases with similar spread in Fig. <xref ref-type="fig" rid="F2"/>, the bin-means collapse onto an almost linear relationship in good agreement with the <inline-formula><mml:math id="M11" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> curve. Here we divide the distribution into 10 bins with equal number of samples in each bin. To quantify the agreement with the <inline-formula><mml:math id="M12" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> line, we fit a linear function through the bin-means and extract the slope of that fit. We refer to this slope as the Spread-Reliability Slope (<inline-formula><mml:math id="M13" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula>), which represents the reliability of fluctuations in the spread of a forecast at a given location. The spread-error relationship diagnosed here is inherently statistical and holds only on average, so substantial scatter between individual spread-error pairs is expected. In addition, extreme spread values are comparatively rare, so that the raw scatter may visually suggest relatively low errors at the highest spread values; this reflects sampling imbalance and the skewness of the error distribution rather than a breakdown of the monotonic increase of mean error with spread. If the spread was perfectly reliable, we would have <inline-formula><mml:math id="M14" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, as all bin-means would collapse onto the <inline-formula><mml:math id="M15" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> line and the ensemble variance would perfectly represent the average forecast uncertainty. Any deviation of the <inline-formula><mml:math id="M16" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> from 1 indicates a lack of reliability of the spread. In the case shown, we have <inline-formula><mml:math id="M17" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.62</mml:mn></mml:mrow></mml:math></inline-formula> and hence we consider the fluctuations in spread of the forecast fairly reliable (further comparison values will be discussed below).</p>
      <p id="d2e586">The spread-error scatter in Fig. <xref ref-type="fig" rid="F2"/>b is based on daily values. For subseasonal forecasting, time-averaged quantities such as weekly means are often more closely aligned with the low-frequency modes that provide predictability. However, daily ensemble spread is occasionally available and inspected in ensemble forecasts or operational products, for example in the form of spaghetti plots or ensemble evolution diagrams. Further, using daily values provides a useful baseline diagnostic: it allows us to assess whether day-to-day spread fluctuations in the ensemble output already contain meaningful information about forecast error. In addition, daily values provide substantially better sampling of the spread-error distribution than time-averaged quantities.</p>
      <p id="d2e591">While daily values can exhibit substantial pointwise scatter, the <inline-formula><mml:math id="M18" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> is derived from averages over many spread-error pairs and is therefore not sensitive to the noise of individual points. Daily spread and error values within a given forecast are temporally autocorrelated, which reduces the effective number of statistically independent samples. As discussed in Sect. 5, the <inline-formula><mml:math id="M19" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> is primarily controlled by differences between forecasts rather than day-to-day fluctuations within a forecast, so temporal autocorrelation mainly affects sampling uncertainty rather than the existence of a systematic spread-error relationship.</p>
      <p id="d2e619">Computing the <inline-formula><mml:math id="M20" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> metric from long-term time-averaged data (e.g. weekly means) suppresses high-frequency, potentially spurious variability but reduces the number of available spread-error pairs. We therefore view daily and time-averaged analyses as complementary. As shown in Sect. 5, <inline-formula><mml:math id="M21" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> patterns based on time-averaged spread remain qualitatively consistent and are often enhanced, indicating that the main conclusions are robust while also highlighting the benefit of temporal aggregation.</p>
      <p id="d2e646">We can now apply the same methodology to analyse the reliability of different locations by extracting the <inline-formula><mml:math id="M22" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> metric of the associated ensemble forecasts at these locations. Figure <xref ref-type="fig" rid="F3"/> shows that the reliability of subseasonal forecast spread varies substantially across the northern hemisphere. For example, northern Europe has <inline-formula><mml:math id="M23" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> values robustly exceeding 0.6, while the <inline-formula><mml:math id="M24" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> in parts of eastern Canada is statistically not different from zero. In fact, different pronounced regions of high <inline-formula><mml:math id="M25" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> can clearly be identified, e.g., in the northern Atlantic, the mid-east Pacific, the tropical west Pacific or the Gulf of Mexico. Over the North Atlantic, the spatial structure exhibits a pronounced north-south contrast within the sector associated with the North Atlantic Oscillation (NAO), with comparatively higher <inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> values over the northern centre of action than over the southern centre. As discussed in Sect. <xref ref-type="sec" rid="Ch1.S6"/>, the relatively high reliability of spread in these regions is likely due to a strong influence of slowly varying modes of atmospheric variability driven by different teleconnections.</p>
      <p id="d2e714">Regions of high <inline-formula><mml:math id="M27" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> mark locations where ensemble spread fluctuations reliably track fluctuations in intrinsic forecast uncertainty and associated forecast errors, indicating that slowly varying teleconnections can modulate forecast uncertainty and thereby create the potential for windows of forecast opportunity. Our framework is not designed to assess the average or climatological level of forecast skill in a given region. Instead, it diagnoses whether forecast error varies in a flow-dependent manner and whether such variations are reliably captured by ensemble spread. In such regions, periods of reduced spread correspond to reduced forecast error, while other periods exhibit increased error, implying a potential for both enhanced and degraded forecast skill rather than uniformly high skill.</p>

      <fig id="F3" specific-use="star"><label>Figure 3</label><caption><p id="d2e731"><inline-formula><mml:math id="M28" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> metric computed as the slope of the linear fit to the z1000 spread-error curve at each point in the northern hemisphere. Shown are results for 50-member ensembles. Hatched areas indicate regions where the <inline-formula><mml:math id="M29" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> metric (derived from the regression of bin-mean error on spread) is not statistically different from zero at the 99 % confidence level. Crosses indicate two example locations (eastern Canada at 55° N<inline-formula><mml:math id="M30" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>70° W and northern Europe at 60° N<inline-formula><mml:math id="M31" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>15° E) further analysed below.</p></caption>
        <graphic xlink:href="https://wcd.copernicus.org/articles/7/767/2026/wcd-7-767-2026-f03.png"/>

      </fig>

      <p id="d2e777">For the remainder of this study, we investigate the mechanisms and processes that can lead to the pronounced spatial structures in <inline-formula><mml:math id="M32" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> shown in Fig. <xref ref-type="fig" rid="F3"/>. We use a combination of toy-model experiments and comprehensive perfect-model approaches to examine how intrinsic variability and sampling-related misrepresentation of ensemble spread influence the slope of the spread-error relationship.</p>
      <p id="d2e795">Our framework does not require the ensemble mean to be perfectly represented. Instead, it focuses on how fluctuations in ensemble spread relate to fluctuations in forecast error when diagnostics are constructed from averages over many forecast cases. For finite ensemble sizes, individual forecasts can exhibit sampling uncertainty in both estimated spread and associated forecast error. However, because the <inline-formula><mml:math id="M33" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> is derived from averages over many forecasts within variance bins, sampling-induced uncertainties in forecast error are largely random across cases and therefore mostly cancel out in the bin mean. Deviations of the <inline-formula><mml:math id="M34" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> from unity instead arise when sampling effects and other limitations reduce the contrast between low- and high-uncertainty forecast situations. These mechanisms are further illustrated and discussed using toy-model experiments in Sect. <xref ref-type="sec" rid="Ch1.S4.SS2"/>. The potential role of ensemble-mean biases and other model deficiencies is discussed separately in Sect. <xref ref-type="sec" rid="Ch1.S6"/>.</p>
      <p id="d2e826">We identified four major mechanisms that can modify the reliability of a spread forecast and lead to <inline-formula><mml:math id="M35" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi><mml:mo>≠</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, although other mechanisms might exist: <list list-type="bullet"><list-item>
      <p id="d2e847"><italic>Sampling error</italic>: random misrepresentation of ensemble spread due to small ensemble sizes</p></list-item><list-item>
      <p id="d2e853"><italic>Natural variability</italic>: modification of the sampling error effect due to variability of spread in the physical system</p></list-item><list-item>
      <p id="d2e859"><italic>Model error</italic>: misrepresentation of the variability of spread in the model</p></list-item><list-item>
      <p id="d2e865"><italic>Under-representation of states</italic>: Insufficient sampling of initial conditions producing forecasts with a given spread value</p></list-item></list> In Sect. <xref ref-type="sec" rid="Ch1.S4"/>, we start by using a statistical toy model to isolate each of these four mechanisms and study their specific effects on spread-error curves and the <inline-formula><mml:math id="M36" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> individually. We then analyse these mechanisms and their impacts on spread reliability in operational forecast systems in Sect. <xref ref-type="sec" rid="Ch1.S5"/>. This two-step approach enables us to develop an intuitive understanding of the individual processes before quantifying their effects in more complex, real-world forecast settings, where the mechanisms typically interact and are challenging to disentangle.</p>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Reliability curves studied in a toy model</title>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>Details of the toy model</title>
      <p id="d2e902">In this section, we seek to develop an intuitive understanding of different mechanisms that can modify the reliability of spread in a forecast in terms of the slope of associated spread-error curves (i.e., the <inline-formula><mml:math id="M37" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> metric). To study how individual mechanisms can affect the <inline-formula><mml:math id="M38" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula>, we use a statistical toy model that generates synthetic forecast-observation pairs with controlled properties.</p>
      <p id="d2e929">We start by generating <inline-formula><mml:math id="M39" display="inline"><mml:mi>C</mml:mi></mml:math></inline-formula> ensemble forecast cases, divided equally into five groups. Each forecast comprises <inline-formula><mml:math id="M40" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula> ensemble members. The forecast value <inline-formula><mml:math id="M41" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">m</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> (the subscript <inline-formula><mml:math id="M42" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula> denotes “forecast”) for case “c” <inline-formula><mml:math id="M43" display="inline"><mml:mrow><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mi>C</mml:mi><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula> and ensemble member “m” <inline-formula><mml:math id="M44" display="inline"><mml:mrow><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mi>M</mml:mi><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula> is randomly sampled from a normal distribution with zero mean and a standard deviation <inline-formula><mml:math id="M45" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>, i.e., <inline-formula><mml:math id="M46" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">m</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub><mml:mo>∼</mml:mo><mml:mi mathvariant="script">N</mml:mi><mml:mo>(</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. Here, we can vary <inline-formula><mml:math id="M47" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> for different forecast cases, which allows us to mimic natural variability in the spread of the underlying physical system. Situations where <inline-formula><mml:math id="M48" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is small, for example, then represent periods with low intrinsic uncertainty (i.e. windows of opportunity). By varying other parameters (like <inline-formula><mml:math id="M49" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M50" display="inline"><mml:mi>C</mml:mi></mml:math></inline-formula> or <inline-formula><mml:math id="M51" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>) we can sample other experimental setups that mimic different characteristics of the forecasting model or underlying physical system. The values used for the different parameters are given below.</p>
      <p id="d2e1141">For each forecast, we then generate an observation <inline-formula><mml:math id="M52" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> (the subscript “O” denotes “observation”) by sampling from a normal distribution with zero mean and a standard deviation <inline-formula><mml:math id="M53" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>, i.e., <inline-formula><mml:math id="M54" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub><mml:mo>∼</mml:mo><mml:mi mathvariant="script">N</mml:mi><mml:mo>(</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>)</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d2e1220">Given the ensemble members <inline-formula><mml:math id="M55" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">m</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> and observations <inline-formula><mml:math id="M56" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> corresponding to forecast case “c”, the ensemble mean is then computed as

            <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M57" display="block"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>M</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:munderover><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">m</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          the ensemble variance (spread) as

            <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M58" display="block"><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:mi>M</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:munderover><mml:msup><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">m</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">m</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          and the squared error (SE) with respect to the observation as

            <disp-formula id="Ch1.E3" content-type="numbered"><label>3</label><mml:math id="M59" display="block"><mml:mrow><mml:msub><mml:mi mathvariant="normal">SE</mml:mi><mml:mi mathvariant="normal">c</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msup><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
      <p id="d2e1438">Note that for a given observation-forecast pair, the ensemble variance <inline-formula><mml:math id="M60" display="inline"><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> provides an unbiased estimate of the underlying forecast variance <inline-formula><mml:math id="M61" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula>. However, the accuracy of this estimate improves with increasing ensemble size.</p>
      <p id="d2e1477">Our modelling strategy within the toy model includes a reference experiment and different perturbation experiments, where we vary individual parameters to simulate changes to the model and physical system and isolate their effect on the resulting spread error curve. The reference case is supposed to represent an “ideal case”, associated with a spread-error slope of exactly one (<inline-formula><mml:math id="M62" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>). The parameter setup for this reference case is given in Table <xref ref-type="table" rid="T1"/>a.</p>

<table-wrap id="T1" specific-use="star"><label>Table 1</label><caption><p id="d2e1501">Overview of toy model experiment configurations, with the top row describing the reference experiment. Here, <inline-formula><mml:math id="M63" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula> denotes the ensemble size, <inline-formula><mml:math id="M64" display="inline"><mml:mi>C</mml:mi></mml:math></inline-formula> the number of forecast-observation pairs, <inline-formula><mml:math id="M65" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">O</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> the standard deviation of the observations, and <inline-formula><mml:math id="M66" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">F</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> the standard deviation of the forecast ensemble members. The two experiments mimicking few verification dates (g/h) are initialised with different random seeds, but use otherwise equal parameters. Parameters deviating from our “standard set-up” (labeled Ideal spread-error relationship) are shown in bold, for easier identification of the sensitivity studied in respective experiments.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="5">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:colspec colnum="5" colname="col5" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Experiment</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M67" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M68" display="inline"><mml:mi>C</mml:mi></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M69" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>O</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5"><inline-formula><mml:math id="M70" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>F</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">(a) Ideal spread-error relation</oasis:entry>
         <oasis:entry colname="col2">100</oasis:entry>
         <oasis:entry colname="col3">3000</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M71" display="inline"><mml:mrow><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">0.7</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0.85</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1.15</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1.3</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5">as <inline-formula><mml:math id="M72" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>O</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">(b) Mimic small ens. size</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M73" display="inline"><mml:mn mathvariant="bold">10</mml:mn></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">3000</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M74" display="inline"><mml:mrow><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">0.7</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0.85</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1.15</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1.3</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5">as <inline-formula><mml:math id="M75" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>O</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">(c) Mimic little variability in observed variance</oasis:entry>
         <oasis:entry colname="col2">100</oasis:entry>
         <oasis:entry colname="col3">3000</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M76" display="inline"><mml:mrow><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="bold">0.925</mml:mn><mml:mo mathvariant="bold">,</mml:mo><mml:mn mathvariant="bold">0.9625</mml:mn><mml:mo mathvariant="bold">,</mml:mo><mml:mn mathvariant="bold">1</mml:mn><mml:mo mathvariant="bold">,</mml:mo><mml:mn mathvariant="bold">1.0375</mml:mn><mml:mo mathvariant="bold">,</mml:mo><mml:mn mathvariant="bold">1.075</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5">as <inline-formula><mml:math id="M77" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>O</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">(d) Mimic large variability in observed variance</oasis:entry>
         <oasis:entry colname="col2">100</oasis:entry>
         <oasis:entry colname="col3">3000</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M78" display="inline"><mml:mrow><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="bold">0.475</mml:mn><mml:mo mathvariant="bold">,</mml:mo><mml:mn mathvariant="bold">0.7375</mml:mn><mml:mo mathvariant="bold">,</mml:mo><mml:mn mathvariant="bold">1</mml:mn><mml:mo mathvariant="bold">,</mml:mo><mml:mn mathvariant="bold">1.2625</mml:mn><mml:mo mathvariant="bold">,</mml:mo><mml:mn mathvariant="bold">1.525</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5">as <inline-formula><mml:math id="M79" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>O</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">(e) Mimic model error (too little variability)</oasis:entry>
         <oasis:entry colname="col2">100</oasis:entry>
         <oasis:entry colname="col3">3000</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M80" display="inline"><mml:mrow><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">0.7</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0.85</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1.15</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1.3</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5"><inline-formula><mml:math id="M81" display="inline"><mml:mrow><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="bold">0.925</mml:mn><mml:mo mathvariant="bold">,</mml:mo><mml:mn mathvariant="bold">0.9625</mml:mn><mml:mo mathvariant="bold">,</mml:mo><mml:mn mathvariant="bold">1</mml:mn><mml:mo mathvariant="bold">,</mml:mo><mml:mn mathvariant="bold">1.0375</mml:mn><mml:mo mathvariant="bold">,</mml:mo><mml:mn mathvariant="bold">1.075</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">(f) Mimic model error (too large variability)</oasis:entry>
         <oasis:entry colname="col2">100</oasis:entry>
         <oasis:entry colname="col3">3000</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M82" display="inline"><mml:mrow><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">0.7</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0.85</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1.15</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1.3</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5"><inline-formula><mml:math id="M83" display="inline"><mml:mrow><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="bold">0.475</mml:mn><mml:mo mathvariant="bold">,</mml:mo><mml:mn mathvariant="bold">0.7375</mml:mn><mml:mo mathvariant="bold">,</mml:mo><mml:mn mathvariant="bold">1</mml:mn><mml:mo mathvariant="bold">,</mml:mo><mml:mn mathvariant="bold">1.2625</mml:mn><mml:mo mathvariant="bold">,</mml:mo><mml:mn mathvariant="bold">1.525</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">(g/h) Mimic few verification dates</oasis:entry>
         <oasis:entry colname="col2">100</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M84" display="inline"><mml:mn mathvariant="bold">60</mml:mn></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M85" display="inline"><mml:mrow><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">0.7</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0.85</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1.15</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1.3</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5">as <inline-formula><mml:math id="M86" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>O</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e2029">Here, we choose a large ensemble size of <inline-formula><mml:math id="M87" display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">100</mml:mn></mml:mrow></mml:math></inline-formula>, which corresponds to the ensemble size of the latest operational subseasonal forecast system at ECMWF. Lower values of <inline-formula><mml:math id="M88" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula> then model smaller ensembles. We further choose a case sample size of <inline-formula><mml:math id="M89" display="inline"><mml:mrow><mml:mi>C</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">3000</mml:mn></mml:mrow></mml:math></inline-formula>, considerably larger than the number of 100-member forecasts analysed in this study (which is 181 for 100-member ensembles; see Sect. <xref ref-type="sec" rid="Ch1.S2.SS2"/>). Modifying <inline-formula><mml:math id="M90" display="inline"><mml:mi>C</mml:mi></mml:math></inline-formula> allows us to model the effect of this reduced sample size of cases. Within the reference experiment, we then vary the variability of observed spread <inline-formula><mml:math id="M91" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> by choosing values from the set <inline-formula><mml:math id="M92" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">0.7</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0.85</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1.0</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1.15</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1.3</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>. Specifically, the first <inline-formula><mml:math id="M93" display="inline"><mml:mrow><mml:mi>C</mml:mi><mml:mo>/</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula> forecasts use <inline-formula><mml:math id="M94" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.7</mml:mn></mml:mrow></mml:math></inline-formula>, the next <inline-formula><mml:math id="M95" display="inline"><mml:mrow><mml:mi>C</mml:mi><mml:mo>/</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula> forecasts use <inline-formula><mml:math id="M96" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.85</mml:mn></mml:mrow></mml:math></inline-formula>, and so on, with the final <inline-formula><mml:math id="M97" display="inline"><mml:mrow><mml:mi>C</mml:mi><mml:mo>/</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula> forecasts using <inline-formula><mml:math id="M98" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1.3</mml:mn></mml:mrow></mml:math></inline-formula>. Our qualitative conclusions are not sensitive to the precise distribution of the set <inline-formula><mml:math id="M99" display="inline"><mml:mi>S</mml:mi></mml:math></inline-formula>, but it can modify some aspects of the shape of the spread-error curves. However, we run experiments with a reduced or increased range of values in <inline-formula><mml:math id="M100" display="inline"><mml:mi>S</mml:mi></mml:math></inline-formula> to mimic underlying physical systems with low or large variability in their intrinsic uncertainty, respectively. These different physical systems could represent different spatial locations or periods in different seasons. Additionally, we can choose the forecast distribution (i.e., <inline-formula><mml:math id="M101" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>) to exactly match the observed distribution <inline-formula><mml:math id="M102" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> (which simulates a perfect model), or follow different distributions (which simulates model error). A summary of the different experiments and their associated parameter combinations is given in Table <xref ref-type="table" rid="T1"/>.</p>
</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>Sensitivities of spread-error curves in the toy model</title>
      <p id="d2e2276">Figure <xref ref-type="fig" rid="F4"/>a shows the spread-error curve for the reference toy model experiment. This reference experiment uses a large ensemble size (<inline-formula><mml:math id="M103" display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">100</mml:mn></mml:mrow></mml:math></inline-formula>) with good sampling (<inline-formula><mml:math id="M104" display="inline"><mml:mrow><mml:mi>C</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">3000</mml:mn></mml:mrow></mml:math></inline-formula>) and a correct model representation of the forecast distribution (<inline-formula><mml:math id="M105" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>). Therefore, the spread-error curve lies almost exactly on the <inline-formula><mml:math id="M106" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> line, as expected, with spread-reliability-slope of  <inline-formula><mml:math id="M107" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.99</mml:mn></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d2e2363">If we reduce the ensemble size to 10 members (Fig. <xref ref-type="fig" rid="F4"/>b), the spread-error curves becomes more shallow and hence the <inline-formula><mml:math id="M108" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> reduces as random differences between the ensemble sample variance (<inline-formula><mml:math id="M109" display="inline"><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula>) and the underlying forecast population variance (<inline-formula><mml:math id="M110" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula>) become larger.</p>
      <p id="d2e2416">The reduction of the <inline-formula><mml:math id="M111" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> in this case is not caused by changes in the underlying predictable signal, but by sampling uncertainty associated with the finite ensemble size. At the level of individual forecasts, sampling noise affects both the estimated ensemble spread and the associated forecast error, leading to substantial scatter in the spread-error relationship. However, the <inline-formula><mml:math id="M112" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> is not diagnosed from individual spread-error pairs, but from averages over many forecasts grouped by similar estimated spread. Sampling-induced fluctuations in forecast error are largely random across cases and therefore mostly cancel out in the bin means, such that uncertainty in the error does not systematically bias the mean spread–error relationship within a bin.</p>
      <p id="d2e2443">A systematic effect instead arises from sampling noise in the spread estimate itself. For small ensemble sizes, random under- and overestimation of the ensemble sample variance <inline-formula><mml:math id="M113" display="inline"><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> leads to a redistribution of forecasts across variance bins. Consequently, bins with high estimated spread include some forecasts with lower true uncertainty, while bins with low estimated spread include some forecasts with higher true uncertainty. This misclassification systematically reduces the contrast between low- and high-spread bins, flattening the spread-error curve and leading to a reduced <inline-formula><mml:math id="M114" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d2e2477">Note that a similar argument applies if <inline-formula><mml:math id="M115" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> is obtained from a linear regression of the full spread-error distribution, rather than via explicit variance binning, as the reduction in slope is likewise driven by sampling-induced distortion of the spread-error contrast. Random fluctuations in forecast error would, in this case, primarily increase the scatter around the regression and the associated uncertainty of the fit, while leaving the best-fit slope, and thus the inferred <inline-formula><mml:math id="M116" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula>, largely unchanged.</p>

      <fig id="F4" specific-use="star"><label>Figure 4</label><caption><p id="d2e2506">Toy model experiments illustrating the spread–error relationship (ensemble variance vs. mean squared error of the ensemble mean). Pink shading shows the 2D distribution of individual spread–error pairs. Black dots indicate the average spread and error within each bin. For visual guidance, these bin means are connected with thick dashed black lines. Note that the pink shading highlights where samples are most frequent, while the black dots show bin means and can therefore lie in low-density regions, particularly in the distribution tails. Thin black dotted lines indicate the overall mean spread (vertical) and error (horizontal). The solid grey diagonal line represents the <inline-formula><mml:math id="M117" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> relationship. The spread-reliability-slope (<inline-formula><mml:math id="M118" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula>) is indicated for each case.</p></caption>
          <graphic xlink:href="https://wcd.copernicus.org/articles/7/767/2026/wcd-7-767-2026-f04.png"/>

        </fig>

      <p id="d2e2539">An equivalent geometric perspective on the <inline-formula><mml:math id="M119" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> reduction is the following. Consider that in a system with perfectly reliable spread (in which <inline-formula><mml:math id="M120" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> = <inline-formula><mml:math id="M121" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula>), the forecasts with the largest true variance, <inline-formula><mml:math id="M122" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula>, should lie on the <inline-formula><mml:math id="M123" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> line when plotting variance against the case-averaged squared error. However, since neither <inline-formula><mml:math id="M124" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> nor <inline-formula><mml:math id="M125" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> are known in practice, we use the ensemble sample variance <inline-formula><mml:math id="M126" display="inline"><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> as an estimate. While this estimate is perfect in each single case in an infinitely large ensemble, it is subject to sampling noise when the ensemble size is finite. As a result, averages over cases with the highest spread values (i.e., those with the largest ensemble spread <inline-formula><mml:math id="M127" display="inline"><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula>) will not only include cases with genuinely large forecast variance <inline-formula><mml:math id="M128" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula>, but also cases with smaller forecast variance that appear inflated due to sampling fluctuations. These forecasts misclassified as high-spread tend to underestimate their expected error compared to what their spread suggests, thus lowering the average error for that variance value and pulling it below the 1:1 line. The reverse happens for averages over cases with the lowest spread values: it may include forecasts with higher true variance that were misclassified due to sampling variability. These cases have overestimated errors compared to what the spread suggests and increase the average error in the bin, pulling it above the 1:1 line. The net effect is a systematic reduction of the <inline-formula><mml:math id="M129" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula>. Note that bins in the centre of the spread range will usually include forecasts with both over- and underestimated spread. For those bins the biases due to sampling effects mostly cancel out.</p>
      <p id="d2e2723">This behaviour is consistent with ensemble sampling theory, which provides insight into why forecasts with few members often misrepresent variability in spread. With small ensemble sizes <inline-formula><mml:math id="M130" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula>, the sample variance becomes noisy (high “variance of the variance”), and individual forecasts may, by chance, fail to sample extreme outcomes, causing the actual error to exceed the predicted spread. Increasing <inline-formula><mml:math id="M131" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula> reduces this sampling error, as the standard error of the spread estimate decreases proportional to <inline-formula><mml:math id="M132" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:msqrt><mml:mi>M</mml:mi></mml:msqrt></mml:mrow></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx25" id="paren.24"><named-content content-type="pre">cf.</named-content></xref>. In general, larger ensembles are therefore required to obtain a stable spread–error relationship, especially for higher-order moments such as the ensemble variance.</p>
      <p id="d2e2758">The impact of sampling noise due to finite ensemble size also depends on the intrinsic variability of the true uncertainty, <inline-formula><mml:math id="M133" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula>, which is determined by the characteristics of the underlying physical system. If <inline-formula><mml:math id="M134" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> varies strongly across cases (i.e., if the range <inline-formula><mml:math id="M135" display="inline"><mml:mrow><mml:mfenced open="[" close="]"><mml:mrow><mml:msub><mml:mo>min⁡</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:msub><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>,</mml:mo><mml:mspace linebreak="nobreak" width="0.33em"/><mml:msub><mml:mo>max⁡</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:msub><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:mfenced></mml:mrow></mml:math></inline-formula> is large) then the relative effect of sampling noise becomes less significant. In such cases, the differences between <inline-formula><mml:math id="M136" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> and its noisy estimate <inline-formula><mml:math id="M137" display="inline"><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> are small compared to the variability in <inline-formula><mml:math id="M138" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> itself. As a result, even with a finite ensemble size, the binning of forecasts by spread is more robust, and the spread-skill relationship appears less distorted. This effect is illustrated in Fig. <xref ref-type="fig" rid="F4"/>c, d. In panel c, <inline-formula><mml:math id="M139" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> varies within a narrow range (0.925 to 1.075), while in Fig. <xref ref-type="fig" rid="F4"/>d it spans a much broader range (0.475 to 1.525). The comparison shows that larger variability in spread of the physical system improves the clarity of the spread-skill relationship under finite ensemble conditions, mitigating the <inline-formula><mml:math id="M140" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> reduction due to sampling error.</p>
      <p id="d2e2930">In addition to sampling error and natural variability in spread, model biases in how intrinsic uncertainty responds to physical drivers can also affect the spread-error relationship. For instance, anomalies in ensemble spread may be systematically too small if the model responds too weakly to teleconnection patterns. The opposite would be true if the model responds too strongly to teleconnections. In our toy model, such biases can be mimicked by choosing different values for the variance of observed (<inline-formula><mml:math id="M141" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>) and modelled (<inline-formula><mml:math id="M142" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>) distributions. Figure <xref ref-type="fig" rid="F4"/>e and f show experiments where the model over- or underestimates the anomalies in spread. Such a misrepresentation of the spread leads to a stretching or compression of the distribution in variance-direction. Analogous to the effect discussed with regards to sampling error, this will affect the slope of the spread-error curve and alter the <inline-formula><mml:math id="M143" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula>. In general, an over-estimation of spread variability (so <inline-formula><mml:math id="M144" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">F</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub><mml:mo>/</mml:mo><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">c</mml:mi></mml:mrow></mml:msub><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> and a stretching in variance direction) will lead to <inline-formula><mml:math id="M145" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, while an under-estimation of spread variability will do the opposite and lead to <inline-formula><mml:math id="M146" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>. Note that here we are discussing over- and under-estimation of the variations in spread, and not an overall over- or under-estimation of the mean spread (which may or may not be accurate on average).</p>
      <p id="d2e3044">Next we analyse the effect of a limited number of cases, i.e., few forecast-observation pairs (small <inline-formula><mml:math id="M147" display="inline"><mml:mi>C</mml:mi></mml:math></inline-formula>). This will in general lead to a violation of the underlying equality between errors and spread (see Sect. <xref ref-type="sec" rid="Ch1.S3"/>) and introduce random deviations of the spread-error curve from the <inline-formula><mml:math id="M148" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> line. Since these deviations are unsystematic, they can randomly lead to increases or decreases of the <inline-formula><mml:math id="M149" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula>. A system with an under-representation of cases can therefore, in principle, produce an <inline-formula><mml:math id="M150" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> larger than 1 (see Fig. <xref ref-type="fig" rid="F4"/>h), smaller than 1 or even smaller than 0 (see Fig. <xref ref-type="fig" rid="F4"/>g). Note that in situations with small <inline-formula><mml:math id="M151" display="inline"><mml:mi>C</mml:mi></mml:math></inline-formula> the <inline-formula><mml:math id="M152" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> could take very large or very small (or even negative) values despite the underlying ensemble forecast having many members and no model error.</p>
      <p id="d2e3116">The toy model presented in this section allowed us to study the effects of different mechanisms on the spread-error curves and the <inline-formula><mml:math id="M153" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> in an isolated manner. While some of these effects are systematic and always increase or decrease the <inline-formula><mml:math id="M154" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> (e.g. the ensemble size effect), others are unsystematic (e.g. associated with number of forecast cases). A forecasting system will typically suffer from multiple error sources. This can lead to a superposition of the corresponding effects and hence <inline-formula><mml:math id="M155" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> values that either deviate strongly from one for multiple reasons, or <inline-formula><mml:math id="M156" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> values close to 1 despite major error sources due to cancellation. The next section analyses the reliability of spread forecasts in subseasonal ensembles by trying to disentangle the different mechanisms and studying their potential importance individually.</p>
</sec>
</sec>
<sec id="Ch1.S5">
  <label>5</label><title>Reliability of operational forecasts</title>
<sec id="Ch1.S5.SS1">
  <label>5.1</label><title>Sampling error due to ensemble size</title>
      <p id="d2e3184">Various mechanisms can affect the reliability of spread forecasts and lead to deviations of the <inline-formula><mml:math id="M157" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> from unity, as shown within the toy model in Sect. <xref ref-type="sec" rid="Ch1.S4"/>. This section goes through the list of individual mechanisms and analyses their importance within subseasonal ensemble forecasts of the real atmosphere.</p>
      <p id="d2e3201">We start by analysing the effect of sampling error on the <inline-formula><mml:math id="M158" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula>. Figure <xref ref-type="fig" rid="F5"/> shows the reliability of z1000 spread within the northern hemisphere for three different ensemble sizes. It can be seen that the <inline-formula><mml:math id="M159" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> generally increases with ensemble size. While 10-member ensembles show poor spread reliability almost throughout the entire hemisphere (<inline-formula><mml:math id="M160" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> close to zero), 50-member ensembles exhibit substantially more reliable spread in various regions (e.g. northern Europe, eastern Asia, western North America or around the Gulf of Mexico), with <inline-formula><mml:math id="M161" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> closer to 1. We see further <inline-formula><mml:math id="M162" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> improvements in many of these regions, when increasing the ensemble size to 100 members. However, for 100-member ensembles we find regions with <inline-formula><mml:math id="M163" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> larger 1 or negative <inline-formula><mml:math id="M164" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula>. This is likely due to the limited number of 100-member ensembles available and reflects an under-representation of atmospheric evolutions in the system (cf. Fig. <xref ref-type="fig" rid="F4"/>c).</p>

      <fig id="F5" specific-use="star"><label>Figure 5</label><caption><p id="d2e3295"><inline-formula><mml:math id="M165" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> metric computed as the slope of the z1000 spread-error curve in the northern Hemisphere. As Fig. <xref ref-type="fig" rid="F3"/> but for forecast of different ensemble size (10, 50 and 100 members). Note that panel <bold>(b)</bold> is identical to Fig. <xref ref-type="fig" rid="F3"/>.</p></caption>
          <graphic xlink:href="https://wcd.copernicus.org/articles/7/767/2026/wcd-7-767-2026-f05.png"/>

        </fig>

      <p id="d2e3323">Although 50 and 100 member ensembles show generally reliable spread in many regions, other regions do not exhibit visible improvements with increasing ensemble size. A pronounced region in eastern Canada, for example, is associated with a slope robustly close to zero. Other effects therefore seem to play an important role here, reducing the reliability of spread fluctuations in the forecasts. In the next section, we show that a lack of variability within the physical system is a major contributor to the lack of reliability in these regions.</p>
</sec>
<sec id="Ch1.S5.SS2">
  <label>5.2</label><title>Intrinsic variability of the physical system</title>
      <p id="d2e3334">As shown within the toy model in Sect. <xref ref-type="sec" rid="Ch1.S4"/>, the intrinsic variability of spread within the underlying physical system can have a strong effect on the spread-error curve and the <inline-formula><mml:math id="M166" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula>, due to modification of the sampling error effect. To illustrate and further quantify the effect of variability in spread we contrast two example locations, one in eastern Canada (55° N<inline-formula><mml:math id="M167" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>70° W) and one in northern Europe (60° N<inline-formula><mml:math id="M168" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>15° E). While northern Europe shows strong improvement of <inline-formula><mml:math id="M169" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> with increasing ensemble size and very reliable spread forecasts for 50- and 100-member ensembles, eastern Canada shows low <inline-formula><mml:math id="M170" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> for all ensemble sizes (Fig. <xref ref-type="fig" rid="F5"/>).</p>

      <fig id="F6"><label>Figure 6</label><caption><p id="d2e3394">Evolution of leadtime-dependent ensemble variance in <bold>(a)</bold> Northern Europe at 60° N<inline-formula><mml:math id="M171" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>15° E and <bold>(b)</bold> eastern Canada at 55° N<inline-formula><mml:math id="M172" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>70° W. Thick black line shows the climatology for February as average over all February initialisations, with shading indicating one standard deviation around the mean. Red and blue lines show variance evolution of example forecasts with 50 members initialised on 20 February of the years 2020 and 2023, respectively. Dashed horizontal lines show the average over the respective variance between days 14 and 46. Vertical dashed line indicated day 14 as visual aid.</p></caption>
          <graphic xlink:href="https://wcd.copernicus.org/articles/7/767/2026/wcd-7-767-2026-f06.png"/>

        </fig>

      <p id="d2e3423">Figure <xref ref-type="fig" rid="F6"/> indicates the variability of ensemble spread at these two points. It can be seen that the climatological day-to-day variability in spread is generally larger in northern Europe (Fig. <xref ref-type="fig" rid="F5"/>a) than eastern Canada (Fig. <xref ref-type="fig" rid="F5"/>b). The difference in variability between two points in time and at a given location mostly comes from slowly varying modes of atmospheric variability that affect the spread, as discussed in the following. To distinguish between slow and fast modes of variability, we will decompose changes in spread within and across forecasts into two components. These components are defined formally below using the ensemble variance as a function of forecast case and lead time.</p>
      <p id="d2e3433">Let <inline-formula><mml:math id="M173" display="inline"><mml:mrow><mml:msup><mml:mi>s</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>(</mml:mo><mml:mi>c</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> denote the ensemble variance of forecast <inline-formula><mml:math id="M174" display="inline"><mml:mi>c</mml:mi></mml:math></inline-formula> at lead time <inline-formula><mml:math id="M175" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>. We use the notation <inline-formula><mml:math id="M176" display="inline"><mml:mrow><mml:mo>〈</mml:mo><mml:mo>⋅</mml:mo><mml:msub><mml:mo>〉</mml:mo><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M177" display="inline"><mml:mrow><mml:mo>〈</mml:mo><mml:mo>⋅</mml:mo><mml:msub><mml:mo>〉</mml:mo><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> to denote averages over lead times and forecasts, respectively. The specific lead-time range over which the temporal averages are taken can be chosen as appropriate for the application.</p>
      <p id="d2e3501"><list list-type="bullet">
            <list-item>

      <p id="d2e3506"><italic>Inter-variability</italic>: This quantifies differences in the mean spread between different forecasts and is defined as the variance across forecasts of the time-averaged ensemble variance, <inline-formula><mml:math id="M178" display="inline"><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mi mathvariant="normal">inter</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:mi>C</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:mfrac></mml:mstyle><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>c</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>C</mml:mi></mml:msubsup><mml:msup><mml:mfenced open="(" close=")"><mml:mrow><mml:mo>〈</mml:mo><mml:msup><mml:mi>s</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:msub><mml:mo>〉</mml:mo><mml:mi>t</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mfenced close="〉" open="〈"><mml:mrow><mml:mo>〈</mml:mo><mml:msup><mml:mi>s</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:msub><mml:mo>〉</mml:mo><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:mfenced><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>. It therefore characterises variability in spread associated with differences between forecast cases, for example due to slowly varying modes of variability.</p>
            </list-item>
            <list-item>

      <p id="d2e3589"><italic>Intra-variability</italic>: This quantifies day-to-day fluctuations of the spread within individual forecasts and is defined as the forecast-mean of the temporal variance of the ensemble variance, <inline-formula><mml:math id="M179" display="inline"><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mi mathvariant="normal">intra</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mfenced close="〉" open="〈"><mml:mrow><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:mfrac></mml:mstyle><mml:msub><mml:mo>∑</mml:mo><mml:mi>t</mml:mi></mml:msub><mml:msup><mml:mfenced close=")" open="("><mml:mrow><mml:msup><mml:mi>s</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>-</mml:mo><mml:mo>〈</mml:mo><mml:msup><mml:mi>s</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:msub><mml:mo>〉</mml:mo><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mfenced><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M180" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> denotes the number of lead times included in the temporal average. This quantity therefore captures the contribution of faster, intra-forecast variability to the evolution of ensemble spread.</p>
            </list-item>
          </list></p>
      <p id="d2e3670">The concept of inter- and intra-variability is visualised in Fig. <xref ref-type="fig" rid="F6"/> by two example forecasts. Both forecasts are initialised on 20 February, but in two different years: 2020 and 2023. It can be seen that in northern Europe, the two forecasts are associated with substantially different spread at subseasonal leadtimes, suggesting large inter-variability of the spread in this region. The inter-variability, i.e., the difference between subseasonally averaged variances of the two forecasts, is of the same order as the intra-variability, i.e., the day-to-day fluctuations in spread. In eastern Canada, on the other hand, the two example forecasts do not essentially differ in their subseasonal mean variance, and only show deviations from each other due to intra-variability (i.e., day-to-day variations).</p>
      <p id="d2e3675">Figure <xref ref-type="fig" rid="F7"/> displays the dependence of inter- and intra-variability on underlying ensemble size at the two points in eastern Canada and northern Europe. This allows us to study the two variability components more systematically. Figure <xref ref-type="fig" rid="F7"/> further shows how the two components should depend on sample size if variations were entirely due to sampling error and not due to physical drivers. Under this assumption, the theoretical sampling error scaling is taken to decrease with ensemble size <inline-formula><mml:math id="M181" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula> as <inline-formula><mml:math id="M182" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:msqrt><mml:mi>M</mml:mi></mml:msqrt></mml:mrow></mml:math></inline-formula>, with the reference value estimated from the corresponding variability obtained for 10-member ensembles. It can be seen that in northern Europe the inter-variability converges clearly to a value of about 3000 m<sup>2</sup> for large ensembles and does not follow the theoretical line of sampling errors. This suggests a pronounced inter-variability in the physical system, which is well-sampled with ensemble sizes exceeding about 50 members. Intra variability, however, follows almost perfectly the theoretical line of sampling errors, suggesting that day-to-day variability is entirely spurious. In general, this leads to a gradual increase of the ratio of inter- over intra-variability, which exceeds one at an ensemble size of about 50.</p>

      <fig id="F7" specific-use="star"><label>Figure 7</label><caption><p id="d2e3714">Average <bold>(a, b)</bold> inter and <bold>(c, d)</bold> intra variability at subseasonal leadtimes computed for forecasts with varying ensemble size. <bold>(e)</bold> and <bold>(f)</bold> show the ratio of inter-over-intra variability. Top row <bold>(a, c, e)</bold> shows a point in northern Europe at 60° N<inline-formula><mml:math id="M184" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>15° E and bottom row <bold>(b, d, f)</bold> shows a point in eastern Canada at 55° N<inline-formula><mml:math id="M185" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>70° W. Blue thin lines indicate theoretical dependency of variability components on ensemble size <inline-formula><mml:math id="M186" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula>, computed as value for 10 member ensembles divided by <inline-formula><mml:math id="M187" display="inline"><mml:msqrt><mml:mi>M</mml:mi></mml:msqrt></mml:math></inline-formula>. Dashed horizontal lines in <bold>(e)</bold> and <bold>(f)</bold> indicate a ratio of 1.</p></caption>
          <graphic xlink:href="https://wcd.copernicus.org/articles/7/767/2026/wcd-7-767-2026-f07.png"/>

        </fig>

      <p id="d2e3779">For the point in eastern Canada, both inter- and intra-variability follow rather closely the line of sampling error theory. Even for 100 member ensembles the inter-variability has not converged yet and seems to be substantially affected by sampling errors. This leads to generally low inter-over-intra ratios. Figure <xref ref-type="fig" rid="F7"/> suggests that intra-variability is almost entirely spurious and a result of sampling error. The inter-over-intra ratio can therefore be interpreted as ratio of natural spread variability of the system compared to sampling error effects. The spatially resolved maps of inter- and intra-variability, as well as the ratio, are shown in Fig. S1 in the Supplement.</p>
      <p id="d2e3784">As discussed before and shown with the toy model in Fig. <xref ref-type="fig" rid="F4"/>c and d, large intrinsic variability can reduce the effects of sampling error on the spread-error curve and hence provide reliability of the fluctuations in ensemble spread of a forecast (i.e., increased <inline-formula><mml:math id="M188" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula>). Figure <xref ref-type="fig" rid="F8"/>a shows that, indeed, regions with large inter-over-intra ratio have generally large <inline-formula><mml:math id="M189" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula>. The regions gain their spread reliability from slowly varying modes of variability that affect the forecast uncertainty. In that sense, these are also regions that show the potential to develop windows of forecast opportunity. The ensemble spread in regions with low inter-over-intra ratio and corresponding low <inline-formula><mml:math id="M190" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> (like eastern Canada) is dominated by spurious day-to-day variability but does not show robust and persistent changes in forecast uncertainty for the studied ensemble sizes. The pattern correlation between the <inline-formula><mml:math id="M191" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> and the inter-over-intra ratio for the northern hemisphere is 0.44 based on 50-member ensembles. This correlation increases to 0.82 for the perfect model approach discussed below in Sect. <xref ref-type="sec" rid="Ch1.S5.SS3"/>. Further note that the regions with large inter-over-intra variability are roughly consistent with regions that show large relative variability in ensemble spread, as shown in Fig. <xref ref-type="fig" rid="F1"/>.</p>
      <p id="d2e3844">Some of the spread reliability in subseasonal z1000 spread comes from seasonal evolution, which also gives a slowly varying mode of atmospheric variability. Figure <xref ref-type="fig" rid="F8"/>b shows that the slope of spread-error curves generally decreases when computed for spread and error data that has been de-seasonalised. In particular, the north Atlantic and European regions have substantially reduced spread reliability when seasonal effects are removed. Reliability at the point in northern Europe (at 60° N<inline-formula><mml:math id="M192" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>15° E) reduces from 0.63 to 0.41. However, we find that inter-variability of spread is still a major source of reliability in de-seasonalised data, with pattern correlation between <inline-formula><mml:math id="M193" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> and inter-over-intra ratio even increasing from 0.44 to 0.51 for the northern hemisphere.</p>
      <p id="d2e3868">The separation into inter- and intra-variability also connects our framework to classical frequency-based analyses of atmospheric variance. For example, <xref ref-type="bibr" rid="bib1.bibx3" id="text.25"/> decomposed Northern Hemisphere 500 hPa height variability into long, intermediate, and short time-scale components, highlighting the dynamical importance of low-frequency planetary-scale fluctuations. In our terminology, inter-variability reflects changes in spread between forecasts and therefore captures modulation of intrinsic predictability on similarly low-frequency time scales. In contrast, intra-variability is dominated by higher-frequency fluctuations and sampling effects. A more explicit frequency decomposition of the underlying flow could help to attribute inter-variability to specific variability bands or teleconnection patterns, and could be subject to future research.</p>

      <fig id="F8" specific-use="star"><label>Figure 8</label><caption><p id="d2e3876">(shading) <inline-formula><mml:math id="M194" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> showing the slope of z1000 spread-error curve (cf. Fig. <xref ref-type="fig" rid="F3"/>) and (contours) inter-over-intra variability ratio [unitless] in the northern hemisphere for 50 member ensembles. <bold>(a)</bold> computed based on the full spread and error and <bold>(b)</bold> for de-seasonalised data, where anomalies in spread and error are computed by removing a time-dependent climatology. Crosses in both panels indicate the points in northern Europe and eastern Canada analysed in other figures. Top right title in each panel shows the area-weighted pattern correlation between the slope of spread-error curves and the inter-over-intra ratio.</p></caption>
          <graphic xlink:href="https://wcd.copernicus.org/articles/7/767/2026/wcd-7-767-2026-f08.png"/>

        </fig>

</sec>
<sec id="Ch1.S5.SS3">
  <label>5.3</label><title>Model error and under-representation of evolutions</title>
      <p id="d2e3913">In the previous sections we studied the effects of sampling error and natural intrinsic variability of uncertainty on the reliability of ensemble spread in subseasonal forecast models. However, forecast models may not always accurately represent all physical processes and can hence misrepresent the flow-dependence of the forecast uncertainty. In this section we assess model error effects in two complementary ways: by comparing different forecast systems and by using a perfect-model framework within a single system.</p>
      <p id="d2e3916">A comparison between the IFS model primarily analysed in this study and the CNRM model (Fig. S2) shows qualitatively similar large-scale spatial patterns of spread reliability and a comparable dependence on ensemble size, despite the much smaller CNRM sample. This consistency suggests that the large-scale geography of spread reliability is strongly influenced by flow-dependent variability in the underlying physical system of the atmosphere, while model-specific errors primarily modulate, rather than determine, these patterns. Nevertheless, quantitative differences between models indicate that model error may still affect the regional amplitude of reliability.</p>
      <p id="d2e3919">To quantify model-error effects more directly within a single forecast system, we performed an analysis using a perfect-model approach: instead of computing the errors of the prediction as difference between the ensemble mean and re-analysis data (which we regard as quasi-observations), we assumed the truth to be given by one of the ensemble members of the forecast. The prediction errors for an <inline-formula><mml:math id="M195" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula> member forecast are then given by the mean of the remaining <inline-formula><mml:math id="M196" display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> ensemble members and that single selected member. The associated ensemble spread is also computed based on <inline-formula><mml:math id="M197" display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> members. This approach ensures that the model spread is on average exactly equal to the prediction error, i.e., we have a perfectly reliable ensemble. However, this exact equality only holds if the approach is performed <inline-formula><mml:math id="M198" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula> times, i.e., once for each of the ensemble members of the original ensemble, and then averaged over these <inline-formula><mml:math id="M199" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula> sets. By only computing the perfect-model error and variance once for every given forecast (so based on a single “model truth”), we retain the same sampling as is given for the true errors computed based on re-analysis data.</p>

      <fig id="F9" specific-use="star"><label>Figure 9</label><caption><p id="d2e3970"><inline-formula><mml:math id="M200" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> describing the slope of z1000 spread-error curves with errors computed <bold>(a)</bold> with respect to re-analysis data and <bold>(c)</bold> based on the perfect model approach, i.e., with respect to a single ensemble member. <bold>(b)</bold> and <bold>(d)</bold> show the slope anomalies, computed as deviations from the hemispheric means, to highlight spatial structures. Contour lines in <bold>(b)</bold> and <bold>(d)</bold> show inter-over-intra ratio [unitless] with pattern correlations between slope and ratio over the northern hemisphere are indicated in the top right. All panels based on 50 member ensembles. Points in northern Europe (60° N<inline-formula><mml:math id="M201" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>15° E) and eastern Canada (55° N<inline-formula><mml:math id="M202" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>70° W) are indicated by crosses. Note that panels <bold>(b)</bold> and <bold>(d)</bold> use different colour scales to emphasise the similarity of spatial patterns; the smaller anomaly magnitudes in <bold>(d)</bold> are evident from the reduced colour-bar range.</p></caption>
          <graphic xlink:href="https://wcd.copernicus.org/articles/7/767/2026/wcd-7-767-2026-f09.png"/>

        </fig>

      <p id="d2e4032">Figure <xref ref-type="fig" rid="F9"/> shows the <inline-formula><mml:math id="M203" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> in the northern hemisphere for 50-member ensembles computed from re-analysis data and using the perfect model approach, respectively. To highlight the spatial structures, Fig. <xref ref-type="fig" rid="F9"/>b and d show the <inline-formula><mml:math id="M204" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> anomaly from the hemispheric mean of each respective experiment. In general, the spatial patterns in reliability of the perfect model approach matches well with the reliability computed with respect to re-analysis. This agreement indicates that the spatial patterns seen in the <inline-formula><mml:math id="M205" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> maps mostly result from spatial inhomogeneities in the physical system (see Sect. <xref ref-type="sec" rid="Ch1.S5.SS2"/>). The correlation between <inline-formula><mml:math id="M206" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> and inter-over-intra ratio for the northern hemisphere in 50-member ensembles is 0.82, further supporting the interpretation that these spatial structures are largely governed by slowly evolving modes within the underlying physical system (Fig. <xref ref-type="fig" rid="F9"/>d). At the same time, magnitudes of <inline-formula><mml:math id="M207" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> anomalies are generally smaller for the perfect-model approach, indicating that model errors in representing spread variability modulate the regional amplitude of reliability.</p>
      <p id="d2e4104">This interpretation is reinforced by the hemispheric-mean values: the area-weighted mean <inline-formula><mml:math id="M208" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> is 0.395 for the perfect-model framework and 0.391 when verified against re-analysis. This small difference shows that model-error effects only weakly modify the hemispheric-mean spread reliability. Taken together, these results suggest that the large-scale structure of spread reliability is primarily governed by slowly evolving large-scale modes of variability, including teleconnections, while model errors mainly redistribute reliability at regional or smaller spatial scales.</p>
</sec>
<sec id="Ch1.S5.SS4">
  <label>5.4</label><title>Post-processing and practical calibration of spread</title>
      <p id="d2e4127">One practical way to exploit our findings is to define a “corrected variance” that enforces an <inline-formula><mml:math id="M209" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> value of one (Fig. <xref ref-type="fig" rid="F10"/>). Such a correction could be constructed in various ways, with an intuitive way being the following: let <inline-formula><mml:math id="M210" display="inline"><mml:mover accent="true"><mml:mrow><mml:msup><mml:mi>s</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:math></inline-formula> denote the climatological mean of the ensemble spread <inline-formula><mml:math id="M211" display="inline"><mml:mrow><mml:msup><mml:mi>s</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> at a given grid point. A post-processed variance <inline-formula><mml:math id="M212" display="inline"><mml:mover accent="true"><mml:mrow><mml:msup><mml:mi>s</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover></mml:math></inline-formula> could be obtained based on the <inline-formula><mml:math id="M213" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> value computed from the associated spread-error curve via <inline-formula><mml:math id="M214" display="inline"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:msup><mml:mi>s</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mover accent="true"><mml:mrow><mml:msup><mml:mi>s</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mo>+</mml:mo><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:msup><mml:mi>S</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mfenced close=")" open="("><mml:mrow><mml:msup><mml:mi>s</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mrow><mml:msup><mml:mi>s</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:mrow></mml:mfenced></mml:mrow></mml:math></inline-formula>. This transformation effectively rescales deviations of the instantaneous spread from its climatological mean according to the inverse slope of the diagnosed spread–error relationship. Importantly, the climatological mean spread is not modified. The correction only adjusts the amplitude of flow-dependent spread fluctuations while preserving the underlying mean uncertainty level.</p>

      <fig id="F10"><label>Figure 10</label><caption><p id="d2e4256">Evolution of z1000 ensemble variance as function leadtime at points in <bold>(a)</bold> northern Europe (60° N<inline-formula><mml:math id="M215" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>15° E) and <bold>(b)</bold> eastern Canada (55° N<inline-formula><mml:math id="M216" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>70° W). Black line shows climatological mean with shading indicating one standard deviation around the mean. Blue line shows variance of the example forecast initialised 20th February 2023. Pink line shows the corrected ensemble variance for that example forecast by post-processing to achieve perfect reliability.</p></caption>
          <graphic xlink:href="https://wcd.copernicus.org/articles/7/767/2026/wcd-7-767-2026-f10.png"/>

        </fig>

      <p id="d2e4285">By construction, this rescaling ensures that ensemble variance and squared error align on average. In regions where <inline-formula><mml:math id="M217" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi><mml:mo>≈</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, the ensemble spread already provides a reliable estimate of forecast uncertainty and the correction is therefore negligible, preserving genuine flow-dependent information. In regions where <inline-formula><mml:math id="M218" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi><mml:mo>≪</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, fluctuations in ensemble spread are not strongly linked to variations in forecast error. In such cases, the correction effectively reduces the influence of these unreliable fluctuations and shifts the variance estimate closer to its climatological baseline, leading to a more stable and statistically consistent measure of uncertainty. In areas with very small inter-variability, the corrected variance therefore remains close to the climatological mean, reflecting the limited intrinsic potential for windows of opportunity.</p>
      <p id="d2e4321">This adjustment can be applied in real time and provides a transparent bridge between ensemble output and user needs for calibrated risk estimates. We emphasise that the approach assumes an approximately linear and stationary spread–error relationship. More sophisticated implementations could allow the correction factor to depend on season or flow regime, although such refinements are beyond the scope of the present study.</p>
      <p id="d2e4324">The ideas described in this section follow closely suggestions proposed by <xref ref-type="bibr" rid="bib1.bibx8" id="text.26"/> based on ideal statistical models, where large case-to-case variability is necessary to obtain reliable and practically useful spread forecasts. The present results provide an empirical demonstration of this principle in operational subseasonal ensemble forecasts.</p>
</sec>
<sec id="Ch1.S5.SS5">
  <label>5.5</label><title>Effect of temporal averaging</title>
      <p id="d2e4339">Another practical way to improve the reliability of subseasonal uncertainty estimates that emerged from this study, in addition to a direct post-processing discussed in Sect. <xref ref-type="sec" rid="Ch1.S5.SS4"/>, is through additional time averaging. Starting from daily spread values, averaging the spread over subseasonal lead times suppresses spurious intra-forecast variability and increases the stability of the spread-error relationship, particularly when the relevant signal originates from slowly evolving large-scale modes. Such time-averaging approaches mimic the effect of a larger ensemble sizes and can enable even 10-member ensembles to outperform the daily reliability of larger ensembles based on the <inline-formula><mml:math id="M219" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula>, as shown in Fig. <xref ref-type="fig" rid="F11"/>. This improvement arises because averaging reduces sampling-induced variability in the ensemble variance, analogous to increasing the effective ensemble size. Alternatively, one can first average the ensemble members in time and then compute spread and error from these weekly means, which acts as a low-pass filter and emphasises slow variability. Such weekly mean datasets are widely used at subseasonal leadtimes. Supplementary Fig. S3 compares these two strategies and supports the finding of enhanced reliability through time-averaging, with subtle regional differences: for example, over the polar Atlantic, averaging the daily spread yields higher reliability (larger <inline-formula><mml:math id="M220" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula>) than computing spread from averaged fields. This difference potentially suggests that flow-dependent reliability in this region is partly linked to faster synoptic variability rather than predominantly slow modes.</p>

      <fig id="F11" specific-use="star"><label>Figure 11</label><caption><p id="d2e4372">Comparison between reliability of daily spread values and spread averaged over subseasonal leadtimes (days 14–46) at points in <bold>(a, c)</bold> eastern Canada (55° N<inline-formula><mml:math id="M221" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>70° W) and <bold>(b, d)</bold> northern Europe (60° N<inline-formula><mml:math id="M222" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>15° E). Top row shows reliability in 50-member ensembles, bottom row in 10-member ensembles. Numbers in the top-left corner of each panel indicate the <inline-formula><mml:math id="M223" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula>, describing the slopes of linear fits through the reliability curve.</p></caption>
          <graphic xlink:href="https://wcd.copernicus.org/articles/7/767/2026/wcd-7-767-2026-f11.png"/>

        </fig>

      <p id="d2e4413">While time averaging can improve reliability, other factors such as systematic model biases also influence the spread-error relationship. A systematic displacement of the ensemble mean adds a constant contribution to the squared-error term, systematically shifting each point in a spread-error scatter plot (like Fig. <xref ref-type="fig" rid="F2"/>b) while leaving its slope unchanged. On the other hand, ensemble mean biases can affect the ensemble spread when internal dynamics couple the mean flow to extreme behaviour. <xref ref-type="bibr" rid="bib1.bibx17" id="text.27"/>, for example, discuss how an anomalous position of the Atlantic storm track in the ensemble mean flow can lower the likelihood of storms over northern Europe, thereby reducing ensemble variance in that region. For our analysis framework, however, such cases can simply be considered as model errors in terms of spread itself and should be diagnosable thorough perfect model approaches as done in Sect. <xref ref-type="sec" rid="Ch1.S5.SS3"/>.</p>
</sec>
</sec>
<sec id="Ch1.S6" sec-type="conclusions">
  <label>6</label><title>Conclusions and discussion</title>
      <p id="d2e4432">Our analysis of subseasonal winter-time forecasts, aided by an idealised toy model, shows that the reliability of ensemble spread depends on three intertwined factors: (1) sampling error (either related to a small ensemble size or to a small number of ensemble forecast), (2) the strength of the contribution of physically driven variability in intrinsic uncertainty and (3) how well the model captures this variability. Regions such as northern Europe, the mid-east Pacific and the tropical west Pacific exhibit consistently high <inline-formula><mml:math id="M224" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> values (i.e., high spread reliability), often exceeding 0.6 for 50-member ensembles. Accordingly, our results should not be interpreted as identifying regions of uniformly high forecast skill, but rather regions where forecast uncertainty and error vary in a flow-dependent manner that is reliably captured by ensemble spread. These <inline-formula><mml:math id="M225" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> hotspots coincide with areas influenced by slowly varying atmospheric modes that provide “windows of forecast opportunity”. In northern Europe, for instance, the downward influence of the polar stratosphere has been linked to multi-week periods of anomalously low spread, due to a reduction in storm-induced synoptic variability <xref ref-type="bibr" rid="bib1.bibx22 bib1.bibx17" id="paren.28"/>. This process seems to be well-captured in forecast models and hence leads to high values of <inline-formula><mml:math id="M226" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> in northern Europe. The mid-east Pacific signal could reflect ENSO modulation of the jet, while tropical West Pacific reliability may arise from the MJO’s planetary wave response, though these connections remain speculative and warrant targeted process studies.</p>
      <p id="d2e4474">In contrast, eastern Canada displays almost no reliability even when 100 members are available, with <inline-formula><mml:math id="M227" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> essentially zero. Consistently, the ensemble variance in eastern Canada is nearly constant through the subseasonal range, suggesting the atmosphere itself offers little low-frequency modulation of forecast uncertainty. Enlarging the ensemble further would therefore add computational cost without creating useful information in terms of forecast uncertainty because the intrinsic potential for windows of opportunity is vanishingly small.</p>

      <fig id="F12" specific-use="star"><label>Figure 12</label><caption><p id="d2e4491">As Fig. <xref ref-type="fig" rid="F8"/> but for t2m rather than z1000.</p></caption>
        <graphic xlink:href="https://wcd.copernicus.org/articles/7/767/2026/wcd-7-767-2026-f12.png"/>

      </fig>

      <p id="d2e4503">Beyond the contrast between highly reliable regions such as northern Europe and nearly constant-spread regions such as eastern Canada, the North Atlantic sector itself exhibits additional structure (cf. Fig. <xref ref-type="fig" rid="F3"/>). In particular, the higher reliability over the northern part of the basin compared to the southern centre of action suggests that mechanisms projecting onto the NAO may influence forecast uncertainty asymmetrically. This behaviour is consistent with the pronounced maximum in relative spread variability over the northern North Atlantic (Fig. <xref ref-type="fig" rid="F1"/>). While enhanced variability is also present in the subtropical North Atlantic, it appears displaced further south (around 20–30° N), suggesting that distinct processes may contribute there. Previous work has shown that subseasonal spread anomalies linked to stratospheric variability can project more strongly onto northern NAO regions than onto the southern centre <xref ref-type="bibr" rid="bib1.bibx22" id="paren.29"/>. In addition, modulation of synoptic eddy activity has been identified as a key mechanism shaping subseasonal forecast spread over the North Atlantic <xref ref-type="bibr" rid="bib1.bibx17" id="paren.30"/>, suggesting that variability in eddy magnitude, rather than purely meridional shifts of storm tracks, may contribute to this asymmetric reliability pattern. Given the limited number of forecast initialisations, however, finer-scale spatial details should be interpreted with caution.</p>
      <p id="d2e4516">Generally, the reliability of spread forecasts can vary for different variables analysed. This dependence partly arises through different flow dependences of the ensemble spread on the basic state <xref ref-type="bibr" rid="bib1.bibx23" id="paren.31"><named-content content-type="pre">as, e.g., shown in</named-content></xref>. Figure <xref ref-type="fig" rid="F12"/>a shows the <inline-formula><mml:math id="M228" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> metric for the 2-metre temperature (t2m) and compares it to the t2m inter-variability. Two pronounced regions of high <inline-formula><mml:math id="M229" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> are clearly visible around 60° N, forming band-like structures across the two major northern-hemispheric landmasses. These regions also show high values of inter-over-intra variability, further suggesting that subseasonal reliability is mostly driven by slowly evolving modes, although the overall pattern correlation for the northern hemisphere is relatively small. Further, we find that the high <inline-formula><mml:math id="M230" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> for the t2m field in these regions mostly arises from the seasonal winter-to-spring transition in surface temperatures, which is most pronounced in the mid-latitudes and over land. Correspondingly, the <inline-formula><mml:math id="M231" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> strongly decreases in those regions when computing the reliability based on deseasonalised spread data (Fig. <xref ref-type="fig" rid="F12"/>b).</p>
      <p id="d2e4577">The results presented here might also be relevant with regard to the so-called “signal-to-noise paradox” (SNP), described by <xref ref-type="bibr" rid="bib1.bibx18" id="text.32"/>. The paradox refers to an apparent mismatch in climate and seasonal forecasting systems, in which forecasts correlate better with observed variability than with their own ensemble members. According to <xref ref-type="bibr" rid="bib1.bibx18" id="text.33"/>, such a situation arises if the unpredictable component (noise) of the observed atmosphere is systematically smaller than the ensemble spread suggests, leading to forecasts being paradoxically under-confident.</p>
      <p id="d2e4586">Recent work by <xref ref-type="bibr" rid="bib1.bibx15" id="text.34"/> investigates ensemble reliability and SNP in large-ensemble subseasonal forecasts, with particular emphasis on the role of ensemble size and sampling effects in diagnosing apparent under-confidence. Specifically, they demonstrate how reliability diagnostics and SNP metrics depend sensitively on ensemble size and highlights the role of large ensembles in reducing sampling artefacts when interpreting apparent under-confidence. While their focus is primarily on the amplitude of predictable signal relative to climatological ensemble spread and the overall reliability characteristics of the forecast system, rather than on flow-dependent variability of spread fluctuations, studies like <xref ref-type="bibr" rid="bib1.bibx18" id="text.35"/> or <xref ref-type="bibr" rid="bib1.bibx15" id="text.36"/> underline the importance of carefully interpreting reliability measures in the subseasonal-to-seasonal regime. Similarly, <xref ref-type="bibr" rid="bib1.bibx24" id="text.37"/> suggest that the occurrence of SNP is equivalent to reliability diagrams exhibiting slopes greater than unity, although their reliability metric differs from our spread-error slopes. Despite methodological differences, both the SNP framework and our spread-error analysis highlight the fundamental importance of accurately capturing atmospheric variability and ensemble spread characteristics in order to interpret forecast confidence in a dynamically consistent manner.</p>
      <p id="d2e4601">While our framework does not directly quantify the absolute level of ensemble spread (which is central to the SNP), it does address whether fluctuations in spread (e.g., departures from climatology) reliably represent variations in atmospheric uncertainty. Regions identified in our study as having low inter-over-intra variability ratios also exhibit correspondingly poor spread reliability and low <inline-formula><mml:math id="M232" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula>. Such conditions might indirectly reflect scenarios favourable for the paradox, potentially arising from systematic misrepresentations of ensemble spread variability. Indeed, <xref ref-type="bibr" rid="bib1.bibx10" id="text.38"/> demonstrate that teleconnections influencing the subseasonal skill are also associated with changes in the ensemble spread, further supporting a potential linkage between slowly varying atmospheric modes, ensemble spread representation, and the SNP. A more direct analysis of how precisely variability in spread connects to the paradox remains an open question and future research explicitly bridging these concepts could help clarify their relationship.</p>
      <p id="d2e4619">Although the present manuscript emphasises low-spread situations in the context of windows of opportunity, the <inline-formula><mml:math id="M233" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:math></inline-formula> metric itself is symmetric with respect to the spread distribution and therefore reflects the reliability of spread fluctuations at both the low- and high-spread ends of the spectrum. Low-spread regimes (i.e. windows of opportunity) are dynamically as relevant as high-spread regimes, i.e., walls of adversity. High-spread states correspond to periods of enhanced intrinsic uncertainty and reduced predictability. For example, episodes of strong stratosphere-troposphere coupling during intense upward wave activity can lead to uncertainty in the evolution of the polar vortex, including whether it undergoes a sudden stratospheric warming or re-strengthening. If such uncertainty propagates downward, it may temporarily amplify tropospheric ensemble spread, representing a dynamically driven high-uncertainty regime. From a dynamical perspective, both ends of the spectrum arise from flow-dependent modulation of intrinsic forecast uncertainty by slowly evolving large-scale modes; the difference lies only in whether those modes temporarily suppress or amplify error growth.</p>
      <p id="d2e4635">In summary, the ability of an ensemble to convey reliable uncertainty forecasts depends on two questions: does the physical system provide a window of opportunity, and is the model accurate enough to detect it? Our spread-error framework shows that, over large areas of the Northern Hemisphere, slowly varying teleconnections modulate intrinsic forecast uncertainty, thereby creating the potential for flow-dependent reductions in intrinsic forecast uncertainty (windows of opportunity) that can be reliably detected with 50 to 100 members. On the other hand, regions lacking a strong influence of such slow modes remain unreliable in terms of spread-error relationship even when the ensemble size is large. While low-spread situations and associated improved forecast uncertainty may enable enhanced forecast skill, the existence of a potential for windows of opportunity does not by itself guarantee skillful forecasts, which depend on additional aspects of model performance and predictability.</p>
</sec>

      
      </body>
    <back><notes notes-type="dataavailability"><title>Data availability</title>

      <p id="d2e4642">Detailed information on the ERA5 re-analysis and S2S forecast datasets used in this study can be found in Sect. <xref ref-type="sec" rid="Ch1.S2"/>.</p>
  </notes><app-group>
        <supplementary-material position="anchor"><p id="d2e4647">The supplement related to this article is available online at <inline-supplementary-material xlink:href="https://doi.org/10.5194/wcd-7-767-2026-supplement" xlink:title="pdf">https://doi.org/10.5194/wcd-7-767-2026-supplement</inline-supplementary-material>.</p></supplementary-material>
        </app-group><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e4656">PR and JS conceptualised the idea together. PR performed the analyses of subseasonal forecasts and wrote most of the manuscript. JS performed the toy model simulations and wrote the corresponding section. TB assisted with the interpretation of results and helped to improve the manuscript.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e4662">At least one of the (co-)authors is a member of the editorial board of <italic>Weather and Climate Dynamics</italic>. The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e4671">Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.</p>
  </notes><ack><title>Acknowledgements</title><p id="d2e4677">The authors thank the Transregional Collaborative Research Center SFB/TRR 165 “Waves to Weather” funded by the German Research Foundation (DFG) for support. We further thank Hella Garny for some inspirational discussions about predictability. We thank Tim Woollings and one anonymous referee for their constructive comments during the review process, and Tim Woollings for introducing the term “walls of adversity”.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e4682">This research has been supported by the Deutsche Forschungsgemeinschaft (grant no. SFB/TRR 165).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e4689">This paper was edited by Tim Woollings and reviewed by Tim Woollings and one anonymous referee.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Baggett et al.(2017)Baggett, Barnes, Maloney, and Mundhenk</label><mixed-citation> Baggett, C. F., Barnes, E. A., Maloney, E. D., and Mundhenk, B. D.: Advancing atmospheric river forecasts into subseasonal-to-seasonal time scales, Geophys. Res. Lett., 44, 7528–7536, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Baldwin et al.(2003)Baldwin, Stephenson, Thompson, Dunkerton, Charlton, and O'Neill</label><mixed-citation> Baldwin, M. P., Stephenson, D. B., Thompson, D. W., Dunkerton, T. J., Charlton, A. J., and O'Neill, A.: Stratospheric memory and skill of extended-range weather forecasts, Science, 301, 636–640, 2003.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Blackmon et al.(1984)Blackmon, Lee, and Wallace</label><mixed-citation> Blackmon, M. L., Lee, Y., and Wallace, J. M.: Horizontal structure of 500 mb height fluctuations with long, intermediate and short time scales, J. Atmos. Sci., 41, 961–980, 1984.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Bröcker and Smith(2007)</label><mixed-citation> Bröcker, J. and Smith, L. A.: Increasing the reliability of reliability diagrams, Weather Forecast., 22, 651–661, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Fortin et al.(2014)Fortin, Abaza, Anctil, and Turcotte</label><mixed-citation> Fortin, V., Abaza, M., Anctil, F., and Turcotte, R.: Why should ensemble spread match the RMSE of the ensemble mean?, J. Hydrometeorol., 15, 1708–1713, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>Giggins and Gottwald(2019)</label><mixed-citation> Giggins, B. and Gottwald, G. A.: Stochastically perturbed bred vectors in multi-scale systems, Q. J. Roy. Meteorol. Soc., 145, 642–658, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Hersbach et al.(2020)Hersbach, Bell, Berrisford, Hirahara, Horányi, Muñoz-Sabater, Nicolas, Peubey, Radu, Schepers et al.</label><mixed-citation>Hersbach, H., Bell, B., Berrisford, P., et al.: The ERA5 global reanalysis, Q. J. Royal Meteorol. Soc. [data set], 146, 1999–2049, <ext-link xlink:href="https://doi.org/10.1002/qj.3803" ext-link-type="DOI">10.1002/qj.3803</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Hopson(2014)</label><mixed-citation> Hopson, T.: Assessing the ensemble spread–error relationship, Mon. Weather Rev., 142, 1125–1142, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Johnson et al.(2014)Johnson, Collins, Feldstein, L’Heureux, and Riddle</label><mixed-citation> Johnson, N. C., Collins, D. C., Feldstein, S. B., L’Heureux, M. L., and Riddle, E. E.: Skillful wintertime North American temperature forecasts out to 4 weeks based on the state of ENSO and the MJO, Weather Forecast., 29, 23–38, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Karpechko et al.(2025)Karpechko, Butler, and Vitart</label><mixed-citation>Karpechko, A. Yu., Butler, A. H., and Vitart, F.: Signal, noise and skill in sub-seasonal forecasts: the role of teleconnections, Weather Clim. Dynam., 6, 1661–1681, <ext-link xlink:href="https://doi.org/10.5194/wcd-6-1661-2025" ext-link-type="DOI">10.5194/wcd-6-1661-2025</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Lakatos et al.(2023)Lakatos, Lerch, Hemri, and Baran</label><mixed-citation> Lakatos, M., Lerch, S., Hemri, S., and Baran, S.: Comparison of multivariate post-processing methods using global ECMWF ensemble forecasts, Q. J. Roy. Meteorol. Soc., 149, 856–877, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Leutbecher and Palmer(2008)</label><mixed-citation> Leutbecher, M. and Palmer, T. N.: Ensemble forecasting, J. Comput. Phys., 227, 3515–3539, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>Leutbecher et al.(2024)Leutbecher, Lang, Lock, Roberts, and Tsiringakis</label><mixed-citation>Leutbecher, M., Lang, S., Lock, S.-J., Roberts, C. D., and Tsiringakis, A.: Improving the physical consistency of ensemble forecasts by using SPP in the   IFS, ECMWF, <ext-link xlink:href="https://doi.org/10.21957/mlz238dk1p" ext-link-type="DOI">10.21957/mlz238dk1p</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Mariotti et al.(2020)Mariotti, Baggett, Barnes, Becker, Butler, Collins, Dirmeyer, Ferranti, Johnson, Jones et al.</label><mixed-citation> Mariotti, A., Baggett, C., Barnes, E. A., Becker, E., Butler, A., Collins, D. C., Dirmeyer, P. A., Ferranti, L., Johnson, N. C., Jones, J., et al.: Windows of opportunity for skillful forecasts subseasonal to seasonal and beyond, B. Am. Meteorol. Soc., 101, E608–E625, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Roberts and Vitart(2024)</label><mixed-citation> Roberts, C. D. and Vitart, F.: Ensemble reliability and the signal-to-noise paradox in large-ensemble subseasonal forecasts, arXiv preprint arXiv:2411.17694, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Robertson et al.(2020)Robertson, Vigaud, Yuan, and Tippett</label><mixed-citation> Robertson, A. W., Vigaud, N., Yuan, J., and Tippett, M. K.: Toward identifying subseasonal forecasts of opportunity using North American weather regimes, Mon. Weather Rev., 148, 1861–1875, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Rupp et al.(2024)Rupp, Spaeth, Afargan-Gerstman, Büeler, Sprenger, and Birner</label><mixed-citation>Rupp, P., Spaeth, J., Afargan-Gerstman, H., Büeler, D., Sprenger, M., and Birner, T.: The impact of synoptic storm likelihood on European subseasonal forecast uncertainty and their modulation by the stratosphere, Weather Clim. Dynam., 5, 1287–1298, <ext-link xlink:href="https://doi.org/10.5194/wcd-5-1287-2024" ext-link-type="DOI">10.5194/wcd-5-1287-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Scaife and Smith(2018)</label><mixed-citation>Scaife, A. A. and Smith, D.: A signal-to-noise paradox in climate science, npj Clim. Atmos. Sci., 1, 28, <ext-link xlink:href="https://doi.org/10.1038/s41612-018-0038-4" ext-link-type="DOI">10.1038/s41612-018-0038-4</ext-link>,  2018.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Scherrer et al.(2004)Scherrer, Appenzeller, Eckert, and Cattani</label><mixed-citation> Scherrer, S. C., Appenzeller, C., Eckert, P., and Cattani, D.: Analysis of the spread–skill relations using the ECMWF ensemble prediction system over Europe, Weather Forecast., 19, 552–565, 2004.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Selz(2019)</label><mixed-citation> Selz, T.: Estimating the intrinsic limit of predictability using a stochastic convection scheme, J. Atmos. Sci., 76, 757–765, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Selz et al.(2022)Selz, Riemer, and Craig</label><mixed-citation> Selz, T., Riemer, M., and Craig, G. C.: The transition from practical to intrinsic predictability of midlatitude weather, J. Atmos. Sci., 79, 2013–2030, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Spaeth et al.(2024a)Spaeth, Rupp, Garny, and Birner</label><mixed-citation>Spaeth, J., Rupp, P., Garny, H., and Birner, T.: Stratospheric impact on subseasonal forecast uncertainty in the Northern extratropics, Commun. Earth Environ., 5, 126, <ext-link xlink:href="https://doi.org/10.1038/s43247-024-01292-z" ext-link-type="DOI">10.1038/s43247-024-01292-z</ext-link>, 2024a.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Spaeth et al.(2024b)Spaeth, Rupp, Osman, Grams, and Birner</label><mixed-citation>Spaeth, J., Rupp, P., Osman, M., Grams, C., and Birner, T.: Flow-dependence of ensemble spread of subseasonal forecasts explored via North Atlantic-European weather regimes, Geophys. Res. Lett., 51, e2024GL109733, <ext-link xlink:href="https://doi.org/10.1029/2024GL109733" ext-link-type="DOI">10.1029/2024GL109733</ext-link>, 2024b.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Strommen et al.(2023)Strommen, MacRae, and Christensen</label><mixed-citation>Strommen, K., MacRae, M., and Christensen, H.: On the Relationship Between Reliability Diagrams and the “Signal-To-Noise Paradox”, Geophys. Res. Lett., 50, e2023GL103710, <ext-link xlink:href="https://doi.org/10.1029/2023GL103710" ext-link-type="DOI">10.1029/2023GL103710</ext-link>,  2023.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Tempest et al.(2023)Tempest, Craig, and Brehmer</label><mixed-citation> Tempest, K. I., Craig, G. C., and Brehmer, J. R.: Convergence of forecast distributions in a 100,000-member idealised convective-scale ensemble, Q. J. Roy. Meteorol. Soci., 149, 677–702, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Vitart and Robertson(2018)</label><mixed-citation>Vitart, F. and Robertson, A. W.: The sub-seasonal to seasonal prediction project (S2S) and the prediction of extreme events, npj Clim. Atmos. Sci., 1, 3, <ext-link xlink:href="https://doi.org/10.1038/s41612-018-0013-0" ext-link-type="DOI">10.1038/s41612-018-0013-0</ext-link>,  2018.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Vitart et al.(2017)Vitart, Ardilouze, Bonet, Brookshaw, Chen, Codorean, Déqué, Ferranti, Fucile, Fuentes et al.</label><mixed-citation> Vitart, F., Ardilouze, C., Bonet, A., et al.: The subseasonal to seasonal (S2S) prediction project database, B. Am. Meteorol. Soc., 98, 163–173, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>Weisheimer and Palmer(2014)</label><mixed-citation>Weisheimer, A. and Palmer, T.: On the reliability of seasonal climate forecasts, J. Roy. Soc. Interf., 11, 20131162, <ext-link xlink:href="https://doi.org/10.1098/rsif.2013.1162" ext-link-type="DOI">10.1098/rsif.2013.1162</ext-link>,  2014.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>Weisheimer et al.(2019)Weisheimer, Decremer, MacLeod, O'Reilly, Stockdale, Johnson, and Palmer</label><mixed-citation> Weisheimer, A., Decremer, D., MacLeod, D., O'Reilly, C., Stockdale, T. N., Johnson, S., and Palmer, T. N.: How confident are predictability estimates of the winter North Atlantic Oscillation?, Q. J. Roy. Meteorol. Soc., 145, 140–159, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Wilks(2011)</label><mixed-citation>Wilks, D. S.: Statistical methods in the atmospheric sciences, Vol. 100, Academic press, Elsevier, 4 edn., <ext-link xlink:href="https://doi.org/10.1016/C2017-0-03921-6" ext-link-type="DOI">10.1016/C2017-0-03921-6</ext-link>,  2011.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>A spread-versus-error framework to reliably quantify the potential for subseasonal windows of forecast opportunity</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>Baggett et al.(2017)Baggett, Barnes, Maloney, and
Mundhenk</label><mixed-citation>
      
Baggett, C. F., Barnes, E. A., Maloney, E. D., and Mundhenk, B. D.: Advancing
atmospheric river forecasts into subseasonal-to-seasonal time scales,
Geophys. Res. Lett., 44, 7528–7536, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Baldwin et al.(2003)Baldwin, Stephenson, Thompson, Dunkerton,
Charlton, and O'Neill</label><mixed-citation>
      
Baldwin, M. P., Stephenson, D. B., Thompson, D. W., Dunkerton, T. J., Charlton,
A. J., and O'Neill, A.: Stratospheric memory and skill of extended-range
weather forecasts, Science, 301, 636–640, 2003.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Blackmon et al.(1984)Blackmon, Lee, and
Wallace</label><mixed-citation>
      
Blackmon, M. L., Lee, Y., and Wallace, J. M.: Horizontal structure of 500&thinsp;mb
height fluctuations with long, intermediate and short time scales, J.
Atmos. Sci., 41, 961–980, 1984.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Bröcker and Smith(2007)</label><mixed-citation>
      
Bröcker, J. and Smith, L. A.: Increasing the reliability of reliability
diagrams, Weather Forecast., 22, 651–661, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Fortin et al.(2014)Fortin, Abaza, Anctil, and
Turcotte</label><mixed-citation>
      
Fortin, V., Abaza, M., Anctil, F., and Turcotte, R.: Why should ensemble spread
match the RMSE of the ensemble mean?, J. Hydrometeorol., 15,
1708–1713, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Giggins and Gottwald(2019)</label><mixed-citation>
      
Giggins, B. and Gottwald, G. A.: Stochastically perturbed bred vectors in
multi-scale systems, Q. J. Roy. Meteorol. Soc.,
145, 642–658, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Hersbach et al.(2020)Hersbach, Bell, Berrisford, Hirahara,
Horányi, Muñoz-Sabater, Nicolas, Peubey, Radu, Schepers
et al.</label><mixed-citation>
      
Hersbach, H., Bell, B., Berrisford, P., et al.: The ERA5 global reanalysis, Q. J. Royal
Meteorol. Soc. [data set], 146, 1999–2049, <a href="https://doi.org/10.1002/qj.3803" target="_blank">https://doi.org/10.1002/qj.3803</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Hopson(2014)</label><mixed-citation>
      
Hopson, T.: Assessing the ensemble spread–error relationship, Mon. Weather
Rev., 142, 1125–1142, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Johnson et al.(2014)Johnson, Collins, Feldstein, L’Heureux, and
Riddle</label><mixed-citation>
      
Johnson, N. C., Collins, D. C., Feldstein, S. B., L’Heureux, M. L., and
Riddle, E. E.: Skillful wintertime North American temperature forecasts out
to 4 weeks based on the state of ENSO and the MJO, Weather Forecast.,
29, 23–38, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Karpechko et al.(2025)Karpechko, Butler, and
Vitart</label><mixed-citation>
      
Karpechko, A. Yu., Butler, A. H., and Vitart, F.: Signal, noise and skill in sub-seasonal forecasts: the role of teleconnections, Weather Clim. Dynam., 6, 1661–1681, <a href="https://doi.org/10.5194/wcd-6-1661-2025" target="_blank">https://doi.org/10.5194/wcd-6-1661-2025</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Lakatos et al.(2023)Lakatos, Lerch, Hemri, and
Baran</label><mixed-citation>
      
Lakatos, M., Lerch, S., Hemri, S., and Baran, S.: Comparison of multivariate
post-processing methods using global ECMWF ensemble forecasts, Q.
J. Roy. Meteorol. Soc., 149, 856–877, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Leutbecher and Palmer(2008)</label><mixed-citation>
      
Leutbecher, M. and Palmer, T. N.: Ensemble forecasting, J.
Comput. Phys., 227, 3515–3539, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Leutbecher et al.(2024)Leutbecher, Lang, Lock, Roberts, and
Tsiringakis</label><mixed-citation>
      
Leutbecher, M., Lang, S., Lock, S.-J., Roberts, C. D., and Tsiringakis, A.:
Improving the physical consistency of ensemble forecasts by using SPP in the   IFS, ECMWF, <a href="https://doi.org/10.21957/mlz238dk1p" target="_blank">https://doi.org/10.21957/mlz238dk1p</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Mariotti et al.(2020)Mariotti, Baggett, Barnes, Becker, Butler,
Collins, Dirmeyer, Ferranti, Johnson, Jones et al.</label><mixed-citation>
      
Mariotti, A., Baggett, C., Barnes, E. A., Becker, E., Butler, A., Collins,
D. C., Dirmeyer, P. A., Ferranti, L., Johnson, N. C., Jones, J., et al.:
Windows of opportunity for skillful forecasts subseasonal to seasonal and
beyond, B. Am. Meteorol. Soc., 101, E608–E625,
2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Roberts and Vitart(2024)</label><mixed-citation>
      
Roberts, C. D. and Vitart, F.: Ensemble reliability and the signal-to-noise
paradox in large-ensemble subseasonal forecasts, arXiv preprint
arXiv:2411.17694, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Robertson et al.(2020)Robertson, Vigaud, Yuan, and
Tippett</label><mixed-citation>
      
Robertson, A. W., Vigaud, N., Yuan, J., and Tippett, M. K.: Toward identifying
subseasonal forecasts of opportunity using North American weather regimes,
Mon. Weather Rev., 148, 1861–1875, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Rupp et al.(2024)Rupp, Spaeth, Afargan-Gerstman, Büeler,
Sprenger, and Birner</label><mixed-citation>
      
Rupp, P., Spaeth, J., Afargan-Gerstman, H., Büeler, D., Sprenger, M., and Birner, T.: The impact of synoptic storm likelihood on European subseasonal forecast uncertainty and their modulation by the stratosphere, Weather Clim. Dynam., 5, 1287–1298, <a href="https://doi.org/10.5194/wcd-5-1287-2024" target="_blank">https://doi.org/10.5194/wcd-5-1287-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Scaife and Smith(2018)</label><mixed-citation>
      
Scaife, A. A. and Smith, D.: A signal-to-noise paradox in climate science, npj
Clim. Atmos. Sci., 1, 28, <a href="https://doi.org/10.1038/s41612-018-0038-4" target="_blank">https://doi.org/10.1038/s41612-018-0038-4</a>,  2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Scherrer et al.(2004)Scherrer, Appenzeller, Eckert, and
Cattani</label><mixed-citation>
      
Scherrer, S. C., Appenzeller, C., Eckert, P., and Cattani, D.: Analysis of the
spread–skill relations using the ECMWF ensemble prediction system over
Europe, Weather Forecast., 19, 552–565, 2004.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Selz(2019)</label><mixed-citation>
      
Selz, T.: Estimating the intrinsic limit of predictability using a stochastic
convection scheme, J. Atmos. Sci., 76, 757–765, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Selz et al.(2022)Selz, Riemer, and Craig</label><mixed-citation>
      
Selz, T., Riemer, M., and Craig, G. C.: The transition from practical to
intrinsic predictability of midlatitude weather, J. Atmos.
Sci., 79, 2013–2030, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Spaeth et al.(2024a)Spaeth, Rupp, Garny, and
Birner</label><mixed-citation>
      
Spaeth, J., Rupp, P., Garny, H., and Birner, T.: Stratospheric impact on
subseasonal forecast uncertainty in the Northern extratropics, Commun.
Earth Environ., 5, 126, <a href="https://doi.org/10.1038/s43247-024-01292-z" target="_blank">https://doi.org/10.1038/s43247-024-01292-z</a>, 2024a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Spaeth et al.(2024b)Spaeth, Rupp, Osman, Grams, and
Birner</label><mixed-citation>
      
Spaeth, J., Rupp, P., Osman, M., Grams, C., and Birner, T.: Flow-dependence of
ensemble spread of subseasonal forecasts explored via North Atlantic-European
weather regimes, Geophys. Res. Lett., 51, e2024GL109733, <a href="https://doi.org/10.1029/2024GL109733" target="_blank">https://doi.org/10.1029/2024GL109733</a>,
2024b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Strommen et al.(2023)Strommen, MacRae, and
Christensen</label><mixed-citation>
      
Strommen, K., MacRae, M., and Christensen, H.: On the Relationship Between
Reliability Diagrams and the “Signal-To-Noise Paradox”, Geophys.
Res. Lett., 50, e2023GL103710, <a href="https://doi.org/10.1029/2023GL103710" target="_blank">https://doi.org/10.1029/2023GL103710</a>,  2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Tempest et al.(2023)Tempest, Craig, and
Brehmer</label><mixed-citation>
      
Tempest, K. I., Craig, G. C., and Brehmer, J. R.: Convergence of forecast
distributions in a 100,000-member idealised convective-scale ensemble,
Q. J. Roy. Meteorol. Soci., 149, 677–702, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Vitart and Robertson(2018)</label><mixed-citation>
      
Vitart, F. and Robertson, A. W.: The sub-seasonal to seasonal prediction
project (S2S) and the prediction of extreme events, npj Clim.
Atmos. Sci., 1, 3, <a href="https://doi.org/10.1038/s41612-018-0013-0" target="_blank">https://doi.org/10.1038/s41612-018-0013-0</a>,  2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Vitart et al.(2017)Vitart, Ardilouze, Bonet, Brookshaw, Chen,
Codorean, Déqué, Ferranti, Fucile, Fuentes
et al.</label><mixed-citation>
      
Vitart, F., Ardilouze, C., Bonet, A., et al.: The
subseasonal to seasonal (S2S) prediction project database, B.
Am. Meteorol. Soc., 98, 163–173, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Weisheimer and Palmer(2014)</label><mixed-citation>
      
Weisheimer, A. and Palmer, T.: On the reliability of seasonal climate
forecasts, J. Roy. Soc. Interf., 11, 20131162, <a href="https://doi.org/10.1098/rsif.2013.1162" target="_blank">https://doi.org/10.1098/rsif.2013.1162</a>,  2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Weisheimer et al.(2019)Weisheimer, Decremer, MacLeod, O'Reilly,
Stockdale, Johnson, and Palmer</label><mixed-citation>
      
Weisheimer, A., Decremer, D., MacLeod, D., O'Reilly, C., Stockdale, T. N.,
Johnson, S., and Palmer, T. N.: How confident are predictability estimates of
the winter North Atlantic Oscillation?, Q. J. Roy.
Meteorol. Soc., 145, 140–159, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Wilks(2011)</label><mixed-citation>
      
Wilks, D. S.: Statistical methods in the atmospheric sciences, Vol. 100,
Academic press, Elsevier, 4 edn., <a href="https://doi.org/10.1016/C2017-0-03921-6" target="_blank">https://doi.org/10.1016/C2017-0-03921-6</a>,  2011.

    </mixed-citation></ref-html>--></article>
