Stationary Waves and Upward Troposphere-Stratosphere Coupling in S2S Models

The simulated Northern Hemisphere stationary wave (SW) field is investigated in 11 subseasonal-to-seasonal (S2S) models. It is shown that while most models considered can well-simulate the stationary wavenumbers 1 and 2 during the first two weeks of integration, they diverge from observations following week 3. Those models with a poor resolution in the stratosphere struggle to simulate the waves, both in the troposphere and the stratosphere, even during the first two weeks, and 5 biases extend from the troposphere all the way up to the stratosphere. Focusing on the tropospheric regions where SWs peak in amplitude reveals that the models generally do a better job in simulating the Northwest Pacific stationary trough, while certain models struggle to simulate the stationary ridges both in Western North America and the North Atlantic. In addition, a strong relationship is found between regional biases in the stationary height field and model errors in simulated upward propagation of planetary waves into the stratosphere. In the stratosphere, biases mostly are in wave-2 in those models with high stratospheric 10 resolution, whereas in those models with low resolution in the stratosphere, a wave-1 bias is evident, which leads to a strong bias in the stratospheric mean zonal circulation due to the predominance of wave-1 there. Finally, biases in both amplitude and location of mean tropical convection and the subsequent subtropical downwelling, are identified as possible contributors to biases in the regional SW field in the troposphere.

to observed variability during the duration of their forecast. Previous works have used these models to assess their skill in capturing both tropospheric and stratospheric teleconnections that lead to polar vortex variability (Schwartz and Garfinkel, 2020;Garfinkel et al., 2018Garfinkel et al., , 2019. Specifically, Schwartz and Garfinkel (2020) investigated biases in SWs in 5 subseasonal forecast models to examine their SW pattern biases and their implications for upward coupling resulting from intraseasonal tropical 60 variability in the troposphere. They found that the NCEP, ECMWF and UKMO models have realistic SW patterns, particularly during the first week of reforecast, while biases in the CMA and BoM models are more pronounced throughout the run. They also demonstrate the importance of realistic simulated SWs for upward coupling, with the models with better simulated SWs, in particular over the North Pacific (NP) region, also simulating a more realistic upward coupling in response to subseasonal tropical variability.
In this work, we examine the fidelity of the simulated SWs both in the troposphere and stratosphere, and the implications for upward coupling and the stratospheric mean state, as represented in 11 subseasonal forecast models, and for three models we consider two distinct versions. Furthermore, we investigate the tropical biases that potentially contribute to the extratropical SWs biases in these models. 70 This paper is organized as follows. The data and methods used for the analysis are described in section 2. In section 3, the fidelity of the Northern Hemisphere tropospheric SWs is examined in subseasonal forecast models, with an emphasis put on regions where SWs are of large amplitude. In section 4, the SWs biases in the models are discussed in the context of troposphere-stratosphere coupling and the simulated mean stratospheric circulation. In section 5, the possibility that biases in 75 regional SWs in the extratropics emanate from model biases in the distribution of tropical convection is considered. Finally, conclusions and a discussion are in section 6.

Data and Methods
The fidelity of the stationary waves is examined in models that have contributed to the S2S Prediction project (Vitart et al., 2017). We include all eleven modeling centers -the Australian Bureau of Meteorology (BoM), the European Centre for Meteo France (CNRM). A summary of the reforecast availability, ensemble size, and selected properties of the vertical resolu-85 tion is presented in table 1. Throughout the paper, we look at the ensemble mean rather than individual ensemble members.
For the UKMO, we downloaded hindcasts for the operational model in use during 2015 and the winter of 2019/2020, for the ECMWF, we downloaded data for the model version in use during 2016 and the winter of 2019/2020 (CY41R1/CY41R2 and CY46R1), and for the CNRM we downloaded data of model versions 2014 and 2019. For the ECMWF model, we use only one reforecast each week, and for the NCEP model we only downloaded 9 reforecasts each month, for consistency with the 90 data availability for the other models. These various models differ in the quality of their representation of the stratosphere (Domeisen et al., 2020a): the stratosphere is less well resolved in CMA, CNR-ISAC, HMCR, and BoM as compared to the other models (Table 1). We consider reforecasts initialized in November through February and assess the stationary waves as a function of forecasted week. Note that each modeling center has made available reforecasts from different years and the reforecast initialization dates differ among the models even for a given year. 95 We define the stationary waves by first computing the time mean geopotential height over intializations during November-December-January-February (NDJF) for each model and forecast week, and then subtract off the zonal mean height at each latitude. For meridional eddy heat flux, we calculate the anomalies from the zonal mean of daily meridional wind, v, and temperature, T, separately, then multiply them, and average over all initializations in NDJF to obtain v * T * , where overbar denotes the zonal mean and asterisk denotes deviations from the zonal-mean. We do not explicitly filter out short-time variability, al-100 though these waves often take non-stationary features, however, by averaging over many initializations and over week-periods, the non-stationary features are filtered out. The wavenumber components of the geopotential height and meridional heat flux fields are obtained by performing a Fourier decomposition and cross-spectrum respectively. Each field is averaged over weekperiods, thus we look at weekly time leads from initialization.
Diabatic heating is not available as a standard output in the S2S archive. The Lagrangian pressure tendency (i.e. vertical veloc-105 ity on pressure coordinates, or ω) is available for all models, and we use ω at 500hPa as a proxy for convection. Most models make available outgoing longwave radiation (OLR), and the biases in OLR resemble those for ω.

Fidelity of Tropospheric Stationary Waves in Subseasonal Models
We begin our analysis with the fidelity of the full mid-tropospheric stationary wave field in the models. Figure 1 shows the 500 hPa eddy geopotential height field in the NCEP model, the corresponding days in ERA-I during weeks 1, 3, 5 and 6 of 110 integration, and the model biases.
During the first week, biases in the stationary wave field are small over the mid-high latitudes. However, during week 3, biases are already developed, with a strong positive and negative biases over the Northwest and Northeast Pacific, respectively. In addition, a negative bias develops over the Northwestern Atlantic. Following week 3, biases in the North Pacific sector strengthen in magnitude, while those over the North Atlantic slightly strengthen in magnitude and strongly project onto the negative node As seen in figure 2, the mid-tropospheric SW biases in the models, especially those which perform poorly, are mostly confined to three key regions in which the observed SW amplitude is relatively large. Those regions are the Northwest Pacific In the Northwest Pacific, about half of the models are already biased one week after initialization, and the bias persists through- While the biases in the Northwest Pacific are less than 10% of the observed trough for all models, biases are relatively larger in the other regions. In Western North America, the spread among models is relatively small by the end of week 1, although the bias of the simulated predominant stationary ridge over this region becomes larger at later lags (figure 3b). The BoM and 140 ISAC models stand out as very poorly performing during all weeks following week 1. Both either strongly overestimate (BoM) or underestimate (ISAC) the ridge. The JMA model and both versions of the ECMWF fail to simulate the persistence of the observed ridge, and after week 1, its simulated amplitude decreases. Similarly to its realistically simulated trough in the NW Pacific, the ECCC model is consistent in its simulated ridge in Western North America, despite a slight positive bias in week 2. All other models diverge from observations in weeks 2-3, with a weaker than observed ridge.
145 Figure 3c indicates that all models underestimate the strength of the upper-tropospheric North Atlantic stationary ridge during NDJF. In particular, the CNRM, HMCR, BoM and CMA models poorly simulate the ridge at all lags following week 1, and although other models better perform in the North Atlantic region, most of them also simulate a weaker ridge than that observed during all weeks after week 1. The only exception is the NCEP model that succeeds to simulate the ridge with a very small bias. These results are somewhat consistent with those that have been shown in previous works, such as Garfinkel et al.

150
(2020) and Park and Lee (2021), who demonstrate that GCMs struggle to produce a realistic and strong enough SW in the North Atlantic upper-troposphere.
Overall, the models are more biased over Western North America and over the North Atlantic compared to the Northwest Pacific region beyond week 2-3, and in some cases, even beyond week 1. . The BoM has a biased wave-2 in the lower stratosphere, while in the mid-stratosphere it is mostly biased towards wave-1 (figure S5d-i). In the models that struggle to simulate a realistic wave-2 175 structure, the simulated wave-1, which is predominant in the stratosphere, is still reasonably simulated, therefore, the simulated mean circulation is not expected to be strongly biased. This stands in contrast to the BoM and CMA models that have strongly biased wave-1 in the stratosphere.
To better demonstrate the effect of the regional biases in the troposphere impact the planetary-scale mean NH circulation, 180 which is particularly relevant for the stratospheric mean state, we show in figure 6 different relationships between the S2S models biases in the regional SW field and the amplitude of the stationary wave-1 and wave-2, obtained by Fourier decomposition. , it is evident that models with a weaker than observed Northwest Pacific trough also have too-weak of a wave-1 (r=-0.66). Over the North Atlantic, a shallower ridge is associated with a biased too weak wave-1 with a correlation of 185 r=0.55 (figure 6b). Figure 6c contrasts the magnitude of the wave-1 component of eddy height with wave-1 heat flux at 500 hPa, and indicates a strong connection (r=0.91), as expected from Garfinkel et al. (2010). Figure 6d,e show also a strong connection between mid-tropospheric wave-2 biases and regional biases over Western North America (r=0.73) and over Western Eurasia upward coupling of wave-2 has been stressed in previous works such as Garfinkel et al. (2019) and Karpechko et al. (2018). In both works, an error in the simulated West Eurasian ridge in S2S models, is linked to a weaker simulated upward propagation of planetary wave activity, and as a result, erroneous stratospheric variability. Figure 6g demonstrates this connection, with biases over Western Eurasia significantly linked to biases in upward propagation of planetary wave-2 activity, indicated by biases in mid-tropospheric wave-2 meridional eddy heat flux (r=0.65).

195
The connection of biases in the mid-troposphere to biases in the lower stratospheric planetary wave activity is shown in figures 6h,i, for wave-1 and wave-2, respectively. In both cases, models that are more biased in their simulated wave-1 and wave-2 in the mid-troposphere, are also more biased in their upward coupling of wave-1 (r=0.75) and wave-2 (r=0.78).
In general, the connection between the regional biases and the planetary wave-1 and wave-2 is strong across all regions. In particular, biases in regional mid-tropospheric SWs are closely related to biases in wave-1 and wave-2 meridional eddy heat 200 flux, which in turn, lead to a biased wave activity that enter the stratosphere.
The wintertime stratospheric polar vortex in the NH is weaker than its Southern Hemisphere counterpart. This is because of the time mean stationary waves, and in particular the wavenumber 1 and 2 components that originate in the troposphere.
Therefore, a key ingredient for a realistic stratospheric mean state is realistic eddy fluxes forced by stationary waves in the 205 troposphere. Figure 7 shows the time evolution of meridional heat flux of wavenumbers 1 and 2 and stratospheric polar vortex, both for the models (solid lines) and the corresponding days in reanalysis (dots) during NDJF. We examine both the midtropospheric and lower stratospheric meridional eddy heat flux over the mid-latitudes (40 • N-80 • N), v * T * , which is used as proxy for upward coupling between the troposphere and the stratosphere.
In the mid-troposphere, wave-1 v * T * is better simulated by NCEP, UKMO (both versions), ECCC and the KMA models. Both In the lower stratosphere, biases in v * T * are noticeable in most models. For wave-1, both the two versions of the UKMO and ECMWF models decently simulate wave-1 throughout the run, while the KMA model has slightly weaker wave-1 signal than that observed during weeks 2-4, but then it simulates the amplitudes very well. Nevertheless, other models struggle to simulate a realistic wave-1 throughout the run. The NCEP model largely overestimates the amplitude of wave-1 following week 2, while The BoM, ISAC and HMCR have the most poorly simulated wave-2, with too weak amplitudes after week 1, though their negative bias is not as large as in wave-1. The CMA has a decently simulated amplitude of wave-2 during the first 6 weeks of the integration, despite its poorly simulated wave-2 amplitude in the mid-troposphere. In contrast to its overly strong wave-1, the simulated wave-2 in the NCEP model is relatively realistic.
As mentioned earlier, during winter, the presence of stationary waves 1 and 2 in the stratosphere is the reason for a perturbed 230 and weakened vortex, which enables transients to propagate into the stratosphere and perturb the vortex even further. Therefore, a realistic simulated mean state vortex is an important factor for upward wave propagation and vortex daily variability. Figure   7e shows CNRM-2014, however, is considerably biased towards a stronger vortex than that observed. Interestingly, the ECCC is unable to simulate a realistic polar vortex after the first week, despite its decently simulated waves 1-2 both in the mid-troposphere and lower stratosphere.
Overall, the CMA, HMCR, ISAC and BoM models, i.e, those with a low resolution stratosphere (the latter three are also low-top within the stratosphere), struggle to simulate realistic v * T * , particularly in the stratosphere, which may be a result of 245 poorly represented upward coupling. This, in turn, affects the stratospheric mean state zonal circulation, and as a consequence, temporal variability is too weak, as demonstrated in previous works (Domeisen et al., 2020b;Schwartz and Garfinkel, 2020). Despite its high model-top and stratospheric resolution, the NCEP and CNRM-2019 models have a too weak vortex strength during winter, which is a result of too strong wave-1 and wave-2 respectively, signal in the lower stratosphere. As noted in section 1, the biases in the stationary wave pattern, both in the troposphere and the stratosphere, may be a result of a poorly represented time-mean tropical convection. More specifically, deep convection in the Tropical Western Pacific may be one of the drivers that directly (Hoskins and Karoly, 1981;Jin and Hoskins, 1995) and indirectly (Simmons et al., 1983) forms the PNA pattern. Figures 4 and 5 show the time-mean pressure velocity, ω, and zonal winds during week 3 of the integration in Next, we examine the regional biases in the tropics that possibly have an impact on the extratropical mean state. Figure 8 shows  (Hoskins and Ambrizzi, 1993;Jin and Hoskins, 1995). Panel b shows that biases 270 in the models, in large, are confined to these regions, and extend to the Eastern Pacific as well. The performance of the models in simulating realistic mean tropical convection is not the goal of this paper, however the divergent wind and subtropical convergence and downwelling in the Northern Hemisphere associated with this tropical upwelling can help seed Rossby waves (Sardeshmukh and Hoskins, 1988). Therefore, we focus now on how the biases in three regions within the tropics and subtropics are connected to the simulation of SWs in the key regions in the extratropics.
275 Figure 9 summarises the relationships between NDJF 500 hPa ω biases in the subtropical Western Pacific, tropical Eastern Pacific and the Caribbean, and the extratropical mid-tropospheric stationary height biases in Northwest Pacific, Western North America and North Atlantic, respectively, during week 3 of integration. Note that following Sardeshmukh and Hoskins (1988), Rossby wave sources (RWSs) are formed in regions of downwelling (and upper tropospheric convergence). Therefore, we examine subtropical regions where downwelling is considerably strong, as a result of tropical convection in the deep tropics.

280
One exception is the deep tropical Eastern Pacific, which is located in the descending branch of the Walker Cell, therefore is closer to the equator. The Northern hemisphere SWs play a major role in regulating the equator-to-pole temperature gradient, as well as modulating the storm tracks and large-scale circulation both in the troposphere and stratosphere. In particular, realistically simulated SWs by global-circulation models used both for future climate projections and medium-range forecasts is crucial for accurate weather and climate surface predictions, which have significant and potentially life-saving, societal and economic implications.
In this work, we investigated the how well tropospheric and stratospheric planetary SWs are simulated in 11 operational sub-295 seasonal models during NDJF. We showed that biases in the troposphere evolve with lead time, and are pronounced already by week 2-3. We showed that nearly all models have like-signed biases over the North Pacific, Western North America and the North Atlantic regions, regions where the amplitude of SWs peaks, and nearly all models simulate too weak a SW in these regions. More specifically, it has been shown that the biases in the North Pacific sector are already evident from week 1, both in amplitude and phase. However, they are smaller compared to those over Western North America and North Atlantic regions; 300 specifically some models suffer from biases of 50% or more in the Northwest North American and North Atlantic ridge.
We also considered SWs biases in the context of troposphere-stratosphere upward coupling, with a particular emphasis on wave-1 and wave-2 which are mostly relevant to upward propagation of waves into the stratosphere. In the troposphere, the models vary in their ability to simulate wave-1 meridional eddy heat flux, while wave-2 is at least decently simulated by most of them, with the KMA model the most realistic in simulated wave-1 and wave-2. Next, we showed that a connection exists be-305 tween regional biases of SWs in the troposphere and upward propagation of planetary waves, further stressing the importance of realistically simulated SWs in key regions in the mid-latitude troposphere.
In the stratosphere, it is demonstrated that the models with finer vertical resolution in the stratosphere and high model-top generally do a better job in simulating the stratospheric mean state, agreeing with previous works. Interestingly, models with less a complex stratosphere have a more biased wave-1, while those with a finer resolution are more biased in their simulated 310 wave-2. It is not clear if this difference might be a result of small sample size, or alternately a systematic effect of stratosphere complexity in the models. Nevertheless, a more complex stratosphere is only one ingredient for a realistically mean-state stratosphere as demonstrated in the case of the NCEP model, which is positively biased in its wave-1 meridional heat flux in the lower stratosphere, and as a result, its simulated time-mean polar vortex is too weak. This is also shown in Butler et al. (2016) who showed that of the models that participated in the Climate System Historical Forecast Project with a higher complexity 315 stratosphere are not necessarily better skilled in forecasting seasonal sea-level pressure over the North Pole regions in winters when no SST anomalies are present in the tropical Pacific.
Therefore, further work that integrates both the models' mean state and idealized modeling has to be done in order to better identify sources for SWs biases in the S2S models.
Finally, we also considered whether biases in the tropics and subtropics may play a role in the simulated tropospheric mid-320 latitude SW field in the models. Large biases in the models are evident in oceanic basins immediately to the south and west of the strongest features of the SW pattern, namely in the the Western Pacific, Eastern Pacific and Caribbean. We examined whether regional biases in SWs are related to a biased subtropical (and tropical in the case of the Eastern Pacific) down-welling, and found a significant connection in the models between a biased downwelling in the subtropical Western Pacific and Caribbean and biases in the Northwest Pacific trough and Northwest Atlantic. In addition, a significant relationship exists 325 between downwelling in the tropical Eastern Pacific, associated with the descending branch of the Walker Cell, and the Western North America ridge.
Our results demonstrate the importance of accurately simulated SWs for both the troposphere and stratosphere mean state in the models. Most subseasonal models show little consistency in their simulated SWs at lead times longer than 1-2 weeks, which may enhance forecast errors on subseasonal timescales, particularly those induced by the stratospheric variability.

330
Overall, it is shown that a well-simulated SWs field in the troposphere is at least a necessary ingredient for a realistically simulated stratospheric mean circulation, which in turn, enables more accurate temporal variability and surface predictability on subseasonal timescales.

Acknowledgements
C. I. G., C. S. and W.C. are supported by the ISF-NSFC joint research program (grant No.3259/19), D.D and P. Y. are supported 335 by the Swiss National Science Foundation through projects PP00P2_170523 and PP00P2_198896. This work is based on S2S data. S2S is a joint initiative of the World Weather Research Programme (WWRP) and the World Climate Research Programme (WCRP). The original S2S database is hosted at ECMWF as an extension of the TIGGE database and can be downloaded from the ECMWF server http://apps.ecmwf.int/datasets/data/s2s/levtype=sfc/type=cf/. We sincerely thank Ian P. White for the meaningful discussion throughout the work.

340
Correspondence and requests for data should be addressed to C. I. G. (email: chaim.garfinkel@mail.huji.ac.il).