Articles | Volume 7, issue 2
https://doi.org/10.5194/wcd-7-787-2026
© Author(s) 2026. This work is distributed under the Creative Commons Attribution 4.0 License.
Impacts of tropical forecast errors on two extreme precipitation events: insights from relaxation experiments using machine-learning weather prediction models
Download
- Final revised paper (published on 19 May 2026)
- Preprint (discussion started on 14 Jan 2026)
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
-
RC1: 'Comment on egusphere-2026-35', Yannick Peings, 17 Feb 2026
- AC1: 'Reply on RC1', Siyu Li, 02 Mar 2026
- RC2: 'Comment on egusphere-2026-35', Anonymous Referee #2, 26 Feb 2026
Peer review completion
AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload
AR by Siyu Li on behalf of the Authors (25 Mar 2026)
Author's response
Author's tracked changes
Manuscript
ED: Referee Nomination & Report Request started (03 Apr 2026) by Daniela Domeisen
RR by Yannick Peings (10 Apr 2026)
RR by Anonymous Referee #2 (13 Apr 2026)
ED: Publish subject to technical corrections (13 Apr 2026) by Daniela Domeisen
AR by Siyu Li on behalf of the Authors (21 Apr 2026)
Author's response
Manuscript
The paper examines the week 3-4 prediction skill of two machine learning weather prediction (MLWP) models for two climate events that brought significant precipitation over California in 2022/23 (December-January 2022/23 and February-March 2023). The two MLWP, NeuralGCM and Pangu-Weather, are compared to a traditional S2S General Circulation Model (GCM), UFS. The authors use a relaxation technique (nudging) to impose observed atmospheric variability in the tropics in a set of ensemble reforecasts, that they compare to the original reforecasts of the two climate events. They find that imposing accurate tropical variability largely improves the prediction skill of the North Pacific atmospheric circulation and associated moisture flux at week 3-4 lead time, especially for the December case study. This is true for both the two MLWP models and UFS, with comparable physical mechanisms that lead to the improvement (Rossby wave sources in the subtropics). This demonstrates that improved S2S prediction in the tropics would induce a higher prediction skill of such precipitation events in the mid-latitudes, and also that the new generation of MLWP models exhibit comparable skill and mechanisms as the traditional physics-based forecast models when such tropical relaxation techniques is used (for much lower computational costs). The prediction skill of the two MLWP models is in fact slightly higher than UFS for the two case studies, without and with tropical relaxation, but as noted by the authors, a more robust comparison of prediction skill would require a more systematic evaluation over a greater number of cases.
The paper is a nice contribution to the field of S2S prediction, it is clear and well-written. However there is room for improvement, and I have some comments and suggestions listed herebelow.
1) l. 28, when discussing the potential for S2S prediction using MLWP models, some references are missing to reflect what has been done already. For instance, the two following papers are relevant references to include as they discuss and demonstrate the advance of S2S forecast skill using these models.
Weyn, J. A., Durran, D. R., Caruana, R., & Cresswell-Clay, N. (2021). Sub-seasonal forecasting with a large ensemble of deep-learning weather prediction models. Journal of Advances in Modeling Earth Systems, 13(7), e2021MS002502. https://doi.org/10.1029/2021ms002502
Chen, L., Zhong, X., Li, H., Wu, J., Lu, B., Chen, D., et al. (2024). A machine learning model that outperforms conventional global subseasonal forecast models. Nature Communications, 15(1), 6425. https://doi.org/10.1038/s41467-024-50714-1
2) l. 85 : “Sea surface temperature are prescribed from ERA5.” Can you detail here? Do you maintain SST anomalies from initialization (persistent SST anomalies)?
3) l. 91: “This leads to its significantly lower computational resource requirements compared to the other two models of this study.” Could you give an rough estimate of each MLWP model’s computational cost here, relative to UFS?
4) Section 2.3 : it sounds like the daily anomalies for the models are calculated from the ERA5 daily climatology. Ideally the model anomalies should be calculated using the model daily climatology, but this requires a set of hindcasts over a sufficient long period. I do not think that using model climatology would significantly change the results, but this should be mentioned for transparency.
5) l. 125, it is unclear what the “model replay” experiment is used for in the study.
6) Section 3.1: the December case study has also been highlighted in our recent paper (Peings et al. 2026), as a window of opportunity for S2S forecasting. The three models used in our study (two MLWP models and the ECMWF S2S model) exhibit good prediction skill for this period at week 2 as shown in the paper, but we also found good skill for week 3 and more generally for the week 2-4 window. We also preformed a sensitivity study with one of the MLWP model to demonstrate that the skill was coming from the tropics. I think this paper is worth being cited because it aligns with the result presented here.
Peings, Y., Dong, C., Mahesh, A., Pritchard, M., Collins, W., & Magnusdottir, G. (2026). Subseasonal forecasting and MJO teleconnections in machine learning weather prediction models. Journal of Geophysical Research: Atmospheres, 131, e2025JD044910. https://doi.org/10.1029/2025JD044910
7) The section about the physical mechanism leading to more skillful predictions for the two case studies would benefit from being developed. The RWS anomalies of Fig. 3 and Fig. 5 are noisy and they are not very explicit. I think it would be interesting to see how they bridge the tropics with the extratropics. I.e., showing the Rossby wave associated with it, maybe at different lead times (week 1, 2 and 3) to show its development. You could also show how the deep convection anomalies in the tropics differ in CRL versus NTR in function of time, maybe using a Hovmoller plot (time in function of longitude) which would reveal how MJO propagation changes with nudging and makes for a more accurate teleconnection. The paper only includes 6 figures so there is room for a couple figures further detailing the tropics-extratropics teleconnection leading to improved skill in the North Pacific/North America sector (especially for the December case).
8) In conclusion, when stating that “However, drawing more definitive conclusions will require a systematic evaluation over multiple years and similar events to assess the generalization of these results”, it should be mentioned that a systematic evaluation of the S2S forecast skill for the North Pacific/Western North America region has been done for NeuralGCM (Peings et al. 2026). The study shows that two MLWP models (SFNO-HENS and NeuralGCM) exhibit comparable S2S skill to ECWMF for the case of the MJO and North Pacific atmospheric patterns during the October-March season.
9) l. 296: “This suggests that a better representation of the tropical atmospheric state in the models would have improved the prediction of this particular event”. The conclusion would benefit from a discussion of how the MLWP models have the potential to improve prediction skill in the tropics, and consequently in the mid-latitudes (if they do).
Do the authors anticipate that S2S forecast skill will improve with future developments in both traditional dynamical models and machine-learning weather prediction (MLWP) systems? Or does the current similarity in S2S skill between MLWP models and GCMs indicate an intrinsic predictability limit of the climate system that may be difficult to surpass?
Nudging simulations such as those presented in the paper are valuable for investigating mechanisms and tracing potential sources of predictability for specific events. However, do we realistically expect S2S forecasts in the tropics to become sufficiently accurate to substantially improve prediction skill in the mid-latitudes? I know that is the million-dollar question bit it would be worthwhile to address it in the conclusion to place the results in a broader predictability context.