Articles | Volume 6, issue 4
https://doi.org/10.5194/wcd-6-1147-2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.Estimating return periods for extreme events in climate models through Ensemble Boosting
Download
- Final revised paper (published on 21 Oct 2025)
- Preprint (discussion started on 21 Feb 2025)
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
- RC1: 'Comment on egusphere-2025-525', Cristian Martinez-Villalobos, 27 Mar 2025
- RC2: 'Comment on egusphere-2025-525', Anonymous Referee #2, 31 Mar 2025
- AC1: 'Comment on egusphere-2025-525', Luna Bloin-Wibe, 13 Jun 2025
Peer review completion
AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload
AR by Luna Bloin-Wibe on behalf of the Authors (19 Jun 2025)
Author's response
Author's tracked changes
Manuscript
ED: Referee Nomination & Report Request started (25 Jun 2025) by Roberto Rondanelli
RR by Cristian Martinez-Villalobos (11 Jul 2025)

RR by Anonymous Referee #2 (05 Aug 2025)

ED: Publish subject to minor revisions (review by editor) (02 Sep 2025) by Roberto Rondanelli

AR by Luna Bloin-Wibe on behalf of the Authors (02 Sep 2025)
Author's response
Author's tracked changes
Manuscript
ED: Publish as is (03 Sep 2025) by Roberto Rondanelli

AR by Luna Bloin-Wibe on behalf of the Authors (08 Sep 2025)
This paper introduces a new framework to estimate return periods of rare climate extremes using ensemble boosting and conditional probability theory. The technique enhances the sampling of extreme events through targeted perturbations, thereby improving return period estimates without requiring prohibitively long control runs. The method is carefully developed and applied to CESM2 under both stationary and transient conditions, including an application to the 2021 Pacific Northwest heatwave.
The manuscript is clearly written and proposes a promising and computationally efficient approach. That said, several assumptions and empirical decisions underlie the method, and their implications for robustness and generalizability are not fully explored. I believe the paper could make a strong contribution after revisions addressing the following points
Main Comments
Assumptions in the estimator
The validity of the boosting estimator depends on assumptions that may deserve further testing or clarification:
Methodological choices and tuning
Several aspects of the boosting design are empirical and would benefit from more context or testing:
This isn’t to criticize the empirical design — that’s often necessary in early-stage methods — but documenting what was tested and what was fixed would strengthen the work and help future applications.
Validation in a simpler, fully controlled setting
To me, one of the most convincing ways to build confidence in the proposed estimator would be to test it in a much simpler, controlled setting — for example, a low-order stochastic model or linear inverse model where the true return periods are known (or can be computed empirically over very large samples).
This would allow a direct comparison between the boosted estimator and ground truth, and help isolate where biases or over-/under-confidence may arise. It could also help evaluate how the estimator behaves when assumptions like conditional independence or adequate ACₜ sampling are or aren't satisfied.
Even a basic demonstration of this kind would be extremely informative and, in my view, would strengthen the paper considerably.
Confidence interval handling The method appears to yield narrower confidence intervals than GEV-based estimates in some cases. While this could reflect improved sampling, it might also result from underestimating uncertainty in the boosted setting. Appendix A mentions that bootstrapping is used, which is helpful. Still, it would be good to clarify whether the intervals fully reflect all sources of uncertainty (e.g., finite Nparent, dependence structures, or sensitivity to NbN_bNb).
Minor comments/suggestions
Nonstationarity correction. Line 279: The paper states that results are corrected for non-stationarity, but the method used for that correction isn’t described in much detail. How is the rolling climatology computed? Is it applied to each member individually or to ensemble means? And does the choice of window matter?
Section 2.3: Including computational cost (e.g., node-hours or wall-clock time) for the boosted ensemble would help support the method’s efficiency claims.
Notation: Several variables (e.g., TXx5d, TbnT_b^nTbn, TextT_{\text{ext}}Text) appear. A glossary or symbol table might help readers.
Confidence intervals: Have you tested how return period confidence intervals behave if Nb=1500N_b = 1500Nb=1500 or 6000? Even a brief comment would help.
Alternative thresholds: Appendix A briefly discusses threshold sensitivity, but the main text might benefit from a more explicit statement. Would estimates change significantly if parents are selected above the 95th or 99th percentile instead of 90th?
This is a creative and carefully implemented study with a potentially valuable method for return period estimation. The framework is promising and the examples are well chosen. I appreciate that the authors are transparent about the method’s limitations, particularly regarding subjective choices and empirical design. That said, several of these choices and assumptions could still benefit from additional testing and sensitivity analysis. In particular, validating the method in a simple, controlled setting where return periods can be measured directly would provide a powerful test of its performance. With these revisions, the paper would be a strong contribution to the literature on climate extremes.
Cristian Martinez-Villalobos