Estimating return periods for extreme events  in climate models through Ensemble Boosting

Bloin-Wibe, Luna; Noyelle, Robin; Humphrey, Vincent; Beyerle, Urs; Knutti, Reto; Fischer, Erich

doi:https://doi.org/10.5194/wcd-6-1147-2025

Articles | Volume 6, issue 4

https://doi.org/10.5194/wcd-6-1147-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/wcd-6-1147-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 6, issue 4

Research article

| Highlight paper

|

21 Oct 2025

Research article | Highlight paper |

| 21 Oct 2025

Estimating return periods for extreme events in climate models through Ensemble Boosting

Luna Bloin-Wibe, Robin Noyelle, Vincent Humphrey, Urs Beyerle, Reto Knutti, and Erich Fischer

Download

Final revised paper (published on 21 Oct 2025)
Preprint (discussion started on 21 Feb 2025)

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-525', Cristian Martinez-Villalobos, 27 Mar 2025
This paper introduces a new framework to estimate return periods of rare climate extremes using ensemble boosting and conditional probability theory. The technique enhances the sampling of extreme events through targeted perturbations, thereby improving return period estimates without requiring prohibitively long control runs. The method is carefully developed and applied to CESM2 under both stationary and transient conditions, including an application to the 2021 Pacific Northwest heatwave.
The manuscript is clearly written and proposes a promising and computationally efficient approach. That said, several assumptions and empirical decisions underlie the method, and their implications for robustness and generalizability are not fully explored. I believe the paper could make a strong contribution after revisions addressing the following points
Main Comments
Assumptions in the estimator
The validity of the boosting estimator depends on assumptions that may deserve further testing or clarification:
The parent ensemble is assumed to adequately sample the antecedent condition set (AC^0_t). Section 4.2 includes some helpful discussion and testing of the number of parent events used, particularly in the pre-industrial slices. However, it would be useful to further clarify what aspects of the full antecedent condition space are critical for representativeness, and whether longer-term variability (e.g., decadal modes) might still be undersampled. Could the estimator be biased or unstable if AC^0_t is only sparsely or unevenly populated?

The method also assumes independence between \hat{p}_{T \ge T_{\text{ref}}} and the conditional ratio \frac{\hat{p}_{T \ge T_{\text{ext}} \mid AC^\epsilon_t}}{\hat{p}_{T \ge T_{\text{ref}} \mid AC^\epsilon_t}}. This assumption is discussed in Appendix A1 and appears plausible in the authors' setup, but it is not directly tested. It would be helpful to evaluate how return period estimates change when varying T_{\text{ref}} or the number of parent events, to better understand whether this independence holds in practice.

Methodological choices and tuning
Several aspects of the boosting design are empirical and would benefit from more context or testing:
The use of specific humidity as the perturbation variable is described as effective, but it’s not entirely clear why this variable was chosen over others. Were other variables (e.g., temperature, geopotential height) tried? If so, the rationale could be made more explicit.

The perturbation magnitude (1 + 10−13⋅R10^{-13}\cdot R10−13⋅R) seems designed to stay within numerical noise limits. Still, it would be good to mention whether other values were tested, or whether results depend on this factor at all.

The lead time of −12 days is said to balance realism and divergence. Section 4.3 justifies this choice based on ensemble spread and trajectory divergence, which is useful. Still, have return period estimates themselves been tested for robustness to this choice? Pooling across lead times (as shown in Fig. 4d) seems helpful — if that’s generally recommended, it might be worth saying so directly.

This isn’t to criticize the empirical design — that’s often necessary in early-stage methods — but documenting what was tested and what was fixed would strengthen the work and help future applications.
Validation in a simpler, fully controlled setting
To me, one of the most convincing ways to build confidence in the proposed estimator would be to test it in a much simpler, controlled setting — for example, a low-order stochastic model or linear inverse model where the true return periods are known (or can be computed empirically over very large samples).
This would allow a direct comparison between the boosted estimator and ground truth, and help isolate where biases or over-/under-confidence may arise. It could also help evaluate how the estimator behaves when assumptions like conditional independence or adequate ACₜ sampling are or aren't satisfied.
Even a basic demonstration of this kind would be extremely informative and, in my view, would strengthen the paper considerably.
Confidence interval handling The method appears to yield narrower confidence intervals than GEV-based estimates in some cases. While this could reflect improved sampling, it might also result from underestimating uncertainty in the boosted setting. Appendix A mentions that bootstrapping is used, which is helpful. Still, it would be good to clarify whether the intervals fully reflect all sources of uncertainty (e.g., finite Nparent, dependence structures, or sensitivity to NbN_bNb).
Minor comments/suggestions
Nonstationarity correction. Line 279: The paper states that results are corrected for non-stationarity, but the method used for that correction isn’t described in much detail. How is the rolling climatology computed? Is it applied to each member individually or to ensemble means? And does the choice of window matter?
Section 2.3: Including computational cost (e.g., node-hours or wall-clock time) for the boosted ensemble would help support the method’s efficiency claims.
Notation: Several variables (e.g., TXx5d, TbnT_b^nTbn, TextT_{\text{ext}}Text) appear. A glossary or symbol table might help readers.
Confidence intervals: Have you tested how return period confidence intervals behave if Nb=1500N_b = 1500Nb=1500 or 6000? Even a brief comment would help.
Alternative thresholds: Appendix A briefly discusses threshold sensitivity, but the main text might benefit from a more explicit statement. Would estimates change significantly if parents are selected above the 95th or 99th percentile instead of 90th?
This is a creative and carefully implemented study with a potentially valuable method for return period estimation. The framework is promising and the examples are well chosen. I appreciate that the authors are transparent about the method’s limitations, particularly regarding subjective choices and empirical design. That said, several of these choices and assumptions could still benefit from additional testing and sensitivity analysis. In particular, validating the method in a simple, controlled setting where return periods can be measured directly would provide a powerful test of its performance. With these revisions, the paper would be a strong contribution to the literature on climate extremes.
Cristian Martinez-Villalobos
Citation: https://doi.org/10.5194/egusphere-2025-525-RC1
RC2: 'Comment on egusphere-2025-525', Anonymous Referee #2, 31 Mar 2025

My report is contained in the uploaded supplement.

Citation: https://doi.org/10.5194/egusphere-2025-525-RC2
AC1: 'Comment on egusphere-2025-525', Luna Bloin-Wibe, 13 Jun 2025

The comment was uploaded in the form of a supplement.

Citation: https://doi.org/10.5194/egusphere-2025-525-AC1

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Luna Bloin-Wibe on behalf of the Authors (19 Jun 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (25 Jun 2025) by Roberto Rondanelli

RR by Cristian Martinez-Villalobos (11 Jul 2025)

RR by Anonymous Referee #2 (05 Aug 2025)

ED: Publish subject to minor revisions (review by editor) (02 Sep 2025) by Roberto Rondanelli

AR by Luna Bloin-Wibe on behalf of the Authors (02 Sep 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (03 Sep 2025) by Roberto Rondanelli

AR by Luna Bloin-Wibe on behalf of the Authors (08 Sep 2025)

Executive editor

This study introduces an original methodology combining ensemble boosting and conditional probability theory to estimate return periods of rare climate extremes without relying on long climate simulations. The method is rigorously developed, validated on a red-noise process, and applied to CESM2 simulations, including an analysis of the 2021 Pacific Northwest heatwave. Importantly, this framework enables linking probability estimates to specific climate storylines, allowing an assessment of the odds that an extreme event like the one examined will occur in the future.

Estimating return periods for extreme events in climate models through Ensemble Boosting

Download

Interactive discussion

Peer review completion

For final publication, the manuscript should be

For final publication, the manuscript should be