The authors have made some general improvements to the manuscript in the way of incorporating ensemble sensitivity and improving the analysis discussion. I thank the authors for taking my suggestion of applying sensitivity analysis to examine dynamic sensitivities, but I do not feel the manuscript sufficiently benefited from the analysis presented. Additionally, I do not feel the authors have sufficiently addressed my main concern (as well as Reviewer #2’s concern) about the methodology and how this technique would be compared to other lateral-boundary perturbation methods. The proposed method is essentially brand new to the literature, and should be presented as such. I have elaborated on these points below. Until the authors make considerable strides to discuss the previous work on boundary condition perturbations, and how their proposed technique would be better/worse than others, I cannot accept the manuscript. I think the analysis and discussion is reasonably well done and this paper could be a valuable addition to Weather and Climate Dynamics at a future date, pending some additional motivation for the analyzed experiments and potentially a diagnosis of the forecast differences induced by the model configurations with new experiments (also discussed below).
The technique used by the authors is new to the literature on convection-allowing ensemble forecasts. However, there is no clear motivation as to why this technique is preferred over other boundary-perturbation techniques (e.g., Torn et al. 2006). Both I and Reviewer #2 saw this as a flaw in the previous manuscript and it has not been addressed within the introduction. I do not expect the authors to complete companion analyses using other techniques to generate ensemble forecasts (although I would like to see those comparisons), but I do expect the authors to at the very least discuss the advantages and disadvantages of the various techniques (including their own) which motivates this work. There is no reason to reinvent the wheel in ensemble generation unless sufficient reasoning motivates a more simple technique. What benefit does this technique have over others that make it worthwhile to pursue?
One possible new avenue to pursue is comparing the conducted experiments with parallel forecasts that use small, stochastic perturbations on the boundary conditions of the reference forecast. I suspect that the forecast variability seen in the current forecasts is simply the result of numerical noise and chaos seeding (see Ancell et al. 2018), whereby small perturbations in remote locations of the domain (e.g., the boundary) can induce very large changes in forecast precipitation for unrealistic reasons; the perturbations propagate during each model timestep the distance of the finite differencing scheme, which is typically faster than the speed of sound. The authors could actually use this companion analysis to prove or disprove that their methodology is sound and actually producing meaningful spread in their forecasts, i.e. represents true predictability.
2. Ensemble Sensitivity
My previous suggestion to use ensemble sensitivity was motivated by wanting to see that the reasoning for forecast differences proposed by the authors (e.g., closer proximity to cold SSTs) could be deduced statistically, and not just subjectively by the authors. For instance, does higher precipitation in Germany during the 1700-0000 UTC timeframe (Fig. 4) statistically relate to higher/lower CAPE, stronger convergence, or upper-level dynamics at an earlier time that can explain the differences in the simulations? The analysis presented is a good start, but maybe insufficient to answer the dynamic questions about what is influencing the forecasts. I suggest the authors consider applying ensemble sensitivity to the fields in Figure 8 to support their claims about what ultimately influences MCS initiation and propagation into Germany.
3. Forecast Differences
It may be helpful for the eventual readership to actually see forecast difference plots that illustrate the reduced CAPE near the french coastline, enhanced convergence, etc. rather than try to interpret the differences based on equivalent environmental plots. For instance, in Figure 10 the differences between the W, REF, and SE precipitation rates are fairly obvious, but differences in the CAPE fields (and presumably a number of other environmental parameters) are much more difficult to make out. Rather than showing identical forecast plots side-by-side as in Figs. 10, 11, and 12 (and elsewhere in the manuscript), please consider including different plots, which would help elucidate the claims made in the text more clearly.
Line 33, Line 36: What is the convective event on 8 June? The only mention of deep convection is the MCS on 9 June (Line 27). Please consider highlighting the preceding day or remove mention of an event on 8 June.
Lines 49-51: But this increase in lead time was not observed in the simulations by Barthlott et al. (2017), correct? Lines 37-38 only mention that extending the model domain “had the largest effect”. Please clarify what “largest effect” means in relation to the forecast skill and if this skill improvement corroborates the statement in Lines 49-51.
Line 55: “there are various to generate”
Line 59: “Recent studies by”
Lines 59-61: How do these studies contribute to ensemble generation (Line 56)? These references seem out of place for what this section is discussing.
Lines 7,94,182: I urge the authors to consider removing “surprising” and “surprisingly” from the manuscript. The results are NOT surprising when you consider the domain shifting technique as a simple stochastic perturbation, which has been shown to have large impacts on convection by others when considering perturbed lateral boundary conditions (e.g., Clark et al. 2010, Romine et al. 2014). This wording is much too subjective and not scientific. Moreover, the general variability of convective systems can be dramatic within convection-allowing models, as demonstrated in other studies (e.g., Melhauser and Zhang 2012), so I would not consider the current results “surprising”.
Lines 109-110: What initiated the storms during the night and morning hours? Shortwave trough?
Line 129: Please specify the date with the initialization time
Line 148: I think this is the first time the authors have mentioned “reference run”. I would suggest highlighting the reference/control simulation in the previous section, maybe stating that before any domain shifting occurs, you run a reference/control simulation of the event to compare shifted experiments against. Section 4 then naturally transitions to the discussion of reference forecast skill.
Line 159: “of of” to “of”
Lines 169-170: What initiated the showers near Nantes? What was the initiating mechanism that ultimately led to the extreme event in Germany later in the forecast period?
Line 170: How was this track determined? Subjectively? Please specify
Line 174: “too far north”
Lines 184-187: These two sentences are at odds with one another. The first sentence states that there are no systematic responses to the forecasts when shifting the domain in a specific direction. The second sentence actually describes a systematic response resulting in poor forecasts when shifting to the east. Please rectify this discrepancy.
Please switch sections 4.2 and 4.3. It makes more sense to discuss the reference forecast, the sensitivity to domain choice, and then the results of the experiment simulations. It reads disjointed as is.
Lines 210-211: Please reference relevant literature on this topic (e.g., Ancell et al. 2018 and others) to support your claim.
Line 219: Please illustrate this box in one of your figures. A reader should be able to see/visualize where this box is placed at a particular time for the analysis presented in this section.
Line 225: I do not see the “slight reduction in intensity” within the Fig. 8a black line at 1700 UTC.
Line 225: Between 2100 and 2200 UTC
Lines 231-234: There is a clear bifurcation in directional shear between good and bad experiment forecasts by 1300 UTC. Can the authors please discuss the possible implications of these directional shear differences as they relate to MCS development and propagation?
Lines 245-246: This is a reasonable assumption...but can the authors actually prove this is what is occurring and not some other mesoscale process?
Line 250: What is the implication of reference simulation and “unsuccessful” runs having similar proportions of CAPE-to-CIN? It appears to me that the absence of CIN is the major contributor to initiation for the “successful” (warmer colors) runs.
Lines 265-266: Can the authors show this is indeed the cause of lower CAPE and shear in the successful runs? It reads as speculative and should be backed up with some analysis, at the very least shown to the reviewers
Line 269: Can the authors please provide supporting evidence that indeed the lower CAPE and shear is a result of storm modification? This appears to be speculation, and should be corroborated with evidence.
Lines 288-290: Why not compute a sensitivity analysis to the fields listed in Figure 8?
Line 290: Please show this box in a figure
Ancell, B. C., et al, 2018: Seeding Chaos: The Dire Consequences of Numerical Noise in NWP Perturbation Experiments. Bulletin of the American Meteorological Society, https://doi.org/10.1175/BAMS-D-17-0129.1.
Clark et al, 2010: Growth of Spread in Convection-Allowing and Convection-Parameterizing Ensembles, Weather and Forecasting, 25, 594-612.
Melhauser, C. and F. Zhang, 2012: Practical and Intrinsic Predictability of Severe and Convective Weather at the Mesoscales. Journal of Atmospheric Sciences, 69, 3350-3371.
Romine, G. et al., 2014: Representing Forecast Error in a Convection-Permitting Ensemble System. Monthly Weather Review, 142, 4519-4541.
Torn. R., G. J. Hakim, and C. Snyder, 2006: Boundary conditions for limited-area ensemble Kalman filters. Monthly Weather Review, 134, 2490-2502.