Comment on wcd-2021-39

This study by Ghinassi et al. aims to document the representation of midlatitude Rossby wave activity in state-of-the-art climate models of the PRIMAVERA project. The authors find that overall the models reasonably well represent the climatological mean wave activity (LWA). Increasing the resolution of the models generally improves the representation of the stationary LWA but not necessarily of the transient LWA. Further, it is found that an improvement of the models can only be observed when both the oceanic and atmospheric resolution is changed. For models where only the atmospheric resolution was changed a worsening of the models' ability to represent the LWA is detected. The study is well written and the figures are clear. However, I have some major and in my view important comments which need to be addressed. Once these comments (which may change the interpretation) have been addressed I am happy to provide a more detailed review. Just to make sure, I am generally very positive about the work. Thus, I highly encourage the authors to submit a revised version of the manuscript to WCD.

This study by Ghinassi et al. aims to document the representation of midlatitude Rossby wave activity in state-of-the-art climate models of the PRIMAVERA project. The authors find that overall the models reasonably well represent the climatological mean wave activity (LWA). Increasing the resolution of the models generally improves the representation of the stationary LWA but not necessarily of the transient LWA. Further, it is found that an improvement of the models can only be observed when both the oceanic and atmospheric resolution is changed. For models where only the atmospheric resolution was changed a worsening of the models' ability to represent the LWA is detected. The study is well written and the figures are clear. However, I have some major and in my view important comments which need to be addressed. Once these comments (which may change the interpretation) have been addressed I am happy to provide a more detailed review. Just to make sure, I am generally very positive about the work. Thus, I highly encourage the authors to submit a revised version of the manuscript to WCD.

Most important comments:
1) Statistial significance of results: An important part of the manuscript is the discussion of the differences between reanalysis and the PRIMAVERA simulations. I my view, the discussion lacks two important aspects. 1) More quantitative information concerning the biases would be very helpful. Accordingly, I suggest to revise e.g. Fig. 2 by showing the mean LWA as contours and the differences between the PRIMAVERA simulations and reanalyses in shading. This would clearly highlight the regions associated with the most pronounced biases. 2) A calculation and discussion of the statistical significance of the differences is missing. Thus, I would like to ask the authors to provide some information on the statistical significance. For example, a bootstrap approach with replacement would be suitable to analyse the significance of the results.
2) Choice of the isentropic level: I absolutely agree that the 320 K isentropic level is a suitable choice to investigate the midlatitude LWA during Northern Hemisphere winter. However, I am wondering how this level affects any interpretations concerning RWPs along the subtropical jet. To me it is quite unexpected that no signal of LWA activitiy is found along the subtropical jet which stretches from Northern Africa, across the Arabian Peninsula towards India during Northern Hemisphere winter. Therefore, I encourage the authors to either include an additional higher isentropic level in their analysis, or to at least comment on possible model biases in terms of LWA along the subtropica jet.
3) Classification into HR and LR: A major goal of the study is to investigate the impact of model resolution on the LWA. To this regard, the models are classified into LR and HR. However, in its current form the classification is questionable since LR and HR actually include model runs with the same atmospheric resolution. For example, the CNRM-CM6 with 50 km is classified as LR whereas the ECMWF-IFS with 50 km is classified as HR. Accordingly, I suggest to reorganize the classification so that each of them only contains models with a similar range of atmospheric resolution. In the same way, it would be intriguing to classify the simulation based on the ocean resolution (100 km vs 25 km). I would leave the final decision concerning this latter aspect to the authors. 4) WR identification: The authors state that "to allow the comparison between different models and the observations we choose to work with the same reference reduced phase space for all simulations, defined by the 4 leading EOFs obtained from ERA5 reanalysis." Accordingly, the anomalies from the models are projected onto the reference space. Though I understand the reasoning behind, the reader is left wondering on how potential model biases affect the projection onto the reference space. Is any bias correction of M performed prior to the projection? This important information needs to be included and I suggest the authors to perform a bias correction prior to the regime identification.
Minor comments: l. 40: Better write (e.g., wind, geopotential height, mean sea level pressure) instead of (wind, geopotential height, mean sea level pressure...) l. 225: This shift is consistent with Quinting and Vitart (2019) who found the same behaviour in models of the S2S reforecast data base. l. 365: Again, how is the significance of the trend determined? And can you actually quantify the magnitude of the trend?