Automated detection and classification of synoptic-scale  fronts from atmospheric data grids

Niebler, Stefan; Miltenberger, Annette; Schmidt, Bertil; Spichtinger, Peter

doi:https://doi.org/10.5194/wcd-3-113-2022

Articles | Volume 3, issue 1

https://doi.org/10.5194/wcd-3-113-2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/wcd-3-113-2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 3, issue 1

Research article

|

01 Feb 2022

Research article |

| 01 Feb 2022

Automated detection and classification of synoptic-scale fronts from atmospheric data grids

Stefan Niebler, Annette Miltenberger, Bertil Schmidt, and Peter Spichtinger

Download

Final revised paper (published on 01 Feb 2022)
Supplement to the final revised paper
Preprint (discussion started on 25 May 2021)
Supplement to the preprint

Interactive discussion

Status: closed

RC1: 'Comment on wcd-2021-27', Peter Düben, 24 Jun 2021

This is a good, well-written paper that should be of interest to the readership. However, I have a couple of minor comments that could be addressed in a revised version of the paper.

Comments:

The paper is longer than it needs to be, and some information is spread over the paper which makes it difficult to extract the relevant pieces. E.g. when introducing the vertical levels in l.117 and then mentioning that you only use 9 pressure levels in l.196. Why not describe the data-set augmentation with the data in section 2.1?

Section 3.1 and 3.2: I have tried for a while to understand why you present results for the validation AND the test dataset and gave up. Why do you need section 3.1? You are writing in l.300: “We validated our model during training using 1460 samples of data from 2017. We evaluated our trained models on 1 year of data from 2016 using an object based evaluation described as described later in this section.” This does not really explain why you need the two sections. Also, section 3.1 starts with “The trained models were evaluated on test sets…” which generates ultimate confusion. Do you loose any information when removing section 3.1? Maybe I am missing something.

l.193: I do not understand this. You say you “ignore the outer 20 pixel”. But then you are saying that the brighter areas can be used as input in the caption of Figure 4. Are they used as input but not predicted? But then the output domain should be smaller than the input domain in Figure 3…? And why do you crop to 128x256 pixel (l.199)? And then there is again a confusing mentioning of the 5 degree border in the caption of Table 2…

l.8: I would not call the baseline model “ETH”. ETH is a very large institution.

l.21: Maybe add a reference to the Mei-Yu front?

l.22: “Determining the position and propagation of surface fronts plays an important role for weather forecasting”. Well, the prediction of the position, yes. But is the same true for the automatic detection? Fronts can easily be identified in field maps by the trained eye. Why do we need the ability to detect them automatically with ML? I do understand why, but it would be good if this would be made more explicit in the intro, otherwise it seems that you have a hammer and are searching for nails.

l.24: What are empirical guidelines?

Section 2.1: Maybe I missed it, but do you actually state the resolution of the NWS and DWD datasets somewhere (or the resolution equivalent of the PNG image)?

Figure 3: I do not understand the encode and decode blocks. Can you add some info here? Also, what are the white boxes the “copy” arrows end in?

l.198: “If both labels are available”. What does this mean? At a certain point in time? Why should this matter?

Table 2: The whole caption should be reformulated. “For the global region this border is included within the mentioned range.” ?

l.242: This paragraph is important but very difficult to understand. It should be rewritten.

l.279: I would not use “t” for the index of the channels as “t” is often used for time.

l.280-282: I do not understand this. “individually for each batch”? “more emphasize onto classification”? Either equation (2) holds, or not.

l.289: Why did you not evaluate the baseline at 0.25 degree? I guess there are good reasons, but please state them.

Table 3: You can as well remove the “Stationary” line.

Table 5: “The suffix “all”…” I do not understand this sentence.

l.488: I find this a bit confusing. You would not leave out a certain region in a real-world application, so why here?

Typos etc:

l.51: typically

l.253: predicted fronts

l.301: remove “described”

l.346: “be be”

l.349: “slight edge”?

l.351: “fact that training”

l.403: “most likely”

l.445: “and the European data”

Caption Figure 7: “on the for the”

l.514: “for is the lack”

l.439: “However,”

Citation: https://doi.org/10.5194/wcd-2021-27-RC1
RC2: 'Comment on wcd-2021-27', Anonymous Referee #2, 29 Jun 2021

Review of Automated detection and classification of synoptic scale fronts from atmospheric data grids

Automated feature recognition has proven useful in gaining scientific knowledge of the dynamics and relationships between various atmospheric flow features such as cyclones, jets, and surface fronts. However, there are a variety of automated methods to identify the relevant feature of interest because even trained experts do not agree on how to define a feature. This applies to surface fronts for which no general definition exists. Therefore, improved methods that help to gain additional insight into the nature of fronts are important, however I am have some concerns on whether ML-based method trained on surface analysis is the next step.

General comment

1. There is no single accepted front definition and different weather centers use their own definitions based partly on physical considerations, partly on training and experience, partly on the specific local meteorology, and sometimes simply artistic. It is therefore questionable whether a front identification should be guided by manual surface maps or physical arguments. This dilemma is nicely summarized in Uccellini et al (1992) and Sanders (1999) and was lately reviewed in Schemm et al. (2018) and Thomas and Schultz (2019a). I recommend that the authors review these earlier studies; their introduction comes in its first paragraph without a single reference (and there are numerous studies that link fronts to extreme weather that could be referenced). Also, there is little historic background provided.

The front dilemma can be summarized with the following example: The UK MetOffice automated surface analysis regularly displays double fronts, while the DWD chart never shows these fronts – see Fig. 2 in Thomas and Schultz (2019a). Instead, DWD-front are Norwegian-like and hemispheric spanning, which is more art than science. The missing double fronts are however real and important to detect. They will be missed if trained on DWD charts.

Related to the definition of fronts, there is one stream of front definitions that is based on baroclinic instability and there is also a second front definition based on air-mass boundaries – see Thomas and Schultz (2019b). The two are mixed up in this study, for example, when the authors speak about fronts that are associated with the propagation of extratropical cyclones but thereafter describe fronts as air-mass boundaries. These air-mass boundaries, which provide little baroclinicity, are very interesting for research. If one wants to detect these, one cannot train a ML method on DWD charts, although it seems as if DWD uses many meteorological variables to draw their front, which is common for the air-mass boundary definition of fronts but not that based on baroclinicity. Maybe DWD excludes mesoscale fronts in general?

Overall, I therefore reject manual surface charts as ground truth, baseline method or “gold standard” for verification. The surface maps are biased, inhomogeneous, only partly based on physical reasoning and cannot be transferred between different regions. I find a tool that learns these biases, here the DWD bias, difficult to use for research purposes, though they might be useful in an automated DWD workplace environment. Even though the authors try to alleviate some of these issues with the blurring of the front position shown in their Fig. 5, I hesitate to conlcude that ML-based fronts trained on surface charts is the way forward.



2. The manuscript has a strong technical nature with only little insight into meteorology or front dynamics. I would recommend considering a transfer of this manuscript to GMD.

3. The presented comparison against a second front detection method, which is based on the thermal front parameter (TFP), is odd. First, I recommend not to call it “the ETH method”, because this is not known to the community and ETH is a large institution. Maybe TFP method? Second, I recommend providing more background about the TFP, which goes back to Renard and Clarke (1965). The TFP implementation by Jenkner et al (2010), which is used here as reference method, is unique, because it places the front where TFP=0, which is at the center of the frontal zone. However, this is not where most meteorologist place the front. Most center, including DWD and ECMWF, place it where MAX(TFP)=0, which is at the leading edge of the frontal zone. So, there is a mismatch. This important difference is not explained in the current manuscript and because the width of a frontal zone can easily encompass a couple of hundred kilometers, the here used “baseline method” will be due to its design in most situations do not agree with the DWD charts. Basically, a method that was trained to reproduce DWD fronts, which it does very well, is compared against a method that was intentionally designed not to agree with DWD fronts because the front line is placed in a different location. It is thus not a meaningful comparison (and this explains much of what is found in Lines 376-4079) and I recommend that the comparison is removed.

4.    More meaningful would be a comparison against another ML-based method, for example, that of Lagerquist et al. (2019), who pioneered ML-based front detection. This would be more insightful because it is not clear at this point which neural-network architecture is most suitable for front detection and why this is the case. I find Fig. 10 in Lagerquist et al. (2019) very helpful. A similar figure plus a direct comparison of these two ML-based methods would thus be of high merit.

5. It is not advisable to transfer an automated front detection from a low-resolution grid to a high-resolution grid without intensive retuning and testing. How was this retuning done? By how much was the detection thresholds for the front gradient increased? By how much was the minimum length criterion changed? Did the authors increase the minimum advection speed to separate stationary from non-stationary fronts? A method developed for a ERA-Interim 1x1 degree grid (or for a 2-km grid as in Jenkner et al. 2010) should not be transferred to another grid spacing. Further, does DWD use a minimum front length and are the authors using the same threshold? At the same time, while the front threshold presumably was increased when going from a 1x1 degree grid to a 0.5x0.5-degree grid, the number of fronts ideally should not change (see Fig. 2 in Thomas and Schultz (2019b) on the dependency to the threshold). More details on the retuning that was done when preparing the comparison is needed in this manuscript.

6. I was disappointed to see an equivalent-potential temperature based front definition purposefully applied to latitudes outside of midlatitudes. Front detection methods based on equivalent-potential temperature (called TH in the next statement) are well known to be unsafe for usage outside of the midlatitudes, for example, Schemm et al. (2015, p. 1696) noted: “… clearly indicates that the TH method is influenced by semi-permanent convergence zones and tropical convection (although a minimum advection threshold is applied). Tropical features which, from a synoptic viewpoint, would not be regarded as a ‘front’, are identified as such. Accordingly, TH methods should be used with care if applied outside midlatitudes”, which is a nice way to say that it should not be done. Further they note “… as the θe gradient can be dominated solely by moisture gradients, especially in tropical latitudes, this results in the detection of several quasi-stationary fronts (which form along mountain crests, or in association with land – sea contrasts) which must be removed in a post-processing step” (Schemm et al. 2015, p.1687). Against these recommendations the authors decided to apply the θe method to subtropical and tropical latitudes and afterward, not too surprisingly, conclude that it detects numerous of non-cyclone related fronts. What was the intention behind this? The section between L.460-470 is therefore misleading.

7. The conclusion is short, with only a technical statement and an outlook but no conclusion related to weather and climate dynamics. Maybe you could try to conclude on how and why the ML-based method is able to distinguish mobile from stationary fronts (such as those along the coastlines or mountains), which would yield additional process understanding and it is a mayor struggle for traditional TFP-based methods.

Minor comments:

• L. 19 “much of the literature is on the larger-scale fronts” – research on mesoscale fronts is a very active field of research as well.

• L. 15 “are a vital part of the communication of weather to the public and the public perception of weather in general” – Most people use Apps; fronts are no longer a major part of modern weather communication.

• L. 27: “The former methodology goes back to the work by Hewson (1998)” – I guess it goes back to Renard and Clarke (1965).

References:

• Lagerquist, Ryan, Allen, John T., and McGovern, Amy, 2020, "Climatology and Variability of Warm and Cold Fronts over North America from 1979 to 2018" Journal of Climate Vol. 33, No. 15, 1520-0442

• Lagerquist, R., A. McGovern, and D. Gagne II, 2019: Deep learning for spatially explicit prediction of synoptic-scale fronts. Wea. Forecasting, 34, 1137–1160.

• Sanders, F., 1999: A proposed method of surface map analysis. Mon. Wea. Rev., 127, 945–955

• Schemm, S., Sprenger, M., & Wernli, H. (2018). When during Their Life Cycle Are Extratropical Cyclones Attended by Fronts?, Bulletin of the American Meteorological Society, 99(1), 149-165.

• Thomas, Carl M. and Schultz, David M., 2019, "Global Climatologies of Fronts, Airmass Boundaries, and Airstream Boundaries: Why the Definition of “Front” Matters" Monthly Weather Review Vol. 147, No. 2, pp 691, 1520-0493

• Thomas, Carl M. and Schultz, David M., 2019, "What are the Best Thermodynamic Quantity and Function to Define a Front in Gridded Model Output?" Bulletin of the American Meteorological Society Vol. 100, No. 5, pp 873, 1520-0

• Uccellini, L. W., S. F. Corfidi, N. W. Junker, P. J. Kocin, and D. A. Olson, 1992: Report on the surface analysis workshop at the National Meteorological Center 25–28 March 1991. Bull. Amer. Meteor. Soc., 73, 459–471.

Citation: https://doi.org/10.5194/wcd-2021-27-RC2
AC1: 'FAC on wcd-2021-27', Stefan Niebler, 28 Sep 2021

We thank both reviewers for their helpful comments and suggestions.
The reply to the reviewer's comments can be found in the supplement.

Citation: https://doi.org/10.5194/wcd-2021-27-AC1
EC1: 'Editor's comment on wcd-2021-27', Lukas Papritz, 30 Sep 2021

Both reviewers find the machine-learning based front detection method for mid-latitude fronts valuable. Despite being a technical paper, I see it fit into the scope of WCD, not least because such automated feature detection methods have become a cornerstone of atmospheric dynamics research. I, therefore, encourage the authors to submit a revised version of the manscript.
Both reviewers point out several aspects that need clarification and they provide guidance for improving the manuscript. Furthermore, reviewer 2 raises several important concerns regarding the methodology, in particular, the training of the neural network based on manually detected fronts, and the comparison with and evaluation against established method. These concerns should be carefully addressed in a revised version of the paper. I addition, I would ask the authors to focus also on the following points:
(1) In order to make the paper of interest for a broader readership, the authors should include an application of their method that showcases the benefit of this method over existing ones (see also comment by reviewer 2).
(2) As pointed out by reviewer 2, the subjectivity of fronts retrieved from manual surface charts can be problematic and I consider it essential the authors can alleviate this concern.
(3) Generally, the introduction should provide a broader overview over the existing literature on automated front detection and its applications. Also I recommend the authors expand the conclusion and put their method into a broader context.
(4) Please make sure that all figures have appropriate labels (a), (b)... and that these are referenced in the figure caption.
(5) Clarify the labels in legend to Fig. 6. Also in Fig. 6b the legend should not be centered at 0, where the front is located. It should be moved to the edge as for the other panels.
(6) The figures showing the extracted front features are difficult to interpret. At the very least they should include a geographical reference, such that, for example, in Fig. 2 a direct comparison can be made with the manual surface chart. Also plotting in the background some meteorological fields such as SLP and THE or similar would facilitate the interpretation of the detected fronts.

Citation: https://doi.org/10.5194/wcd-2021-27-EC1

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Stefan Niebler on behalf of the Authors (15 Oct 2021) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (25 Oct 2021) by Lukas Papritz

RR by Anonymous Referee #1 (06 Nov 2021)

RR by Anonymous Referee #2 (09 Nov 2021)

For final publication, the manuscript should be

accepted as is

accepted subject to technical corrections

accepted subject to minor revisions

reconsidered after major revisions

rejected

Were a revised manuscript to be sent for another round of reviews:

I would be willing to review the revised manuscript.

I would not be willing to review the revised manuscript.

Suggestions for revision or reasons for rejection

Below some thoughts and recommendations, I suggest moving forward with this manuscript with minor revisions.

General
------------------
The authors present a revised version of their ML-based surface front detection. One certainly cannot dispute the usefulness of feature-based detection methods, but my reservation about whether manual surface analysis should guide the development of a next-generation automated feature-based method remains intact.

The discrepancy between the traditional TFP-based methods, which arguably have their own shortcomings, and the presented method for emulating DWD and WCP fronts does not, in my opinion, indicate a particular weakness of the TFP-based methods, but must be considered in light of the weakness of the manual surface maps, which either fail to account for or erroneously indicate or displace relevant surface features otherwise correctly detected by the automated TFP-based method. The question remains as to what should be accepted as ground truth, and as stated earlier, I cannot recommend relying fully on manual analysis. This position is in contrast to several statements by the authors who continue to hold on to manual analysis as a ground truth and consequently continue to argue throughout the manuscript that traditional TFP-based methods are outperformed. The answer might simply be that the manual analysis is erroneous in many cases, and the ML-method has learned the bias while the TFP-based method – in fact – outperforms both. It is simply a fundamentally different viewpoint.

Even though the authors have improved their comparison between a TFP method and their new method, I still think the comparison is incorrect. A proper choice for a baseline method must always be seen relative to the ground truth. Here, manual surface charts are used as ground truth, which are drawn based on several variables at several heights, and as baseline a method is chosen which uses one variable at one height and was never developed with the specific goal to reproduce surface charts. As already mentioned, I recommended removing this comparison, since it is not necessary for the publication. Only the comparison with an earlier ML method would make sense. However, since even the authors of this study argue in their reply that they are not able to handle the code provided by of one of the earlier ML methods, I am concerned about the reproducibility of these studies. At the end of this document, I recommend another ML method that uses the same ground truth – maybe this code is more user friendly and can serve as the baseline method the author wish to have.

Nevertheless, in their revised introduction, the authors have addressed aspects of this discussion, but the need for labeled training data is so central to their method, there is basically no other option for the training of the ML method but to accept the author decision and evaluate the manuscript considering their viewpoint.

To me then, the automated method gives us gridded front data that might be useful for meteorological research related to phenomena associated with the passage of surface fronts. The presented example, a confirmation of an earlier studies that addressed the question of front-related extreme precipitation events, leaves unfortunately the question open of what exactly can be learned using ML-based methods given that the authors basically show that the method reproduces exactly what was previously found using a TFP-based method. Recommendation: It would be helpful to at least give some indication at the end of this section of what exact new physical insight can now be generated with the new method that could not be generated before.

The climatological application is otherwise a very nice example that would motivate a section on the issue of explainability of data-driven methods. While the presented method produces climatological patterns in agreement with previous findings, it is beyond that capable of splitting different front types in a clear manner. Of particular interest would be to understand what variables are key for the learning process and if the climatological patterns would look different if only trained on a single variable. A particular strength of the ML method could be to use a low number of input features to reproduce manual analysis. Again, to me, however the results seem to result from the combination of various input channels, while traditional methods often rely on a single variable which seem to be not sufficient to separate different front types. Layerwise backward propagation might be a simple way of showing what variables allow the network to develop this ability. Recommendation: It would be useful to give some indications in this direction at the end of the corresponding section.

In the summary it is argued that the method can also be applied to higher-resolution data. I think this is not the case. To make the method mesh independent, the input training data would need to be converted to continues space and training would need to be performed in continues space as is done in random feature methods or eventually also in FFT-space. There is something to be said here about mapping between Banach spaces.

Minor issues
------------------

Introduction
------------------
The authors are encouraged to add more reference to their statements relating fronts to, for example, wind gusts or extreme weather.

Some ML related questions:
------------------
- Is the Batch normalization really needed? Usually, it accelerates the training process and additionally improves the skill. However, from a theoretical viewpoint, it is unclear why this is case and thus it might not be needed in this particular application.

- Why is the drop-out chance set to 0.2? Is there any over-fitting without it? How does this relate to the problem of choosing arbitrary thresholds? I recommend a brief discussion of the sensitivity.

- Why did you choose 3 drop-out layers and avg. pooling steps in your U-Net architecture and not less or more?

- Why are the number of channels changing from 330 to 64 after the first encoding block, but for all further encoding it increases by a factor of two?

- Reference for U-Net should also be given to Shelhamer et al. 2016 (doi: 10.1109/TPAMI.2016.2572683)

- L. 345, how did you determine the deformation factor of k=3? Shouldn’t the choice be tested against randomness in some way? How, as before, does this choice relate to the problem of choosing arbitrary thresholds? A common weakness of traditional methods.

Section 2.2.4
------------------
Several previous studies have questioned the usefulness of front lines in general and for the use in next-generation front detection methods. These studies rather recommend using frontal regions or frontal volumes. Is all of what is done in this section needed simply to obtain front lines?

Section 2.3
------------------
I recommend removing this section and the corresponding comparison in Section 3.1.1. Also, it is noted that only midlatitude fronts are included for the TFP method, but in Section 3.2.2. the opposite is done.

Section 2.4.3
------------------
Even though POD and SR are intuitive measures, I recommend to better explain the meaning of nmws and nws. The latter is “the count of all provided fronts” the former “all fronts that could be matched”. To what does provided refer to (provided by whom)? What is a front that is provided but cannot be matched?

Fig. 6 is missing a color bar for the gray shading.

Fig. 6 The yellow class is labelled as “no class” but there seems to be no yellow label in the figure.

Section 3.2.
------------------
Overall, I am afraid I do not understand the purpose of this section. Is it about showing that DWD and WCP fronts have gradients?

Fig.9: What is the variance for the shown averaged values for each line and are the differences between the methods within or outside, for example, the range given by -/+ two times the standard deviation of the sample that went into the averaging for each method? The lines all look very similar to me and may not significantly be different from each other.

In all honesty, this does section does not add much to the paper. This section should be removed as the paper can be published without this information.

Section 3.2.2.
------------------
I am afraid I do not support the usage of an attribution measured that uses an attribution radius defined in terms of degrees. I would assume that 2.5 degrees correspond to a different area/distance at different latitudes so you will attribute less precipitation to fronts at higher latitudes, don’t you?

Not sure if the difference between fr and a(fr) is fully clear. Is the first the number of fronts at a grid point and the second a probability? What do you mean by “grid point p is associated with a front” other than “a front occurs at p”?

Fig. 10: Maybe I missed it but why are the polar regions not shown?

Fig. 10-12: Some words in the title of the figures are capitalized others not.

Literature:
------------------
The authors may consider the following paper, which appears to target the same ground truth but uses a random forest method. I guess that this is the baseline method the authors are looking for.

Bochenek, B.; Ustrnul, Z.; Wypych, A.; Kubacka, D. Machine Learning-Based Front Detection in Central Europe. Atmosphere 2021, 12, 1312. https://doi.org/10.3390/ atmos12101312

Hide

ED: Publish subject to minor revisions (review by editor) (15 Nov 2021) by Lukas Papritz

AR by Stefan Niebler on behalf of the Authors (23 Nov 2021) Author's response Author's tracked changes Manuscript

ED: Publish subject to minor revisions (review by editor) (29 Nov 2021) by Lukas Papritz

AR by Stefan Niebler on behalf of the Authors (08 Dec 2021) Author's response Author's tracked changes Manuscript

ED: Publish as is (09 Dec 2021) by Lukas Papritz

AR by Stefan Niebler on behalf of the Authors (17 Dec 2021)

Short summary

We use machine learning to create a network that detects and classifies four types of synoptic-scale weather fronts from ERA5 atmospheric reanalysis data. We present an application of our method, showing its use case in a scientific context. Additionally, our results show that multiple sources of training data are necessary to perform well on different regions, implying differences within those regions. Qualitative evaluation shows that the results are physically plausible.