of Automated detection and classiﬁcation of synoptic-scale fronts from atmospheric data grids

Abstract. Automatic determination of fronts from atmospheric data is an important task for weather prediction. In this paper we introduce a deep neural network to detect and classify fronts from multi-level ERA5 reanalysis data. Model training and prediction is evaluated using two different regions covering Europe and North America. We apply label deformation within our loss function which removes the need for skeleton operations or other complicated post processing steps as observed in other work, to create the final output. We observe good prediction scores with CSI higher than 62.9 % and a Object Detection Rate of more than 73 %. Frontal climatologies of our network are highly correlated (greater than 79.6 %) to climatologies created from weather service data. Evaluated cross sections further show that our networks classification is physical plausible. Comparison with a well-established baseline method (ETH Zurich) shows a better performance of our network classification.


. Sketch of evaluation and comparison region used during CSI evaluation for two exemplary outputs (a) and (b). Green line segments are within the evaluation (and comparison) region, while red line segments can only be used during comparison but are not evaluated. Front segments connected within the comparison region are evaluated as a single front, even though they are not connected within the evaluation region alone.

S1 CSI evaluation sketch
Fig. S1 shows a sketch of the evaluation and comparison regions used during evaluation of the CSI scores for two exemplary outputs. The comparison region fully contains the evaluation region plus the darker shaded blue region, while the evaluation region consists only of the brighter blue shaded region. Parts of a front are green if the segment is within the evaluation region, red otherwise.

5
During evaluation the front segments within the evaluation region of Output 1 (Panel a) are compared against the front segments within the comparison region of Output 2 (Panel b) and vice versa. That means only the green segments are evaluated while the combination of green and red segments serves as potential matches.
As front objects are determined within the comparison region, both green segments of front 1 (front a) are correctly identified as being parts of the same front. Not using the comparison regions, these segments would be counted as two separate fronts, 10 potentially skewing the count of matched or unmatched fronts in the end. Additionally this method allows the algorithm to correctly match front e against the bottom right red front of Output 1. As this front is not located within the evaluation region, e would have otherwise been falsely counted as unmatched. This method is therefore useful to reduce errors in the evaluation introduced by the cropping of the outputs. 15 We carried out an additional evaluation of the CSI, POD and SR scores, where each front is only allowed to match against one front of the corresponding type rather than the complete set. Table S1 displays the scores for the DWD, Table S2 for the NWS data-set.  The values of all three measures decrease in comparison to the evaluation of the fronts with matching to the complete set. For the SR and classification results this effect is less pronounced.

S3 Cross Sections on NWS Data
Here we present additional cross sections of physical properties ( Fig. S2 and S3) relative to the front position provided by the NWS data-set. Qualitatively, the same behaviour as for the DWD fronts can be seen. Note that the NWS data contains stationary fronts, thus cross-sections for these are indicated here by a solid yellow line.

25
The provided video supplement shows the predicted and classified fronts for January 2016 at each hour. The background consists of the equivalent potential temperature at 850hPa. Color channels are chosen as follows: red: warm front Fronts are created as described in Section 2.4.2.
In some cases a classification may not be exclusive for a pixel, resulting in a potential overlap in the color channels. This effect may occur when one type of front transitions into another. E.g. a transition from a warm to a cold front may appear pink, 35 due to mixing in the blue and red color channel. Weakly expressed fronts may appear fragmented, due to the filtering threshold.

S5 Connection between fronts and extreme precipitation
Additional information regarding the methodology used in Section 3.3 is provided here.

S5.1 Data and Definitions
For the determination of precipitation events, we use the surface precipitation as contained in the ERA5 data set (hourly 2D 40 field). Extreme precipitation is defined as any precipitation event that exceeds the 99th percentile of precipitation at each grid point. Due to a limitation of available data we calculate this percentile using ERA5 data ranging from the years 2010 until 2018 (inclusive) using CDO.
We consider any grid point within an L2-distance of 2.5 • (i.e. 10 grid points) to a front (extreme precipitation event) to be associated with a front (extreme precipitation event). We evaluate the connection between fronts and extreme precipitation x + y: Events x and y occur at the same time at p. (e.g. epr + a(f r) describes the event that an extreme precipitation event occurs at p while p is associated with a front.)
We further define the proportions P evt (p) = N evt (p)/N (p) for events evt as defined above. Finally we also calculate the relations , describing the proportion of extreme precipitation events at grid point p that can be associated with a front , describing the proportion of fronts at grid point p that can be associated with an extreme precipitation event.
These definitions are slightly similar to the formulation of conditional probability.
We further define as high altitude regions any grid point within a 5 pixel distance from any grid point exceeding a height of 2000m. The height of a gridpoint is derived from the geopotential height variable of the ERA5 data-set 65

S5.2 Statistical Test
If we assume that both events, i.e. the occurrence of an extreme precipitation event and a front, are completely uncorrelated, we would expect R 1 to be similarly distributed as P a(f r) , i.e. the frequency with which point p is associated with a front. For each grid point p poleward of 20 • we calculate the frontal frequency P a(f r) (p). We then distribute all points p according to their respective frontal frequency, into bins of 1% width. For each bin with at least m entries we randomly select m grid points (base points) and create 1000 event lists, each containing k successive extreme precipitation events sampled at 6 points. Each of those points is located at the respective opposite hemisphere from the corresponding base point. k is chosen as 50 such that we obtain at least 300 samples of extreme precipitation events for each base point. As result for each frequency bin we obtain m sampled distributions of the proportion of extreme precipitation events occurring while the base point is associated with a front. Taking the median of each of those samples we get a sample of m points per bin. We then apply a percentile regression 75 on this data to obtain linear functions describing the 1st and 99th percentile of our data with respect to the frontal frequency. We then define that for each grid point p, where R 1 (p) is not within the limits described by these percentiles respecting the underlying frontal frequency, a significant connection between extreme precipitation and frontal frequency exists. We are then able to additionally mask all grid points where no significant connection is found.
For our test m = 12 was chosen as the maximum observed frontal frequency was around 53%. Ignoring bins with less than 80 m entries a total of 576 base points were considered. This test will be used for the evaluation in different scenarios.

S5.3 Box Plots
The box plots in Fig. 11 display R 1 as a function of P a(f r) . For this we divided all sample points into k = 21 bins. Each bin b i with 0 ≤ i < k, i ∈ N contains all R 1 (p) for all grid points p within the midlatitudes, excluding high altitude regions, where (i − 1) · 5% < P a(f r) (p) ≤ (i) · 5%.

S5.4 Refined attribution radius
In Fig. S4 we provide the same content as Fig. 10(a) for two smaller radii of 5px (1.25 • ) and 2px (0.5 • ), respectively. The qualitative features (i.e. the regions with high correlations) remain the same but the correlation magnitude is reduced due to the smaller radius of influence. Such investigations cannot be carried out with classical TFP methods, since they are restricted to low resolution data sets.
90 Figure S4. As Fig. 10(a). Proportion of extreme precipitation events, which are also associated with a front, where the association radius is 5px (1.25 • ) (a) and 2px (0.5 • ) (b), respectively. Regions with high topography are shaded in light gray, while areas where no extreme precipitation events occurred in 2016 are shaded in dark gray. Regions where no significant correlation between extreme precipitation and fronts was found are blanked.