Seasonal forecasts of the Saharan Heat Low characteristics: A multi-model assessment

. The Saharan heat low (SHL) is a key component of the West African Monsoon system at the synoptic scale and a driver of summertime precipitation over the Sahel region. Therefore, accurate seasonal precipitation forecasts rely in part on a proper representation of the SHL characteristics in seasonal forecast models. This is investigated using the latest versions of two seasonal forecast systems namely the SEAS5 and MF7 systems from the European Center of Medium-Range Weather Forecasts (ECMWF) and Météo-France respectively. The SHL characteristics in the seasonal forecast models are assessed based on a comparison with the ﬁfth ECMWF Reanalysis (ERA5) for the period 1993–2016. The analysis of the modes of variability shows that the seasonal forecast models have issues with the timing and the intensity of the SHL pulsations when compared to ERA5. SEAS5 and MF7 show a cool bias centered on the Sahara and a warm bias located in the eastern part of the Sahara respectively. Both models tend to underestimate the interannual variability in the SHL. Large discrepancies are found in the representation of extremes SHL events in the seasonal forecast models. These results are not linked to our choice of ERA5 as a reference, for we show robust coherence and high correlation between ERA5 and the Modern-Era Retrospective analysis for Research and Applications (MERRA). The use of statistical bias correction methods signiﬁcantly reduces the bias in the seasonal forecast models and improves the yearly distribution of the SHL and the forecast scores. The results highlight the capacity of the models to represent the intraseasonal pulsations (the so-called east–west phases) of the SHL. We notice an overestimation of the occurrence of the SHL east phases in the models (SEAS5, MF7), while the SHL west phases are much better represented in MF7. In spite of an improvement in prediction score, the SHL-related forecast skills of the seasonal forecast models remain weak for speciﬁc variations for lead times beyond 1 month, requiring some adaptations. Moreover, the models show predictive skills at an intraseasonal timescale for shorter lead times.


Introduction
In the Sahel region, food security for populations depends on rain-fed agriculture which is conditioned by seasonal rainfall (Bickle et al., 2020;Durand, 1977), characterized by a strong convective activity in the summer, associated with a large 2.3 Strategy for the analysis of forecast As we analyse the representation of the SHL, we focus on the period going from June to September (denoted by JJAS in the rest of the study) because it corresponds approximately to the period of maximum heat low activity over the Sahara (Lavaysse et al., 2009). Climate models usually fail to forecast correctly events a long time in advance for a given target period. Therefore, 115 we are interested in a forecast launched at most two months in advance of the JJAS period. In order to do that, we consider forecasts initialised on the 1 st of April, May and June, which corresponds respectively to a June lead time of 2, 1, 0 month(s).
This technique allows us to quantify the sensitivity of the models in representing the SHL at different lead times. The forecast validation process is made separately for the whole JJAS period and individual months (June, July, August and September) because June and September temperature values are in the same range.

Methods
This section describes in more details the set of analyses carried out to achieve our goal. The methodology adopted is illustrated below.

Saharan Heat Low detection and metric
The location of the West African Heat low has a strong seasonal variation: North-South owing to the seasonal cycle of insolation 125 and East-West owing to orographic forcing (Lavaysse et al., 2009;Drobinski et al., 2005). It is termed SHL once it reaches its Saharan location generally within 20 • -30 • N × 7 • W -5 • E during the monsoon season, an area that is bounded by the Atlas mountains to the North, the Hogar mountains to the East, the Atlantic ocean to the West and the northern extent of the WAM to the South (Evan et al., 2015). The SHL has been detected in previous studies using the low level atmospheric thickness (LLAT) computed as a geopotential distance between two pressure levels 700 hPa and 925 hPa (Lavaysse et al.,130 2009). Because the LLAT is due to a thermal dilatation of the low troposphere and in order to simplify the detection process, the SHL can be monitored by using the 850 hPa temperature field. Lavaysse (2015) showed that this 850 hPa temperature field is well correlated to the LLAT and can be used as a proxy for the monitoring of the SHL (detection and intensity). In this study, we use the temperature at 850 hPa to analyse the SHL characteristics. Because fixed boxes are used, the detection of the SHL is not needed, but, strong (weak) phases of the SHL will be associated with high (low) respectively temperatures.

Variability modes: Wavelet analysis
A mode of variability represents a spatio-temporal structure highlighting the main characteristics of the evolution of atmospheric variables at a given time scale. There are several statistical methods for assessing the modes of variability that contribute to a raw signal. The one used here is the wavelet analysis of the temperature signal. The wavelet transform consists in applying a time-frequency analysis on a given signal. It is very useful to analyze non-stationary signals in which phenomena 140 occur at different scales. This method provides more information than the Fourier transform about the observed structures in the initial signal (starting and ending time, and the duration of propagation (frequency)). With this type of analysis, we observe the distribution of the signal intensity in time and frequency. A wavelet function is defined by a scale factor and a position factor (Büssow, 2007;Zhao et al., 2004).
Let f (t) be a real function of real variable, the wavelet transformation of this function denoted as W f (a, b) is given by: The function Ψ is called mother wavelet and must be of square integrable that means +∞ −∞ (Ψ(t)) 2 dt is finite, and also verify the following property: +∞ −∞ Ψ(t)dt = 0. The parameter b is the position factor and a is the scaling parameter greater than zero. For a given signal, a represents the frequency and b the time. There exist diverse types of mother wavelets; based on 150 the literature review and its common use, we chose the Morlet wavelet (Tang et al., 2010). The Morlet wave is defined as the product of a complex sine wave and a gaussian window (see "Eq. (3)") (Cohen, 2018). The wavelet analysis has been applied separately on the re-forecasts for an initialisation of the models on the 1 st of April, May and June for a 6 months period. We focused on signals with a period up to 32 days.

Bias Correction
Climate models provide a numerical representation of the earth and the interactions between its different reservoirs: the atmosphere, the ocean and the continental surfaces. Those interactions are very complex and take place at different spatio-temporal scales. This can lead in certain cases to an over/under-estimation of the evolution of atmospheric variables in the models. The 160 cause of this behavior in the models is often the presence of biases. To overcome this bias issue, we use here two univariate bias correction methods: "Quantile Mapping" (QMAP) and "Cumulative Distribution Function-transform" (CDF-t).

-QMAP
Quantile-mapping aims to adjust climate model simulations with respect to reference data, in determining a transfer function to match the statistical distribution of simulated data to the one of reference values (e.g., Dosio and Paruolo, 2011). When 165 reference data have a resolution similar to climate model simulations, this technique can be considered as a bias adjustment method. On the other hand, when the observations are of higher spatial resolution than climate simulations, quantile-mapping attempts to fill the scale shift and is then considered as a downscaling method (Michelangeli et al., 2009). The QMAP method is based on the assumption that the transfer function calibrated over the past period remains valid in the future. Let F o,h and X m,h , in a historical period h. The transfer function for bias correction of X m,p (t) which represents a modeled value at time t within a projected period p is given by the following relation (e.g., Cannon et al., 2015;Dosio and Paruolo, 2011): By applying this relation to the CDF F m,f of the modeled data X m,f in a future period f , it provides an estimation of the local CDF F o,f in the future period f : A quantile mapping can then be performed between F m,f andF o,f to obtain bias corrected values of future simulations.

190
More details about CDF-t can be found in (Vrac et al., 2012). All the computations for the CDF-t method were done with the R package "CDFt".
After applying the bias correction methods on the model outputs, the added value of the bias correction compared to the raw re-forecasts will be assessed by the computation of the Cramer-von Mises (hereafter Cramer) score (Henze and Meintanis, 2005;Michelangeli et al., 2009). The Cramer score measures the similarity between two distribution functions; the closer its 195 value is from 0 the closer are the distributions.
-Application of bias correction QMAP and CDF-t are usually used for downscaling tasks in a climate projection context. In this study, we adapted the application of these methods for bias correction in a seasonal forecast context. We used a leave-one out approach for the calibration process with CDF-t and QMAP. This method consists in removing the target year (the year we want to apply the 200 correction on) in the historical period before the estimation of the transfer function which allows to pass from the global scale to local scale data. In our case the calibration process has been made using 23 of the 24 years in the historical period 1993-2016 for every year. The correction or projection process is made differently using CDF-t and QMAP. For QMAP, we use as input the target year removed previously during the calibration phase. With CDF-t, we built a new dataset of 24 years which is the concatenation of the dataset used for the calibration and the target year, so that the year at the end of the new dataset represents 205 the target year.

Ensemble forecast verification
Ensemble forecast verification is the process of assessing the quality of a forecast. The forecast is compared against a corresponding observation or a reference, the verification can be qualitative or quantitative. Forecast verification is important to monitor forecast quality, improve forecast quality and compare the quality of different forecast systems. There are many met- Let P F (x) and P O (x) be the cumulative distribution functions respectively for the forecasts and observations, the CRPS is computed as follows:

Results
In this section, we present the main results obtained through the methodology described previously. We assessed the climato- the 4-8-day window with high intensity; secondly an intensification of events with strong intensity (spectral power > 16) is observed for periods of about 8-16 days. Finally, events at very high frequencies are observed and the intensity associated is very higher (spectral power > 64) than the previous ones. This shows the SHL activity becomes stronger at high frequencies.
The models tend to reproduce quite differently the pulsations observed in the reanalysis signals; there is an issue regarding the temporality, frequency and intensity of the pulsations in the forecast models.

235
To assess the climatology of the variability modes [Fig2], we analysed the distribution of days associated with spectral power greater than 1 (normalized value) here defined as significant days during the period 1993 to 2016. This threshold of 1 has been selected arbitrarily after applying a sensitivity test on several threshold values from 0.5 to 10 to get a robust selection of events at different periods. Note that the sensitivity to the threshold values does not significantly impact our results (not shown). We observe a similar behaviour in all products in terms of significant days with an increasing number of days with periods up 240 to 10 days followed by a quite steady activity for longer periods. Over the Sahara area, there is a tendency of both models to reproduce a SHL activity similar to ERA5 at too short period (∼ 10 days). ERA5 shows little variations in the number of significant days with periods between 12 − 26 days and tends to be constant for high periods (greater than 27 days). SEAS5 overestimates the SHL activity around the 15-day period while MF7 is shifted toward higher frequency and underestimates the longer period. Over the central SHL box, there is a tendency of both MF7/SEAS5 models to generate a significant SHL activity 245 at too short period (∼ 4/10 days) compared to ERA5. At longer timescales MF7 tends to overestimate the SHL activity within the 10-23 day period while SEAS5 shows an under-estimation of the SHL activity within the same window. The evolution of significant days over central SHL location and Sahara highlighted three main pulsations based on the period (or frequency).
The different pulsations identified are arbitrarily classified as follows: the class C1 = [0 − 10days] for low frequency, the class Those two types of waves have a periodicity between 1-6 days (Janicot et al., 2008a). The correlation between the models and ERA5 is very low, less than 0.4. In general, for most of the classes, MF7 shows a negative correlation . From this 255 analysis, we can see that the models tend to represent the climatological activity of the SHL at different frequencies even if some discrepancies are observed. However, the representation of the inter-annual variability of the SHL activity remains a big challenge for the models.

Monthly Bias and seasonal drifts in the models
In this section, we are assessing the spatial representation of the SHL over the Sahara region. In order to do that, we evaluate 260 the bias between the seasonal models (SEAS5, MF7) and ERA5 [Fig4]. The bias is defined here as the difference between the forecasts and the reanalyses; the mathematical expression of the bias is the following: where F t and R t are respectively the forecasts and reanalyses at time t.
The bias is computed for each month at lead time 0 during the season from January to December for the period 1993-2016.

265
When analyzing the SEAS5 model outputs [Fig4-a)], we notice an overestimation of temperature over the Atlantic Ocean and over the Mediterranean sea. We observe a cold bias between SEAS5 and ERA5 which appears progressively during the first months (January to April) and tends to intensify during the monsoon period over the Sahel region. This cold bias is centered over the Sahara between the North of Mali, Niger and the South of Algeria; and tends to decrease in intensity during the retreat phase of the monsoon in October. SEAS5 shows a colder trend than ERA5 and under-estimates the spatial evolution of the 270 SHL over the Sahara. In fact, some mechanisms such as ocean-surface interactions, continental fluxes can explain the cooling bias over West Africa; the study of these processes is beyond the scope of this paper. The analysis on MF7 shows a progressive appearance of a hot bias over the Sahel during January and February [Fig4-b)]. This hot bias tends to develop from March to September and affects the whole Sahara. It is more intense during the monsoon phase and is located over the eastern part of the Sahara. The observed bias tends to decrease in intensity during the retreat of the monsoon in October. MF7 has a hotter trend 275 than ERA5 and overestimates the spatial evolution of the SHL over the Sahara. The central SHL area is less affected by this warming in MF7 compared to the rest of the Sahara. This behaviour in the Sahara region, especially in the eastern part of the Sahara, could be related to an under-estimation of air advection coming from the Mediterranean regarding the prevalence of the hot bias to the eastern part of the Sahara [Fig1 -b)]. This analysis shows that the models have two contrasted representations of the SHL compared to ERA5 with a colder SHL in SEAS5 and a warmer SHL in MF7. They share however a similar seasonal 280 evolution of the bias (increasing bias during the monsoon season) and a large spatial scale of the bias that covers most of the Sahara. These biases in the representation of the SHL have also been found in climate models (e.g., Dixon et al., 2017).
After the evaluation of the spatial evolution of the SHL in the seasonal forecast models, the representation of the temporal drift is assessed [Fig5]. The method used here consists in computing the climatology of daily temperature ensemble mean and ensemble spread for the two models (SEAS5 and MF7); and the daily climatology of ERA5 from 1993-2016. For the models, 285 we consider only the re-forecasts launched respectively on the 1 st of April, May and June for a period of 6 months (see section 2.3 for more details). We can see that the climatology of ERA5 remains contained in the spread described by SEAS5 for all lead times over central SHL location and Sahara; this spread in SEAS5 seems to be constant in time and does not increase with the lead time. We observe for the first forecast days, a large spread with MF7 which is not present in SEAS5, likely associated with different perturbations and initialization techniques that are beyond the scope of this study. For all lead times, outputs of the models, SEAS5 tends to represent much better than MF7 the distribution the SHL intensities over the Sahara.
SEAS5 seems to underestimate the warming trends present in ERA5 during the 2005 s while an overestimation of this trend is observed with MF7 [FigS7]; this behavior in the models is present both over central SHL box and Sahara. Another specificity of MF7 is its slightly larger ensemble spread compared to SEAS5. By using this type of visualization (Heatmap which is a graphical representation of data where values are depicted by color), it is possible to assess the intensity of the climatological 315 trend with respect to the intra-seasonal variability [Fig7]. The inter-annual variability of the SHL anomalies distribution in the models is too far from ERA5 but some characteristics are captured by the models (e.g. the increase of the frequency of anomalies in ERA5 during the 2000 s ). We observed high frequencies in the SHL anomalies distribution at inter-annual time scale for MF7 and SEAS5 with more intense values over the Sahara [Fig7-b), c), g), h)]. From these two previous analyses, we can deduce that MF7 presents a behavior close to the climatology compared to SEAS5. To focus more on the evolution of 320 the tails of the distribution (i.e. the warmest and coldest temperatures), the anomaly of the pdf of temperature is provided in supplementary materials [FigS4]. An increase of the occurrence of the warmest temperature is observed in SEAS5 and MF7 during the 2010 s . MF7 tends to overestimate the inter-annual variability of the coldest and warmest temperature distribution, while SEAS5 exhibits an overall trend with some features close to ERA5. Despite the fact that seasonal models tend to capture some characteristics of the SHL variability, large differences are observed in comparison with ERA5. These differences can be 325 explained by systematic biases present in models, as well as approximations made during the models implementation (initial and boundary conditions, physical hypotheses, etc...). In order to improve the quality of the forecast, bias correction methods have been applied.

Bias Correction
The above analyses revealed the presence of biases in the models. In this section, the bias correction was applied over the JJAS 330 period for June lead time 2, 1 and 0 (which represent the forecast of the JJAS period initialised respectively in April, May and June). The bias correction techniques used are CDF-t and QMAP (see Section 2.4.3 for more details on their application).
The analysis of ensemble forecast models remains very delicate because of the many possible ways to approach the bias correction, i.e. should we use the unperturbed member, mean ensemble member, median ensemble member or the whole ensemble member? In our case, the bias correction is applied separately on each ensemble member firstly, and then on the 335 mean ensemble member. To evaluate the sensitivity of the Cramer score on the ensemble forecast models, we defined three different approaches as follows: -"CORR_NO_MEAN", in this approach the bias correction is applied on the whole ensemble member and the Cramer score is computed using the outputs of the correction; -"CORR_MEAN", here we compute first the mean over the outputs of the bias correction on the whole ensemble member; 340 and we use this mean to compute the Cramer score; -"MEAN_CORR", the method consists of applying the bias correction on the mean ensemble member and the computation of the Cramer is done directly using the outputs of the correction.
The Cramer score was calculated firstly using ERA5 and the raw forecast samples (SEAS5 and MF7), and secondly between ERA5 and the bias corrected forecast samples [Fig8]. We can observe that raw forecasts are not improved with initialisation A positive value of the dipole indicates a HLW occurrence while a negative value corresponds to the HLE event. We evaluate the method using the LLAT approach and the automatic detection of the SHL barycenter ( (Lavaysse et al., 2009)) used during the DACCIWA campaign (Knippertz et al., 2017), which aims to evaluate the seasonal location of the SHL with respect to its climatological position.