Simulations of Bay of Bengal tropical cyclones in a regional convection-permitting atmosphere–ocean coupled model

Abstract. Tropical cyclones (TCs) in the Bay of Bengal can be extremely destructive when they make landfall in India and Bangladesh. Accurate prediction of their track and intensity is essential for disaster management. This study evaluates simulations of Bay of Bengal TCs using a regional convection-permitting atmosphere-ocean coupled model. The Met Office Unified Model atmosphere-only configuration (4.4 km horizontal grid spacing) is compared with a configuration coupled to a three-dimensional dynamical ocean model (2.2 km horizontal grid spacing). Simulations of six TCs from 2016–2019 show that both configurations produce accurate TC tracks for lead times of up to 6 days before landfall. Both configurations underestimate high wind speeds and high rain rates, and overestimate low wind speeds and low rain rates. The ocean-coupled configuration improves landfall timing predictions and reduces wind speed biases relative to observations outside the eyewall but underestimates maximum wind speeds in the eyewall for the most intense TCs. The coupled configuration produces weaker TCs than the atmosphere-only configuration, consistent with lower sea surface temperatures in the coupled model and an overestimated cooling response in TC wakes. Both model configurations accurately predict rain rate asymmetry, suggesting a good representation of TC dynamics. Much of the rain rate asymmetry variation in the simulations is related to wind shear variations, with a preference for higher rain rates in the down-shear left quadrant.


Abstract. Tropical cyclones (TCs) in the Bay of Bengal can be extremely destructive when they make landfall in India and Bangladesh. Accurate prediction of their track and intensity is essential for disaster management.
This study evaluates simulations of Bay of Bengal TCs using a regional convection-permitting atmosphereocean coupled model. The Met Office Unified Model atmosphere-only configuration (4.4 km horizontal grid spacing) is compared with a configuration coupled to a three-dimensional dynamical ocean model (2.2 km 20 horizontal grid spacing). Simulations of six TCs from 2016-2019 show that both configurations produce accurate TC tracks for lead times of up to 6 days before landfall. Both configurations underestimate high wind speeds and high rain rates, and overestimate low wind speeds and low rain rates. The ocean-coupled configuration improves landfall timing predictions and reduces wind speed biases relative to observations outside the eyewall but underestimates maximum wind speeds in the eyewall for the most intense TCs. The 25 coupled configuration produces weaker TCs than the atmosphere-only configuration, consistent with lower sea surface temperatures in the coupled model and an overestimated cooling response in TC wakes. Both model configurations accurately predict rain rate asymmetry, suggesting a good representation of TC dynamics. Much of the rain rate asymmetry variation in the simulations is related to wind shear variations, with a preference for higher rain rates in the down-shear left quadrant. 30

Introduction
Tropical cyclones (TCs) are one of the most destructive weather phenomena. Accurate prediction of their track and intensity is essential for disaster management. The Bay of Bengal region of the northern Indian Ocean has a 35 mean annual frequency of five TCs, in addition to less intense monsoon depressions (Rao et al., 2001). Their associated hazards, such as strong winds, heavy rainfall and storm surges, inflict substantial losses on India and Bangladesh's large coastal populations (Ali, 1999;Balaguru et al., 2014;Bandyopadhyay et al., 2018;Dasgupta et al., 2014;Mishra, 2014). During the past two centuries, 42% of global TC-related deaths were in Bangladesh and 27% in India. Cyclone Bhola in the Bay of Bengal in 1970 was the deadliest in history, with an estimated 40 300,000 deaths (Shultz et al., 2005). Reducing TC impacts requires accurate predictions of the storm track, intensity and size, several days before landfall.
Forecasts have improved at a rate of about one day per decade, such that modern 5-day TC track forecasts in global numerical weather prediction (NWP) models are as accurate as 1-day simulations were 40 years ago 45 (Alley et al., 2019). Heming (2016) found that the global Met Office Unified Model (MetUM) had 3-day track errors comparable to 1-day errors in 1996. However, significant errors remain: TC position errors in the northern Indian Ocean are ~ 300 km at a lead time of 5 days in the global MetUM (Yamaguchi et al., 2017).
Predictions of storm strength have improved much more slowly than track simulations (DeMaria et al., 2013).
Efforts to increase TC prediction skill include improving observations to provide accurate initial and boundary 50 conditions, resolving the complex physical processes driving TCs, ensemble forecasting (e.g. Titley et al., 2020), and increasing model resolution (e.g. Gentry and Lackmann, 2009). Most operational global NWP models have a horizontal grid length of about 10 km for the high-resolution deterministic runs and 15-25 km for ensemble systems, insufficient to resolve small scale processes influencing storm development Short & Petch, 2017). A more accurate representation of intense TCs would require 55 horizontal resolution of a few km (Chen et al., 2007;Fierro et al., 2009;Gopalakrishnan et al., 2012), which allows explicitly represented, rather than parameterised, convection (Chan, 2005).
High TC surface wind stresses increase heat and moisture transfer to the atmosphere and cause vertical mixing 60 of the upper ocean and upwelling by Ekman pumping (Mogensen et al., 2017;Shay et al., 2000). The ocean surface cools, reducing surface fluxes. Therefore, SST cooling in TC wakes reduces potential intensity, which is a theoretical limit on cyclone strength based on modelling the storm as a thermal heat engine (Bender et al., 1993;Bender & Ginis, 2000;Dutta et al., 2020;Schade & Emanuel, 1999). The impact of air-sea interactions is greater for high energy-transfer (i.e. intense or slow-moving) TCs, which cause more mixing-induced cooling 65 (Neetu et al., 2019;Pothapakula et al., 2017;Vincent et al., 2012).
However, the simulations in this study are significantly longer than would be run operationally, and so a fixed SST assumption is not valid. Therefore, SST is re-initialised each day in ATM. To provide a fair benchmark against which to compare coupled simulations, the SST is updated daily with the OSTIA product representative 115 of the previous day's SST, as OSTIA data for each day is available for operational use the morning of the following day.
The second configuration (ocean coupled simulations, hereafter referred to as CPL) consists of an identical version of the MetUM and JULES components at 4.4 km resolution, coupled to the Nucleus for European 120 Modelling of the Oceans (NEMO) dynamical ocean model (Madec Gurvan et al., 2019), through the OASIS3-MCT coupling libraries (Ocean Atmosphere Sea Ice Soil, version 2.0; Valcke et al., 2015). The NEMO model has a 2.2 km horizontal resolution and 75 vertical levels. Surface fluxes are exchanged as hourly mean values every hour throughout the forecast. The regional ocean component in CPL is free-running and is initialised from a free-running ocean-only spin up run; there is no local data assimilation within the model domain in either the 125 forecast or the spin-up run.
In both configurations, atmospheric lateral boundary conditions are obtained from the global MetUM, which itself is re-initialised every day from the MetUM global analysis. Therefore, the simulations in this study are not forecasts, and this process could not be applied operationally. This setup is expected to result in lower TC track 130 errors than if a free-running forecast was used to provide atmospheric lateral boundary conditions, as forecast motion errors are primarily driven by errors in the large-scale environmental wind field (Galarneau and Davis, 2012). The model runs are configured in this way to support the evaluation of model TC intensity and structure compared to observations, without the complication of track locations and timings that are far removed from reality. Details of the model components and boundary conditions are given in Table 1. For further information 135 on the IND1 modelling framework, see Castillo et al. (in prep). Hourly (2D fields), 3-hourly (3D fields).

TC case studies 140
Six TCs in the Bay of Bengal from 2016-2019, with various track and intensity characteristics (Figure 1), were selected for analysis. Four storms (Titli, Vardah, Gaja and Phethai) occurred in the post-monsoon TC season (late September to December), while two (Fani and Roanu) occurred during the pre-monsoon season (April-June). Maximum intensities range from a Category 5 TC (Fani) to less intense tropical storms (Roanu and 145 Phethai); see Table 2 for details. Simulations were performed with ATM and CPL for simulation periods of between 4 and 13 days to cover the relevant period of TC evolution in each case. For each storm, deterministic simulations were initialised at 00:00 UTC on five consecutive days during storm evolution (four for TC Titli) to create time-lagged forecast ensembles, with each ensemble member having different initial conditions due to the initialisation time difference (Table 2). The end time is the same for all lagged ensemble members for each case 150 study. TC track errors and errors in minimum MSLP and maximum wind speed were analysed for all simulations; a summary of errors for all simulations is given in Supplementary Table S1. However, most of the analysis in this study focuses on process evaluation. For this purpose, there is limited value in including data where simulated TC tracks are so far from observed tracks that comparison with observations is not relevant. Therefore, only results where tracks were within ~ 500 km of observed tracks throughout the simulation and 155 made landfall within +/-1 day of observed landfall were included when making quantitative comparisons to observations for the purpose of process evaluation. Simulations with large track errors or poor landfall timing, which are excluded from this process evaluation, are marked with an asterisk in Table 2. If a simulation from either ATM or CPL meets these exclusion criteria, the simulation initialised on the same date from the other configuration was also excluded. Inaccurate simulations are excluded from the analysis presented in  Orange circles indicate RAMA buoys used for SST validation.  Knapp et al., 2010), which is a merged archive of TC tracks and intensities as observed by the World Meteorological Organization (WMO) Regional Specialized Meteorological Centres (RSMCs) and other TC agencies. Agencies apply different observational algorithms, leading to discrepancies in track and intensity. For the TCs in this study, data from two agencies are available: RSMC New Delhi, part of the India Meteorological Department, and the USA's National Oceanic and Atmospheric 180 Administration. Track positions from the two agencies generally agree well, so we compare the simulations to the average of the two. We compare forecast storm intensity to both available intensity estimates.
In addition to IBTrACS maximum wind speed, wind speed observations were obtained from the global SYNOP archive of surface synoptic land and ship-based observations. Precipitation analyses were taken from the multi-185 satellite GPM-IMERG gridded product (Huffman et al., 2020).  Figure 1. SST measurements were also obtained from Argo floats (Argo,190 https://doi.org/10.5194/wcd-2021-46 Preprint. Discussion started: 26 July 2021 c Author(s) 2021. CC BY 4.0 License. 2020). Argo data consist of vertical profiles from free-drifting floats. Temperature is almost constant in the upper 10 m of each profile used here (not shown), and therefore we assumed that the uppermost measurement of each profile represented the foundation SST (Donlon et al., 2012) if it was within 10 m of the sea surface.
ERA5 reanalysis, the fifth generation of ECMWF atmospheric global climate reanalyses (Hersbach et al., 2020), 195 was used in addition to in-situ observations. Since 2007, ERA5 SSTs are derived from OSTIA. ERA5 wind speeds on pressure levels were used to compute vertical wind shear along TC tracks.

TC tracking and analysis
TCs were identified and tracked hourly in the regional simulations using an objective feature-tracking methodology based on Nguyen et al. (2014). The method searches for minimum MSLP within 2.5° of the 205 cyclone centre at the last output time and calculates the new TC centre based on a pressure centroid centred on this minimum and with a radius which is 80% of the radius of maximum winds at 10 m (J. Ashcroft, personal communication, 2020). If the newly calculated centre is more than 2.5° from the previous centre, the track stops, ensuring the storm is not tracked when the centre is unclear due to the storm breaking up. To verify this approach, TC Fani was also tracked using the objective tracking algorithm TRACK (Hodges, 1995), which 210 tracks the maximum vertical average of vorticity between 850 and 600 hPa. The two methods produced similar results (not shown).
The majority of our analysis uses simulation data relative to the diagnosed cyclone track, where a track could be identified in model output. For simulations where the cyclone had formed at the start of the run, data prior to 215 T+12 h was discarded to remove sensitivity to model spin-up.
To compare CPL and ATM track errors and perform statistical significance testing, matching times in IBTrACS and the simulations were found such that the number of data points and times used from each simulation ensemble member was the same in ATM and CPL. This allowed paired t-tests to be performed on the absolute 220 errors and p-values obtained for the null hypothesis (no difference), the ATM less than hypothesis (significantly lower error in ATM than in CPL) and the CPL less than hypothesis (significantly lower error in CPL than in ATM). This analysis used as much of the tracks as possible. Mean absolute errors and RMSE over this time are quoted below and summarised in Supplementary Table S1, although Figures 2-4 in this manuscript show data for the whole track of each simulation. 225 TC intensity (maximum wind speed and minimum MSLP), at 3-hourly intervals, was verified against IBTrACS.
Maximum wind speed in IBTrACS is a spatial maximum of the 10-minute sustained wind speed. The model 10 m wind field is not directly comparable to observed measurements of maximum sustained wind, but this is a standard comparison in model evaluation which, when applied consistently, indicates long-term trends in model 230 accuracy (Heming, 2017). Model performance for simulating TC hazards was assessed by comparing simulated wind speed and rainfall to SYNOP and IMERG observations, respectively. Finally, the representation of TC dynamics was examined by comparing TC structure to ERA5/IMERG, using vertical wind shear (200-850 hPa), inner (< 250 km) and outer (250-500 km) rain rates and the asymmetry of inner rain rates as diagnostics.

Tropical cyclone track errors
Track errors are low, even at long lead times, and are similar in the two model configurations (Figures 2 and 3), with larger track differences between different initialisation dates for a given storm than between model 240 configurations. Track errors are higher for those storms moving east to west, although this is dominated by the large errors in the early initialisations of Vardah. Most simulations have track errors < 400 km, with a maximum of ~ 1100 km (calculated relative to IBTrACS storm location at the same validity time). Tracks in CPL generally fall to the left (west for northward-moving TCs; south for westward-moving TCs) of tracks in ATM.
This relative shift is expected as CPL produces SST cold wakes, which tend to be more pronounced to the right 245 of the TC track, as, in the Northern Hemisphere, this is often the location of maximum wind speeds, and hence wind-driven mixing, due to the alignment of TC translation and primary circulation. Cooler SSTs to the right of the TC track drive the TC towards the relatively warmer SSTs to its left. The leftward displacement of TC tracks in CPL agrees with other studies of coupled models (Khain & Ginis, 1991;Wu et al., 2005), although some authors have also found rightward displacement (Bender et al., 1993) or both leftward and rightward 250 displacement (Feng et al., 2019).

Tropical cyclone intensity
TC intensity was assessed using near-surface wind speeds, a critical hazard associated with landfalling TCs, and MSLP. In general, CPL predicts less intense storms (lower maximum wind speed and higher minimum MSLP) than ATM, which is expected as CPL simulates air-sea feedbacks which can lower SST and moderate intensity; 295 the impact of coupling on SST is examined in Section 3.3. Delhi data. For cyclone Titli, peak wind speeds in ATM are lower than IBTrACS USA data but a good match to New Delhi data. Peak wind speeds in CPL are lower than those from either observing agency. The absolute wind errors over the whole Titli track compared to USA data are statistically significantly lower in ATM than CPL. For Vardah and Gaja, simulated and observed maximum winds agree well, but there are significant differences in the timing of peak intensities between the observations and simulations (Figure 4e-h). For these 305 westward-moving storms the absolute error in maximum wind speeds over the whole track was statistically significantly lower in CPL than ATM for both agencies.  shown in Figure 4 for wind speed. However, the wind-MSLP bias in CPL is overall lower than in ATM ( Figure   5).   Table 2) are excluded.

Simulated SSTs and association with intensity biases
The passage of TCs across the Bay of Bengal leads to ocean cooling (Figure 8). Compared to observations from 375 three fixed RAMA buoys (see Figure 1 for locations), CPL predicts the timing of SST minima well but typically produces a cold bias (Figure 8), with minimum SSTs too cold by up to ~ 1°C. TC Fani is an exception, with SST cooling underestimated at the 8°N buoy and overestimated at the 15°N buoy. The SST discrepancies are partly due to storm position errors, as the simulated cold wake is centred along the simulated track, and there is a sharp gradient in SST moving away from the track. Tracks for Fani in simulations are further from the 8°N 380 RAMA buoy, compared to observed tracks, when the storm is at 8°N but closer to the 15°N buoy when the storm is at 15°N. The SST diurnal cycle phase and amplitude are reasonably well represented in CPL for most storms, but for Vardah and Gaja the amplitude is overestimated.
In comparison, ATM has a lag in the timing of SST cooling, which is associated with the use of the OSTIA 385 product that would have been available for initialisation on a given day had these simulations been run in near real-time, which is the OSTIA product that represents the previous day's SST (see Section 2.1). ERA5, however, uses OSTIA SST applied to the relevant day (Hirahara et al., 2016). Therefore, the differences between ERA5 SST and ATM SST are due to the time difference in the OSTIA data used and the coarser resolution of ERA5. The magnitude of the SST cooling is generally underestimated in the ATM simulations 390 relative to RAMA observations. Therefore, SST in ATM is generally too high. Using a daily analysis product as a surface boundary condition for fast-moving phenomena such as TCs will always induce a mismatch between observed and forcing SSTs. As TC-induced cooling can be ~ 2°C, ATM SST errors can also be as high as 2°C ( Figure 8).

395
ATM and CPL use different ocean initial analysis and boundary conditions (OSTIA SST in ATM and a freerunning ocean-only IND1 configuration in CPL; see Table 1), and ATM uses daily-updated SST observations. Therefore, we cannot determine whether SSTs in CPL are cooler than in ATM due to air-sea coupling or https://doi.org/10.5194/wcd-2021-46 Preprint. Discussion started: 26 July 2021 c Author(s) 2021. CC BY 4.0 License. experiment design. However, by analysing data from free-drifting Argo floats, which give good spatial coverage of the model domain, we can investigate TC-induced cooling by examining SST biases in the TC wakes. Figure  400 9 shows SST in ATM and CPL relative to the Argo float observations, with floats in the wake of a cyclone (defined as < 250 km from the observed storm track after the storm has passed) plotted using filled symbols. For Vardah, Gaja and Phethai, the mean bias in the wake is greater than the mean bias outside the wake. For Fani, 410 the mean SST bias in the wake is equal to that in the wider model domain, although the maximum SST biases occur in the TC wake. For Roanu, the mean SST bias in the wake is less than in the wider domain. SST bias is insensitive to forecast lead time (not shown) which is expected as the regional ocean initial condition in CPL is taken from subsequent days of the same long run. Compared to the wider model domain, the larger SST biases in TC wakes in CPL suggest an overestimation of TC-induced SST cooling in addition to cold SST biases 415 throughout the domain. This bias is consistent with errors in ocean initialisation (too-cold upper-ocean temperatures) and a negative bias in net surface heat input into the ocean (Valdivieso et al., 2021). For most cyclones in this study, CPL underestimates peak intensity; it is expected that if the cold bias in SST were reduced, wind and MSLP could be better reproduced.  For exact simulation initialisation dates, see Table 2. Simulations with inaccurate tracks (marked with an asterisk in Table 2)  TC inner and outer rainfall, and inner rainfall asymmetry, are evaluated against satellite-derived rainfall retrievals (Figures 10-11). The radii for inner rainfall (250 km) and outer rainfall (250-500 km) are based on the radii used in Hence & Houze (2012), who found a distinction in the convective nature of rainfall inside and outside a radius of ~ 200 km. They found inner rainfall to be convective and stratiform, but outer rainfall was 440 much more sparse and largely convective. The inner radius used here is larger than that used in many inner rainfall asymmetry studies (more typically ≤ 100 km) because this study looks at both pre-and post-landfall, and after landfall, the maximum rain rate radius is often > 100 km. Rainfall is averaged over the two hours centred on each 3-hourly tracking output time.  Table 2) are excluded. 450 Inner rain rates are much higher than outer rain rates in both simulations and observations (Figure 10, note different axis scales). Lower rain rates tend to be overestimated, and higher rain rates underestimated, in both CPL and ATM. The more intense storms in terms of maximum wind speeds tend to have higher simulated inner rain rates (up to ~ 9 mm/hr for Fani) than the least intense storms (up to ~ 4 mm/hr for Phethai), but this is not 455 https://doi.org/10.5194/wcd-2021-46 Preprint. Discussion started: 26 July 2021 c Author(s) 2021. CC BY 4.0 License. the case in observations (up to ~ 10 mm/hr for Fani and ~ 9 mm/hr for Phethai). On average, rain rates are slightly lower in CPL than ATM (black dashed lines compared to black solid lines), possibly also due to the lower storm intensity (maximum wind speeds) in CPL. It was found that inner rain rates in IMERG tend to drop off once landfall has been made, so simulated inner rain rates are often lower than IMERG over the ocean but higher than IMERG over land. Rainfall biases over land may be related to high orography (near the Himalaya 460 for Fani, Titli and Roanu and smaller areas of high orography in Tamil Nadu for Vardah and Gaja). The absolute error in inner rain rate over the whole time of the tracks (excluding simulations with inaccurate tracks, marked with an asterisk in Table 2) is significantly lower in CPL (1.6 mm/hr) than ATM (1.8 mm/hr) and this is largely due to the lower errors over land when rain rates are lower (see Supplementary Table S2).

465
TCs may display non-axisymmetric behaviour due to storm motion, environmental vertical wind shear, dry-air intrusions, interaction with mid-level and upper-level synoptic systems, and non-uniform surface characteristics resulting in asymmetric heating (Alvey III et al., 2015;Chan & Liang, 2003;Chen et al., 2006;Corbosiero & Molinari, 2003;DeHart et al., 2014;Frank & Ritchie, 1999;Shu et al., 2018;Thakur et al., 2018;Wang & Wu, 2004;Xu et al., 2014). These processes cause asymmetric distributions of rainfall and near-surface winds, both 470 critical hazards to forecast. Asymmetric structures also impact TC intensity, with more symmetric structures tending to maintain intensity and asymmetric structures weakening (Wang & Wu, 2004). Our study concentrates on the asymmetry of rainfall due to the hazards associated with heavy rainfall. In this study, normalised north minus south and east minus west rain rate asymmetry were calculated for the inner rain rate (within 250 km of centre) for all TC cases. 475 Although high-frequency variability in rain rate asymmetry is not well captured, the simulations accurately predict daily changes over TC lifetimes (not shown). Inner rain rate asymmetry in the simulations generally matches that in observations, with Pearson correlation coefficients between 0.62 and 0.72 (see Supplementary Figure S10). The mean absolute errors for the N-S asymmetry are significantly lower in ATM (0.58) than CPL 480 (0.60) due to differences over land. The E-W asymmetry has similar mean absolute errors but they are not significantly different in ATM and CPL. Errors can be due to timing errors in the simulations, such that one would not expect the environmental conditions to be the same. Studies show that vertical wind shear is a dominant control on rainfall asymmetry when shear is > 5 ms -1 and that rainfall tends to occur in the down-shear left quadrant (e.g. Alvey III et al., 2015;Chen et al., 2006;DeHart et al., 2014;Xu et al., 2014). Figure 11  485 shows the inner rain rate asymmetry vs the (200 hPa -850 hPa) vertical wind shear calculated as a mean over the annulus 500-750 km from the storm centre. There is more rain north of the TC centre with larger northward wind shear in observations and simulations, and likewise more rain east of the TC centre with larger eastward shear, suggesting a preference for rain down-shear.

490
In the cases of TCs Fani and Roanu (spring, pre-monsoon), the vertical wind shear over the ocean is westwards to south-westwards (rear-left of storm motion) but becomes roughly north-eastwards (aligned with storm motion) where the storm makes landfall. In the other TC cases (winter, post-monsoon), the vertical wind shear is northwards to north-westwards (quite well aligned with storm motion) and rarely goes southwards. Therefore, the spring TCs (Fani and Roanu) occupy a different plot region than winter cyclones, and correlation 495 https://doi.org/10.5194/wcd-2021-46 Preprint. Discussion started: 26 July 2021 c Author(s) 2021. CC BY 4.0 License. coefficients have been calculated separately. The high positive Pearson correlation coefficients show that inner rain rate asymmetry is primarily controlled by vertical wind shear in simulations and observations, with spring having a higher N-S correlation than winter because of the greater spread of meridional shear values. When simulated rain rate asymmetry did not match observed asymmetry well, this was often due to the different vertical wind shear at that time. 500 Figure 11. Inner rain rate asymmetry vs vertical wind shear for north minus south (top) and east minus west (bottom) and for observations (left), ATM (middle) and CPL (right). The Pearson correlation coefficients are given at the top of each panel for the storms in spring (Fani and Roanu) and in winter (all others). Simulations 505 with inaccurate tracks (marked with an asterisk in Table 2) are excluded.
For pre-monsoon cyclones (Fani and Roanu), the shear over the ocean shifts from south-westwards (rear-left of the track) to north-eastwards (more aligned with the track) in observations and simulations around landfall. The shift occurs at landfall for Fani (still left of the track) and a day before landfall for Roanu (slightly right of the 510 track). Both simulations and IMERG show a preference for rain to the rear-left of the centre before the shift and front-left afterwards (Figure 12a-c and 12m-o). The cross-track asymmetry after the shift in shear is more variable for Roanu. In CPL simulations of TC Fani, the rain shifts to the east (up-shear) immediately before landfall, although the shear in CPL is more to the south, suggesting shear is not the primary control on rainfall at this time. Fani is the only TC case where ATM and CPL show distinct differences in all initialisations. 515 For the post-monsoon cyclones during the day before landfall, shear is generally well aligned with or slightly to the right of the track, resulting in more rainfall to the front and often to the left in both simulations and observations. After landfall, the shear shifts more to the right. For the north-eastward moving storms (Titli and Phethai), the shear becomes stronger, resulting in more rain to the front and right in observations and 520 simulations, although cross-track rain asymmetry is more variable in simulations than in observations (

525
When the shear is well aligned with the storm motion, the rain preference is clearly to the front, but slight differences in the shear direction, in the cyclonic wind speeds and the storm's life stage, can determine whether rain preference is to the right or left. When shear is not well aligned with the storm motion, the simulations more accurately match the observed cross-track asymmetry. In conclusion, ATM and CPL represent cyclone structure reasonably well, and there is little difference between model configurations.  Of all the IND1 model simulations in this study, which are initialised up to 6 days before landfall, landfall position errors range from 15 to 409 km (mean of 130 km), if we exclude storm Roanu, for which landfall position errors are primarily due to the track of the storm closely following the coastline. Track errors over the whole TC lifetime are smaller for TCs moving from south to north than for east to west propagating systems. 545 Both models are more skilful when initialised at higher-intensity stages of the TC lifecycle, in agreement with previous studies (e.g. Routray et al., 2017). Cyclone tracks are better in ATM than CPL for all but the strongest two TCs and better in CPL than ATM for the strongest TC. However, differences are small, which is expected as forecast motion errors are primarily driven by errors in the environmental wind field (Galarneau & Davis, 2012), which is constrained by identical lateral boundary conditions in both sets of simulations. Location and 550 timing of landfall are more accurate for more intense storms and those moving from south to north. Excluding TC Roanu, the mean absolute landfall timing error is 13 hours (ATM) or 12 hours (CPL). For TC Roanu the errors are 83 hours (ATM) or 78 hours (CPL). For similar lead times, landfall position errors are comparable to reported errors for Vardah and Roanu forecasted using atmosphere-only configurations of WRF and the Hurricane Weather Research and Forecasting (HWRF) model (Nadimpalli et al., 2020). Position errors also 555 compare favourably to global model simulations of TC Fani (Singh et al., 2021).
Both ATM and CPL underestimate the intensity of stronger TCs and overestimate the intensity of weaker TCs. Nadimpalli et al. (2020) have highlighted a similar bias in the WRF model for Bay of Bengal cyclones. In general, CPL predicts less intense storms than ATM, consistent with the lower SSTs and overestimation of SST 560 cooling in cyclone wakes in CPL. Compared to IBTrACS and SYNOP observations, both models generally underestimate high wind speeds and overestimate low wind speeds. The CPL configuration gives improved pressure-wind relationships than ATM ( Figure 5). However, in both configurations, there is a bias in the maximum wind speed-minimum MSLP relationship shown here (too strong MSLP/too weak wind speeds), which is also a known feature of the global atmosphere-only MetUM with parameterised convection (Heming, 565 2018). Theory, models and reanalysis suggest that TC potential intensity increases with SST at a rate of approximately 7.6 m s -1 °C -1 (Ramsay & Sobel, 2011;Vecchi & Soden, 2007), and the predictability of TC intensity is sensitive 585 to SST (Keshavamurthy & Kieu, 2021). TC potential intensity is linked to both regional mean SST and local SST changes, such as those associated with cold wakes caused by atmosphere-ocean feedback (Vecchi and Soden, 2007;Ramsay and Sobel, 2011;Lin et al., 2013). Cold wakes act to reduce available heat and moisture transfer from the ocean to the atmosphere under the TC centre, moderating TC intensity (Lin et al., 2013). It is consistent that the storms in CPL are weaker because CPL explicitly simulates cold ocean wakes, which in 590 ATM are poorly timed and weaker than observed wakes. SSTs are also cold-biased throughout the model domain in CPL; however, the biases are relatively low (< 1°C) given that the ocean component is free running and only forced with assimilative global model boundaries.
In CPL, SST cold biases are stronger in cyclone wakes than the mean cold bias in the wider region. SST cooling 595 could be overestimated if the modelled TCs were too intense or too slow. However, this does not appear to be the case here, as SST cooling is overestimated even for TCs with accurate landfall times and where intensity is underestimated in the simulations (e.g., TC Titli). Therefore, CPL overestimates the cooling response for a given TC intensity. SST cooling caused by wind-driven mixing is strongly modulated by pre-storm oceanic conditions, including mixed layer depth (Dutta et al., 2020;Vincent et al., 2012;Yesubabu et al., 2020). 600 Analysis of a month-long period in summer 2016 has shown that coupled IND1 simulations produce a shallower mean ocean mixed layer depth (depth < ~15 m) across the Bay of Bengal in IND1 compared to FOAM (Forecast Ocean Assimilation Model) analyses and the three RAMA moorings in the Bay of Bengal. This bias is primarily due to errors in ocean initialisation (too-cold upper-ocean temperatures) and, to a lesser degree, a negative bias in net surface heat input into the ocean (Valdivieso et al., 2021). Enhanced wind-driven mixing, 605 and the resulting entrainment of too-cold water from the thermocline associated with the TC, would lead to cold-biased SSTs in the coupled IND1 model. Sensitivity experiments where SST is artificially increased in CPL to correct the SST bias would indicate the extent to which TC intensity biases could be corrected by improving the ocean initial conditions. Coupling the atmosphere and ocean models to a wave model would also influence ocean vertical mixing, although this effect is expected to be small for the Bay of Bengal region. 610 In the post-monsoon TC season (late September to December), the Bay of Bengal's hydrography is not conducive to SST cooling (e.g. Chaudhuri et al., 2019). It is characterised by a shallow surface layer of freshwater from river runoff and monsoon rainfall, which caps a deep warm layer. This warm water at depth, together with salinity stratification, which reduces the depth of vertical mixing, can mean there is a near-absence 615 https://doi.org/10.5194/wcd-2021-46 Preprint. Discussion started: 26 July 2021 c Author(s) 2021. CC BY 4.0 License. of SST cooling in the wake of even the strongest TCs post-monsoon (Chaudhuri et al., 2019;Neetu et al., 2012;Sengupta et al., 2008;Subrahmanyam et al., 2005). General circulation models have difficulty simulating the Bay of Bengal's salinity stratification (Chowdary et al., 2016). Freshwater inputs to the Bay of Bengal are not represented in the IND1 model, meaning there are salinity biases in the TC runs in this study; this is a crucial future area for model development. In contrast, in the pre-monsoon period, the warmest water is at the surface, 620 and mixing cools SST (Krishna et al., 1993;Kumar et al., 2019), meaning TC-induced cooling is three times larger in the pre-monsoon season (Neetu et al., 2012). These differences in pre-and post-monsoon ocean conditions could explain why CPL overestimates cooling in the storm wake for the four post-monsoon storms in this study (Titli, Vardah, Gaja and Phethai) but has smaller negative biases in the wake compared to the wider region for the two pre-monsoon storms (Fani and Roanu). The SST cooling simulated in the CPL simulations 625 after the passage of tropical storm Roanu (up to ~ 2°C) is a good match to data from a custom-made Lagrangian float and Argo floats from Kumar et al. (2019), which show that almost the entire Bay of Bengal north of 12°N cooled by 1-2 °C during the passage of the storm, with ~ 50% of this cooling attributed to wind-driven mixing.
While SST cooling in the wake of the post-monsoon storms is too strong in CPL, it is not sufficiently strong in 630 ATM, indicated by larger warm biases in the cyclone wakes than the rest of the domain. This could be partly due to the lag between the time of observations contributing to OSTIA data and the validity time at which it is applied in the simulations. A time lag of one day is significant given the short timescales of SST change associated with TC intensification and cold wakes (hours to days; Price, 1981). The warm-biased TC wakes in ATM could indicate that the cooling feature in OSTIA occurs after the end of the simulations. Also, 635 comparisons of OSTIA observations to moored buoys have shown that cyclone cold wakes can be poorly captured in OSTIA  due to the difficulty in obtaining accurate satellite SST measurements through thick clouds such as those associated with cyclones (Sunder et al., 2020).
TC simulations are sensitive to initial SST conditions, as well as coupling (Feng et al., 2019). This means that 640 the negative intensity biases in CPL could be due to cold-biased initial SSTs, in addition to an overestimation of cooling in cyclone wakes. In fact, coupling alone cannot explain the intensity biases in this study, as we find no relationship between SST biases in the storm wake and intensity biases (not shown). However, it is challenging to observe storm wake bias directly under the eyewall, where cooling would most impact intensity. To determine the relative impacts of coupling and initial SST on cyclone intensity requires SST sensitivity 645 experiments where ATM and CPL are run with the same ocean initial conditions, and ATM SSTs are updated with the output of standalone ocean runs so that SSTs in the ATM and CPL runs are equivalent apart from airsea feedbacks.
High rain rates are underestimated in both configurations relative to IMERG, while low rain rates are 650 overestimated. Compared to IMERG observations, both models produce good predictions for the magnitude of rainfall associated with landfalling TCs. Inner rain rate is slightly underestimated over the ocean but overestimated after landfall. It is important to note here that differences between IMERG and other satellite datasets have been found to vary with the surface type (land or ocean; Liu, 2016). Moreover, comparisons to rain gauges have shown that IMERG is less accurate at higher altitudes , so caution should be 655 https://doi.org/10.5194/wcd-2021-46 Preprint. Discussion started: 26 July 2021 c Author(s) 2021. CC BY 4.0 License. exercised when interpreting biases relative to IMERG after landfall when the storms all move over higheraltitude regions. In general, CPL has lower rain rates than ATM and lower accumulated rainfall for 48 hours around landfall. Rain rates are much higher for more intense storms in the simulations, but not in IMERG, which is consistent with studies showing that maximum rainfall in IMERG is not related to TC intensity in IBTrACS for TCs in the Bay of Bengal (Thakur et al., 2018) and China (Yu et al., 2017). In addition to 660 intensity, rainfall bias in the simulations may be associated with large-scale SST biases, which affect moisture availability and vertical instability; therefore, the lower rain rates in CPL could be due to the cold-biased SSTs in the CPL configuration, which originate from the ocean initialisation. This relationship could be investigated using the same SST sensitivity experiments described above, where SST in the ATM model is artificially increased or decreased. Both models produce reasonable predictions of the location of rainfall associated with 665 landfalling TCs. Maximum rain rates are generally observed to the left of the cyclone track before landfall, typical for Bay of Bengal TCs (Ankur et al., 2020) and monsoon depressions (Hunt et al., 2016); this is well captured in the models. The simulations predict more spiral rainbands than IMERG.
Both ATM and CPL produce accurate predictions for rain rate asymmetry, with little difference between 670 models, suggesting a good representation of TC dynamics in the model configurations. Much of the variation in rain rate asymmetry, in both the simulations and observations, can be explained by variations in wind shear with higher rain rates in the down-shear left quadrant (Alvey III et al., 2015;Chen et al., 2006;DeHart et al., 2014;Xu et al., 2014;Yu et al., 2017). Wind shear cannot explain all the rainfall asymmetry variation in the simulations or all the discrepancies between the simulations and IMERG. 675 In operational forecasts, simulations of TC track, landfall time, landfall location and intensity are essential in planning mitigation efforts. In this study, the IND1 regional model is driven at the boundaries by the global MetUM, which is re-initialised every day from the MetUM global analysis, a procedure that cannot be applied operationally. Using a truly free-running atmospheric forecast to provide lateral boundary conditions is likely to 680 degrade forecast performance relative to that shown in this study, and would be of value in future assessments of the IND1 system.. In addition, most operational atmosphere-only TC simulations use persisted SST throughout the model run (e.g. Routray et al., 2017) in contrast to the daily updating SST data used in this study.
Previous studies suggest that the biases in ATM might deteriorate if SST were instead persisted throughout model runs: Rai et al. (2019) conducted simulations of TC Phailin (October 2013, Bay of Bengal) with persisted 685 and daily updating SST. Using the daily updating SST improved storm track and intensity estimates by 37% and 41%, respectively. Mohanty et al. (2019) determined that the impact of updating simulations with realistic SSTs during TC lifetimes improved landfall position and timing predictions by 20% and 33%, respectively. With a lack of data assimilation in the oceanic component, the CPL configuration performs well relative to ATM, given the observed SST used in ATM. 690 This study has found that the convection-permitting regional IND1 model, in both an atmosphere-only and a coupled configuration, can accurately simulate the track, intensity, and structure of tropical cyclones in the Bay of Bengal. The model generally underestimates high wind speeds and high rain rates and overestimates low wind speeds and low rain rates. The track errors are generally slightly lower in ATM, but minimum MSLP 695 https://doi.org/10.5194/wcd-2021-46 Preprint. Discussion started: 26 July 2021 c Author(s) 2021. CC BY 4.0 License. absolute errors are generally slightly lower in CPL, and for several storms, CPL has slightly lower maximum wind absolute errors. The similarity between ATM and CPL indicates that many of the deficiencies in the simulations originate in the atmospheric model. Future areas for model improvement include reducing the intensity biases (or applying bias-correction methods) and improving the representation of rapid intensification for the most intense cyclones. Development of the CPL configuration should include an adjustment of ocean 700 initial and boundary conditions, as SST throughout the model domain is too cold. Future refinements could also be made to the regional ocean dynamics and vertical structure, including coupling the present configuration to a wave model, as CPL overestimates cooling in TC wakes. Any candidate operational implementation would require further evaluation to determine the impact of lateral atmospheric boundary conditions obtained from free-running forecasts, ideally from a global coupled model, and for the ATM configuration, the impact of 705 persisting SST throughout model runs. Coupled systems provide a physically sound approach to introducing air-sea interactions, which are important for track and intensity predictions, to tropical cyclone forecasts.
However, there are outstanding challenges related to ocean dynamics in both the coupled system development and model initialisation.

710
Code/data availability. The model code and raw data were generated at the Met Office. The data that support the findings of this study are available from the corresponding author, J Saxby, upon reasonable request. Crook prepared the manuscript with contributions from all co-authors. All co-authors helped guide the analysis and reviewed and edited the manuscript. 720 responsible for this use of the Copernicus data. We are grateful to the GTMBA Project Office of NOAA/PMEL for the use of RAMA buoy data. Argo float data were collected and made freely available by the International Argo Program and the national programs that contribute to it (https://argo.ucsd.edu, https://www.oceanops.org). The Argo Program is part of the Global Ocean Observing System. We are grateful to the Centre for Environmental Data Analysis, the NCAS British Atmospheric Data Centre and the Met Office for access to 740 SYNOP data through the MetDB service. IMERG data were provided by NASA and PPS, which develop and compute IMERG as a contribution to GPM, which are archived at the NASA GES DISC.