The Response of Tropical Cyclone Intensity to Temperature Profile Change

Abstract. Theory indicates that tropical cyclone intensity should respond to changes in the vertical temperature profile. While the sensitivity of tropical cyclone intensity to sea surface temperature is well understood, less is known about sensitivity to the temperature profile. In this paper, we combine historical data analysis and idealised modelling to explore the extent to which historical tropospheric warming and lower stratospheric cooling can explain observed trends in the tropical cyclone intensity distribution. Observations and modelling agree that historical global temperature profile changes coincide with higher lifetime maximum intensities. But observations suggest the response depends on the tropical cyclone intensity itself. Historical lower- and upper-tropospheric temperatures in hurricane environments have warmed significantly faster than the tropical mean. In addition, hurricane-strength storms have intensified at twice the rate of weaker storms per unit warming at the surface and at 300-hPa. Idealized simulations respond in the expected sense to various imposed changes in the temperature profile and agree with tropical cyclones operating as heat engines. Yet lower stratospheric temperature changes have little influence. Idealised modelling further shows an increasing altitude of the TC outflow but little change in outflow temperature. This enables increased efficiency for strong tropical cyclones despite the warming upper troposphere. Observed sensitivities are generally larger than modelled sensitivities, suggesting that observed tropical cyclone intensity change responds to a combination of the temperature profile change and other environmental factors.


et al., 2019; Kossin et al., 2020;Emanuel, 2021). TC intensity sensitivity to the underlying SST, or more accurately the 80 thermal disequilibrium between the SST and the near surface atmosphere, is relatively well understood (Emanuel, 1987;Elsner et al., 2008;Strazzo et al., 2015). Global average TC intensity scales by 2.5% per degree Kelvin SST warming (Knutson et al., 2019). Yet the magnitude and mechanistic response of TC intensity to changes in the vertical profile of temperature are less well understood.

85
A Carnot heat engine has been used to link TC intensity to the vertical temperature profile (Emanuel, 1986;1991;2006;Ramsay, 2013;Pauluis and Zhang, 2017). This maximum potential intensity (PI) theory suggests that maximum TC intensity changes in response to the engine's efficiency -the temperature difference between the surface and the level of the TC outflow (e.g., Emanuel 1988;Holland 1997). Numerical experiments agree (Shen et al., 2000;Bryan and Rotunno, 2009a;Emanuel and Rotunno, 2011). In idealised axisymmetric simulations under radiative convective equilibrium, PI increased by 90 about 1 ms -1 per degree of lower stratospheric cooling, and by about 2 ms -1 per degree of surface warming (Ramsay, 2013).
While lower stratospheric cooling revs the Carnot engine by increasing thermodynamic efficiency and potential intensity, the warming maximum in the upper troposphere has the opposite effect and limits TC intensification associated with ocean warming (Shen et al., 2000;Hill and Lackmann, 2011;Tuleya et al., 2016). The spread in historical temperature trends across reanalysis datasets also results in a spread in PI trends (Emanuel et al., 2013). 95 Yet the realized response of the TCs themselves may be quite different from the response of the PI. Idealized GCM simulations (Vecchi et al., 2013) did not show significant sensitivity of the TC intensity distribution to lower stratospheric cooling despite an increasing PI. The TC intensity distribution did, however, respond to temperature perturbations in the upper troposphere, corresponding with PI changes. Furthermore, the realized response of TCs appears to depend on the TC 100 intensity itself. Indeed, the highest sensitivity to surface warming resides in the strongest storms (e.g., Elsner et al., 2008;Knutson et al., 2010).
We hypothesize that observed tropical temperature profile changes exert predictable influences on TC intensity characteristics including intensification rate and maximum intensity. Furthermore, we explore whether historic temperature 105 profile changes are sufficient to explain past trends in the TC intensity distribution. Our approach blends historical data analysis with idealized numerical modelling. Observational analyses bring together a global homogenized radiosonde temperature dataset with a homogeneous TC intensity record to minimize contamination by artificial trends. Naturally, observed trends in TC intensity are not due to changes in temperature alone, and respond to changes in other environmental factors. Our goal is to isolate the influence of temperature change on TC intensity. We focus on a global-scale analysis over 110 a 37-year historical period -scales at which TC intensity should be more strongly constrained by thermodynamic change than by other environmental or geographic factors (Deser et al., 2012). Idealized numerical modelling further isolates and quantifies the TC intensity response to observed trends and future temperature profile changes.

Historical temperature and tropical cyclone datasets
We use multiple temperature and TC datasets to characterise historical trends and the relationships between TC intensity and 120 thermal structure. Temperature data are compared across radiosonde soundings and two reanalysis datasets and related to two historical TC datasets.
Global radiosonde data are obtained from the Radiosonde Observation Correction Using Reanalyses (RAOBCORE) v1.5.1, available on a 10° ´ 5° grid, 16 pressure levels and twice daily (Haimberger, 2007;Haimberger et al., 2012). RAOBCORE 125 was developed to be suitable for climate applications and was created by applying a time-series homogenization to the Integrated Global Radiosonde Archive (IGRA; Durre et al., 2006). This procedure uses temperature differences between radiosonde observations and background forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA-40, Uppala et al., 2005) to correct discontinuities tied to observing system changes and remove persistent biases. These corrections are particularly important for lower stratospheric temperatures where 130 measurements are susceptible to radiation errors (Sherwood et al., 2005). Haimburger et al. (2008) showed that RAOBCORE compares favourably with satellite-derived estimates of temperature trends in the upper troposphere and lower stratosphere consistent with theoretical and model expectations. Sounding profiles are sufficiently numerous to characterise the thermal structure from the 925 hPa level up to 50 hPa. While sounding locations in TC genesis regions are sparse, their spatial representativeness for temperature scales with the large radius of deformation at low latitudes. In addition, we only use 135 stations that have at least 70 % complete records over the period 1981 to 2017 and do not contain breakpoints. Breakpoints are detected following the methodology described in Prein and Heymsfield (2020). Briefly, four different breakpoint detection algorithms are applied and time series for which more than two algorithms identified a breaking point in the same year were excluded.

140
The two reanalysis datasets analysed here, both produced by the ECMWF, are the Interim reanalysis (ERA-I; Dee et al., 2011; accessed from European Centre for Medium-Range Weather Forecasts, 2009) and the more recent ERA5 (Hersbach et al., 2020; accessed from European Centre for Medium-Range Weather Forecasts, 2019). These reanalyses differ in important ways that may affect trends in the vertical temperature profile, including horizontal and vertical grid spacing, model physics, data assimilation technique, and the data sources assimilated. The horizontal grid spacings are 79 km/TL255 (ERA-I) and 31 145 km/TL639 (ERA5), and the numbers of vertical levels and vertical extent are 60 levels up to 10 hPa for ERA-I and 137 levels up to 1 hPa for ERA5.
ERA-I and ERA5 assimilate vast quantities of in situ, radiosonde, and remote sensing observations, and the observing systems change over time. This can lead to discontinuities in the simulated time series (Dee et al., 2011;Simmons et al., 150 2014). ERA-I assimilates the RAOBCORE data and ERA5 assimilates radiosonde data that have been homogenized using a newer procedure that uses neighbouring stations rather than departure statistics alone. ERA5 contains a pronounced cold bias in the lower stratosphere from 2000 to 2006 due to the use of inappropriate background error covariances (Hersbach et al., 2020;Simmons et al., 2020). This bias has been corrected in ERA5.1 (Simmons et al., 2020;accessed  Observations of historical TCs are taken from two sources: The international best track archive for climate stewardship version 4 (IBTrACS, Knapp et al., 2010, downloaded on June 14, 2021) and a reanalysed intensity record provided by Kossin et al. (2020). The IBTrACS has formed the basis for many studies of TC variability and change. Here, we use USA 160 agency data, which are largely derived from the National Hurricane Center's HURricane DATa 2nd generation (HURDAT2) dataset and reports from the Joint Typhoon Warning Center. However, spatial and temporal variations in the instrumental observing system challenge the interpretation of TC variability and change, particularly in the early record (e.g., Landsea et al., 2006;Klotzbach and Landsea, 2015). Indeed, substantial differences across the reporting agencies (Knapp and Kruk, 2010) can contaminate global climatologies (Schreck et al., 2014). In response, Kossin et al. (2013) reanalysed the historical 165 intensity record by applying an intensity algorithm (the advanced Dvorak Technique, ADT) to a homogenized geostationary satellite dataset (the Hurricane Satellite record, HURSAT). The resulting ADT-HURSAT dataset was recently extended to cover the period 1979 to 2017 . The key advantage of ADT-HURSAT compared to IBTrACS is its consistency in time and space which makes it suitable for trend analysis, especially from 1981 onwards. Both TC datasets are included here to demonstrate sensitivity of TC intensity change to artifacts of the datasets, and to connect results back to 170 prior work.
The 37-year observational analysis period of 1981 to 2017 is chosen as a balance between data availability and to roughly coincide with the start of the recent warming trend (e.g., Rahmstorf et al., 2017, their Fig. 2) and its influence on global TC behaviour (Holland and Bruyère, 2014). Where possible, we use minimum central sea level pressure (Pmin) as a measure of 175 storm intensity, though for some analyses we also use maximum 10 m wind speeds (Vmax). The advantages of Pmin over Vmax are discussed by Klotzbach et al. (2020), including a significantly higher correlation with normalized TC damage.

Idealized model experiments
We hypothesize that observed tropical temperature profile changes exert predictable influences on trends in the intensification rate and maximum intensity of TCs. As discussed above, previous studies have explored the sensitivity of TC 180 intensity to both the tropical upper-tropospheric warming maximum and lower stratospheric cooling. From the conceptual framework of a Carnot heat engine, an upper tropospheric warming maximum in the ambient TC environment reduces the thermodynamic efficiency of a TC by warming the outflow temperature, especially for weaker TCs with lower altitude outflow (rising, saturated air parcels experience a lower equilibrium level). Lower stratospheric cooling, on the other hand, could increase thermodynamic efficiency, owing to colder outflow temperatures, particularly for stronger TCs with higher 185 altitude outflow (this would increase the altitude of a parcel's equilibrium level). We use ensembles of simulations from an axisymmetric model to test these predictions, and to quantify the magnitude of these influences on TC intensity.
The axisymmetric TC capability of Cloud Model 1 (CM1, Bryan and Fritsch, 2002;Bryan and Rotunno, 2009a) is well suited for our experiments. The limitations of axisymmetric simulations are outweighed by the reduced computational 190 expense, which allows us to run ensembles of simulations. Axisymmetric models have proven useful in the evaluation of TC maximum intensity (e.g., Rotunno and Emanuel, 1987;Bryan and Rotunno, 2009a;Hakim, 2011;Rousseau-Rizzi and Emanuel, 2019). We acknowledge that some three-dimensional effects, such as vortex Rossby waves, are known to be important to TC intensity (e.g., Wang, 2002;Gentry and Lackmann, 2010;Persing et al., 2013). There is no reason to believe that these factors would vary substantially in direct response to changes in the environmental temperature profile, but they 195 could vary with storm intensity. Thus, the response of axisymmetric vortices to changes in the thermodynamic profile is deemed sufficient to test our hypotheses, but fully 3-dimensional simulations are needed to investigate this limitation. The axisymmetric domain in our simulations features 4 km grid length, a model top of 25 km (59 vertical levels), and a radial domain length of 768 km. The horizontal mixing length in this version of CM1 is a linear function of surface pressure, varying from 100 m at 1015 hPa to 1000 m at 900 hPa (Bryan, 2012). 200 We initialize CM1 (version r19.10) with the Dunion (2011) "moist tropical" sounding, derived from western North Atlantic rawinsonde data from 1995 to 2002 (Fig. 1a). The model is initialized with a weak vortex (~12 ms -1 maximum azimuthal velocity in gradient thermal wind balance) like that in the control simulation of Rotunno and Emanuel (1987). A potentially important difference between our experimental design and that of Rotunno and Emanuel (1987) is that our initial conditions 205 are not in a state of radiative-convective equilibrium. This is to assess the influence of temperature profile differences more directly during the TC intensification stage, although we acknowledge that the TC begins to modify the environment immediately, and we have not eliminated this change in our simulations. Our present-day simulations feature an SST of 28°C, close to the value obtained by lowering the 1000-hPa air temperature in the Dunion moist-tropical sounding adiabatically to the surface (~1015 hPa). Bryan and Rotunno (2009b, p. 3046) discuss their use of 28 °C SST in the control 210 simulation of Bryan and Rotunno (2009a), citing Cione et al. (2000) for observational support for air-sea temperature differences.
We ran the simulations for 8 days, which allowed the idealized TCs to intensify to a maximum and then equilibrate to a quasi-steady-state intensity. We recognize that much longer integrations have been used in several equilibrium studies (e.g., 215 Hakim, 2011;Ramsay, 2013), but TC modification of the environment in longer integrations would limit our ability to detect environmental influences. Given our goal of examining TC responses to changes in the environmental temperature profile, we focus on the equilibrium state rather than the peak core strength (Rousseau-Rizzi et al., 2021), though we present both.
Owing to the sensitivity of simulated TC intensity to various model parameterization choices, we ran an ensemble of 21 simulations for each environmental profile, varying the turbulence, radiation, sea surface and microphysical 220 parameterizations (Tables 1, and A1). Despite temporal variability, the ensemble mean intensity appears close to the analytical value predicted by the Emanuel (1988) maximum potential intensity (E-PI, Table 2); we recognize that considerable uncertainty also exists in the E-PI values owing to various choices that go into that calculation. 225 exchange coefficients (isftcflx), atmospheric radiation (radopt), relaxation term that mimics atmospheric radiation (rterm), and explicit moisture scheme (ptype); see Table A1 for specific settings for each of the 21 ensemble members.
parameter description sfcmodel CM1 (1), "WRF" (2), "revised WRF" (3), GFDL (4), MYNN (6) oceanmodel constant SST (1), ocean mixed layer model (2) isftcflx Donelan (1), or Donelan/Garratt for Cd and Ce (2) radopt simple (0, with rterm = 1), NASA (1), or RRTMG (2) ptype Morrison (5) or Thompson (3) To explore the sensitivity of simulated TC intensity to changes in the environmental thermodynamic profile, we ran five additional 21-member ensemble experiments (Table 2). These were primarily designed to explore TC intensity response to extrapolated observational trends based on RAOBCORE data discussed in Sect. 2.1 and presented in Sect. 3.1. The "midcentury" experiment corresponds to conditions approximately in the year 2050 if current trends are extrapolated, and the 240 "end-of-century" experiment applies changes extrapolated over a century-long period. Two additional experiments allow us to isolate the sensitivity of TC intensity to specific changes observed in tropical temperature profiles. The "no upper warming maximum" ensemble is based on a temperature change profile that is nearly constant with height in the troposphere ( Fig. 1b), and the "no stratospheric cooling" simulations explore TC response to a temperature change profile that eliminates lower stratospheric cooling (Fig. 1b). Recognizing the limitations in extrapolation of current observational trends, we ran an additional ensemble experiment based on a multi-model mean of IPCC AR5 GCM change profiles, for end-of-century conditions under the RCP8.5 scenario (see Jung and Lackmann, 2019, their Table 2).
Based on the thermodynamic and Carnot efficiency considerations mentioned in Sect. 1 and the E-PI calculations shown in Table 2, we predict a priori that the present-day simulation would produce the weakest ensemble-mean TC, followed in 250 order of increasing intensity by the mid-century and end-of-century simulations. We further expect that simulations omitting the tropical upper warming maximum would be slightly stronger than the default end-of-century ensemble, and that the ensemble removing stratospheric cooling would be slightly weaker in intensity relative to the default end-of-century run. We expect the GCM-based ensemble to yield the strongest storm, given significantly greater warming. Of course, the numerical simulations are not constrained to agree with these theoretically motivated predictions. 255 To further test our hypotheses relating changes in TC intensity to environmental temperature changes, we computed thermodynamic efficiency following Emanuel (1987;1988) and Gilford (2021). Given the availability of high-resolution numerical simulations, we also computed the simulated TC outflow temperature directly, defined as the temperature of air with outward radial flow exceeding 1.0 ms -1 and cloud ice mixing ratio exceeding 10 -5 kgkg -1 . Experimentation with these 260 threshold values demonstrates that this setting works well to represent the temperature of the cirrostratus outflow layer, though the ensemble average values obtained were not highly sensitive to changes in the radial velocity or cloud ice mixing ratio thresholds (not shown). In our analysis of derived outflow temperatures, we noted substantial differences between simulations conducted with "complex" versus "simple" representation of radiation and have stratified the results accordingly. 265

Historical temperature and tropical cyclone observations
To begin exploring whether observed temperature profile changes are sufficient to explain observed trends in the TC intensity distribution, we start with an analysis of historical data. Historical summertime tropical temperature profile trends are compared across RAOBCORE, ERA5 and ERA-I in Fig. 2a. The known upper tropospheric warming maximum and 275 lower stratospheric cooling are present across all three datasets but vary significantly in magnitude and vertical structure. As expected, ERA-I and RAOBCORE trend profiles agree well with each other (since ERA-I assimilates RAOBCORE data) with peak warming located at the 300 hPa level. The ERA5 exhibits 30 % weaker peak warming than RAOBCORE and locates peak warming higher in altitude, at 175 hPa. Cooling rates in the lower stratosphere are strongest in ERA5, reportedly due to the assimilation of radiosonde data adjusted by the RICH method (Haimberger et al., 2012;Hersbach et al., 280 2020). Simmons et al. (2014) suggest that the weaker cooling trend in ERA-I may be related to a cold bias in the lower stratosphere which persisted through the early 2000s and then was corrected through a new assimilation of radio occultation data. We next examine whether the trend is stable across the decades, or whether the change concentrates in a particular decade.
The rate of change in the temperature profile is roughly constant across the four decades throughout the troposphere (Fig.  295 2b). But decadal changes in the lower stratosphere are less stable, reflecting the known step changes in temperature linked to volcanic eruptions (Ramaswamy et al., 2006). Figure 2c shows that temperature trends proximal to strong TCs are significantly different from trends for the tropics as a whole. Proximal is defined here as an average within 0.5° of the LMI locations (according to ADT-HURSAT) two days 300 before a TC arrives at the location. The sample sizes are 2174 tropical storm environments and 1774 hurricane environments.
Strong TC environments have warmed significantly faster than the tropical mean environment below the 850-hPa level, warming twice as fast. The peak warming in the upper troposphere is correspondingly stronger and located at a higher level.
The middle troposphere warms more slowly, but not significantly so. Trends also differ between proximal environments for tropical storms and hurricane strength storms, but not significantly so. Tropical storm environments also do not trend 305 significantly differently from the tropical mean environment.
Our purpose here is not to comment on which temperature dataset produces the most accurate trends, but rather to document that the choice of temperature dataset matters for the magnitude and structure of the temperature trend. We also update previous work (Emanuel et al., 2013;Vecchi et al., 2013) that compared across reanalysis datasets by including the more 310 recent ERA5 combined with ERA5.1. By extension, analysed relationships between TC intensity trends and temperature profile trends may also vary by choice of temperature dataset. Later in this section we make links between temperature trends and TC intensity trends. This requires a temperature dataset with globally uniform coverage. We choose the ERA5 dataset for this purpose given its higher spatial resolution and newer data assimilation procedures compared to ERA-I. We next turn our attention to the changing TC intensity distribution. 315 At the same time as the global tropical temperature profile has changed, so too has the distribution of global TC intensity. Figure 3a,b shows TC intensity distributions by historical decade in both the IBTrACS and ADT-HURSAT datasets. First, we first notice the different shaped distributions between IBTrACS and ADT-HURSDAT. Kossin et al. (2020) explain that cirrus-obscured TC eyes can cause underestimation of lifetime maximum intensity (LMI) at around 33 ms -1 . It's likely that this dataset therefore over-reports LMI values less than 33 ms -1 , with higher LMI only reported if the algorithm locks onto a clearing eye signature as TCs intensify. ADT-HURSAT therefore sacrifices storm-level accuracy for improved long-term statistics. The well-established bi-modal distribution is present in both datasets, and both reproduce the known result of an increasing 330 proportion of the strongest storms over time (e.g., Elsner et al., 2008;Kossin et al., 2020). We also reproduce the stronger trends in IBTrACS than ADT-HURSAT. For the proportion of major hurricanes (category 3 and higher on the Saffir-Simpson scale), Kossin et al. (2020) find the increase in ADT-HURSAT is about half that in IBTrACS and suggest that half the trend in IBTrACS is attributable to changes in observing systems. When considering the proportion of category 4 and 5 storms, we find even larger discrepancies. In IBTrACS, the proportion of category 4 and 5 storms increases from 11.3 % in 335 the 1980s to 20.9 % in the 2010s; a factor 1.85 increase. For ADT-HURSAT, the proportion increases from 14.1 % in the 1980s to 17.7 % in the 2010s; a factor of only 1.26, and a rate approximately 3 times lower than in IBTrACS. Our finding here is consistent with the greater impact of observing system change for the strongest storms .
Interestingly, we also find that IBTrACS produces more than half the change between the first two decades (1980s to the 1990s), whereas ADT-HURSDAT produces more than half the change between the final two decades (2000s to the 2010s).
Our purpose in reproducing and expanding upon known trends and discrepancies among datasets is to show that the choice of TC dataset matters for intensity trend magnitudes. The choice may be particularly important for trend analyses that subset trends by TC intensity.
We now begin to explore statistical linkages between the changing TC intensity and temperature profiles. We use quantile 345 regression models to explore how the strength of the statistical relationship between LMI and environmental temperature varies by storm intensity, following the approach used in Elsner et al. (2008) and Kossin et al. (2013). Our quantile regression models specify how the LMI quantile changes with variation in temperature. This allows us to identify whether relationships with the temperature profile differ between strong and weak storms. We later compare these assessments to those derived from our numerical simulations. 350 We start by quantifying temporal trends in LMI to link back to existing work and provide a starting point from which to explore trends with respect to temperature. When considering all TCs (Fig. 4a), only those exceeding hurricane strength (>33 ms -1 ) show intensification, but trends are not significantly different from zero. Kossin et al. (2020) report that quantile regression can be highly sensitive to the range of the data. When considering only hurricane strength storms (Fig. 4b) we 355 found that intensification is significantly different from zero, peaking at 3 ms -1 per decade for a hurricane quantile of 0.4.
These results reproduce those of Kossin et al. (2020).
We next explore how these trends in LMI quantiles compare to trends in the theoretical maximum potential intensity, to determine how strong vs. weak storms have kept pace with trends in their PI. The theoretical maximum potential intensity is 360 calculated using E-PI (Emanuel, 1988) on thermodynamic profiles from ERA5 data proximal to individual TCs at the time of LMI. The linear trend in mean E-PI is 1.2 ms -1 per decade for locations of all TCs and 0.9 ms -1 per decade for locations of hurricane strength TCs only. Given that tropical storm strength TCs show no temporal trend, they have not kept pace with their rising E-PI. But hurricane strength storms exhibit super-E-PI trends and have therefore closed the gap between realized and maximum potential intensity.

375
Figures 4c,d,e show relationships between LMI quantiles over all TCs and SST, temperature at the 300 hPa level (T300) and temperature at the 50 hPa level (T50). As before for the calculation of E-PI, representative environmental temperatures are obtained using LMI proximal values. In general, we find large and statistically significant relationships. Intensity has increased significantly with warming SSTs almost universally across LMI quantiles, but with a markedly different response 380 between hurricane strength storms and weaker storms. Tropical storm strength quantiles have increased by approximately 0.6 ms -1 per K, whereas the rate rises rapidly with LMI quantiles above hurricane category 1 strength, reaching a maximum of 2.6 ms -1 per K at the highest quantiles. This is markedly different behaviour from the temporal trends where the higher rates are located at the middle quantiles. We also note the dip in the trend at quantiles close to about 33 ms -1 . These may not be reliable because it coincides with the intensity at which the ADT-HURSAT determinations can be influenced by cirrus-385 obscured eyes.
The response of LMI quantiles to T300 is qualitatively similar to the response to SST but trends plateau for the highest quantiles. This similarity may be expected given the strong correlation between proximal SST and proximal T300 (R = 0.78). The reduced rates of change for the highest quantiles may also be expected given the larger change in upper 390 tropospheric temperature per unit change in SST. As before for SST, hurricane strength TCs exhibit markedly different behaviour to weaker storms: They intensify with T300 warming at approximately twice the rate of weaker storms.
The response of LMI quantiles to T50 temperature (Fig. 4c) shows increasing intensity with cooling across most LMI quantiles but is statistically significant for tropical storm strength storms only. We therefore do not find a significant 395 relationship between trends in hurricane intensity and lower stratosphere temperature. This is consistent with the GCM study by Vecchi et al. (2013) but inconsistent with idealized simulations by Ramsay (2013).
In summary, our analysis of historical records finds that hurricane strength storms exhibit markedly different behaviour to weaker storms in environments of changing temperature profile. Hurricane strength storm intensity increases at twice the 400 rate or more compared to weaker storms within environments of sea surface temperature warming. Hurricane strength storm intensity also increases at twice the rate compared to that of weaker storms in environments of upper tropospheric warming.
Despite upper warming having a limited correlation with TC intensity, this result is perhaps unsurprising given the strong correlation between SST and T300 (not shown). The response of hurricane strength storms within environments of lower stratospheric cooling was mixed and did not reach statistical significance. 405

Idealized model experiments
Towards the goal of isolating and quantifying the effects of temperature profile changes on TC intensity, we turn to idealized simulations which are free from other changes. If the results of these simulations agree with expectation, we can be more confident in attributing observed TC intensity trends to temperature profile changes, which are perhaps more reliably projected by GCMs. On the other hand, if the idealized simulations indicate TC intensity trends that differ markedly from observations, then we can be more confident that other environmental changes are dominant in driving the observed changes.
As discussed in Sect. 2.2, numerical simulations were conducted with the CM1 model in an axisymmetric TC configuration.
The 21-member control (present climate) ensemble features an initial period of slightly weakening TC intensity, followed by steady vortex intensification between simulation hours 12 and 90 (Fig. 5). Considerable ensemble spread develops by hour 415 50, with central pressure values ranging from less than 900 hPa to nearly 960 hPa at hour 100. The simulated ensemble mean TC minimum sea level pressure attained a minimum (maximum intensity) around hour 130, followed by slight weakening and quasi-steady ensemble mean intensity until the end of the simulation. Simulations using a simple Newtonian cooling radiation parameterization generally resulted in weaker TCs (blue lines in Fig. 5), motivating use of an ensemble subset consisting of the 13 members using more complex radiation parameterizations. The complex-radiation subset features 420 reduced ensemble spread, and a lower ensemble-mean central pressure ( Table 2). The intensification phase of TCs in the complex radiation members consistently begins earlier in the simulation relative to the simple-radiation subset; for instance, the time required for Pmin to reach 960 hPa is nearly 24 hours faster for the complex radiation members (Fig. 5). We evaluate both the maximum "core" ensemble mean intensity and the steady period at the end of the simulations, consistent with "equilibrium intensity" in the nomenclature of Rousseau-Rizzi et al. (2021). The core intensity corresponds to the LMI. 425 For the additional experiments, time series of ensemble-mean maximum near-surface wind speed and minimum central 430 pressure sort out precisely as expected based on theoretical predictions: The present-day simulation features the weakest ensemble-mean TC, while the end-of-century simulations are all stronger, with the mid-century ensemble falling between (Fig. 6, Table 2). This overall trend matches the E-PI calculations in a relative sense (Table 2). One notable difference is the removal of the stratospheric cooling, which had no impact on E-PI but weakened the simulated storm slightly. The GCMmodified end-of-century environment yields the greatest intensity, with filtered ensemble-mean Pmin values approaching 900 435 hPa in the complex-radiation ensemble subset (Fig. 6a). This is consistent with the fact that future changes under the CMIP5 RCP8.5 scenario exceed that due to extrapolation of current observed trends (compare purple and red curves in Fig. 6a, and abscissa values in Figs. 1b,c). In all of the simulations, the ensemble mean Pmin values were lower than the E-PI calculations, though this difference was reduced for the equilibrium period Pmin values. Note that there is uncertainty in the E-PI calculation owing to several choices in parameter settings, as is the case with the CM1 model. 440 Each ensemble experiment exhibits considerable variability, and the ensemble standard deviations are generally larger than the differences in ensemble mean between the experiments (Fig. 6b, Table 2). That the relative ranking of the experimental ensemble mean intensity matches expectation from theory is notable, but the large ensemble variability provides context regarding statistical robustness, or lack thereof. While we refrain from a dichotomous declaration of "statistically 445 significant" or not (e.g., Amrhein et al., 2019;Wasserstein et al., 2019), we recognize that the differences between the experiments are "small" in this sense. Inspection of the individual ensemble experiments demonstrates that the relative intensity of the different ensemble members exhibits considerable consistency, motivating use of a Wilcoxon signed-rank test (Wilcoxon 1945), appropriate for paired samples (Fig. 6b). Except for the mid-century experiment, small p-values relative to the present-day simulation provide more confidence in the significance of the results relative to what comparison 450 to the overall ensemble mean suggests (top labels in Fig. 6b). Comparison of the end-of-century with the no-upper-warming ensemble yields a signed-rank p-value of 0.13 and compared with the no-stratospheric-cooling ensemble value of 0.29 (not shown). While the smoothed, ensemble mean changes are highly consistent with theoretical expectations, neither the changes predicted by E-PI theory nor those resulting from the numerical simulations are dramatic in terms of Pmin. For extrapolations of current RAOBCORE trends, the end-of-century ensemble mean is characterized by Pmin values that are approximately 10 465 hPa lower than for the present-day ensemble. That is not to say that these intensity increases are insignificant, however.
Changes in the GCM-modified environment under the RCP8.5 scenario exhibit the strongest changes in ensemble-mean Pmin, approximately 12 hPa lower. The strengthening seen in the extrapolated RAOBCORE experiments is consistent with that reported for a 2 K change by Knutson et al. (2020), while the GCM experiment change, accompanied by an SST warming in excess of 3 K, is somewhat less than what would be anticipated from the Knutson et al. (2020) review. 470 The consistency between the CM1 simulation results and the theoretical E-PI intensity calculations suggests that interpretation of the simulated TC responses to environmental change is consistent with the concept of a Carnot heat engine (e.g., Emanuel, 1988;1991). Because we use Pmin to measure storm intensity, we are not concerned with supergradient wind speeds as analysed by Hakim (2011) andSmith et al. (2008). Our hypothesis in this analysis is that in the quiescent (un-475 sheared) axisymmetric CM1 environment, the TC response to changes in the environmental temperature profile will be consistent with PI theory and the concept of thermodynamic engines. These idealized simulations provide an estimate of the expected effect of such changes on TC characteristics, allowing us to relate the simulation responses back to the observational TC statistics presented in Sect. 3.1.
We compute the temperature of cloudy, outflowing air in the upper troposphere for each ensemble member in each experiment, and use this information in conjunction with SST to compute the thermodynamic efficiency (see Sect. 2.2) according to Eq. (1): (1) The outflow temperature is remarkably similar between the different experiments (Table 3). While the warmest outflow is in 485 the GCM-modified experiment, as expected, this does not reach statistical significance. The similarity in outflow temperatures is consistent with the Fixed Anvil Temperature (FAT) hypothesis (Hartmann and Larson, 2002). The FAT hypothesis argues that the environmental cooling rate is largely governed by temperature. This follows from the saturation vapor pressure dependence on temperature via the Clausius-Clapeyron relation. The temperature at which cooling rates rapidly decrease with height (and therefore also the temperature of the outflow) should remain approximately constant. 490 Surface warming therefore raises the altitude of the outflow but has less effect on outflow temperature. In agreement, we find the average pressure altitude of the outflow exhibits considerable difference among the experiments, with the present day ensemble showing the lowest outflow altitude, and the GCM experiment the highest (~190 hPa, Table 3). Although the differences are small relative to the ensemble standard deviation, the no stratospheric cooling and no upper warming maximum experiments exhibit the expected changes in outflow pressure. Interestingly, the average outflow pressure 495 generally reflects an altitude above the upper warming maximum, especially for the stronger TCs in the GCM ensemble. Table 3: Ensemble mean outflow temperature, pressure, and thermodynamic efficiency computations for the 13-member complex-radiation ensemble subset; radial wind threshold of 1.0 ms -1 and cloud ice threshold of 10 -5 kgkg -1 . Ensemble standard deviation (SD) is shown for outflow temperature and pressure. Values apply to the "equilibrium" time window of the simulations, 500 hours 150 to 192.

Experiment
Efficiency SST (K) T outflow / SD (K) P outflow / SD (hPa) For the GCM experiment, the slightly warmer outflow temperature is more than compensated by the increased SST, resulting in the greatest thermodynamic efficiency among the experiments. The GCM experiment also produces the lowest Pmin (Table 2). In fact, the numerical simulation experiments ranked by intensity match exactly the ranking in 505 thermodynamic efficiency (Tables 2 and 3). The differences in thermodynamic efficiency between the ensemble members are small in magnitude, but the consistency between these changes and the relative Pmin are consistent with expectation, lending confidence to this interpretation.

Concluding Discussion
In a quiescent environment, theory indicates that TC intensities should exhibit considerable sensitivity to changes in the 510 temperature profile, from the sea-surface up into the lower stratosphere (Emanuel, 1991;Kieu and Zhang, 2018;Tao et al., 2020). In this paper, we explored whether observed temperature profile changes are sufficient to explain observed trends in the TC intensity distribution. To do so we worked to isolate and quantify the response of TC intensity to observed trends in environmental temperature using a combination of historical data analysis and idealized numerical modelling. By establishing the linkage between temperature profile changes and TC intensity, we aimed to strengthen understanding and 515 improve interpretation of observed and emerging trends in the TC intensity distribution.
Our historical data analysis focused on global scales spanning four decades to emphasise the scales where thermodynamic change is large and circulation change is minimized. Tropical storm strength intensities show no temporal trend and have therefore not kept pace with rising PI. Hurricane strength storms, however, exhibit significant temporal trends that reach 520 super-PI rates for some intensity quantiles. Storms at these quantiles have therefore closed the gap between realized and maximum potential intensity. This is consistent with our finding that hurricane environments have warmed faster at lower and upper levels than the tropical mean environment.
In changing our frame of reference from time to temperature, we again found markedly different sensitivities between 525 tropical storms and hurricane strength storms. Hurricane strength storms intensified at up to four times the rate of tropical storms per unit increase in surface and upper tropospheric temperature. The response of storms within environments of lower stratospheric cooling was mixed and did not reach statistical significance. The differing trend magnitudes among commonly used historical temperature and TC intensity datasets challenges our ability to understand relationships using historical data alone. 530 We then turned to idealized modelling to further isolate, quantify, and understand the effects of temperature profile changes on TC intensity, and to interpret the empirical statistics. Idealised TC simulations responded in the expected sense to various imposed changes in the temperature profile and agree with TCs operating as heat engines. The imposed historic warming trend has faster warming aloft than at the surface, thereby reducing the temperature difference. TC efficiency would 535 therefore be expected to decline, yet our simulations show the opposite: increased TC efficiency. Analysis of TC outflow found little change in the outflow temperature but a rising mean pressure outflow altitude that is located above the altitude of peak upper tropospheric warming. The near constancy of outflow temperatures suggests the increase in thermodynamic efficiency is being driven largely by surface warming. While the FAT hypothesis appears to explain our findings well, further work is needed to understand, at a process level, the extent of applicability of the FAT hypothesis for TCs. The FAT 540 hypothesis for tropical convection has support from observational analysis (Xu et al., 2007) and convection-resolving idealized numerical simulations (Kuang and Hartmann, 2007). Some additional supporting evidence for a FAT for TCs is provided by idealized cloud resolving modelling (Khairoutdinov and Emanuel, 2013) and by analysis of TC cloud top temperatures in ADT-HURSAT data (Kossin, 2015). However, detecting trends in TC cloud top temperatures is complicated by a poleward trend in the latitude of LMI (Kossin, 2015). 545 Increasing TC efficiency with warming may also explain the fastest temporal trends in intensity for the middle LMI quantiles. With warming, increasing efficiency closes the gap with E-PI. The strongest storms, however, were already close to their E-PI, and weaker storms are more strongly limited by other environmental factors such as shear or dry air.
Techniques to simulate weaker storms within the idealized modelling framework are needed to test this hypothesis. 550 The magnitude of the simulated changes, even for extrapolated trends, is relatively small compared to observed trends in TC characteristics. This suggests that temperature profile changes contributed to some of the observed TC intensity change, but that other environmental factors dominated as the root causes, including, for example, changes in vertical wind shear, humidity, incipient disturbances, or in the large-scale circulation. Removal of the tropical upper-tropospheric warming 555 maximum resulted in modest changes in core or equilibrium TC intensity in the idealized simulations. The consistency between the sense of the idealized simulation changes with theory and observation is consistent with the concept of a TC as a heat engine. Computations of thermodynamic efficiency in the idealized experiments were also consistent with initial hypotheses, and with the sense of changes in TC strength and intensification rate.

560
Omission of the observed lower stratospheric cooling exerted relatively little influence on TC intensity in our simulations, consistent with our observational analysis. This is consistent with the GCM study by Vecchi et al. (2013). However, the simulated equilibrium TC intensity with omission of stratospheric cooling did weaken, as expected, albeit slightly (Table 2).
Axisymmetric simulations out to radiative convective equilibrium by Ramsay (2013) showed stronger vortex intensity with stronger imposed lower stratospheric cooling rates. This was despite much of the outflow confined to the upper troposphere. 565 We agree with Ramsay (2013) and Ferrara et al. (2017) that it is challenging to reconcile contrasting results across different models with different parameter settings and analysis procedures, and across studies using limited historical datasets.
We hypothesized that observed tropical temperature profile changes also exert predictable influences on trends in the intensification rate of TCs. A preliminary analysis of observations finds historical trends in intensification characteristics 570 (not shown). Specifically, the average onset time of rapid intensification now occurs significantly sooner (by 16 h) after the first reported track point than in the first half of our period of record (not shown). Emanuel (2017) notes that sooner rapid intensification has important implications for watches, warnings, and predictability. Our idealized modelling setup did not allow us to pursue intensification due to possible contamination from model initialization and potentially important missing processes in the 2d dynamics. Suitable modelling frameworks need to be developed to test this hypothesis. 575 The differing trends in TC environments compared to the tropical mean environment has implications for climate change studies that use the Pseudo Global Warming (PGW) method. PGW typically applies a long time-average change from GCMs to reanalysis conditions and uses those high-resolution conditions to drive regional model simulations of historical and future weather events (e.g., Lackmann, 2015;Gutmann et al., 2018). TCs may respond differently to environmental change more 580 representative of that taking place locally within TC environments.
Extrapolated observational temperature trends resulted in weaker TC intensity trends relative to change profiles based on an ensemble of CMIP5 GCMs under the RCP 8.5 emission scenario. Future extensions of this work could omit the GCM-based tropical upper warming maximum or stratospheric cooling to determine whether a more substantial change results relative to 585 these exercises with the extrapolated observations. Use of CMIP6 trends would also be useful. Future work could also start from a different base sounding, other than the Dunion (2011) moist tropical sounding. It's possible that different magnitude sensitivities between the historical data analysis and the idealized simulations could be due, in part, to our use of this single profile that allows all simulated storms to reach the highest observed intensities. Base soundings representative of the observed tropical storm and hurricane strength storm environments may yield more nuanced sensitivity to temperature 590 profile change, given permitted variations in outflow altitude. Future work should also include tests with fully 3-D TC simulations; such simulations would allow examination of changes in intensification rate and timing. Finally, more comprehensive physical process studies are needed to interpret the empirical and idealized modelling findings reported here and work towards untangling the factors driving observed intensity changes.