1 Introduction
There is a strong evidence of global warming induced as a result of the increasing concentration of greenhouse gases in the atmosphere (Lai and Dzombak, 2019). It can be expected that this global warming will manifest itself in more frequent warm days and many papers suggest that heat waves will become more frequent (Lemonsu et al., 2014; Alexander, 2016). This anticipated scenario along with the exceptionally long heat wave observed in Europe in August 2003 has increased interest in analyzing climate extremes. The analysis of heat waves is particularly important due to the potential for serious anthropogenic, environmental, and economic impacts (Amengual et al., 2014; Campbell et al., 2018). Extreme heat raises significant health concerns in humans as it can result in death, can change the range or niche for plants and animals, and can lead to heatdriven peaks in electricity demands or lost crop income.
A challenge in analyzing heat waves stems from the ongoing debate over its exact definition. According to the World Meteorological Office (WMO), a heat wave should be defined as a period persisting at least three consecutive days of marked unusual hot weather (maximum, minimum and daily average temperature) over a region with thermal conditions above given thresholds based on local climatological conditions. This definition suggests that analyses of heat waves require temperature series at a daily scale, but offers no operational guidance with regard to the various choices of implementation. Khaliq et al. (2005) and Reich et al. (2014) used only maximum temperature, while Keellings and Waylen (2014) and, more recently, Abaurrea et al. (2018) considered both maximum and minimum temperatures. Perkins and Alexander (2013) and Smith et al. (2013) addressed the issue of analyzing different measurements and definitions of this phenomenon.
To circumvent the lack of agreement on heat wave definition, we work with the concept of an extreme heat event (EHE) in order to formalize the notion of a period of persistent extremely high temperatures. EHEs are based on an exceedance of threshold approach, often used in the analysis of extremes. Then, at a given location, an EHE is defined as a run of consecutive daily temperature observations exceeding the extreme threshold for that location. According to the WMO, the thresholds should be based on local conditions, such as the 95th percentile of the local temperature series during the warm months.
Much work on extreme events focuses on the analysis of their occurrence using asymptotic results from extreme value theory. These results state that the occurrence of peaks over threshold follow a Poisson process, under mild conditions (Ogata, 1988; GarcíaCueto et al., 2014; Abaurrea et al., 2015). A limitation of this approach is that important behaviors which describe the nature of an EHE, beyond incidence, such as the duration, or the average exceedance, or the maximum exceedance above the threshold, are not captured.
Models for heat waves and extreme heat events commonly consider the occurrence of the events at particular locations, i.e., in the form of a time series over a window of time. However, spatial aspects of this phenomenon are also of interest and should be introduced into the modeling process. That is, spatial dependence is anticipated in the occurrence of EHEs. A joint model for time series at different locations that incorporates spatial dependence will borrow strength from nearby time series, and will anticipate an increase in the probability of an EHE occurring at locations near where an EHE did occur. Furthermore, given the evidence of changing climate, the considered models should also be able to capture nonstationary behavior in maximum daily temperature in time. This can be done by allowing temporal random effects, time varying coefficients or timevarying covariates
(Cheng et al., 2014; Abaurrea et al., 2015).The contribution of this paper is to take up the challenge of providing a methodology to study the occurrence and behavior of extreme heat events during a period of time over a region which incorporates spatial dependence. The intent is to enable more complete characterization of the events. We develop a spacetime model based on a spatially referenced collection of temperature time series that enables the prediction of both the incidence and the characteristics of the EHEs occurring at any location in the region.
A direct approach considers a spatial model for daily maximum temperatures which can then be used to characterize the EHEs. A shortcoming of this approach is that such a model will be driven by the bulk of the distribution, i.e., where most of the data is observed. The model can yield poor fits for the tails (particularly the upper tail), where the main interest for the model lies when attempting to characterize EHEs (Keellings and Waylen, 2015; Shaby et al., 2016). A more promising approach to effectively model both the extremes as well as the bulk of the distribution is to consider a model incorporating thresholding. Here, we propose a model which switches between two observed states, one that defines extreme heat days (those above the temperature threshold) and the other that defines nonextreme heat days (those below the temperature threshold). Importantly, this twostate structure allows temporal dependence of the observations but also that the parameters controling the effects of covariates and the spatial dependence can differ between the two states. In the proposed model, the transition probabilities between the two observed states are driven by a twostate Markovian switching model. For daily maximum temperatures, we are able to demonstrate substantially improved outofsample predictive performance compared with a corresponding spatial autoregressive model which ignores thresholds.
The full model is specified in a hierarchical Bayesian framework and estimated using a Markov chain Monte Carlo (MCMC) algorithm. In this framework, posterior predictive distributions of the foregoing features of the EHEs (rate of incidence, duration, maximum exceedance, average exceedance) can be easily obtained. Specifically, for a given site, we can obtain posterior predictive samples of the time series of daily maximum temperatures from which posterior predictive samples of the characteristics of their EHEs can be extracted.
There is limited literature using Markov models to analyze EHEs.
Smith et al. (1997) studied daily temperature exceedances over a threshold and fitted a Markov model to them. Shaby et al. (2016)proposed a twostate switching hidden Markov model and used a latent variable that controlled whether a day was assigned to the heat wave or the nonheat wave state. By contrast, given local thresholds, our approach takes these state variables as observed rather than latent.
Furthermore, none of this modeling introduces spatial structure. Spatial dependence has been included in the climate extremes literature but using different approaches. For instance, Cooley et al. (2007) proposed a Bayesian spatial modeling of extreme precipitation in order to obtain maps of precipitation return levels. Reich and Shaby (2019) suggested a spatial Markov model for climate extremes, using latent clustering of neighboring regions. Guillot et al. (2015) used Markov random fields to model highdimensional spatial fields for paleoclimate reconstructions.
There are several motivations for building an effective spacetime model. First, the model can be used to study the behavior of extreme heat events over time. That is, for observed locations, we can extract EHEs and their characteristics descriptively. Second, the model can be used to make predictions with regard EHE characteristics at unobserved locations. These predictions are important for understanding the possible impacts of EHEs on human health, on ecosystems, and on the economy, since the assessment of these effects may require predictions at locations where monitoring devices are not available. Such spatial prediction may also be useful to complete observed series with missing information. A further potential application of the model is to make predictions of the spatial extent of an EHE, defined as the geographical area (often expressed as a percentage of the region) affected by extreme temperatures at a given time. Lastly, the model can be used to provide information for a deeper understanding regarding the temporal evolution of aspects related to temperatures across a region. For instance, it can be used to study changes in the incidence of EHEs and their characteristics between decades, to assess whether the factors affecting the nonextreme and the extreme daily maximum temperatures are the same, or to study EHE behavior averaged over a region. Spatial variation with regard to all of these concerns is provided.
As a final comment, there is an important distinction between the objective of modeling daily maximum temperatures and the objective of learning about characteristics of extreme heat events. The distinction is analogous to that of predicting weather versus climate. That is, like weather, if the goal is to predict an annual time series of daily maximum temperatures, our twostate threshold based model will not outperform a corresponding autoregressive model which ignores thresholds. However, like climate, if our goal is to predict incidence and characteristics of EHEs over a window of time, say, a decade, then our twostate model will consequentially outperform the corresponding autoregressive model.
The data analyzed in this work consist of approximately 60 years of daily maximum temperatures at 18 locations throughout the Comunidad Autónoma de Aragón in Spain. Details and summaries of this dataset are presented in the Section 2. As noted above, we fit our models to this dataset within a Bayesian framework using MCMC. We do out of sample model comparison using leaveoneout validation.
The outline of the paper is as follows. Section 2 presents an exploratory analysis of the above dataset. Section 3 describes the modeling details. Comparison of the proposed twostate model to a more traditional autoregressive model ignoring the threshold is presented in Section 4. Results and some associated inference summaries are supplied in Section 5. Finally, Section 6 concludes with a summary and possible directions for future work.
2 Exploratory data analysis
The region of interest is the Comunidad Autónoma de Aragón region in northeastern Spain, located in the Ebro basin (85,362 km). The Ebro river flows from the NW to the SE through a valley bordered by the Pyrenees and the Cantabrian Range in the north and the Iberian System in the southwest. The maximum elevation is approximately 3400 m in the Pyrenees, 2600 m in the Cantabrian Range and Iberian System, and between 200400 m in the central valley. In general, the area is characterized by a Mediterraneancontinental dry climate with irregular rainfall, and a large temperature range. However, several climate subareas can be distinguished due to the heterogeneous orography and other influences, such as the Mediterranean sea to the east, and the continental conditions of the Iberian central plateau in the southwest. Zaragoza, the largest city in the region, is located in the central part of the valley, and experiences more extreme temperatures and drier conditions. This variation in climate conditions suggests that Aragón is an interesting study region.
Daily maximum temperatures were obtained at 18 sites across and around the Comunidad Autónoma de Aragón region for the years 19532015. The names and locations of the 18 sites are shown spatially in Figure 1. Data from the years 19531962 were used to obtain the locationspecific thresholds for extreme heat events; the data for the remaining years 19632015 were used in the modeling. Specifically, the 95th percentile of daily maximum temperature for the months June, July, and August during the 10 years 1953 to 1962 was computed for each site to determine the threshold for defining extreme heat events at the site. These thresholds are shown in parentheses in Figure 1. Figure 2 shows these percentiles versus elevation/altitude for each site. In general, there is a negative, roughly linear relationship between altitude and daily maximum temperature percentile. In addition, while higher elevations are found in both the northern and southern part of the Aragón region, the 95th percentiles tend to be lower in the north than south indicating a latitudinal gradient in addition to an elevation gradient for daily maximum temperature.
Monthly averages of daily maximum temperature for each location during for the years 19761985 and 20062015 are supplied in Figure S1 of the supplementary material. Also shown are the differences (later minus earlier) in these monthly averages between the two time periods for the months April through October. We see an increase in April and very sharp increase in May across all sites, indicating that warmer weather is now starting earlier. We continue to see an increase through the summer months but in September we see a decrease for many sites suggesting an earlier end to summer at these sites. Raw summaries of the daily maximum temperature data are included in Tables 1 and 2. The total number of days with maximum temperature above the specified threshold for two 10 year periods (19761985 and 20062015) are given in Table 1. These summaries are computed for each month for June, July, and August. In general, July and August have the greatest number of days with maximum temperatures above the threshold and the latter time period experienced more days above the threshold. Table 2 shows the number of unique heat events, which considers each run of consecutive days having a maximum temperature above the threshold as one event. Thus, the number of distinct extreme heat events is less than the number of days above the threshold since many events last more than one day. In Table 2 each event is allocated to the month in which the event starts. Again, the majority of heat events occur in July and August for each of the locations and more events occurred in the latter decadal period.
June  July  August  

Location  ’76’85  ’06’15  ’76’85  ’06’15  ’76’85  ’06’15 
Pamplona  0  7  19  26  1  32 
Buñuel  7  27  39  74  9  45 
El Bayo  11  15  43  42  23  36 
Morella  2  14  52  48  26  46 
Huesca  16  32  72  112  35  79 
Tornos  5  13  40  55  18  50 
Santa Eulalia  12  10  71  57  36  35 
Calatayud  0  9  20  20  7  30 
Panticosa  4  13  45  60  16  41 
Puebla de Híjar  23  26  87  82  49  62 
Ansó  7  9  72  39  44  43 
Daroca  6  20  63  92  28  76 
Zaragoza  6  26  42  89  13  57 
La Sotonera  6  11  34  41  17  38 
Pallaruelo de Monegros  7  12  33  36  16  26 
Cueva Foradada  12  41  89  124  39  96 
Sallent de Gállego  11  15  85  47  44  47 
Yesa  6  7  45  22  24  28 
June  July  August  

Location  ’76’85  ’06’15  ’76’85  ’06’15  ’76’85  ’06’15 
Pamplona  0  4  8  20  1  17 
Buñuel  6  13  21  35  8  22 
El Bayo  5  8  19  22  16  15 
Morella  2  8  23  21  14  22 
Huesca  6  13  30  47  19  27 
Tornos  3  5  20  32  10  23 
Santa Eulalia  5  5  30  24  13  15 
Calatayud  0  6  12  10  5  15 
Panticosa  2  5  20  27  10  17 
Puebla de Híjar  12  11  32  37  25  22 
Ansó  3  4  26  23  22  19 
Daroca  4  10  24  41  13  32 
Zaragoza  5  10  20  46  10  22 
La Sotonera  1  6  16  21  10  13 
Pallaruelo de Monegros  3  7  15  16  10  13 
Cueva Foradada  8  21  33  53  17  33 
Sallent de Gállego  4  5  24  20  20  21 
Yesa  4  3  22  13  15  12 
Next, we investigate the autoregressive behavior of daily maximum temperature. Let denote the maximum temperature on day at location s and let denote the threshold for location s such that when we have a day with extreme heat. Additionally, define the binary indicator , where indicates a daily maximum temperature above the threshold on day at location s, otherwise it is .
In an exploratory mode, we fit independent AR(1) models using the data from the years 19632012 for each location where is the autoregressive parameter and is pure error. From the estimated
, we compute the kernel density estimate of
for both and . Figure 3 shows kernel density estimates for three representative locations. It reveals both the differences across locations as well as differences when conditioning on . Notably, the distribution is much more diffuse than that of for all three locations.Finally, we explore the variability in temperature for days above threshold and days below threshold by location. For each location, we computed the variance in maximum daily temperature for the days above threshold and days below threshold, as well as the ratio of these variances. As expected, there is much more variation in maximum daily temperatures for temperatures below the threshold than above threshold. The locationspecific variances in daily maximum temperature for the below threshold days range from 49.6 to 83.2 with a mean of 65.0
, whereas the above threshold days range from 1.17 to 2.45 with a mean of 1.72.3 Modeling details
As in the previous section, let denote the daily maximum temperature at time and location . We propose a twostate model where the state for a given day defines whether the location is experiencing an extreme heat event or not. Let denote the state at time for location where a value of 0 denotes the below threshold state and a 1 denotes the above threshold state. We specify a threshold, at each location, developed as discussed in Section 2. Again, we are interested in EHEs at s relative to the climate at s. Here, is a spatial binary time series process. At location s it provides a time series reflecting times of transition or stateswitching. It is observed for each at a monitored site but is latent elsewhere.
Under a typical hidden Markov model, we would assume is a Markov process where the state depends only on the previous state . Then, the distribution of would be specified explicitly given . Here, we take a different approach since our states are not hidden. That is, the states are observed given a threshold and the temperatures are modeled under the restriction to a given state. Conversely, given the threshold, is a binary function of . Furthermore, we introduce a model specification to allow the transition probabilities between states to be a function of previous temperature.
This specification ensures the transition probabilities to be “local”, i.e., to vary with location and to depend upon the previous day’s maximum temperature at that location. For example, if the previous day’s maximum temperature was sufficiently high so that we were in an extreme heat state, then we would expect this to affect the probability of staying in the extreme heat state for the next day. However, since EHEs are fairly rare, it is perhaps more clear to say that we would like a higher probability of being in an EHE state at day if we were in an EHE state at day than if we were not in an EHE state at day . Vice versa, if the previous day’s maximum temperature resulted in a nonextreme heat state. In different words, we are investigating persistence of extreme heat. We elaborate this further below.
Two remarks are useful here. First, we could condition on a longer history of maximum temperatures or an average of previous maxima. We experimented with introducing additional lags in the modeling but found no gain in predictive performance. Second, failing to condition on the previous day’s temperature provides transition probabilities which don’t predict well the persistence of above or below threshold states. Thus, we supply a locationspecific autoregressive model for current daily maximum temperature which depends upon the current day state as well as the previous day’s maximum temperature.
We define our joint distribution for temperature and state in a first order Markov fashion explicitly as follows. Given
, and thus, , for two consecutive time points, and , we write the joint distribution asHence, the process is started with , followed by the distribution , and so on. Figure 4 provides a simple graph of the specification and also reveals how it differs from a customary dynamic model specification.
This formulation requires three model specifications:



.
With
denoting the threshold (quantile) for location
s, for (i) and (ii), we need truncated distributions, i.e., when and when, respectively. We adopt truncated normal distributions, with suitable centering,
(1) 
where the ’s and ’s supply the truncation as follows:
and
Details for are given below.
It is natural to ask why we do not consider a customary generalized Pareto distribution (GPD) for the exceedance distribution, i.e., for the upper tails? First, we are modeling a time series of daily maximum temperatures; we are not modeling peaks over thresholds for a temperature series which is the usual setting for the GPD. Second, we experimented with the GPD, exploring many different specifications for the parameters and were unable to find improved outofsample predictive performance with it. More disconcerting, we were unable to find posterior mass for a shape parameter . All exploratory model fitting yielded a negative shape parameter which implies bounded support for the daily maximum temperatures. While this is practically true, it would force an awkward selection or estimation of a local bound for each s.
Following the exploratory modeling in the previous section, it also emerged that what is needed is a different, in fact, smaller variance for the above threshold daily maximum temperature distribution than for the below threshold daily maximum temperature distribution. This seems evident since the support for daily maximum temperatures above the threshold is much shorter than the support for daily maximum temperatures below the threshold. Otherwise, Gaussian tails seem to be adequate. Further, we introduce spatially varying variances, expecting that variation in say, Jaca (in the Pyrenees in the north of the region) would be different from variation in say, Zaragoza (flat and central in the region). Again, the exploratory discussion in Section 2 supports this.
For (iii) we introduce a probit link to define
(2) 
with given below.
Now, we turn to the specifics of s and s. There are many possibilities that we can imagine and illustrative choices follow: For consider
Here, denotes a global (across the domain for our dataset) intercept and denotes a local spatial intercept, i.e., providing local adjustment to the global intercept. Each is modeled as a mean Gaussian process with exponential covariance function. For , where denotes the greatest integer function, we are counting years with this subscript and, as a result, the ’s provide annual intercepts to allow for yearly shifts, i.e., for hotter or colder years. The sin and cos terms are introduced to capture annual seasonality with their coefficients reflecting associated amplitudes. In leap years, the specification is analogous, replacing 365 with 366 days. This seasonality is critical to ensure that an annual daily maximum temperature trajectory over the course of a year at a location will provide sensible realizations.
We note that with circularity in time, this definition yields a discontinuity in from December 31 to January 1.^{1}^{1}1We could start the year at another day but, regardless, we will always experience a jump on that day. However, since time is modeled discretely, we do not expect that this will be a concern with regard to EHE behavior. Elev is the elevation at s and Lat is the latitude. We explored additional potential site level covariates, e.g., slope, aspect, distance from water (distance from the Ebro river and the Mediterranean sea) but found no improved predictive performance. Finally, provides a centered AR(1) specification, bringing in the previous day’s temperature, .
For we propose
Here, centering by the threshold yields more sensible transition probabilities. can be moved over to the intercept term in order to provide a spatially varying offset. However, the inclusion of , modeled as a Gaussian process, allows for a richer spatiallyvarying intercept. Writing in terms of being a deviation from suggests that there is no need to model or as spatially varying coefficients. The term with coefficient provides a slope adjustment according whether we are previously in state or in state . This adjustment ensures continuity in (hence, ) as a function of .
These specifications are offered as an attempt to provide regressors that capture the critical features that drive daily maximum temperatures with transitions relative to a threshold state. Other variations of these specifications could be considered. For instance, indicator functions could be considered to introduce an intercept and/or slope adjustment. Also interaction effects might be examined. We explored a few of these richer model specifications and found no additional benefits in model performance.
4 Model fitting and comparison
We briefly summarize the complete model specification including priors as well as the model fitting and its challenges. Then, we turn to the outofsample model comparison.
4.1 Full specification of the model and fitting details
Model inference is obtained in a Bayesian framework, requiring prior distributions for each of the model parameters. When possible, diffuse and conjugate prior distributions are assigned. We start with the parameters of models (i) and (ii),
and . The autoregressive parameters and are each assigned a noninformative and independent Uniform prior distribution. The coefficient parameters , , and as well as , , andare each assigned independent and diffuse normal prior distributions with mean 0 and standard deviation 100. Independent normal prior distributions with mean 0 and standard deviation 100 are also assigned to the coefficients of the seasonal terms,
and . The yearly random effects and are assigned normal distributions with mean 0 and standard deviation 1. For identifiability, each of the random effects for the first year are fixed to 0.Mean 0 Gaussian process priors are assumed for both and . The spatial covariance matrix is specified using the exponential covariance function. The variance parameter for each of the spatial covariances is assigned an independent InverseGamma (2,2) prior distribution. Similarly, the spatially varying variance parameters, log( and log
, are also assigned independent Gaussian process priors. Hyperpriors are assigned to the mean and variance of both processes; each mean is assigned a N(0,1) and each variance is assigned an InverseGamma(2,2). Whereas the other Gaussian processes were mean 0, specifying a hyperprior for the mean of the spatial variances enabled Bayesian learning with respect to the variances of the above and below threshold processes. An exponential covariance function is again used to specify the spatial dependence.
The following priors are assigned to the parameters of model (iii), . Here, independent normal prior distributions with mean 0 and standard deviation 100 are assigned to the coefficients , , , , The spatial random effect is assigned a mean 0 Gaussian process prior. The variance is assigned an InverseGamma (2,2) hyperprior and the spatial covariance is again specified with an exponential covariance function.
Altogether, the model consists of five Gaussian processes, , , log(), log(, and . Spatial models are needed for each of these components in order to be able to predict at new locations. Each of these Gaussian processes is assumed independent with the range parameter of the exponential covariance function fixed such that the effective range is equal to 400 km. This distance is approximately the maximum distance across the Comunidad Autónoma de Aragón region from the north to the south and was chosen to capture largescale spatial dependence across the region. This choice assumes that the local behavior in daily maximum temperature will be captured by the autoregressive process in the model.
Markov chain Monte Carlo is used to obtain samples from the joint posterior distribution. The sampling algorithm is a MetropoliswithinGibbs algorithm. Posterior draws of each of the spatial random effects are obtained using an efficient elliptical slice sampler (Murray et al., 2010).
4.2 Model comparison and inference
For model comparison, we confine ourselves to just two models, the specification above and a corresponding specification which ignores state relative to threshold. This second model will be an AR(1) time series model for the daily maximum temperatures. This is analogous to our models in (i) above with , however, now without the truncation according to the state. That is, we specify a normal distribution for . In both models we adopt spatially varying variances for prediction at new locations. Under the simpler model, we impose thresholds on the posterior predictive distributions after model fitting in order to capture EHEs and their characteristics.
Model comparison is implemented by singlepoint deletion (leaveoneout) validation. While we have done single site deletions for all sites, we show results for three illustrative locations, one in the southern part of the region, one central, and one in the north. With primary interest in capturing persistence of EHEs, comparison is made using conditional and marginal error rates (defined below) with regard to prediction of an EHE day. We shall see that our proposed model is substantially better at such prediction.
Then, using our model, inference is provided, employing posterior predictive summaries, for the EHE characteristics presented in Section 1  duration, maximum exceedance above threshold and average exceedance above the threshold. Such inference is provided for the entire time span of the data as well as by decade for two decades of interest.
The thresholds, , are known/observed for each monitoring site, adopting a th percentile of the daily maxima at a site over the extreme heat period, June, July, and August (e.g., Abaurrea et al., 2007) as described in Section 2. These values were used directly in the model fitting for the twostate models specified above and are known for our leaveoneout validation. For prediction beyond our sites, we assume these thresholds can be supplied from external information for unobserved locations of interest. Alternatively, with interest in prediction of given an observed set , we could propose a stochastic process specification. Spatial quantile regression models using asymmetric Laplace processes (ALP) are one approach (Lum and Gelfand, 2012); however this is beyond the scope of our work here.
5 Results of model comparison and inference
We begin with the results of the model comparison using leaveoneout cross validation and then provide the inferential summaries from our model for the region of interest. Comparison is between an autoregressive model which ignores thresholds in the specification (base model) and the model presented in Section 3. Each model was fitted to the daily maximum temperature data for the 50 years spanning 1966 to 2015. Markov chain Monte Carlo was run for 200,000 iterations. The first half of each chain was discarded as burnin and the remaining samples were used for posterior inference and prediction.
5.1 Model comparison
A leaveoneout comparison of the model is carried out with three series, Zaragoza, Tornos and Yesa. These series show the variation in climate across the study region. Zaragoza is located in the center of the region, surrounded by other locations in the dataset with similar climate. Yesa is located in a valley in the northwest part of the region, with a climate that is quite different from that of the rest of locations in that area. Tornos is near the southwest border of the region with a climate much different from that of its neighbors to the east with the same latitude.
As noted in the previous section, we conduct our model comparison through prediction of exceedance days. The prediction of an exceedance day requires prediction of . Since
is a binary variable, it is natural to assess its performance in terms of misclassification error rates. There are two possible errors here: (i) predicting
when and (ii) predicting when . Arguably, the second error has more impact, since it means failing to predict well an extreme heat exceedance given that it happened.Three different measures of error for days when are considered, two conditional and one marginal error. The first conditional error is
where we condition on previous days which were exceedance days. It gives the error in persistence of heat days, so that smaller errors of this type imply better prediction of persistence. The second conditional error is , where we condition on previous days which were non exceedance days, and it represents the error in predicting the first day of an EHE. The marginal error is , i.e., given the previous day’s temperature regardless of whether or not it was an exceedance.
Each error rate is estimated as the average of daily point estimates for over a selected time window. Here, we choose a 92 day window, arising from the months June, July and August (JJA), since this is when most of the exceedance events occur. We can then average over a year, decade, or the entire time window.
Figure 5 shows the distributions of and , for all exceedance days in JJA across all years for the three outofsample locations. It is observed that in both cases, the distribution of the errors of the base model is shifted towards higher values, and the modes of the errors in our model are lower than in the base model.
The posterior predictive mean estimates of the error rates for all exceedance days in both models are summarized in Table 3 for the two 10year periods. The performance of our model is clearly better, since the marginal errors and the errors conditioned on are lower in all the cases, with a reduction in the latter that varies from 17 to 48%. Therefore, our model captures the observed persistence of exceedance days better than the base model. With regard to the conditional measures of error given , or rather, those concerning the predictions of the first day of an EHE, our model outperforms the base model with less distinction. For example, in Tornos, the errors of our model are lower, but closer to those of the base model, while in Zaragoza, the errors of both models are similar.
While these error rates might seem large, it is important to clarify their interpretation and utility. Recall that exceedance days are not common; the locationspecific thresholds are at the 95% percentile. Furthermore, when they occur, there are usually short in duration meaning their persistence is low. (We note that failure rate in predicting them can not be compared to say, a fair coin flip since coin flips would reveal dismal predictive performance outside of JJA.)
To investigate the predictive performance of our model with regard to persistence, we expand our conditional measures of error. Table 4 shows the posterior predictive mean estimates of the conditional error rates for all exceedance days, given that the EHE has already persisted for one, two, or three days. In general, the conditional error estimates are lower for those events with higher levels of persistence. Compared to the base model, the error rates in our model are lower for each location and level of persistence.
19761985  20062015  

Base  Our  Base  Our  
Model  Model  Model  Model  
Marginal Error:  
Tornos  0.72  0.56  0.72  0.60 
Zaragoza  0.70  0.63  0.64  0.59 
Yesa  0.77  0.68  0.75  0.68 
Conditional Error, :  
Tornos  0.58  0.39  0.56  0.40 
Zaragoza  0.61  0.44  0.48  0.41 
Yesa  0.63  0.48  0.62  0.51 
Conditional Error, :  
Tornos  0.85  0.72  0.86  0.78 
Zaragoza  0.79  0.79  0.82  0.80 
Yesa  0.85  0.85  0.88  0.84 
Base  Our  

Conditional Error  Model  Model 
:  
Tornos  0.58  0.40 
Zaragoza  0.65  0.48 
Yesa  0.61  0.54 
:  
Tornos  0.52  0.34 
Zaragoza  0.58  0.41 
Yesa  0.56  0.44 
:  
Tornos  0.51  0.33 
Zaragoza  0.56  0.37 
Yesa  0.54  0.42 
5.2 Model adequacy
We briefly consider model adequacy with regard to the main objective of our study: outofsample prediction of characteristics of extreme heat events. We do this by comparing the posterior predictive distribution of exceedance days as well as EHE characteristics (duration, and intensity) with the observed empirical counterparts. Comparisons are made for the entire time window of the analysis, 19662015, as well as for two 10year periods, 19761985 and 20062015 to examine the time evolution.
Using posterior predictive samples of time series for each of the three outofsample locations, we compute the mean and 90% credible interval of the probability density for events lasting 3 days, 45 days, 67 days, and 8 or more days. That is, for each time series, we compute the proportion of EHEs during a specified time window lasting each of these durations and then compute the mean and 90% credible intervals of these estimates over the posterior predictive time series. These estimates are shown for each of the locations computed over the entire time window in Figure
6. The empirical probabilities are also shown for each location and duration length. The results reveal that, for each site and for each duration bin, our predictive intervals always capture the observed/true proportion. Similar plots are included in the supplementary material for the two decades of interest, 19761985 and 20062015 (Figure S2) and reveal similar results.Next, we turn to the two measures of the intensity of an EHE, the average and maximum excess (degrees over the threshold) during the event (Abaurrea et al., 2007). Following the WMO suggestion provided in Section 1, we focused on EHEs lasting three or mores days. For each posterior predictive time series, we computed the average (or maximum) excess for each EHE and then obtained the cumulative probability of the average (or maximum) excess being greater than or equal to a set of discrete values. These cumulative probabilities provide an estimate of the distribution of average and maximum excess for each location. Using all of the posterior predictive time series samples, we computed the mean and 90% credible of these cumulative probabilities for each location. The estimates are shown in Figure 7 for the average (top) and maximum (bottom) excess for each out of sample locations over the entire time window of the study. Our model appears to capture these cumulative probabilities well for both average and maximum exceedance. Similar plots are shown in Figures S3 and S4 of the supplementary material for the years 19761985 and 20062015 with similar conclusions.
5.3 Model inference
5.3.1 Models for daily maximum temperature
Both the above and below threshold processes for daily maximum temperature include global and local spatial intercepts, annual random effects, fixed effects of trigonometric terms to provide annual seasonality, spatial covariates, elevation and latitude, and an autoregressive term to model temporal autocorrelation. The posterior distributions for the autoregressive coefficients and the coefficients of the spatial covariates are summarized by their mean and 90% credible intervals in Table 5. The temporal dependence is similar in both models with a high autoregressive coefficient of approximately 0.7. The elevation coefficient, which represents the gradient of temperature with respect to the elevation, is negative in both cases, but slightly smaller in magnitude for the below threshold process. On the other hand, there is evidence of a latitudinal effect only in the below threshold process, where it reflects a temperature decrease with increasing (northern) latitudes. The credible intervals are more precise in the above threshold process due to a larger sample size.
Below threshold  Above threshold  

process  process  
Intercept  18.74 (18.60, 18.88)  23.58 (22.90, 24.24) 
Elevation  1.62 (1.88, 1.34)  2.12 (3.26, 1.10) 
Latitude  1.24 (2.10, 0.45)  0.14 (2.02, 3.10) 
Autoregressive coefficient  0.73 (0.73, 0.73)  0.68 (0.65, 0.71) 
Boxplots showing the posterior distributions of the annual random effects, the ’s, in the above and below models are shown in Figure 8. The time evolution is quite different for the two models: while the below threshold process shows a clear increasing trend of daily maximum temperature through time, no monotonic temporal trend is detected in the above threshold process. This signifies that the model detects an overall warming trend in daily maximum temperature between the years 1966 and 2015, yet the extreme heat temperatures are remaining relatively constant. A similar conclusion was obtained in Abaurrea et al. (2007), who did not find any evidence of trend in the distribution of the maximum and mean intensity of EHEs from daily temperature data obtained for the same region. This important inference can only be obtained from specifying different models for daily maximum temperatures above and below the threshold.
The posterior distributions of the spatial random effects, the ’s, at the observed locations suggest that, in general, the spatial local adjustment to the global intercept is not very strong. Only the posterior distributions for Pallaruelo de Monegros and Tornos did not contain zero (supplementary material Figure S5). We did detect spatial differences in the standard deviations, , across the region, both in the above and below threshold process, supporting the hypothesis that variation in different locations is not the same. Additionally, the mean of the spatial processes of the above and below threshold standard deviations had posterior means of 1.75 and 2.75, respectively. This is in accord with our exploratory data analysis in Section 2. Boxplots of these posterior distributions are given in the supplementary material (Figure S6).
5.3.2 The above and below threshold state models
The model for the above and below threshold states includes a global and local spatial intercept, fixed effects for the difference between the previous temperature and the threshold, and trigonometric terms. The posterior distributions of the spatial random effects, , show large spatial variation in the model for the above and below threshold states. Boxplots of these posterior distributions are shown in Figure S7 of the supplementary material.
The posterior mean estimates of the conditional probabilities across day of the year for three values of corresponding to a previous day with temperature below, equal to, and above the threshold, are shown in Figure 9. These conditional probabilities are shown for Tornos (left) and Buñuel (right), as these locations have the most extreme random effects for the local intercept . Since the model is specified using as opposed to raw temperature values, we do not expect to see major differences between these distributions, even though the climate and altitude of these two locations are different. However, we do see larger probabilities for Buñuel than Tornos for all three values of indicating a greater likelihood for EHE events to occur or persist. The mean estimates of the probabilities show a clear seasonal behavior, modeled by the trigonometric terms, with the maximum value occurring on day 203 (July 22, July 21 in leap years). The effect of the previous daily temperature (relative to the threshold) on the probability of being over the threshold is also very strong. For example, the probability in the middle of summer when C is nearly three times the probability when C. This difference is evidence of the increased probability of persistence compared to the probability of the first exceedance day of an EHE.
6 Discussion and future work
We have proposed a threshold driven twostate model for learning about features of extreme heat events over a year period for the region of Aragón in Spain. These features include incidence, duration, maximum of the daily maximum temperatures over the event, and average exceedance above the threshold over the event. Using leave one out validation, we demonstrate that our model predicts these features well.
Future work includes the possibility of comparison of results for the region of Aragón with other regions in Spain since national databases are available. In this regard, we could consider a national assessment of EHE characteristics. In order to do this we would likely need to extend our model to include spatially varying coefficients, anticipating that, e.g., mountainous response to predictors would be different from coastal response to the predictors. An additional extension would envision that there is seasonal variation in uncertainty and would encourage timevarying variances beyond our current above and below spatially varying specifications. A further challenge is to use our approach for forecasting future EHE behavior. Using regional climate model scenarios over 50 year windows, we look to provide useful insight into the evolution of EHE behavior over this future time period.
References
 Modeling and forecasting extreme hot events in the central Ebro valley, a continentalMediterranean area. Global and Planetary Change 57 (12), pp. 43–58. Cited by: §4.2, §5.2, §5.3.1.
 Modelling the occurrence of heat waves in maximum and minimum temperatures over Spain and projections for the period 203160. Global and Planetary Change 161 (), pp. 244–260. Cited by: §1.
 Modeling and projecting the occurrence of bivariate extreme heat events using a nonhomogeneous common Poisson shock process. Stochastic Environmental Research and Risk Assessmeant 29, pp. 309–322. Cited by: §1, §1.
 Global observed longterm changes in temperature and precipitation extremes: a review of progress and limitations in IPCC assessments and beyond. Weather and Climate Extremes 11, pp. 4 – 16. Cited by: §1.
 Projections of heat waves with high impact on human health in Europe. Global and Planetary Change 119 (), pp. 71 – 84. Cited by: §1.
 Heatwave and health impact research: a global review. Health & Place 53, pp. 210 – 218. Cited by: §1.
 Nonstationary extreme value analysis in a changing climate. Climatic Change 127 (2), pp. 353–69. Cited by: §1.
 Bayesian spatial modeling of extreme precipitation return levels. Journal of the American Statistical Association 102 (479), pp. 824–840. Cited by: §1.
 Analysis and modeling of extreme temperatures in several cities in Northwestern Mexico under climate change conditions. Theoretical and Applied Climatology 116 (12), pp. 211–25. Cited by: §1.
 Statistical paleoclimate reconstructions via Markov random fields. The Annals of Applied Statistics 9 (1), pp. 324–352. Cited by: §1.
 Increased risk of heat waves in Florida: characterizing changes in bivariate heat wave risk using extreme value analysis. Applied Geography 46, pp. 90 – 97. Cited by: §1.
 Investigating teleconnection drivers of bivariate heat waves in Florida using extreme value analysis. Climate Dynamics 44 (11), pp. 3383–3391. Cited by: §1.
 Frequency analysis and temporal pattern of occurrences of southern Quebec heatwaves. International Journal of Climatology 25 (4), pp. 485–504. Cited by: §1.
 Use of historical data to assess regional climate change. Journal of Climate 32 (14), pp. 4299–4320. Cited by: §1.
 Evolution of heat wave occurrence over the Paris basin (France) in the 21st century. Climate Research 61 (), pp. 75–91. Cited by: §1.
 Spatial quantile multiple regression using the asymmetric Laplace process. Bayesian Analysis 7 (2), pp. 235–258. Cited by: §4.2.

Elliptical slice sampling.
Journal of Machine Learning Research: Workshop and Conference Proceedings (AISTATS)
9, pp. 541?548. Cited by: §4.1.  Statistical models for earthquake occurrences and residual analysis for point processes. Journal of the American Statistical Association 83 (401), pp. 9–27. Cited by: §1.
 On the Measurement of Heat Waves. Journal of Climate 26 (13), pp. 4500–4517. Cited by: §1.
 A hierarchical model for seriallydependent extremes: a study of heat waves in the western US. Journal of Agricultural, Biological, and Environmental Statistics 19 (1), pp. 119–135. Cited by: §1.
 A spatial markov model for climate extremes. Journal of Computational and Graphical Statistics 28 (1), pp. 117–126. Cited by: §1.
 A Markovswitching model for heat waves. The Annals of Applied Statistics 10 (1), pp. 74–93. Cited by: §1, §1.
 Markov chain models for threshold exceedances. Biometrika 84 (2), pp. 249–268. Cited by: §1.
 Heat waves in the United States: definitions, patterns and trends. Climatic Change 118 (3), pp. 811–825. Cited by: §1.
Comments
There are no comments yet.