Modeling Climate Variables of Rivers Basin using Time Series Analysis (Case Study: Karkheh River Basin at Iran)

Stochastic models (time series models) have been proposed as one technique to generate scenarios of future climate change. Precipitation, temperature and evaporation are among the main indicators in climate study. The goal of this study is the simulation and modeling of climatic parameters such as annual precipitation, temperature and evaporation using stochastic methods (time series analysis). The 40-year data of precipitation and 37-year data of temperature and evaporation at Jelogir Majin station (upstream of Karkheh dam reservoir) in western of Iran has been used in this study and based on ARIMA model, The auto-correlation and partial auto-correlation methods, assessment of parameters and types of model, the suitable models to forecast annual precipitation, temperature and evaporation were obtained. After model validation and evaluation, the Predicting was made for the ten future years (2006 to 2015). In view of the Predicting made, the precipitation amounts will be decreased than recent years. As regards the mean of annual temperature and evaporation, the findings of the Predicting show an increase in temperature and evaporation.


Introduction
The study of meteorological parameters is very important in hydrology problems, since the same parameters generally from the climate of a region and is due to variations caused by water, wind, rain, etc. that issues problems, such as flood and drought.Therefore, accuracy in data collection of such parameters is of particular importance.The study of long statistical term of the behavior and changes in climate parameters and analysis of the results obtained as well as the study of the behavior of a phenomenon in the past can analyze its probable trend in the future, too.Therefore, one can study the climatic variations using predicting and estimation of parameters, such as precipitation, temperature and evaporation and studying their behavior in the past [1].There are, basically, two approaches to predict the natural climatic variations such as streamflow, precipitation and etc.: physical and statistical models, where the first one includes the rainfall-runoff hydrological model and the second covers data-driven methods such as time series.This type of model adjusts the series using the estimated parameters of the historical data, and does not consider any exogenous information that could affect the hydrological regimes and, consequently, the electricity generation.Statistical methods include two objectives: 1-understanding of random processes, 2-forecasting (predicting) of series in future [2].The time series analysis of the statistical region in hydrology from climate variables data at river basines, which is wide domain, statistics has significant as a powerful method for analyze hydrology in time series to predict the non-seasonal periods in hydrologic data using time series analysis (ARIMA model).The time series analysis is used for building arithmetical models to computation of statistics from climate variables data using ARIMA model.Time series analysis has quickly developed in theory and practice since 1970s to predict and control the types of climate parameters such precipitation, temperature and etc.This type of analysis is generally related to data which are not independent and are consecutively dependent to each another.Shamsnia et al (2009) used stochastic methods in modeling of the monthly precipitation and the mean monthly of Temperature in Shiraz station at Iran.They implied that the suitable models for predicting of monthly precipitation and the mean monthly of temperature are ARIMA (0,0,0)(2,1,0) and ARIMA (2,1,0)(2,1,0) respectively [3].In a study, Gurudeo and Mahbub (2010) applied time series analysis for rainfall and temperature interactions in coastal catchments of Queensland, Australian.They implied that ARIMA model is suitable for prediction of this series [4].Eni and Adeyeye (2015) applied seasonal ARIMA modeling for predict of rainfall in Warri town, Nigeria.The ARIMA (1,1,1)(0,1,1) fitted to this series with AIC value of 281.Model adequacy checks showed that the model was appropriate.Coefficient of the fitted model was finalized by the residual tests [5].Sarraf et al. (2011) used ITSM software for modeling and predicting relative humidity and the mean monthly of temperature in Ahvaz synoptic station at Iran.The ARIMA (0,0,1)(0,1,1) and ARIMA (0,0,1)(2,1,2) models fitted to predict mean monthly temperature and relative humidity series respectively [6].In a study, Jahanbakhsh and Babapour-Basser (2003) used the ARIMA model for mean of monthly temperature of Tabriz station in Iran.In this study, the monthly temperature of Tabriz for a forty year statistical period  was examined based on auto-correlation and partial auto-correlation methods as well as controlling the normality of residues [7].Wang et al. (2014) used the improved ARIMA model to predict the monthly precipitation at the Lanzhou station in Lanzhou, China.The results showed that the accuracy of the improved model is significantly higher than the seasonal model, with the mean residual achieving 9.41 mm and the predict accuracy increasing by 21% [8].Dodangeh et al (2012) employed time series modeling to predict climatic parameters such as evaporation, temperature, relative humidity.The selected ARIMA models were ARIMA(0,0,1)(0,0,1), ARIMA(2,0,4)(1,1,0), ARIMA(4,0,0)(0,1,1), ARIMA (1,0,1)(0,1,1), ARIMA (1,0,0)(0,1,1) for relative humidity, evaporation, air temperature, wind speed and sunshine, respectively [9].Also Frausto et al. (2003) showed that AR and ARMA models could be used to describe the inside air temperature of an unheated [10].Hamidi machekposhti et al. ( 2017) studied the stochastic model to inflow of Karkheh dam at Iran and suggested ARIMA (4,1,1) is the best stochastic model for annual mean streamflow [11].Sathish and Khadar Babu (2017) predicted hydrology time series for water resources (such as flood) using stochastic time series analysis (Thomas-Fiering model) from the river basins In India [12].Ayare and Dhekale (2015) studied multiplicative seasonal ARIMA models for monthly stream flows of Choriti River at at Natuwadi dam site of Konkan region, Maharashtra based on 20 years data.They found that ARIMA(0,0,1)(0,1,1) model indicated closer agreement between forecasted and historical monthly inflow series [13].Akpanta et al. (2015) made Seasonal ARIMA (SARIMA) models SARIMA Modeling for Monthly Rainfall in Umuahia, Abia State of Nigeria.They suggested that SARIMA (0,0,0)(0,1,1) model can be used for monthly forecasting [14].Etuk, et al. (2014) identified and established the adequacy of a Seasonal ARIMA (5,1,0)(0,1,1) for modeling and forecasting the amount of monthly rainfall in Portharcourt, Nigeria [15].Bari et al. (2015) Forecasted monthly precipitation in Sylhet city at Bangladesh using ARIMA Model.They found that the ARIMA(0,0,1)(1,1,1) was found to be the most effective to predict future precipitation with a 95% confidence interval [16].Therefore, considering the importance of climatic parameters of precipitation, temperature and evaporation and the importance they have in determining the roles of other climatic elements, their modeling and predicting using advanced statistical methods is a necessity and could be a basic pillar in hydrology, agriculture and water resource management.
The goal of the present study is to analyze the behavior of climatic parameters of precipitation, temperature and evaporation, simulation and providing a model to predict parameters under study using the statistical models of time series analysis (ARIMA models) in the Jelogir Majin station (upstream of Karkheh dam reservoir) of the Karkheh Basin at western of Iran.The purpose of this study is: (1) To generate or develop stochastic time series model (ARIMA model) for prediction of climatic parameters of precipitation, temperature and evaporation in Karkheh river basin (2) To estimate parameters of ARIMA model for annual precipitation, temperature and evaporation and (3) To test the validity of the annual predicted precipitation, temperature and evaporation with measured and evaluated the performance of the best selected model.

Materials and Methods
In this study, the annual data on the precipitation, temperature and evaporation of Jelogir Majin station was used and the required information was collected from the tables and the databases available of Iran Water Resources Management Company (IWRMC).Jelogir Majin station in Karkheh river basin in Khuzestan province in western part of Iran is located at 46˚ 57′ and 49˚ 10′ E Longitude and 31˚ 48′ and 34˚ 58′ N Latitude with its elevation ranging 1216 m and with the area of 50000 km 2 .The mean annual precipitation 510 mm and mean annual temperature for the study area about 13.5 c and also evaporation more than 2000 mm in year.The geographical location of the study region is shown in Figure 1 (station number 9 in Figure 1).This station is located at the upper reaches of the reservoir of Karkheh dam and is the supplier of the most water entering the dam reservoir and has the greatest impact on reservoir water dam.The statistical period under study is the crop years 1966 to 1967 through 2015 to 2016 for precipitation series and the crop years 1969 to 1970 through 2015 to 2016 for temperature and evaporation series.The statistics related to the crop years 2005 to 2006 were used to train the models and last ten years were used to predict.Initially, the homogeneity of data was confirmed using the run test statistical method.Essentially, homogeneous test before statistical analysis on data should be taken to ensure the stochastic data.Homogeneous data was done using SAS and SPSS softwares.Then, based on the results obtained and studying the sequence of observations and the past behavior of the phenomenon, the appropriate model was devised to predict using time series analysis and stochastic methods.In order to model the data, they were fixed after preparing the time series of observations of annual precipitation, temperature and evaporation separately.For fitting ARIMA model to the time series of the new data sequences, the basis of the approach consists of three phases: model identification, parameter estimation and diagnostic checking [17].Identification stage is proposed to determine the differencing required to produce stationary and also the order of Auto Regressive (AR) and Moving Average (MA) operators for a given series.Stationary is a necessary condition in building an ARIMA model that is useful for predicting.A stationary time series has the property that its statistical characteristics such as the mean and the auto-correlation structure are constant over time.When the observed time series presents trend and heteroscedasticity, differencing and power transformation are often applied to the data to remove the trend and stabilize variance before an ARIMA model can be fitted.Estimation stage consists of using the data to estimate and to make inferences about values of the parameters conditional on the tentatively identified model.The parameters are estimated such that an overall measure of residuals is minimized.This can be done with a nonlinear optimization procedure.Several methods are available for estimating parameters including Maximum Likelihood (ML), Conditional Least Squares (CLS) and Unconditional Least Squares (ULS).Among these methods, maximum likelihood seems to be the best.The parameters should be statistically significant α = p% and satisfy two conditions, namely stationary and invertibility for auto-regressive and moving average models, respectively.The diagnostic checking of model adequacy is the last stage of model building.This stage determines whether residuals are independent, homoscedastic and normally distributed.Several diagnostic statistics and plots of the residuals can be used to examine the goodness of fit; the tentative model should be identified, which is again followed by the stage of parameter estimation and model verification.Diagnostic information may help to suggest alternative model(s).The most common tests applied to test time-independence and normality is the Portmanteau lack of fit test, the nonparametric Kolmogorov-Smirnov and Anderson-Darling tests.This three step model building process is typically repeated several times until a satisfactory model is finally selected.The final selected model can then be used for prediction purpose.By plotting original series trend in the mean and variance may be revealed [18].The ARIMA model is essentially an approach to forecasting time series data.However, the ARIMA model requires the use of stationary time series data [19].

The Modeling Procedures
Modeling is made using time series analysis by several methods.One of which is the ARIMA or Box-Jenkins method being called the (p,d,q) model [18].In the (p,d,q) model, p denotes the number of auto-regressive values (AR), q denotes the number of moving average (MA) values and d is the order of differencing (I), representing the number of times required to bring the series to a kind of statistical equilibrium.In an ARIMA model (p,d,q) is called the nonseasonal part of the model, p denotes the order of connection of the time series with its past and q denotes the connection of the series with factors effective in its construction.The mathematical formulation of ARIMA models Shown by Equation 1. Analysis of a time series is made in several stages.At the first stage, the primary values of p, d and q are determined using the Auto-correlation Function (ACF) and Partial Auto-correlation Function (PACF).A careful study of the ACF and PACF diagrams and their elements will provide a general view on the existence of the time series, its trend and characteristics.
In Equation 1, X(t) is the variable parameter in instant t and Z(t) is the remaining parameter in the model is the white noise variance [20].This general view is usually a basis for selection of the suitable model.Also, the diagrams are used to confirm the degree of fitness and accuracy of selection of the model.At the second stage, it is examined whether p and q could remain in the model or must exit it.At the third stage, it is evaluated whether the residue (the residue error) values are stochastic with normal distribution or not.It is then, that one can say the model has a good fitness and is appropriate.

Model Selection Criteria
Several appropriate models may to be used to select a model to analyze time series or generally data analysis to present a given set of data.Sometimes, selection is easy, whereas, it may be much difficult in other times.Therefore, numerous criteria are introduced to compare models which are different from methods for model recognition.Some of these models are based on statistics summarized from residues (that are computed from a fitted scheme) such as the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and Schwartz Bayesian Criterion (SBC) and others are determined based on the predicting error (that is computed from predicting outside the sample) such as the Mean Percent Error (MPE) method, the Mean Square Error (MSE), The Mean Absolute Value Error (MAE), and the Mean Absolute value Percent Error (MAPE).The model in which the above statistics are the lowest will be selected as the suitable model.Akaike (1979) suggests a mathematical formulation of the parsimony criterion of model bulding as Akaike Information Criterion (AIC) for the purpose of selecting an optimal model fits to a given data.Mathematical formulation of AIC is defined as [21]: Where "M" is the number of AR and MA parameters to estimate, σ2a is residual variance and "n" is the number of observation.The model that gives the minimum AIC is selected as a parsimonious model.The summarized steps of time series modeling (ARMA and ARIMA model) are shown in Figure 2.

Results and Discussion
Time series of precipitation, temperature and evaporation in Jelogir Majin station were showed in Figures 3 to 5. In this figures, we have showed trend line in order to checking of stationarity of series.Figure 4 showed that there is little increase trend for temperature.Figures 3 and 5 showed little decrease trend for precipitation and evaporation too.The natural series are not stationary, for this reason we have first differencing (d=1) of natural data to achieving stationary series.An analysis of significant ACF and PACF plots (Figures 6 to 8) implies the peaks in 4, 6, 7 and 8 lag times for stationary series because in this lag times coefficient of ACF and PACF are out of confidence limit.The maximum likelihood (ML), conditional least square (CLS) and unconditional least square (ULS) methods are used to estimate the model parameters.The result of values for the parameters of three models (1,1,0), (1,1,1) and (8,1,1) for precipitation, three models (1,1,0), (1,1,1) and (6,1,1) for temperature and three models (1,1,0), (1,1,1) and (7,1,1) for evaporation showed in Tables 1 to 3.These tables showed that all models are suitable for modeling this set data because values of p and q parameters in ARIMA(p,1,q) model are less than 1 therefore their have stationarity and invertibility conditions.Then we enter the next stage that is diagnostic check stage.After estimating the model parameters, the diagnostic checking is applied to see if the model is adequate or not.Therefore Portmanteau lack of fit test and residual auto-correlation function test are used for diagnostic checking.The results of Portmanteau and residual auto-correlation function test indicate in Tables 4 to 6 and Figures 9 to 11 Showed that all selected model are adequate for studied series data.In Tables 4 to 6 probability values of selected models are bigger than our probabilistic level that is 0.05 therefore selected models are suitable.Figures 9 to 11 showed auto-correlogram of residual series parameter for annual precipitation temperature and evaporation respectively.In this figures all values of residual series parameters are in of confidence limit therefore selected models are suitable.Also the goodness of fit statistic is shown in Table 7.This table shows Akaike's Statistic values for all selected model.Model with lowest AIC value is the best model.Therefore, the ARIMA(8,1,1) models in CLS estimation parameter method for precipitation and ARIMA(6,1,1) and ARIMA(7,1,1) models in ML estimation parameter method for temperature and evaporation respectively are the best model.According to the results, predicted data for the agriculture years 2006 to 2015 was shown in Table 8 and Figures 12 to 14. Figures 15 to 17 showed the correlation between observed (x) and predicted (y) data from ARIMA models in crop year 2006 to 2015 in order to model validation.In this figures coefficient of determination (R 2 ) for precipitation, temperature and evaporation obtained 0.69, 0.63 and 0.84 respectively.Therefore, because of the strong correlation of data, the selected model is suitable for simulating and predicting of the precipitation, temperature and evaporation.At the end, the models were designed to predict the years 2006-15, the results are consistent with the reality that occurred in that area.

Conclusions
Time series analysis is an important tool in modeling and predicting climatic parameters such as precipitation, temperature and evaporation data.In this study we used the Box-Jenkins model (ARIMA model) to simulate and predict the annual mean of precipitation, temperature and evaporation parameters and the final model was tested using AIC criterion over the Karkheh river basin at Iran.The ARIMA model (8,1,1) in CLS estimation method was suitable model for annual precipitation.Also The ARIMA model (6,1,1) and (7,1,1) in ML estimation method were suitable for annual temperature and evaporation respectively.These models were developed considering step-wise analysis, nonseasonal parameters, and various diagnostic checks.The predicting results for the upcoming ten years are considered to be excellent and accurate.This will certainly assist policy makers and decision makers in planning for any kind of disaster or extreme condition in every district town of the Karkheh river basin by generating scenarios for the next few years.For model validation, coefficient of determination (R 2 ) for climate variables obtained 0.69, 0.63 and 0.84 respectively.Consequently, the models can be used for predicting of studied variables.In view of the predicting made, it is likely that the precipitation will decrease.As regards the mean temperature and evaporation, the trend of increasing this two series, especially in recent years, has continued and the findings of the predicting showed an increase in temperature along with a narrowing of the range of variations.

Figure 2 .
Figure 2. Steps of time series modeling (ARMA and ARIMA model) In the present study, ARIMA model, SPSS and SAS softwares, AIC Criterion were used for modeling and predicting some climatic parameters include the precipitation, temperature and evaporation.The SPSS and SAS softwares determine the best model with minimum AIC.Also, the best model validated using model efficency.So, in order to evaluate the best model, the correlation coefficient was used in crop years 2006 to 2007 through 2015 to 2016 for studied series.

Figure 9 .
Figure 9. Auto-correlogram of residual series parameter for annual precipitation

Table 4 . Result of auto-correlation check of residuals annual precipitation
ML: Maximum LikelihoodCLS: Conditional Least Square ULS: Unconditional Least Square

Table 5 . Result of auto-correlation check of residuals annual temperature
ML:

Maximum Likelihood CLS: Conditional Least Square ULS: Unconditional Least Square Figure 10. Auto-correlogram of residual series parameter for annual temperatureTable 6 . Result of auto-correlation check of residuals annual evaporation
ML: Maximum Likelihood CLS: Conditional Least Square ULS: Unconditional Least Square Figure 11.Auto-correlogram of residual series parameter for annual evaporation