Flood Analysis in Karkheh River Basin using Stochastic Model ,

This study analyzed the annual streamflow of Karkheh River in Karkheh river basin in the west of Iran for flood forecasting using stochastic models. For this purpose, we collected annual stremflow (peak and maximum discharge) during the period from 1958 to 2015 in Jelogir Majin hydrometric station (upstream of Karkheh dam reservoir). A time series model (stochastic model or ARIMA) has three stages consists of: model identification, parameter estimation and diagnostic check. Model identification was done by visual inspection on the Autocorrelation and Partial Autocorrelation Function. Three types of ARIMA(p,d,q) models (0,1,1), (1,1,1) and (4,1,1) suggested for the studied series. The suggested model parameters were computed using the Maximum Likelihood (ML), Conditional Least Square (CLS) and Unconditional Least Square (ULS) methods. In model verification, the chosen criterion for model parsimony was the Akaike Information Criteria (AIC) and the diagnostic checks include independence of residuals. The best ARIMA model for this series was (4,1,1), with their AIC values of 88.9 and 77.8 for annual peak and maximum streamflow respectively. Forecast series up to a lead time of ten years future (2006 to 2015) were generated using the accepted ARIMA models. Model accuracy was checked by comparing the predicted and observation series by coefficient of determination (R2). Results show that the ARIMA model was adequate for the flood analysis in Karkheh River and the forecast of the series in short time at future.


Introduction
Flood analysis is a form of extreme value analysis in nature.The main interest in analyzing extreme hydrological events is not in what has occurred but possibilities that further extreme events such as flood will occur in the future.Flood analysis in particular, allows hydrologists and statisticians to estimate future flood occurrence probabilities as well as the peak magnitude of streamflow.Another reason flood analyses are important is that the design and operation of hydraulic structures such as dams and reservoirs are determined based on them.Flood modelling depends on available data to generate efficient estimations.There are several approaches for hydrological modelling such as deterministic, probabilistic and stochastic.The stochastic models are related to the probability models in the sense that both types of models have random variables.Time series analysis and regression techniques are applied in order to build a stochastic model in flood analysis.The chosen method of study falls under the category of time series modelling.Time series is commonly used in the field of hydrology and water resource management.The beauty of time series modelling is that future values of a variable can be estimated using its historical values.A time series often exhibits trends, sometimes shifts (jumps), seasonality and periodicity.These attributes are referred to as components in Equation 1.The components this equation are trend (Tt), seasonal component (St), cyclical component (Ct) and irregular component (et).2017) studied the stochastic model to inflow of Karkheh dam at Iran and suggested ARIMA (4,1,1) is the best stochastic model for annual mean streamflow [4].Time series modelling of annual maximum flow (AMF) of River Indus at Sukkur India to be examined by Shakeel et al. (1993) and they found that the suitable stochastic model was ARIMA (2,1,1) in this river [15].The applied of ACF and PACF for annual flow of Australian streams were studied by Srikanthan et al. (1983).They determined the appropriate form of stochastic models (Box-Jenkins time series models) for annual flow [17].O'Connel (1977) suggested a type of stochastic models (ARMA (1,1)) to generate synthetic flow series and predict streamflow in future.[13].Stojković et al. (2015) suggested that the stochastic flows (annual discharges) simulated by the stochastic model can be used for hydrological phenomenon simulations in river basins of large European rivers [18].In a study, Vijaya kumar and Vennila (2016) found that an ARMA (2,4) model is the suitable model for generated and predicted annual inflow of Krishnagiri reservoir in the state of Tamilnadu at India [20].Musa (2013) studied ARMA model for flow discharge from the Shiroro River (about 22 years (1990-2011)) and analyzed with 3 different models namely; AR, ARMA and ARIMA models.Based on the model analysis and evaluations, appropriate predictions were made for the effective usage of the flow from the river for farming activities and generation of power for both industrial and domestic [10].Huang et al. (2016) analyzed the annual maximum stage readings of three rivers in Langat River basin in Malaysia for flood predicting using stochastic model (ARIMA model).They found that ARIMA(1,1,0), (1,1,0) and (1,1,1) were appropriate models for the Dengkil, Kg.Lui and Kg.Rinching series respectively, with their AIC values of 133.736, 55.348 and 42.292 [5].This study develop stochastic models (ARIMA models) for prediction of flood (annual extreme streamflow such as peak and maximum discharge) using Box-Jenkins methodology in Jelogir Majin hydrometric station (upstream of Karkheh dam reservoir) in Karkheh river of Karkheh river basin in Khuzestan state at Iran.

The purpose of this study is:
(1) To generate or develop stochastic time series model (ARIMA model) for prediction of flood in Karkheh river basin.
(2) To estimate parameters of ARIMA model for annual streamflow (annual peak and maximum discharge) and.
(3) To test the validity of the annual predicted streamflow with measured and evaluated the performance of the best selected model.

Study Area and Data Collection
The study area is the Karkheh river basin in west of the Iran, located in the central and southern regions of the Zagros mountain range and its area is more than 50000 km 2 .In terms of the geographical coordination, this region has been extended between 46˚ 57′-49˚ 10′ E longitudes and 31˚ 48′-34˚ 56′ N latitudes.The Karkheh River is the third largest river in Iran with 900 km long.This river is directly connected to the Karkheh dam, the largest surface reservoir in the region, which has an important role in supplying water to the region.The annual streamflow (annual peak and maximum discharge) selected for modelling from 1958 to 2015 (58 years) in Jelogir Majin hydrometric station (station number 9 in Figure 1).This station is located at the upper reaches of the reservoir of Karkheh dam and is the supplier of the most water entering the dam reservoir and has the greatest impact on reservoir water dam.The plotted of data are shown in Figures 3 and 4.This data were taken from Iran Water Resources Management Organization (IWRMO).The goal of this study is to decrease the flood problems in Karkheh river basin through developing stochastic models (ARIMA model) for the study Karkheh river using Box-Jenkins approach and then, forecast future annual peak and maximum streamflow (discharge) values in this river by the best ARIMA model.

ARIMA Model
The ARIMA modelling is actually an approach that has the flexibility to fit a model which is adapted from the data structure itself.The time series' stochastic nature can be modelled by the help of the computed Auto-correlation and Partial Auto-correlation Function (ACF and PACF) and fundamental information such as trend, periodic components, random components and serial correlation can be obtained.The Box-Jenkins approach to ARIMA modelling is an iterative model building process where the best models have to be determined through trial and error.However, with the advent of computers and statistical software, this iterative process can be simplified.Commonly SAS, SPSS, MINITAB and STATISTICA software use for this purpose.The basic methodology of ARIMA development is shown in Figure 2. The ARIMA model has three main components of an ARIMA model are AR, I and MA.The AR component represents the auto-correlation between past and current observations, the MA component describes the autocorrelation structure of error and I component represents the level of differencing required to transform a non-stationarity series into a stationary series.A non-seasonal ARIMA model is usually denoted by (p,d,q).The order of the AR, I and MA components are denoted by p, d and q respectively.The general ARIMA (p,d,q) model is: (3) Which Φp = auto-regressive parameter, εt = residual, θq = moving-average parameter, U=d th difference of the dependent variable and X= dependent variable.

Stationary
Before the Box-Jenkins approach can be carried out, we need to recognize if time series is stationarity or nonstationarity.For this purpose, there are many ways to determine stationary such as Augmented Dickey-Fuller (ADF) and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test and trend tests (Mann-Kendall trend test).

Independence
The basic assumption in Box-Jenkins approach is that the residuals of an ARIMA model are white noise.A white noise series have uncorrelated random shock with zero mean and constant variance.If the residuals are independent, it means that there is no more information that could be extracted from the series.One of the ways to determine the independence is to visually inspect the correlogram of the residuals.If the correlogram shows values that are close to zero, the residuals are uncorrelated and independent.

Transformation
Many statistical analyses are done based on the assumption that the population being investigated is normally distributed with a common variance.One of the popular transformation methods is the Box-Cox transformation.Also another method is Ln transformation of natural series.

Stationarity Tests
Unit root tests such as the Augmented-Dickey-Fuller (ADF) test and the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test were carried out to test the presence of a unit root while the Mann Kendall trend test was performed to check for the presence of a trend.The presence of a unit root or a trend should indicate non-stationarity of the series.The significance level used was 5%.If the series is non-stationary, differencing is required to transform it into a stationary series.On the other hand, if the series is stationary, the series is modelled as an ARMA process instead, which requires no differencing.

Differencing
The series was initially differenced once (d=1) and the ACF and PACF of the differenced series were plotted and analyzed.If the ACF and PACF decay rapidly then it indicates stationarity is achieved.Another indicator is the standard deviation of the differenced series.The optimum differenced series should have the lowest standard deviation.If the standard deviation of the current series is lower than that of the previous series, then the current series has the optimum order of differencing.

Plotting the Series and Its ACF and PACF
The main tools used for identification of model parameter were the visual displays of the series, which included the plot of natural data against time, which will show up important aspect of a time series such as trend, seasonality, outliers and etc., ACF and PACF.By using the annual streamflow as the input time series, the auto covariance function (c), the autocorrelation coefficients () and the partial autocorrelation coefficients (()) were calculated and the series with its ACF and PACF were plotted using SAS and SPSS software.The number of lags k should fall between N/4 and N, therefore the chosen number of lags in this study was sufficient.
If all the ACF and PACF values are insignificant and fall within the confidence band, it indicates that the observations are independent.In such a case the time series is a white noise process and no modelling could be performed.A stationary time series has a rapidly decaying ACF.If the ACF is slow decaying, it indicates that the series may be non-stationary and requires differencing.

Identifying p and q
Having identified the optimum order of differencing (d), the next step was to identify the order of the autoregressive (p) and moving average (q) parameters using ACF and PACF of series.

Choosing the Best ARIMA Model
The previous step gave an indication of the order of p and q that should be fitted in the model.However, it was recommended to try a few different values of p and q to get the best model while preserving the parsimony of the parameters.To test for the parsimony of parameters, the Akaike Information Criteria (AIC) was used.The model with the minimum AIC was selected as the best model.The SPSS and SAS software can find the best model based on the AIC values calculated for a range of p and q.In this study, the maximum p and q were selected four.

Diagnostic Checks
After estimating the model parameters, the diagnostic checking is applied to see if the model is adequate or not.Therefore the following statistical tests are used:

A. Port Manteau Lack of fit Test
Port manteau lack of fit test is used for this purpose.It is a test of the residual independency and uses the Q-statistic defined as: Where rk (at) is the autocorrelation coefficient of the residual (at) at lag k, and M is the maximum lag considered (about N/4), ARIMA model is considered adequate if p>chi square is greater than the level of significant 0.05.

B. Residual autocorrelation Function Test
The second test is the independency of the resulting (at) series, the correlogram of this series are computed for lag (M=N/5).The figures of RACF and RPACF (residual autocorrelation and partial autocorrelation function) show that the most of computed lags inside the tolerance interval (±2/√N, at 95% confidence limits).

Series Comparison and Forecasting
Forecasting can be categorized into long term and short term forecasting.Short term forecasting can predict values that are a few time periods (a few years) into the future while long term forecasting can predict values for time periods that extend far beyond that.In terms of applications, long term forecasts are used for strategic planning while short term forecasts are used for project developments as well as operation management.Statistical methods are good for short term forecasting because the historical data normally exhibit inertia and do not show drastic changes [7].Short term forecasting is based on identifying, modelling and extrapolating the patterns found in the data.The best model that passed the diagnostic checking, will predict the data series in future.We determine the degree of similarity between the predicted and observation data series by coefficient of determination (R 2 ).If the pattern of the predicted series appears similar to the pattern of the original series, then the fitted model is a good model.The final step was to generate a forecast of future values.The ARIMA model can predict future values as well as its confidence interval using the calculated model parameters.In this study, the chosen number of predicted values was ten, which means that the values were predicted for the next ten years after the last observation.

Results and Discussion
Before to start of modelling, we must test the two series for normal distribution.In this study, the natural series have not normality therefore we gave Ln transformation of natural series.Plotting the observations of Ln natural data against time (Figures 5 and 6) show that there is increase trend for studied series.In this study, the Ln of natural series is not stationary and then has first differencing of Ln natural data to achieving stationary series (d=1).This plot shows in Figures 7 and 8.It is stationary and there is not trend.

Model Identification
The ACF and PACF plot of Ln series for the Karkheh River show in Figures 9 and 10.Since some the ACF and PACF values are significant and do not fall within the confidence band, it indicates that the time series are not white noise processes and modelling could be performed.The ACF and PACF of the once differenced (d=1) series decayed rapidly compared to the ACF and PACF of the Ln series.Therefore, the optimum level of differencing for the series was one and the d value used in the ARIMA model would be one (d=1).The ACF and PACF of this series show in Figures 11 and 12. Tjelogir majin in Figures 9-12 and 13-14 mean Ln transformation of data in Jelogir majin station.The original data in this station was not normal distribution therefore the Ln transformation is necessary.The suggested model for p and q are 1, 2, 3, 4 of ACF and PACF this series and we start to estimation parameters some of models.

Estimation of the Model Parameters
After the identification of model using the parameter estimation methods such as Maximum Likelihood (ML), Conditional Least Square (CLS) and Unconditional Least Square (ULS) are done.The values of parameters estimation are shown in Tables 1 and 2. The value of this tables showed that all selected model are suitable for entrance to next stage because it have two conditional stationary and inevitability.The model that gives the minimum AIC is selected as best fit model.Obviously, model ARIMA (4,1,1) has the smallest values of AIC, then one would temporarily have a model ARIMA (4,1,1).The goodness of fit statistic and Akaike Information Criterion (AIC) values for the different ARIMA models are shown in Table 5 (AIC=88.87 in CLS estimation method for annual peak streamflow and AIC=77.75 in ML estimation method for annual maximum streamflow).In this case, the initial suggested model structure has the minimum AIC value and has been chosen as best model structure for annual streamflow time series.The results of Port manteau lack of fit test indicate in Tables 3 and 4.This tables shows that all three models are adequate for forecasting of studied series data.In conclusion, the ARIMA (4,1,1) model is the best model for annual peak and maximum streamflow (discharge) in Karkheh River at Karkheh river basin.Also the RACF and RPACF residual autocorrelation Function test are shown in Figures 13 and 14.This figures show that the suggested model can be considered as appropriate model.

Forecasting
In this step, we forecasted annual data for ten years ahead of original data for the period from 2006 to 2015 by applying the best model (ARIMA (4,1,1)) in CLS and ML estimation parameter method for annual peak and maximum discharge respectively.After obtaining the forecasted series (Zt for t=1, 2, 3…10), then the final series (Xt) is determined by reversing (Ln) transformation.The forecasting actuals were brought in Table 6.In this table we can see forecasting values in ten futures.Figures 13 and 14 show predicted of series for these data.The figures show us that forecasting is appropriate.The corresponding observed values are also shown in the Figures 15 and 16 and since agreement between observed and forecasted values of annual peak discharge (R 2 =0.84) and annual maximum discharge (R 2 =0.87) are very good, it is confirmed that the ARIMA (4,1,1) model is adequate for forecasting of annual peak and maximum discharge.The forecasted values in ten years future uses for calibration and verification of the best selected model.

Conclusion
The objectives of this study were develop stochastic ARIMA models for the study rivers using Box-Jenkins approach and forecast annual streamflow values in future using the developed ARIMA models.Recognizing and predicting annual peak and maximum streamflow (discharge) of Karkheh River in Jelogir Majin station during statistical period is necessary for flood control and planning the agricultural activities.Results from this reviewing indicated that:  The best ARIMA model for annual peak streamflow in Jelogir Majin station at Karkheh River was (4,1,1), with their AIC values of 88.9 in CLS estimation method.Also for annual maximum streamflow (discharge) was (4,1,1), with their AIC values of 77.8 in ML estimation method.Forecast series up to a lead time of ten years were generated using the accepted ARIMA models.Model accuracy was checked by comparing the predicted and observation series by coefficient of determination (R 2 ).This coefficient (R 2 ) was 0.84 and 0.87 for annual peak and maximum streamflow respectively.Results show that the ARIMA model was adequate for the river and forecast of the series.
 The study reveals that the Box-Jenkins (ARIMA) model methodology could be used as an appropriate tool to predict the flood in this river for the up-coming 10 years (2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015).Also this methodology can help farmers in the area in order to planning the agricultural activities to enlarge the areas of land to be cultivated using supplemental irrigation.
 The significant ACF and PACF functions with order of four can be caused by factors such as area good vegetation and snowmelt.The good vegetation of the region and so the forest causes water retention in the soil surface layer and delay in the rise in surface runoff.
 The ARIMA model is suitable for short term forecasting of series because the ARMA family models can model short term durability very well.The AR model is a finite memory model, thus it does not fare well in long term forecasting.
 The model identification is the critical step in ARIMA modelling.The values of p, q and d had to be determined visually and they depended on the modeler's experience and judgment.

Figure 1 .
Figure 1.The study area location