A Comparison of Multiple Imputation Methods for Recovering Missing Data in Hydrological Studies

Fatimah Bibi Hamzah, Firdaus Mohd Hamzah, Siti Fatin Mohd Razali, Hafiza Samad


Missing data is a common problem in hydrological studies; therefore, data reconstruction is critical, especially when it is crucial to employ all available resources, even incomplete records. Furthermore, missing data could have an impact on statistical analysis results, and the amount of variability in the data would not be fittingly anticipated. As a result, this study compared the performance of three imputation methods in predicting recurrence in streamflow datasets: robust random regression imputation (RRRI), k-nearest neighbours (k-NN), and classification and regression tree (CART). Furthermore, entire historical daily streamflow data from 2012 to 2014 (as training dataset) were utilised to assess and validate the effectiveness of the imputation methods in addressing missing streamflow data. Following that, all three methods coupled with multiple linear regression (MLR), were used to restore streamflow rates in Malaysia's Langat River Basin from 1978 to 2016. The estimation techniques effectiveness was evaluated using metrics inclusive of the Nash-Sutcliffe efficiency coefficient (CE), root-mean-square error (RMSE), and mean absolute percentage error (MAPE). The results confirmed that RRRI coupled with MLR (RRRI-MLR) had the lowest RMSE and MAPE values, outperforming all other techniques tested for filling missing data in daily streamflow datasets. This indicates that the RRRI-MLR is the best method for dealing with missing data in streamflow datasets.


Doi: 10.28991/cej-2021-03091747

Missing Data; Streamflow; Robust Regression; CART; k-NN; MLR.


DOI: 10.28991/cej-2021-03091747


Copyright (c) 2021 FATIMAH BIBI HAMZAH, Firdaus Mohd Hamzah, Siti Fatin Mohd Razali

