Development of Machine Learning for Debris Flow Event Prediction in a Volcanic Area

Naive Bayes Efficient Logistic Regression Debris Flow Rainfall Data Machine Learning Volcanic Area

Authors

Downloads

The integration of machine learning (ML) into debris flow prediction in volcanic areas, exemplified by the Gendol River watershed of Mount Merapi, offers transformative potential for hazard mitigation. This study aimed to develop real-time, computationally efficient ML models capable of integrating multi-source data, rainfall intensity of 25 mm/hour linked to 300 cm Debris Flow heights, antecedent precipitation, and geomorphological variables to predict debris flows with actionable lead times. Key objectives included optimizing prediction accuracy, minimizing the false positive rate to 18.2% for "Debris Flow" events, and enhancing model interpretability for deployment in data-scarce volcanic regions. Results demonstrated that ensemble methods and deep learning architecture outperformed traditional models, with Efficient Logistic Regression and Linear SVM achieving an accuracy of 82.35%, and Cosine KNN attaining a prediction speed of 272 observations per second. Critical predictors included temporal rainfall patterns (contributing more than 50% to flow initiation) and ash deposit thickness (with a 70% influence on decision-making). However, challenges persisted: imbalanced datasets of nine training instances for "Debris Flow" events led to misclassification rates of 100% for hybrid events like "Rainfall and Debris Flow," while models like Naive Bayes exhibited instability (accuracy dropping to 50%). Research gaps highlighted data scarcity for high-magnitude events, limited geographic transferability, and the absence of standardized evaluation metrics. Technical limitations included reliance on low-resolution remote sensing data, high computational costs for ensemble models requiring 10 operational cost units, and the opacity of neural networks, which hindered stakeholder trust. Despite these constraints, ML models achieved 85% accuracy in non-event recognition and 76.47% precision in Bagged Trees, offering scalable frameworks for early warning systems. The study highlights the importance of enriched datasets, adaptive algorithms, and interdisciplinary collaboration in transforming volcanic risk management from a reactive approach, ultimately safeguarding vulnerable communities through data-driven, life-saving predictions.