Abstract:
Accurate groundwater level (GWL) prediction is key to achieving precise management and scientific decision-making of groundwater resources. GWL prediction based on machine learning faces the dual challenges of unclear physical significance of input variables and the “black box” nature of models, which often leads to poor interpretability. To address this problem, in this study we innovatively construct a GWL prediction framework integrating K-Means clustering, LASSO-CV variable screening, and machine learning models. Data from 36 observation wells in the Yongding River alluvial fan are used to identify seven distinct driving factor combination patterns with clear physical significance. Spatial analysis reveals that the impact of human activities (such as water supply) on groundwater level dynamics (GWLC) increases significantly from the middle and upper reaches to the lower reaches of the alluvial fan. Comparing four machine learning models, it is found that the Long Short-Term Memory (LSTM) and Support Vector Regression (SVR) models are more suitable; averaged Nash efficiency coefficients (NSE) for GWLC prediction during the validation phase are −0.07 and −0.03, respectively, with 13 and 15 wells achieving acceptable results (NSE > 0). After coupling variable screening mechanism, the prediction accuracy of LSTM and SVR models (K-LSTM, K-SVR) is significantly improved, with averaged NSE during validation phase increasing by 0.13 and 0.04 compared to original models, respectively. To further enhance model interpretability, the Shapley Additive Explanations (SHAP) method is used to quantitatively reveal contribution mechanisms of precipitation (P), temperature (T), and water supply (WS) in different types of wells. P generally has a significant positive impact. Effect of T varies regionally. WS mainly has a negative impact. The integrated process framework of “feature screening – model selection – method establishment – result interpretation” proposed in this study has collaboratively improved the accuracy and physical interpretability of GWL prediction, providing a novel methodology and reliable basis for decision-making to address similar complex environmental system prediction and attribution problems.