融合特征选择与可解释性机器学习的地下水位动态预测模型框架研究

A dynamic prediction framework for groundwater level integrating feature selection and interpretable machine learning

  • 摘要: 准确的地下水位(GWL)预测是实现地下水资源精准管理与科学决策的关键.当前基于机器学习的GWL预测面临输入变量物理意义不明确与模型“黑箱”特性导致可解释性不足的双重挑战.为此,本研究创新性地构建了一个融合K-Means聚类、LASSO-CV变量筛选与机器学习模型的GWL预测框架.基于永定河冲洪积扇36眼观测井的数据,研究首先识别出7类具有鲜明物理意义的驱动因子组合模式,空间分析揭示了人类活动(如供水量)对地下水位动态(GWLC)的影响从冲洪积扇中上游至下游呈显著增强趋势.对比4种机器学习模型发现,长短期记忆网络(LSTM)与支持向量回归(SVR)在本区域更具适用性,其验证阶段GWLC预测的平均纳什效率系数(NSE)分别为−0.07和−0.03,且分别有13眼和15眼井的预测结果达到可接受水平(NSE>0).尤为重要的是,在耦合变量筛选机制后,LSTM与SVR模型(K-LSTM,K-SVR)的预测精度显著提升,验证阶段平均NSE分别较原始模型提高了0.13和0.04.为进一步增强模型可解释性,采用沙普利加性解释(SHAP)定量揭示了降水量(P)、气温(T)和供水量(QWS)在不同类别井中的贡献机制:P普遍为显著正向影响;T的作用具有区域差异性;QWS则主要呈负向影响.本研究提出的“特征筛选–模型选择–方法建立–结果解释”一体化流程框架,协同提升了GWL预测的精度与物理可解释性,为解决类似复杂环境系统的预测与归因问题提供了新颖的方法论和可靠的决策依据.

     

    Abstract: Accurate groundwater level (GWL) prediction is key to achieving precise management and scientific decision-making of groundwater resources. GWL prediction based on machine learning faces the dual challenges of unclear physical significance of input variables and the “black box” nature of models, which often leads to poor interpretability. To address this problem, in this study we innovatively construct a GWL prediction framework integrating K-Means clustering, LASSO-CV variable screening, and machine learning models. Data from 36 observation wells in the Yongding River alluvial fan are used to identify seven distinct driving factor combination patterns with clear physical significance. Spatial analysis reveals that the impact of human activities (such as water supply) on groundwater level dynamics (GWLC) increases significantly from the middle and upper reaches to the lower reaches of the alluvial fan. Comparing four machine learning models, it is found that the Long Short-Term Memory (LSTM) and Support Vector Regression (SVR) models are more suitable; averaged Nash efficiency coefficients (NSE) for GWLC prediction during the validation phase are −0.07 and −0.03, respectively, with 13 and 15 wells achieving acceptable results (NSE > 0). After coupling variable screening mechanism, the prediction accuracy of LSTM and SVR models (K-LSTM, K-SVR) is significantly improved, with averaged NSE during validation phase increasing by 0.13 and 0.04 compared to original models, respectively. To further enhance model interpretability, the Shapley Additive Explanations (SHAP) method is used to quantitatively reveal contribution mechanisms of precipitation (P), temperature (T), and water supply (WS) in different types of wells. P generally has a significant positive impact. Effect of T varies regionally. WS mainly has a negative impact. The integrated process framework of “feature screening – model selection – method establishment – result interpretation” proposed in this study has collaboratively improved the accuracy and physical interpretability of GWL prediction, providing a novel methodology and reliable basis for decision-making to address similar complex environmental system prediction and attribution problems.

     

/

返回文章
返回