A study on landslide susceptibility of LightGBM-SHAP based on different factor screening methods
-
摘要: 以重庆市黔江区为例,选取23个评价因子构建滑坡致灾因子数据库,利用地理探测器与皮尔逊-主成分分析2种因子筛选方法选择因子最优组合;基于Bayesian-LightGBM-SHAP混合模型进行滑坡易发性评价,并对模型精度进行验证,分析影响黔江区滑坡发生的主导因子.初始模型的AUC值为0.801,Person Correlation Coefficient-Bayesian-LightGBM模型AUC值为0.824,GeoDetector-Bayesian-LightGBM模型AUC为0.835;由因子重要性可知,多年平均降雨量、高程、POI核密度与距河流距离是滑坡发生的最主要因子,而输沙指数、水流动力指数与坡位对滑坡的发生影响较弱.因子筛选法-Bayesian-LightGBM相结合的混合模型能够提高模型的准确性,为构建合理因子数据库提供参考框架;通过与因子重要性的结合分析,验证了地理探测器能够准确探测各因子对滑坡发生的贡献值,突出各滑坡调理因子组合之间的相关性,从而探究各因子与滑坡之间的关系.Abstract: Taking Qianjiang District of Chongqing Municipality as an example, 23 evaluation factors are selected to construct a landslide disaster-causing factor database, and the optimal combination of factors is chosen by utilizing two kinds of factor screening methods, i.e., geodetector and Pearson-principal component analysis. Based on the Bayesian-LightGBM-SHAP hybrid model for landslide susceptibility evaluation, and verify the model accuracy, the dominant factors affecting the occurrence of landslides in Qianjiang District are analyzed. The initial model has an AUC value of 0.801, the Person Correlation Coefficient-Bayesian-LightGBM model has an AUC value of 0.824, and the GeoDetector-Bayesian-LightGBM model has an AUC of 0.835. From the importance of the factors, it can be seen that the average multi-year rainfall, elevation, POI kernel density and distance from the rivers are the most important factors for the occurrence of landslides, while the sand transport index, hydrodynamic index and slope position have a weaker effect on the occurrence of landslides. The hybrid model combining factor screening method-Bayesian-LightGBM can improve the accuracy of the model and provide a reference framework for constructing a rational factor database. By integrating the analysis with factor significance, it is verified that the geoprobe can accurately detect the contribution value of each factor to landslide occurrence, highlight the correlation between each combination of landslide conditioning factors, and thus explore the relationship between each factor and landslides.
-
表 1 数据及数据来源
数据名称 数据来源 年份 类型 精度 多年平均降雨 地理国情监测云平台 2003-2019年 栅格 30 m DEM Global digital elevation model(GDEM) 2019年 栅格 30 m Landsat 8 美国地质调查局 2019年 栅格 30 m Landsat5 美国地质调查局 2019年 栅格 30 m 地质资料 国家地质资料数据中心 2019年 矢量 1:200 000 土地利用 地理国情监测云平台 2015年 矢量 1:100 000 行政区划 地理国情监测云平台 2019年 矢量 1:100 000 河网 中国科学院资源环境科学数据中心 2019年 矢量 1:100 000 道路 中国科学院资源环境科学数据中心 2019年 矢量 1:100 000 历史滑坡 重庆市地质监测站 2003-2019年 数据表 — 2016年POI 网络爬虫 2016年 矢量 1:100 000 表 2 滑坡因子分类表
因子分组 影响因子 分级 分类标准 地形地貌 高程/m 11 1) <530;2) 530~632;3) 632~723;4) 723~809;5) 809~895;6) ≥895~987;7) 987~1093;
8) 1093~ 1225;9) 1225~1390;10) 1390~1600;11) 1600~1953地形位置指数 10 1) <-13;2) -13~-7;3) -7~-4;4) -4~-1;5) -1~0;6) 0 ~2;7) 2~4;8) 4~8;9) 8~14;10) 14~58 起伏度/m 7 1) <20;2) 20~30;3) 30~40;4) 40~50;5) 50~80;6) 80~170;7) ≥170 坡度/° 8 1) <5;2) 5~10;3) 10~15;4) 15~20;5) 20~25;6) 25~30;7) 30~35;8) ≥35 坡向 9 1) 平面;2) 北;3) 东北;4) 东;5) 东南;6) 南;7) 西南;8)西;9) 西北 坡位 6 1) 谷底;2) 下坡;3) 平坡;4) 中坡;5) 山坡;6) 山脊 曲率 6 1) <-1;2) -1~-0.5;3) -0.5~0;4) 0~0.5;5) 0.5~1;6) ≥1 剖面曲率 6 1) <-1;2) -1~-0.5;3) -0.5~0;4) 0~0.5;5) 0.5~1;6) ≥1 平面曲率 6 1) <-1;2) -1~-0.5;3) -0.5~0;4) 0~0.5;5) 0.5~1;6) ≥1 微地貌 10 1) 峡谷/深流;2) 中坡处水系/浅谷;3) 高地水系/水源;4) U型山谷;5) 平原;6) 空旷斜坡;
7) 上斜坡/台地;8) 局部山谷中的山脊;9) 在平原中坡处山脊/小山;10) 山顶/山脊高处地表切割深度/m 6 1) <256;2) 256~545;3) 545~789;4) 789~1002;5) 1002~1197;6) 1197~1920 粗糙度指数 6 1) <1.05;2) 1.05~1.12;3) 1.12~1.24;4) 1.24~1.41;5) 1.41~1.72;6) 1.72~3.77 地质条件 岩性 9 1) є2-3;2) S2;3) S1;4) O1;5) Qb2b;6) Z;7) D;8) T1j;9) J1z-2x; 距断层距离/m 7 1) <2276;2) 2276~4815;3) 4815~7355;4) 7355~9894;5) 9894~12521;6) 12521~16111;
7) ≥16111环境条件 归一化植被指数 5 1) <0.5;2) 0.5~0.6;3) 0.6~0.7;4) 0.7~0.8;5) 0.8~0.9 距河流距离/m 7 1) <3508;2) 3508~7380;3) 7380~11251;4) 11251~15123;5) 15123~19115;6) 19115~23713;
7) ≥23713土地利用 9 1) 林地;2) 草地;3) 耕地;4) 园地;5) 住宅用地;6) 交通运输用地;7) 工矿仓储用地;8) 水域及水利设施用地;9) 其他用地 人类活动 距道路距离/m 7 1) <342;2) 342~755;3) 755~1243;4) 1243~1827;5) 1827~2581;6) 2581~3578;7) ≥3578 POI核密度 8 1) 4;2) 4~10;3) 10~26;4) 26~50;5) 50~77;6) 77~120;7) 120~170;8) 170~233 水文条件 多年平均降雨/mm 7 1) <1318;2) 1318~1347;3) 1347~1377;4) 1377~1409;5) 1409~1445;6) 1445~1489;
7) 1489~1551地形湿度指数 6 1) <4;2) 4~6;3) 6~8;4) 8~10;5) 10~13;6) 13~26 输沙指数 8 1) <20;2) 20~50;3) 50~100;4) 100~150;5) 100~200;6) 200~300;7) 300~400;8) 400~720 水流动力指数 8 1) <250;2) 250~1000;3) 1000~2000;4) 2000~3000;5) 3000~5000;6) 5000~10000;7) ≥10000 表 5 PCA筛选的因子组合
因子组合 因子个数 主成分提取个数 命名 TRI、地形位置指数、
坡度和起伏度4 1 P1 坡位、曲率和微地貌 3 1 P2 表 3 地形地貌因子的皮尔逊相关系数
TRI 地形位置指数 高程 平面曲率 坡度 坡位 坡向 剖面曲率 起伏度 地表切割深度 曲率 微地貌 TRI 1 地形位置指数 0.663 1 高程 0.129 0.673 1 平均曲率 0.040 0.086 −0.025 1 坡度 0.874 0.654 0.073 0.055 1 坡位 0.003 0.066 0.066 0.237 −0.029 1 坡向 −0.028 −0.057 −0.02 0.003 −0.031 0.032 1 剖面曲率 0.114 0.023 −0.074 0.014 0.118 −0.291 0.310 1 起伏度 0.766 0.663 0.153 0.048 0.773 0.028 −0.017 0.110 1 地表切割深度 −0.340 0.115 0.624 0.002 −0.392 −0.015 0.079 −0.003 0.083 1 曲率 −0.014 0.014 0.041 0.195 −0.064 0.511 0.106 −0.309 0.007 0.065 1 微地貌 0.022 0.104 0.105 0.226 −0.021 0.632 0.070 −0.346 0.051 0.033 0.535 1 表 4 其他因子的皮尔逊相关系数
地质条件 距断层距离 岩性 距断层距离 1 岩性 0.395 1 人类活动 POI核密度 距道路距离 POI核密度 1 距道路距离 −0.241 1 环境条件 NDVI 距河流距离 土地利用 NDVI 1 距河流距离 −0.157 1 土地利用 0.268 0.064 1 水文条件 SPI STI TWI 多年平均降雨 SPI 1 STI 0.476 1 TWI 0.158 0.270 1 多年平均降雨 0.020 0.066 −0.090 1 表 6 3种模型的滑坡易发性区划分级统计
模型 易发性分级 面积/km2 滑坡个数/个 滑坡密度/(个/km2) 初始因子-Bayesian-LGB 极低易发区 1981.105 42 0.021 低易发区 780.323 101 0.129 中易发区 242.908 130 0.535 高易发区 99.845 115 1.152 极高易发区 53.963 157 2.909 PCC-Bayesian-LGB 极低易发区 2041.892 48 0.024 低易发区 716.984 105 0.146 中易发区 240.890 116 0.482 高易发区 107.440 125 1.163 极高易发区 50.962 151 2.963 GD-Bayesian-LGB 极低易发区 2213.294 51 0.023 低易发区 603.123 93 0.154 中易发区 199.891 103 0.515 高易发区 90.802 121 1.333 极高易发区 51.052 177 3.467 表 7 3种模型滑坡易发性模型精度对比表
模型 准确率 精确率 召回率 F1分数 AUC测试集 AUC训练集 初始因子-LightGBM 0.908 0.808 0.908 0.898 0.801 0.949 PCC-LightGBM 0.922 0.828 0.922 0.922 0.824 0.981 GD-LightGBM 0.925 0.828 0.926 0.928 0.835 0.989 -
[1] 殷坤龙,朱良峰. 滑坡灾害空间区划及GIS应用研究[J]. 地学前缘,2001,8(2):279 doi: 10.3321/j.issn:1005-2321.2001.02.010 [2] SUN D L,GU Q Y,WEN H J,et al. A hybrid landslide warning model coupling susceptibility zoning and precipitation[J]. Forests,2022,13(6):827 doi: 10.3390/f13060827 [3] 史培军,刘连友. 北京师范大学灾害风险科学研究回顾与展望[J]. 北京师范大学学报(自然科学版),2022,58(3):458 doi: 10.12202/j.0476-0301.2022112 [4] HAOYUAN,HONG. Landslide susceptibility assessment in Lianhua County (China):a comparison between a random forest data mining technique and bivariate and multivariate statistical models[J]. Geomorphology,2016,259:105 doi: 10.1016/j.geomorph.2016.02.012 [5] SUN D L,WEN H J,XU J H,et al. Improving geospatial agreement by hybrid optimization in logistic regression-based landslide susceptibility modelling[J]. Frontiers in Earth Science,2021,9:713803 doi: 10.3389/feart.2021.713803 [6] Kavzoglu K C I. Selecting optimal conditioning factors in shallow translational landslide susceptibility mapping using genetic algorithm[J]. Engineering Geology,2015,192:101 doi: 10.1016/j.enggeo.2015.04.004 [7] NEAMAH M,JEBUR. Optimization of landslide conditioning factors using very high-resolution airborne laser scanning (LiDAR) data at catchment scale[J]. Remote Sensing of Environment,2014,152:150 doi: 10.1016/j.rse.2014.05.013 [8] CHENG C,YANG Y,ZHONG F C,et al. An optimization of statistical index method based on Gaussian process regression and GeoDetector,for higher accurate landslide susceptibility modeling[J]. Applied Sciences,2022,12(20):10196 doi: 10.3390/app122010196 [9] WANG Y M,WU X L,CHEN Z J,et al. Optimizing the predictive ability of machine learning methods for landslide susceptibility mapping using SMOTE for Lishui city in Zhejiang Province,China[J]. International Journal of Environmental Research and Public Health,2019,16(3):368 doi: 10.3390/ijerph16030368 [10] LEE S,RYU J H,LEE M J,et al. Use of an artificial neural network for analysis of the susceptibility to landslides at Boun,Korea[J]. Environmental Geology,2003,44(7):820 doi: 10.1007/s00254-003-0825-y [11] Vasu N N,Lee S R. A hybrid feature selection algorithm integrating an extreme learning machine for landslide susceptibility modeling of Mt. Woomyeon,South Korea[J]. Geomorphology,2016,263:50 doi: 10.1016/j.geomorph.2016.03.023 [12] BODAGHEE A,RAHOUI F,TOMSICK J A,et al. chandraobservations of five integralsources:new X-ray positions for igr j16393-4643 and igr j17091-3624[J]. The Astrophysical Journal Letters,2012,751(2):113 doi: 10.1088/0004-637X/751/2/113 [13] OLSZEWSKI G,LINDAHL P,FRISK P,et al. Development of 148Gd analysis method using stable Gd[J]. Talanta,2021,229:122295 doi: 10.1016/j.talanta.2021.122295 [14] SUDAVTSOVA V S,SHEVCHENKO M A,KUDIN V G,et al. Thermodynamic properties of Gd-Sn and Gd-Sn-Ni melt systems[J]. Russian Journal of Physical Chemistry A,2021,95(2):237 doi: 10.1134/S0036024421020254 [15] GRAZIOSI D,NAKAGAMI O,KUMA S,et al. An overview of ongoing point cloud compression standardization activities:video-based (V-PCC) and geometry-based (G-PCC)[J]. APSIPA Transactions on Signal and Information Processing,2020,9(1):E13. [16] TRAUTMANN D,VOß B,WILDE A,et al. Microevolution in cyanobacteria:re-sequencing a motile substrain of synechocystis sp. PCC 6803[J]. DNA Research,2012,19(6):435 doi: 10.1093/dnares/dss024 [17] JEON H,OH S. Hybrid-recursive feature elimination for efficient feature selection[J]. Applied Sciences,2020,10(9):3211 doi: 10.3390/app10093211 [18] ABDELAZIZ,MERGHADI,. Machine learning methods for landslide susceptibility studies:a comparative overview of algorithm performance[J]. Earth-Science Reviews,2020,207:103225 doi: 10.1016/j.earscirev.2020.103225 [19] HUANG F,GUO Z,YIN K. Regional landslide susceptibility mapping based on grey relational degree model. Diqiu Kexue-Zhongguo Dizhi Daxue Xuebao/Earth Science-Journal of China University of Geosciences 44 (2018 [20] DELIANG,SUN,. A random forest model of landslide susceptibility mapping based on hyperparameter optimization using Bayes algorithm[J]. Geomorphology,2020,362:107201 doi: 10.1016/j.geomorph.2020.107201 [21] SPERANDEI S. Understanding logistic regression analysis[J]. Biochemia Medica,2014:12 [22] FLEMING S W,WATSON J R,ELLENSON A,et al. Machine learning in Earth and environmental science requires education and research policy reforms[J]. Nature Geoscience,2021,14(12):878 doi: 10.1038/s41561-021-00865-3 [23] ZHOU X Z,WEN H J,LI Z W,et al. An interpretable model for the susceptibility of rainfall-induced shallow landslides based on SHAP and XGBoost[J]. Geocarto International,2022,37(26):13419 doi: 10.1080/10106049.2022.2076928 [24] 杜常见,易庆林,周宝,等. 基于GIS和加权信息量的三峡库区 云阳县滑坡灾害易发性评价[J]. 三峡大学学报(自然科学版),2017,39(2):48 [25] REICHENBACH P,ROSSI M,MALAMUD B D,et al. A review of statistically-based landslide susceptibility models[J]. Earth-Science Reviews,2018,180:60 doi: 10.1016/j.earscirev.2018.03.001 [26] KALANTAR B,PRADHAN B,NAGHIBI S A,et al. Assessment of the effects of training data selection on the landslide susceptibility mapping:a comparison between support vector machine (SVM),logistic regression (LR) and artificial neural networks (ANN)[J]. Geomatics,Natural Hazards and Risk,2018,9(1):49 doi: 10.1080/19475705.2017.1407368 [27] 于宪煜. 基于多源数据和多尺度分析的滑坡易发性评价方法研究[D]. 武汉:中国地质大学 [28] WEISS A. Topographic position and landforms analysis[R]. San Diego,CA:ERSI User Conference ,2001 [29] LUO W,LIU C C. Innovative landslide susceptibility mapping supported by geomorphon and geographical detector methods[J]. Landslides,2018,15(3):465 doi: 10.1007/s10346-017-0893-9 [30] WANG J F,LI X H,CHRISTAKOS G,et al. Geographical detectors-based health risk assessment and its application in the neural tube defects study of the Heshun region,China[J]. International Journal of Geographical Information Science,2010,24(1):107 doi: 10.1080/13658810802443457 [31] Marukatat,Sanparith. Kernel matrix decomposition via empirical kernel map[J]. Pattern Recognition Letters,2016,77:50 doi: 10.1016/j.patrec.2016.03.031 [32] GEWERS F L,FERREIRA G R,DE ARRUDA H F,et al. Principal component analysis:a natural approach to data exploration[J]. ACM Computing Surveys,54(4):1 [33] ADELOMOU P A,FAULI D C,RIBÉ E G,et al. Quantum case-based reasoning (qCBR)[J]. Artificial Intelligence Review,2023,56(3):2639 doi: 10.1007/s10462-022-10238-w [34] MENG Q. LightGBM:a highly efficient gradient boosting decision tree[C]. Neural Information Processing Systems. Curran Associates Inc ,2017 [35] LUNDBERG S M,LEE S I. A unified approach to interpreting model predictions,Proceedings of the 31st International Conference on Neural Information Processing Systems,Curran Associates Inc. ,Long Beach,California,USA,2017,pp. 4768–4777. [36] SHAPLEY L S. A value for n-person games[M]//The Shapley Value. Cambridge:Cambridge University Press,1988:31-40 [37] CAO Y F,JIA H L,XIONG J N,et al. Flash flood susceptibility assessment based on geodetector,certainty factor,and logistic regression analyses in Fujian Province,China[J]. ISPRS International Journal of Geo-Information,2020,9(12):748 doi: 10.3390/ijgi9120748 [38] JINTAO,YANG. New method for landslide susceptibility mapping supported by spatial logistic regression and GeoDetector:a case study of Duwen Highway Basin,Sichuan Province,China[J]. Geomorphology,2019,324:62 doi: 10.1016/j.geomorph.2018.09.019 [39] VICTORIA A H,MARAGATHAM G. Automatic tuning of hyperparameters using Bayesian optimization[J]. Evolving Systems,2021,12(1):217 doi: 10.1007/s12530-020-09345-2 [40] WANG Y,WEN H J,SUN D L,et al. Quantitative assessment of landslide risk based on susceptibility mapping using random forest and GeoDetector[J]. Remote Sensing,2021,13(13):2625 doi: 10.3390/rs13132625 [41] HONG H Y,CHEN W,XU C,et al. Rainfall-induced landslide susceptibility assessment at the Chongren area (China) using frequency ratio,certainty factor,and index of entropy[J]. Geocarto International,2016:1 [42] DEVKOTA K C,REGMI A D,POURGHASEMI H R,et al. Landslide susceptibility mapping using certainty factor,index of entropy and logistic regression models in GIS and their comparison at Mugling-Narayanghat Road section in Nepal Himalaya[J]. Natural Hazards,2013,65(1):135 doi: 10.1007/s11069-012-0347-6 [43] CHANGBAO,GUO. Quantitative assessment of landslide susceptibility along the Xianshuihe fault zone,Tibetan Plateau,China[J]. Geomorphology,2015,248:93 doi: 10.1016/j.geomorph.2015.07.012 [44] THAO P,NGO T,. Evaluation of deep learning algorithms for national scale landslide susceptibility mapping of Iran[J]. Geoscience Frontiers,2021,12(2):505 doi: 10.1016/j.gsf.2020.06.013 -