National Natural Science Foundation of China (No.41971050), Special Fund for Science and Technology Innovation of Fujian Agriculture and Forestry University (No.KFA17616A), Science and Technology Planning Project of Fujian Province (No.2017N5006) and National Undergraduate Innovation Program (No.201910389026)
耕地土壤有机碳（Soil Organic Carbon，SOC）含量既是土壤质量的重要表征，也是农业温室气体的重要源库，而环境变量与随机森林算法（Random Forest，RF）是提高土壤有机碳空间预测精度的重要方法，但不同组合环境变量对RF模型预测精度的影响仍需深入研究。本文以福建闽东南复杂地貌区为例，以两种环境变量组合（遥感变量+气候因子和遥感变量+气候因子+土壤属性）为输入数据集，利用RF算法对耕地表层SOC含量进行模拟预测和精度对比，并与普通克里格（Ordinary Kriging，OK）插值模型进行对比。结果表明，基于全部环境变量构建的RF模型表现最佳，其模型拟合度和预测精度相较于未加入土壤属性的模型有显著提高（r提高7.95%，为0.95，RMSE下降45.13%），且对SOC空间分异信息的捕获更精确，OK模型总体预测精度最弱。利用最优模型反演得到的研究区耕地SOC含量为14.70±2.95 g·kg-1，东部沿海低于西部内陆。变量贡献率分析显示，除了与土壤碳紧密相关的水解性氮（N），遥感变量中数字高程模型（DEM）也是影响闽东南地区SOC预测精度的重要变量，因此，遥感变量、气候因子和土壤属性共同驱动的随机森林模型可作为闽东南复杂地貌区耕地有机碳含量空间预测的有效方法。
[Objective] Soil organic carbon (SOC) plays an important role in soil fertility and the terrestrial ecosystem carbon cycle. A detailed understanding of the spatial distribution of SOC is vital to management of the soil resources and mitigation of the global climate change. With the development of the 3S technology, the models for predicting soil properties based on environmental variables are getting increasingly popular. The purpose of our study is to try to simulate the complex and nonlinear relationship between SOC and environmental variables, and evaluate the importance of soil attributes to accuracy in SOC mapping.[Method] For this purpose, machine learning methods and a random forest (RF) model was applied to map the spatial distribution of topsoil organic carbon contents for farmlands in the high-yield agricultural areas in Southeast Fujian. A set of environmental variables (including 5 hard-to-obtain quantitative soil attributes such as hydrolysable nitrogen, available phosphorus, pH, etc) and 11 easy-to-obtain variables (i.e. topography factors, vegetation indexes and climate factors) were acquired through analysis of a large number of soil samples collected from that region, and then processed with the RF algorithm to predict spatial distribution of SOC content in the topsoil layers of the farmlands of that region. Two different combinations of the above variables were entered as input to RF-S model and RF-A model separately. The RF-S model functioned only on the basis of easy-to-obtain variables and the RF-A model did on the basis of all the variables, both easy-to-or hard-to-obtain ones, for predicting SOC. Root mean square errors (RMSE), mean absolute errors (MAE), Pearson correlation coefficients (r), coefficients of variation (CV), relative errors (RE) and relative root mean square errors (RRMSE) of the two models were worked out for evaluation of accuracy of their predictions, and screening-out of an optimal RF model for mapping SOC in the study area based the raster datasets of all variables. Then cross-validation was performed to compare the optimal RF model with the Ordinary Kriging (OK) interpolation model.[Result] Results show that of the two models, different in input of environmental variables, the RF-A model that functioned based on remote sensing variables, climate factors and soil attributes was much better than the other in performance and could explain the most of the spatial heterogeneity of SOC. Compared with the RF-S model, the RF-A model significantly improved in fitting and prediction (r increased by 7.95% and RMSE decreased by 45.13%). The SOC contents of the farmlands of the region predicted with the RF-A model varied in the range of 14.70±2.95 g·kg-1 and were quite similar to what was obtained with the OK model in spatial distribution, i.e. an ascending trend from the east coastal area to the western inland of the study area. And despite sampling percentage, the RF-A model was generally higher than the OK model in prediction accuracy, and in capability of capturing spatial heterogeneity, and preferred especially in the case of relatively fewer sampling sites. Among the variables, hydrolysable nitrogen (N) was the most important one for the RF-A model, and followed by elevation(DEM). Both of them significantly affected spatial heterogeneity of the SOC, exhibiting positive relationships with SOC.[Conclusion] It is therefore concluded that the random forest model that functions based on remote sensing variables, climate factors as well as soil attributes is a promising approach to predicting spatial distribution of SOC in Southeast Fujian. In addition, soil attributes variables, such as N and P, should be taken into account for improving prediction accuracy for mapping of SOC in regions with complex geomorphology.
袁玉琦,陈瀚阅,张黎明,任必武,邢世和,童珺玥.基于多变量与RF算法的耕地土壤有机碳空间预测研究——以福建亚热带复杂地貌区为例[J].土壤学报,2021,58(4):887-899. DOI:10.11766/trxb202001140623 YUAN Yuqi, CHEN Hanyue, ZHANG Liming, REN Biwu, XING Shihe, TONG Junyue. Prediction of Spatial Distribution of Soil Organic Carbon in Farmland Based on Multi-Variables and Random Forest Algorithm—A Case Study of A Subtropical Complex Geomorphic Region in Fujian as An Example[J]. Acta Pedologica Sinica,2021,58(4):887-899.复制