检索项 文章编号 中文标题 英文标题 作者英文名 作者中文名 单位中文名 单位英文名 中文关键词 英文关键词 中文摘要 英文摘要 基金项目 DOI 检索词 1948 1950 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 到 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 1990 1989 1988 1987 1986 1985 1984 1983 1982 1981 1980 1979 1978 1966 1965 1964 1963 1962 1961 1960 1959 1958 1957 1956 1955 1954 1953 1952 1950 1948
 土壤学报  2021, Vol. 58 Issue (4): 887-899  DOI: 10.11766/trxb202001140623 0

### 引用本文

YUAN Yuqi, CHEN Hanyue, ZHANG Liming, et al. Prediction of Spatial Distribution of Soil Organic Carbon in Farmland Based on Multi-Variables and Random Forest Algorithm—A Case Study of a Subtropical Complex Geomorphic Region in Fujian as an Example. Acta Pedologica Sinica, 2021, 58(4): 887-899.

### 通讯作者Corresponding author

Prediction of Spatial Distribution of Soil Organic Carbon in Farmland Based on Multi-Variables and Random Forest Algorithm—A Case Study of a Subtropical Complex Geomorphic Region in Fujian as an Example
YUAN Yuqi, CHEN Hanyue, ZHANG Liming, REN Biwu, XING Shihe, TONG Junyue
University Key Lab of Soil Ecosystem Health and Regulation in Fujian, College of Resource and Environment, Fujian Agriculture and Forestry University, Fuzhou 350002, China
Abstract: 【Objective】Soil organic carbon (SOC) plays an important role in soil fertility and the terrestrial ecosystem carbon cycle. A detailed understanding of the spatial distribution of SOC is vital to management of the soil resources and mitigation of the global climate change. With the development of the 3S technology, the models for predicting soil properties based on environmental variables are getting increasingly popular. The purpose of our study is to try to simulate the complex and nonlinear relationship between SOC and environmental variables, and evaluate the importance of soil attributes to accuracy in SOC mapping.【Method】For this purpose, machine learning methods and a random forest (RF) model was applied to map the spatial distribution of topsoil organic carbon contents for farmlands in the high-yield agricultural areas in Southeast Fujian. A set of environmental variables (including 5 hard-to-obtain quantitative soil attributes such as hydrolysable nitrogen, available phosphorus, pH, etc) and 11 easy-to-obtain variables (i.e. topography factors, vegetation indexes and climate factors) were acquired through analysis of a large number of soil samples collected from that region, and then processed with the RF algorithm to predict spatial distribution of SOC content in the topsoil layers of the farmlands of that region. Two different combinations of the above variables were entered as input to RF-S model and RF-A model separately. The RF-S model functioned only on the basis of easy-to-obtain variables and the RF-A model did on the basis of all the variables, both easy-to-or hard-to-obtain ones, for predicting SOC. Root mean square errors (RMSE), mean absolute errors (MAE), Pearson correlation coefficients (r), coefficients of variation (CV), relative errors (RE) and relative root mean square errors (RRMSE) of the two models were worked out for evaluation of accuracy of their predictions, and screening-out of an optimal RF model for mapping SOC in the study area based the raster datasets of all variables. Then cross-validation was performed to compare the optimal RF model with the Ordinary Kriging (OK) interpolation model.【Result】Results show that of the two models, different in input of environmental variables, the RF-A model that functioned based on remote sensing variables, climate factors and soil attributes was much better than the other in performance and could explain the most of the spatial heterogeneity of SOC. Compared with the RF-S model, the RF-A model significantly improved in fitting and prediction (r increased by 7.95% and RMSE decreased by 45.13%). The SOC contents of the farmlands of the region predicted with the RF-A model varied in the range of 14.70±2.95 g·kg-1 and were quite similar to what was obtained with the OK model in spatial distribution, i.e. an ascending trend from the east coastal area to the western inland of the study area. And despite sampling percentage, the RF-A model was generally higher than the OK model in prediction accuracy, and in capability of capturing spatial heterogeneity, and preferred especially in the case of relatively fewer sampling sites. Among the variables, hydrolysable nitrogen (N) was the most important one for the RF-A model, and followed by elevation(DEM). Both of them significantly affected spatial heterogeneity of the SOC, exhibiting positive relationships with SOC.【Conclusion】It is therefore concluded that the random forest model that functions based on remote sensing variables, climate factors as well as soil attributes is a promising approach to predicting spatial distribution of SOC in Southeast Fujian. In addition, soil attributes variables, such as N and P, should be taken into account for improving prediction accuracy for mapping of SOC in regions with complex geomorphology.
Key words: Soil organic carbon    Random forest    Combination of variables    Spatial distribution    Accuracy evaluation

1 材料与方法 1.1 研究区概况

 图 1 研究区地理位置及采样点、气象站点分布 Fig. 1 Location of the study area and the distribution of soil sampling sites and meteorological stations
1.2 数据来源

1.3 环境变量的获取、组合与筛选

1.4 RF模型构建和验证

1.5 数据处理方法

 $\frac{{NIR - red}}{{NIR{\rm{ + }}red}}$ (1)
 ${\left( {\frac{{NIR - red}}{{NIR{\rm{ + }}red}} + 0.5} \right)^{\frac{1}{2}}} \times 100$ (2)

RF模型构建和预测的实现均通过Python scikit-learn库中RandomForestRegressor包实现。变量相对重要性排序可直接调用工具包中feature_ importances属性实现。

2 结果 2.1 同变量组合下RF模型预测精度对比

 图 2 土壤有机碳实测值与两种不同变量组合模型预测值的累积分布图 Fig. 2 Cumulative distribution map of SOC measured value and predicted value of two different combinations of variables
2.2 RF模型环境变量重要性

2.3 基于不同抽样百分比的精度检验

2.4 耕地土壤有机碳含量空间分布

 图 3 基于RF-S（a）、RF-A（b）和OK（c）模型的闽东南地区耕地SOC空间分布 Fig. 3 SOC spatial distribution in Southeast Fujian estimated by RF-S model(a)RF-A model(b)and OK model(c)

RF-A模型反演得到闽东南区SOC均值为14.70±2.95 g·kg–1，范围为3.63~25.51 g·kg–1，其中13~19 g·kg–1区间的面积占比最高，超过研究区耕地总面积的65%，主要分布在西部内陆闽中大山带戴云山-博平岭段东南侧；小于10 g·kg–1和大于19 g·kg–1的面积占比较低，不足10%，分别分布在闽东南地区三大平原（漳州平原、泉州平原、莆仙平原）和西部海拔最高地；10~13 g·kg–1区间所占面积在19%左右，位于高低值过渡带。

 图 4 基于RF-A模型的土壤有机碳含量与代表性因子关系 Fig. 4 Comparison of soil organic carbon contents based on RF-A model and representative factors
3 讨论 3.1 闽东南地区土壤有机碳空间预测及主要环境变量影响

3.2 RF-A模型精度

4 结论

 [1] Forkuor G, Hounkpatin O K L, Welp G, et al. High resolution mapping of soil properties using remote sensing variables in south-western Burkina Faso: A comparison of machine learning and multiple linear regression models[J]. PLoS One, 2017, 12(1): e0170478. DOI:10.1371/journal.pone.0170478 (0) [2] Zhang C T, Yang Y, He L Y, et al. Prediction of spatial distribution of soil organic matter based on environmental factors and a joint probability method (In Chinese)[J]. Acta Pedologica Sinica, 2014, 51(3): 666-673. [张楚天, 杨勇, 贺立源, 等. 基于环境因子和联合概率方法的土壤有机质空间预测[J]. 土壤学报, 2014, 51(3): 666-673.] (0) [3] Sun X L, Zhao Y G, Zhao L, et al. Prediction and mapping of spatial distribution of soil attributes by using soil-landscape models (In Chinese)[J]. Soils, 2008, 40(5): 837-842. [孙孝林, 赵玉国, 赵量, 等. 应用土壤-景观定量模型预测土壤属性空间分布及制图[J]. 土壤, 2008, 40(5): 837-842.] (0) [4] Jia S Y, Yang X L, Li G, et al. Quantitatively determination of available phosphorus and available potassium in soil by near infrared spectroscopy combining with recursive partial least squares (In Chinese)[J]. Spectroscopy and Spectral Analysis, 2015, 35(9): 2516-2520. [贾生尧, 杨祥龙, 李光, 等. 近红外光谱技术结合递归偏最小二乘算法对土壤速效磷与速效钾含量测定研究[J]. 光谱学与光谱分析, 2015, 35(9): 2516-2520.] (0) [5] Zhou Z Y, Huang W, Xu W, et al. Updating traditional soil maps based on random forest algorithm (In Chinese)[J]. Journal of Huazhong Agricultural University, 2019, 38(3): 53-59. [周紫燕, 黄魏, 许伟, 等. 基于随机森林算法的原始土壤图更新研究[J]. 华中农业大学学报, 2019, 38(3): 53-59.] (0) [6] Fang K N, Wu J B, Zhu J P, et al. A review of technologies on random forests (In Chinese)[J]. Statistics & Information Forum, 2011, 26(3): 32-38. DOI:10.3969/j.issn.1007-3116.2011.03.006 [方匡南, 吴见彬, 朱建平, 等. 随机森林方法研究综述[J]. 统计与信息论坛, 2011, 26(3): 32-38.] (0) [7] Grimm R, Behrens T, Märker M, et al. Soil organic carbon concentrations and stocks on Barro Colorado Island-Digital soil mapping using Random Forests analysis[J]. Geoderma, 2008, 146(1/2): 102-113. (0) [8] Grinand C, Maire G L, Vieilledent G, et al. Estimating temporal changes in soil carbon stocks at ecoregional scale in Madagascar using remote-sensing[J]. International Journal of Applied Earth Observation and Geoinformation, 2017, 54: 1-14. DOI:10.1016/j.jag.2016.09.002 (0) [9] Were K, Bui D T, Dick Ø B, et al. A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape[J]. Ecological Indicators, 2015, 52: 394-403. DOI:10.1016/j.ecolind.2014.12.028 (0) [10] Qi Y B, Wang Y Y, Chen Y, et al. Soil organic matter prediction based on remote sensing data and random forest model in Shaanxi Province (In Chinese)[J]. Journal of Natural Resources, 2017, 32(6): 1074-1086. [齐雁冰, 王茵茵, 陈洋, 等. 基于遥感与随机森林算法的陕西省土壤有机质空间预测[J]. 自然资源学报, 2017, 32(6): 1074-1086.] (0) [11] Wang Y Y, Qi Y B, Chen Y, et al. Prediction of soil organic matter based on multi-resolution remote sensing data and random forest algorithm (In Chinese)[J]. Acta Pedologica Sinica, 2016, 53(2): 342-354. [王茵茵, 齐雁冰, 陈洋, 等. 基于多分辨率遥感数据与随机森林算法的土壤有机质预测研究[J]. 土壤学报, 2016, 53(2): 342-354.] (0) [12] Xu X W, Pan G X, Cao Z H, et al. A study on the influence of soil organic carbon density and its spatial distribution in Anhui Province of China (In Chinese)[J]. Geographical Research, 2007, 26(6): 1077-1086. DOI:10.3321/j.issn:1000-0585.2007.06.002 [许信旺, 潘根兴, 曹志红, 等. 安徽省土壤有机碳空间差异及影响因素[J]. 地理研究, 2007, 26(6): 1077-1086.] (0) [13] Ren L, Yang L A, Wang H, et al. Spatial prediction of soil organic matter in apple region based on random forest (In Chinese)[J]. Journal of Arid Land Resources and Environment, 2018, 32(8): 141-146. [任丽, 杨联安, 王辉, 等. 基于随机森林的苹果区土壤有机质空间预测[J]. 干旱区资源与环境, 2018, 32(8): 141-146.] (0) [14] Liu S S, Yang Y H, Shen H H, et al. No significant changes in topsoil carbon in the grasslands of Northern China between the 1980s and 2000s[J]. Science of the Total Environment, 2018, 624: 1478-1487. DOI:10.1016/j.scitotenv.2017.12.254 (0) [15] Guo P T, Li M F, Luo W, et al. Digital mapping of soil organic matter for rubber plantation at regional scale: An application of random forest plus residuals kriging approach[J]. Geoderma, 2015, 237/238: 49-59. DOI:10.1016/j.geoderma.2014.08.009 (0) [16] Xu W M, Luo X, Chen W F. Spatial distribution characteristics of arable land grade in Fujian Province (In Chinese)[J]. Journal of Fuzhou University(Natural Science Edition), 2018, 46(3): 355-359. [徐伟铭, 罗星, 陈伟锋. 福建省耕地等别空间分布特征研究[J]. 福州大学学报(自然科学版), 2018, 46(3): 355-359.] (0) [17] Guo P T, Li M F, Luo W, et al. Prediction of soil total nitrogen for rubber plantation at regional scale based on environmental variables and random forest approach (In Chinese)[J]. Transactions of Chinese Society of Agricultural Engineering, 2015, 31(5): 194-202. DOI:10.3969/j.issn.1002-6819.2015.05.028 [郭澎涛, 李茂芬, 罗微, 等. 基于多源环境变量和随机森林的橡胶园土壤全氮含量预测[J]. 农业工程学报, 2015, 31(5): 194-202.] (0) [18] 张文开. 福建省耕地资源优化利用[D]. 福州: 福建师范大学, 2002. Zhang W K. Study on the quality utilization of the cultivated land resource in Fujian Province[D]. Fuzhou: Fujian Normal University, 2002. (0) [19] Breiman L. Random forests[J]. Machine Learning, 2001, 45(1): 5-32. DOI:10.1023/A:1010933404324 (0) [20] Chen F X, Cheng J C, Hu Y M, et al. Spatial prediction of soil properties by RBF neural network (In Chinese)[J]. Scientia Geographica Sinica, 2013, 33(1): 69-74. [陈飞香, 程家昌, 胡月明, 等. 基于RBF神经网络的土壤铬含量空间预测[J]. 地理科学, 2013, 33(1): 69-74.] (0) [21] Yang S H, Zhang H T, Chen J Y, et al. The spatial variability of soil organic carbon in plain-hills transition belt and its environmental impact (In Chinese)[J]. China Environmental Science, 2015, 35(12): 3728-3736. DOI:10.3969/j.issn.1000-6923.2015.12.026 [杨顺华, 张海涛, 陈家赢, 等. 平原丘陵过渡带土壤有机碳空间分布及环境影响[J]. 中国环境科学, 2015, 35(12): 3728-3736.] (0) [22] 刘素真. 土壤有机碳储量估算及其空间分布-以福建省为例[D]. 北京: 北京林业大学, 2016. Liu S Z. Estimation of soil organic carbon storage and its spatial distribution-A case of Fujian Province[D]. Beijing: Beijing Forestry University, 2016. (0) [23] 卢蒙. 氮输入对生态系统碳、氮循环的影响: 整合分析[D]. 上海: 复旦大学, 2009. Lu M. The effects of nitrogen additions on ecosystem carbon and nitrogen cycles: A meta-analysis[D]. Shanghai: Fudan University, 2009. (0) [24] Zhang H D, You W Z, Wei W J, et al. Soil physical and chemical properties and correlation with organic carbon in original Korean pine forest in Eastern Liaoning mountainous area (In Chinese)[J]. Journal of Northwest A&F University(Natural Science Edition), 2017, 45(1): 76-82. [张慧东, 尤文忠, 魏文俊, 等. 辽东山区原始红松林土壤理化性质及其与土壤有机碳的相关性分析[J]. 西北农林科技大学学报(自然科学版), 2017, 45(1): 76-82.] (0) [25] Xie E Z, Zhao Y C, Lu F Y, et al. Comparison analysis of methods for prediction of spatial distribution of soil organic matter contents in farmlands south Jiangsu, China (In Chinese)[J]. Acta Pedologica Sinica, 2018, 55(5): 1051-1061. [谢恩泽, 赵永存, 陆访仪, 等. 不同方法预测苏南农田土壤有机质空间分布对比研究[J]. 土壤学报, 2018, 55(5): 1051-1061.] (0) [26] Zhang W, Wang K L, Chen H S, et al. Use of satellite information and GIS to predict distribution of soil organic carbon in depressions amid clusters of Karst peaks (In Chinese)[J]. Acta Pedologica Sinica, 2012, 49(3): 601-606. [张伟, 王克林, 陈洪松, 等. 典型喀斯特峰丛洼地土壤有机碳含量空间预测研究[J]. 土壤学报, 2012, 49(3): 601-606.] (0) [27] Hengl T, Heuvelink G B M, Stein A. A generic framework for spatial prediction of soil variables based on regression-kriging[J]. Geoderma, 2004, 120(1/2): 75-93. DOI:10.1016/j.geoderma.2003.08.018 (0)