Impact of Sample Size and Sampling Method on Accuracy of Topsoil pH Prediction on a Regional Scale

doi:10.11766/trxb202112010651

Home > Archive>Volume 60, Issue 6, 2023 >1595-1609. DOI:10.11766/trxb202112010651

Impact of Sample Size and Sampling Method on Accuracy of Topsoil pH Prediction on a Regional Scale
DOI:
                        10.11766/trxb202112010651
                    
CSTR:
                        
Author:
                        
Affiliation:
Clc Number:
Fund Project:Supported by the National Key Research and Development Program (No.2021YFD1700900)

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

【Objective】Under the background of high-intensity soil resource utilization, digital soil mapping has become an effective method to obtain and characterize soil information quickly, efficiently and accurately. The accuracy and reliability of soil spatial prediction and digital mapping are restricted by multiple factors, such as soil sample size, sampling strategy, prediction model, the complexity of geomorphology and soil-forming environment in the target region, and quality of covariate data. 【Method】Choosing Henan Province as the study region, we applied five of the most representative machine learning (ML) algorithms to spatially predict and digitally map the topsoil pH of croplands. Afterwards, the impact of different sample sizes and sampling methods on the performance of the chosen ML models and the prediction accuracy of topsoil pH were compared. 【Result】The results showed that: (1) When the soil sample size increased from 200 to 2 000, the performance of all ML models and prediction accuracy of topsoil pH showed a general trend of rapid increase regardless of the sampling method. When sample size reached and exceeded 2 000, the performance of most ML models tended to be stable, and the prediction accuracy of topsoil pH increase rapidly slowed down, suggesting that a soil sample size of 2 000 might be the sample size threshold for these ML models to predict the topsoil pH of croplands in this area. (2) The performance of the five ML models and their topsoil pH prediction accuracy was significantly different. The tree-based ML models, namely Random forests (RF) and Cubist performed best. No matter which sampling method was used, when the sample size was more than 2 000, the archived coefficient of determination (R²) of the two models could be stable between 0.75 and 0.80, and the RMSE could be kept below 0.50. (3) When the soil sample size was large enough, the sampling method had little impact on the ML model performance. Also, the topsoil pH prediction accuracy and the sampling method gradually highlighted when the soil sample size was less than 2 000. Comparatively, Conditioned Latin hypercube sampling (clhs) had advantages when the sample size was small. When the sample size was 1 000, clhs sampling method could still keep the R² of random forest and Cubist prediction at about 0.80. Even when the sample size was as small as 200, the R² archived by all five ML models under the clhs sampling method was above 0.55. (4) The uncertainty analysis showed that 73.9% of the observed values of topsoil pH of the validation samples fell into the 90% Prediction Interval (PI) of the random forest model, indicating that the reliability of the model was slightly overconfident, but it was within the acceptable range. In addition, the data indicated that the uncertainty of model prediction was not significantly correlated with sample size. 【Conclusion】Tree-structured machine learning models Random Forest and Cubist stand out in this case. Improving the spatial prediction and digital mapping accuracy of soil target variables cannot be achieved simply by expanding the scale of sample points and increasing the density of sample points. It is necessary to improve the model prediction performance and covariate data quality at the same time. When the sample size is large enough, the sampling strategy has little effect on the performance of the ML model and the prediction accuracy of surface soil pH; when the sample size is smaller than a certain threshold, the sampling method has a significant impact on the model performance and prediction results.

Reference

Cited by

Get Citation

SUN Yueqi, SUN Xiaomei, WU Zhenfu, YAN Junying, ZHAO Yanfeng, CHEN Jie. Impact of Sample Size and Sampling Method on Accuracy of Topsoil pH Prediction on a Regional Scale[J]. Acta Pedologica Sinica,2023,60(6):1595-1609.

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:December 01,2021
Revised:July 03,2022
Adopted:September 15,2022
Online: September 27,2022
Published: November 28,2023

Home

About Journal

Editorial Board

Subscription

Publication Ethics

E-mail Alert

Contact Us

中文

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code