Abstract:Diffuse reflectance spectroscopy within the visible and near-infrared (Vis-NIR) range is a promising way for acquisition of soil properties and digital soil mapping. The diffuse reflectance spectroscopy technique is rapid, nondestructive, environment-friendly and more efficient than the conventional analysis method. However, due to the diversity and spatial heterogeneity of soil, the prediction model based on the technique will have to face the issue of universality. Total nitrogen (TN) in soil is not only a significant index of soil fertility but also an important factor deciding crop yield. It is, therefore, essential to timely acquire the information of TN in soil. This paper introduces a method, i.e. locally weighted regression (LWR), as supplement to the use of the Vis-NIR spectrum technique in predicting TN in soil at a regional scale, and evaluates accuracy of the prediction using Vis-NIR plus LWR. To that end a total of four hundred and fifty soil samples were collected from Zhejiang, Jilin, Yunnan, Hainan and Gansu, air dried and ground to pass a 2 mm sieve. Their Vis-NIR diffuse reflectance spectra were collected using a FieldSpec Pro FR Spectrometer. The reflectance spectra in the wavelength range from 400 to 2 450 nm were denoised by Savitzky-Golay and first derivatived. Three fourths of the samples were selected for calibration dataset using the Kennard-Stone algorithm and the remaining one fourth were used as validation dataset. The core of the LWR method is to select samples from the calibration dataset most spectrally similar to those in the validation dataset. The algorithm of LWR goes in three steps: to decompose and compress the spectral matrix through Principal Component Analysis and pick out local modeling subsets from the modeling dataset similar to the validation dataset by Euclidean distance. Based on the spectral distance of each sample in the local modeling subset to the validation sample, weight of the sample in the regression model is defined, by means of tri-cube weight function. The number of principle components and the number of similar samples were the crucial parameters in the LWR model, and in this study, the two parameters were optimized to be 5 and 40, respectively. The determination coefficient (RP 2), the root mean square error (RMSEp ) and ratio of standard error of performance to standard deviation (RPD) was 0.63, 0.36 g kg-1 and 1.63, respectively, in the PLSR model. However, the support vector machine (SVM) model and artificial neural network (ANN) model was higher than the PLSR model in prediction accuracy (RP 2=0.75~0.80, RMSEp =0.27~0.30 g kg-1, RPD=1.98~2.22). Thanks to the advantages of LWR in algorithm, the LWR model reduced the interference of samples lower in similarity in local modeling, and hence increased the accuracy of TN prediction (RP 2=0.83, RMSEp =0.25 g kg-1, RPD=2.41). The findings demonstrate that correlation coefficient between soil TN and the spectral reflectance after first-order differential transformation peaks at 820, 1 400, 1 430, 1 630, 1 800, 1 930, 2 100, 2 200 and 2 300 nm, which overlap the important bands for spectral modeling of soil organic matter. Due to the spatial heterogeneity of the study areas and soil samples, the two crucial parameters, i.e. number of similar samples and number of principle components, of the LWR model vary with the modeling datasets, so they should be optimized when LWR is used to predict TN. The LWR TN prediction model diminishes the probability of underestimating TN content as the PLSR model would, and makes the prediction closer to 1:1 line. Besides, the LWR model performs better than the non-linear ANN and SVM models in TN prediction, and does not have any black-box problem. Therefore, it can be concluded that LWR is a reliable method for prediction of soil TN content when a large spectral database is available. With the consummation of various large-scaled soil spectral libraries, LWR can be used to tap more useful information out of these soil database and bring them into full play.