Training Sample Selection Method Based on Grading of Soil Types by Area for Updating Conventional Soil Maps
Author:
Affiliation:

Clc Number:

Fund Project:

the National Natural Science Foundation of China (41431177; 41471178), the Natural Science Research Program of Jiangsu(14KJA170001), the Graduate Research Innovation Program of Jiangsu(KYLX15_0715), the National Basic Research Program of China (2015CB954102),the “One-Thousand Talents” Program of China

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    【Objective】Traditional soil surveyshave turned out huge piles ofconventional soil mapsvarious in scale and nature.Although these maps are not very high in spatial detail or accuracy, they contain large volumes of valuable expertise concerning soil-environment relationships in relevant regions. Data mining models can be used to extract from these maps information useful to updatingof the conventional soil maps. In using data mining models to extract the information of soil spatial distribution, selection of training samples is an essential step.Quality of training samples will affect to a great extent full expression of soil-environmental relationships and accuracy of the updatedsoil maps.The area-weighted proportion method was a common method for selecting of training samples. However, this method usually assigns too much weight to those soil types large in area, so that too many training samples would be selected. Meanwhile, random selection of training samples from polygons of the same soil type may bring in some “noise” samples, occurring ontransition areas between soil types,which make the accuracy of the updated soil maps not high.【Method】In this paper, a new method was developed to select training samples from conventional soil mapsbased on grading of soil types by area.The method consists of the following two steps. The first step is to specifytypical (representative)samples of each soil type based on conventional soil map, so as to avoid generation of “noise pixels” due to misplacement in delineating boundaries between soil polygons.It is assumed that most of the boundaries of the soil polygons of a certain soil type arecorrectly delineated, and then the peak of the histogram of a certain environmental factor enclosed in the polygons of the soil type represents the typical environmental conditionunder which the soil develops or exists. The pixels close to the selectedenvironmentalconditions or within the peak zone of the histogram are considered as representative samples. All the representative samples selected through histograms of various environmental conditions of a certain soil type are combined into a typical sample set of the soil type.The second step is to select training samples based on grading of soil type by area, with a view to keep the numbers of samples of each soil type in balance. Soil types in the samegrade should have the same number of training samples out of the typical sample set of each of the soil types.The random forest model adopted in this study is to update conventional soil maps based on the selected training samples. To evaluate the above-proposed method, comparison was made between this method and two other training sample selection methods.Oneis to randomly select trainingsamples from polygons of each soil type and the number oftraining samples for each soil type depended on proportion of the grade the soil type is in, while the other is the common area-weighed proportion method, which randomly selects training samples form the soil polygons of a soil type and the number of training samples for each soil type depended on the area-weighted proportion of the soil type.The study area was a small watershed inRaffelson, Wisconsin of USA.The three selection methods were tried repeatedly, each for 500 times, and validate mean precision of the inferential mapping and proportion of the updated conventional soil maps with 92 independent verification samples in the field.【Result】Resultsshow that based on the 500 trails, comparison of this method with the other two reveals thatabout 79.5%, 71.8% and 63.6% of the conventional soil maps could be updated, respectively. Meanwhile, the updatedsoil maps based on the proposed training sample selectionmethod are more consistent with the actual soil distribution inthe Raffelson watershed.【Conclusion】It is concluded that the proposed method is an effective training sample selection method for data mining model to update conventional soil maps.

    Reference
    Related
    Cited by
Get Citation

LIU Xueqi, ZHU Axing, YANG Lin, MIAO Yamin, ZENG Canying. Training Sample Selection Method Based on Grading of Soil Types by Area for Updating Conventional Soil Maps[J]. Acta Pedologica Sinica,2017,54(1):36-47.

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:March 21,2016
  • Revised:July 13,2016
  • Adopted:July 27,2016
  • Online: October 17,2016
  • Published: