cc_outl.RdRemoves out or flags records that are outliers in geographic space according to the method
defined via the method argument. Geographic outliers often represent
erroneous coordinates, for example due to data entry errors, imprecise
geo-references, individuals in horticulture/captivity.
cc_outl(x, lon = "decimallongitude", lat = "decimallatitude", species = "species", method = "quantile", mltpl = 5, tdi = 1000, value = "clean", sampling_thresh = 0, verbose = TRUE)
| x | data.frame. Containing geographical coordinates and species names. |
|---|---|
| lon | character string. The column with the longitude coordinates. Default = “decimallongitude”. |
| lat | character string. The column with the latitude coordinates. Default = “decimallatitude”. |
| species | character string. The column with the species name. Default = “species”. |
| method | character string. Defining the method for outlier selection. See details. One of “distance”, “quantile”, “mad”. Default = “quantile”. |
| mltpl | numeric. The multiplier of the interquartile range
( |
| tdi | numeric. The minimum absolute distance ( |
| value | character string. Defining the output value. See value. |
| sampling_thresh | numeric. Cut off threshold for the sampling correction.
Indicates the quantile of sampling in which outliers should be ignored. For instance,
if |
| verbose | logical. If TRUE reports the name of the test and the number of records flagged. |
Depending on the ‘value’ argument, either a data.frame
containing the records considered correct by the test (“clean”) or a
logical vector (“flagged”), with TRUE = test passed and FALSE = test failed/potentially
problematic . Default = “clean”.
The method for outlier identification depends on the method argument.
If “outlier”: a boxplot method is used and records are flagged as
outliers if their mean distance to all other records of the same
species is larger than mltpl * the interquartile range of the mean distance
of all records of this species. If “mad”: the median absolute
deviation is used. In this case a record is flagged as outlier, if the
mean distance to all other records of the same species is larger than
the median of the mean distance of all points plus/minus the mad of the mean
distances of all records of the species * mltpl. If “distance”:
records are flagged as outliers, if the minimum distance to the next
record of the species is > tdi. For species with records from > 10000
unique locations a random sample of 1000 records is used for
the distance matrix calculation.
The likelihood of occurrence records being erroneous outliers is linked to the sampling effort in any given location. To account for this, the sampling_cor option fetches the number of occurrence records available from www.gbif.org, per country as a proxy of sampling effort. The outlier test (the mean distance) for each records is than weighted by the log transformed number of records per square kilometer in this country. See for https://ropensci.github.io/CoordinateCleaner/articles/Tutorial_geographic_outliers.html an example and further explanation of the outlier test.
See https://ropensci.github.io/CoordinateCleaner/ for more details and tutorials.
Other Coordinates: cc_cap,
cc_cen, cc_coun,
cc_dupl, cc_equ,
cc_gbif, cc_inst,
cc_iucn, cc_sea,
cc_urb, cc_val,
cc_zero
x <- data.frame(species = letters[1:10], decimallongitude = runif(100, -180, 180), decimallatitude = runif(100, -90,90)) cc_outl(x)#>#>#> species decimallongitude decimallatitude #> 1 a 33.595505 -49.261924 #> 2 b 67.203619 63.716334 #> 3 c 84.711681 -11.590733 #> 4 d -14.002125 -37.929287 #> 5 e 58.319740 24.765241 #> 6 f -179.655140 -42.382897 #> 7 g 171.347591 -20.471692 #> 8 h -100.741156 21.454915 #> 9 i -69.721978 7.502055 #> 10 j 83.363989 -2.730420 #> 11 a -131.937759 47.573085 #> 12 b -144.059018 82.984849 #> 13 c -118.301345 -41.294784 #> 14 d 42.185372 41.508040 #> 15 e 65.974791 -65.850202 #> 16 f 104.606665 -81.288319 #> 17 g -103.230674 36.649273 #> 18 h 105.123919 -63.501517 #> 19 i 33.348387 48.447960 #> 20 j 38.254897 -58.256980 #> 21 a -140.123324 -70.063380 #> 22 b -109.580890 78.901633 #> 23 c -177.454521 62.364805 #> 24 d 158.713414 12.805845 #> 25 e -155.444634 32.236813 #> 26 f -130.705495 -73.921996 #> 27 g -60.185871 51.367343 #> 28 h -3.712215 -49.147659 #> 29 i -117.736366 -9.327876 #> 30 j -156.621873 -60.979025 #> 31 a -17.399884 -58.299896 #> 32 b -43.415153 -54.318938 #> 33 c 147.347412 -25.629963 #> 34 d -169.219275 -57.360023 #> 35 e -109.873309 11.006075 #> 36 f 64.612574 28.634942 #> 37 g 89.087869 28.845614 #> 38 h 53.833577 -89.567858 #> 39 i -152.070956 88.820286 #> 40 j 35.875316 22.948062 #> 41 a 136.443892 -87.366540 #> 42 b 83.533376 -53.067921 #> 43 c -151.180889 29.353800 #> 44 d 119.789932 -6.526075 #> 45 e 131.294051 -25.137332 #> 46 f 76.667390 34.085950 #> 47 g 92.666560 -43.405058 #> 48 h -135.923790 63.216906 #> 49 i -110.268763 -3.599156 #> 50 j -66.895816 6.707950 #> 51 a -130.488126 29.095182 #> 52 b 125.209037 -35.846188 #> 53 c -5.320085 -42.569727 #> 54 d -10.712619 -35.091697 #> 55 e 104.776335 67.492488 #> 56 f -67.326022 39.318699 #> 57 g -68.844775 -16.754455 #> 58 h -55.744079 -86.515295 #> 59 i 118.239593 -80.967535 #> 60 j -81.361621 36.405186 #> 61 a -24.899009 -78.053351 #> 62 b -29.916778 -83.496174 #> 63 c -131.647608 -86.372335 #> 64 d 67.082904 -22.899403 #> 65 e -118.051878 -49.380227 #> 66 f 152.457640 -1.454742 #> 67 g 40.947693 -9.603872 #> 68 h -66.373399 34.236882 #> 69 i 111.503913 61.693196 #> 70 j 144.854676 -22.349940 #> 71 a -134.892792 87.280144 #> 72 b 23.600590 81.436739 #> 73 c -134.049202 88.181732 #> 74 d 45.935142 -10.236896 #> 75 e -152.883377 -62.322507 #> 76 f -77.237523 80.050918 #> 77 g -68.183126 3.503954 #> 78 h -127.178078 -7.985771 #> 79 i 107.917385 -50.383764 #> 80 j -118.081411 -64.947171 #> 81 a 102.846172 -51.091581 #> 82 b -10.787125 -16.977839 #> 83 c -24.135218 -47.927774 #> 84 d 26.831830 -85.488055 #> 85 e 82.789180 45.421553 #> 86 f 141.907373 41.773058 #> 87 g -100.689707 -16.360781 #> 88 h -75.025282 23.554405 #> 89 i 112.643550 -78.219903 #> 90 j -164.303161 -63.235413 #> 91 a 1.897799 -39.653682 #> 92 b -93.003883 63.118226 #> 93 c 32.091271 -86.122660 #> 94 d 169.610605 -44.490622 #> 95 e -96.220439 28.208070 #> 96 f 132.270486 52.046894 #> 97 g -140.125773 37.592217 #> 98 h 108.537041 -78.373261 #> 99 i 56.852804 2.036042 #> 100 j 54.739547 -53.832897cc_outl(x, method = "quantile", value = "flagged")#>#>#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE #> [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE #> [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE #> [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE #> [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE #> [76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE #> [91] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUEcc_outl(x, method = "distance", value = "flagged", tdi = 10000)#>#>#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE #> [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE #> [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE #> [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE #> [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE #> [76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE #> [91] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUEcc_outl(x, method = "distance", value = "flagged", tdi = 1000)#>#>#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE #> [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE #> [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE #> [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE #> [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE #> [76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE #> [91] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE