Dataset Open Access
Flora of Russia on iNaturalist backup 2020 Sep 08 (886K records in total). Metadata only and hyperlinks to photos in csv format.
The “Flora of Russia” project on iNaturalist brought together professional scientists and amateur naturalists from all over the country. Over 10,000 people participate in data collection. In 20 months, the participants collected 750,000 confirmed and 136,000 unverified photo observations on 6850+ species of the Russian flora. This is the largest dataset of open distributional data on the country’s biodiversity and a leading source of data on the current state of the nation’s flora.
We prepared the stable version of the project’s data as of Sep. 08, 2020. We are using 27 columns for further processing out of 66 available columns, since the whole iNaturalist dataset in long-tailed. Column labels and column descriptions are given below (abbreviations: A - automatically generated data (usually from exif files of photos); M - manually inserted data; AM - both options are possible (automatically generated data which could be manually edited):
1. id – Unique identifier for the observation (A)
2. observed_on_string – Date/time as entered by the observer (AM)
3. observed_on – Normalised date of observation (A)
4. time_observed_at – Normalised date/time of observation (A)
5. time_zone – Time zone of observation (AM)
6. user_id – Unique identifier for the observer (A)
7. user_login – Username of the observer (A)
8. created_at – Date/time observation was created (A)
9. updated_at – Date/time observation was last updated (A)
10. quality_grade – Quality grade of this observation; "research grade" only for the "Flora of Russia" project and "needs ID" only for the project's backlog (A)
11. license – License the observer has chosen for this observation (AM)
12. url – URL for the observation (A)
13. image_url – URL for the default image (A)
14. oauth_application_id – Which application was used to post the observation (A)
15. latitude – Publicly visible latitude (AM)
16. longitude – Publicly visible longitude (AM)
17. positional_accuracy – Accuracy estimate in meters (AM)
18. private_latitude – Private latitude, set if observation private or obscured (AM)
19. private_longitude – Private longitude, set if observation private or obscured (AM)
20. private_positional_accuracy – Coordinate precision, set if observation private or obscured (AM)
21. geoprivacy – Whether or not the observer has chosen to obscure or hide the coordinates (AM)
22. taxon_geoprivacy – Most conservative geoprivacy applied due to the conservation statuses of taxa in current identification (A)
23. coordinates_obscured – Whether or not the coordinates have been obscured, either because of geoprivacy or because of a threatened taxon (A)
24. positioning_device – Device used to determine coordinates (A)
25. positioning_method – How coordinates were determined (A)
26. scientific_name – Scientific name of the observed taxon according to iNaturalist taxonomic backbone (AM)
27. taxon_id – Unique identifier for the observed taxon (A)
28. gbif_id – URL for the corresponding GBIF record (A)
Using iNaturalist export tools, we downloaded four csv files with the records fitting the project’s criteria:
1) Observations made from 1970-09-01 to 2019-06-30 (184,103 records), 10:57 MSK Sep. 8, 2020
2) Observations made from 2019-07-01 to 2020-05-20 (191,156 records), 12:28 MSK Sep. 8, 2020
3) Observations made from 2020-05-21 to 2020-07-05 (195,060 records), 14:04 MSK Sep. 8, 2020
4) Observations made from 2020-07-06 to 2020-09-08 (181,261 records), 15:39 MSK Sep. 8, 2020
The fifth file contains the project’s backlog, i.e. the observations that are either unidentified or unverified (“needs ID” grade in iNaturalist terms):
5) Observations from needs-id-backlog of “Flora of Russia” (136,669 records), 17:04 MSK Sep. 8, 2020
We amended the dataset on Sep. 25, 2020 after data audit performed by Dr Robert Mesibov (https://www.datafix.com.au) in line with preparation of the data paper submitted to “Biodiversity Data Journal”. All records with positional accuracy exceeding 50,000 m were marked as having inaccurate location and reported to users. Altogether, we excluded 1,106 observations from the project’s data and 587 observations from the backlog from the backup on this ground.
The “research-grade” observations with free licenses (CC0, CC-BY, and CC-BY-NC) are fully available in GBIF within “iNaturalist Research-grade Observations” occurrence dataset (https://doi.org/10.15468/ab3s5x). We added the last column "gbif_id" to all csv files of our dataset with URLs of GBIF records using the GBIF Occurrence Download https://doi.org/10.15468/dl.msfxkn performed Sep. 28, 2020.
Finally, we left in this dataset only observations with free licenses (CC0, CC-BY, and CC-BY-NC). This five files are forming the stable project backup with 760,251 records.
1) Observations made from 1970-09-01 to 2019-06-30 (167,501 records), 10:57 MSK Sep. 8, 2020
2) Observations made from 2019-07-01 to 2020-05-20 (171,277 records), 12:28 MSK Sep. 8, 2020
3) Observations made from 2020-05-21 to 2020-07-05 (164,269 records), 14:04 MSK Sep. 8, 2020
4) Observations made from 2020-07-06 to 2020-09-08 (148,960 records), 15:39 MSK Sep. 8, 2020
5) Observations from needs-id-backlog of “Flora of Russia” (108,244 records), 17:04 MSK Sep. 8, 2020
All project data can be freely used in scientific, educational and environmental activities.