Planned intervention: On Thursday March 28th 07:00 UTC Zenodo will be unavailable for up to 5 minutes to perform a database upgrade.
Published October 1, 2020 | Version v1
Dataset Open

Flora of Russia on iNaturalist backup 2020 Sep 08 (750K + 136K records)

  • 1. Lomonosov Moscow State University
  • 2. inaturalist.org

Description

Flora of Russia on iNaturalist backup 2020 Sep 08 (886K records in total). Metadata only and hyperlinks to photos in csv format.

The “Flora of Russia” project on iNaturalist brought together professional scientists and amateur naturalists from all over the country. Over 10,000 people participate in data collection. In 20 months, the participants collected 750,000 confirmed and 136,000 unverified photo observations on 6850+ species of the Russian flora. This is the largest dataset of open distributional data on the country’s biodiversity and a leading source of data on the current state of the nation’s flora.

We prepared the stable version of the project’s data as of Sep. 08, 2020. We are using 27 columns for further processing out of 66 available columns, since the whole iNaturalist dataset in long-tailed. Column labels and column descriptions are given below (abbreviations: A - automatically generated data (usually from exif files of photos); M - manually inserted data; AM - both options are possible (automatically generated data which could be manually edited):

1. id – Unique identifier for the observation (A)

2. observed_on_string – Date/time as entered by the observer (AM)

3. observed_on – Normalised date of observation (A)

4. time_observed_at – Normalised date/time of observation (A)

5. time_zone – Time zone of observation (AM)

6. user_id – Unique identifier for the observer (A)

7. user_login – Username of the observer (A)

8. created_at – Date/time observation was created (A)

9. updated_at – Date/time observation was last updated (A)

10. quality_grade – Quality grade of this observation; "research grade" only for the "Flora of Russia" project and "needs ID" only for the project's backlog (A)

11. license – License the observer has chosen for this observation (AM)

12. url – URL for the observation (A)

13. image_url – URL for the default image (A)

14. oauth_application_id – Which application was used to post the observation (A)

15. latitude – Publicly visible latitude (AM)

16. longitude – Publicly visible longitude (AM)

17. positional_accuracy – Accuracy estimate in meters (AM)

18. private_latitude – Private latitude, set if observation private or obscured (AM)

19. private_longitude – Private longitude, set if observation private or obscured (AM)

20. private_positional_accuracy – Coordinate precision, set if observation private or obscured (AM)

21. geoprivacy – Whether or not the observer has chosen to obscure or hide the coordinates (AM)

22. taxon_geoprivacy – Most conservative geoprivacy applied due to the conservation statuses of taxa in current identification (A)

23. coordinates_obscured – Whether or not the coordinates have been obscured, either because of geoprivacy or because of a threatened taxon (A)

24. positioning_device – Device used to determine coordinates (A)

25. positioning_method – How coordinates were determined (A)

26. scientific_name – Scientific name of the observed taxon according to iNaturalist taxonomic backbone (AM)

27. taxon_id – Unique identifier for the observed taxon (A)

28. gbif_id – URL for the corresponding GBIF record (A)

Using iNaturalist export tools, we downloaded four csv files with the records fitting the project’s criteria:

1) Observations made from 1970-09-01 to 2019-06-30 (184,103 records), 10:57 MSK Sep. 8, 2020

2) Observations made from 2019-07-01 to 2020-05-20 (191,156 records), 12:28 MSK Sep. 8, 2020

3) Observations made from 2020-05-21 to 2020-07-05 (195,060 records), 14:04 MSK Sep. 8, 2020

4) Observations made from 2020-07-06 to 2020-09-08 (181,261 records), 15:39 MSK Sep. 8, 2020

The fifth file contains the project’s backlog, i.e. the observations that are either unidentified or unverified (“needs ID” grade in iNaturalist terms):

5) Observations from needs-id-backlog of “Flora of Russia” (136,669 records), 17:04 MSK Sep. 8, 2020

We amended the dataset on Sep. 25, 2020 after data audit performed by Dr Robert Mesibov (https://www.datafix.com.au) in line with preparation of the data paper submitted to “Biodiversity Data Journal”. All records with positional accuracy exceeding 50,000 m were marked as having inaccurate location and reported to users. Altogether, we excluded 1,106 observations from the project’s data and 587 observations from the backlog from the backup on this ground.

The “research-grade” observations with free licenses (CC0, CC-BY, and CC-BY-NC) are fully available in GBIF within “iNaturalist Research-grade Observations” occurrence dataset (https://doi.org/10.15468/ab3s5x). We added the last column "gbif_id" to all csv files of our dataset with URLs of GBIF records using the GBIF Occurrence Download https://doi.org/10.15468/dl.msfxkn performed Sep. 28, 2020.

Finally, we left in this dataset only observations with free licenses (CC0, CC-BY, and CC-BY-NC). This five files are forming the stable project backup with 760,251 records.

1) Observations made from 1970-09-01 to 2019-06-30 (167,501 records), 10:57 MSK Sep. 8, 2020

2) Observations made from 2019-07-01 to 2020-05-20 (171,277 records), 12:28 MSK Sep. 8, 2020

3) Observations made from 2020-05-21 to 2020-07-05 (164,269 records), 14:04 MSK Sep. 8, 2020

4) Observations made from 2020-07-06 to 2020-09-08 (148,960 records), 15:39 MSK Sep. 8, 2020

5) Observations from needs-id-backlog of “Flora of Russia” (108,244 records), 17:04 MSK Sep. 8, 2020

All project data can be freely used in scientific, educational and environmental activities.

Files

Files (136.3 MB)

Name Size Download all
md5:39a4f9af897c6c300f87b553cf61a948
30.4 MB Download
md5:0e824817925b27b6620f015534a0d8ed
31.0 MB Download
md5:635b4d8fed51e29633762911f4e7342d
29.7 MB Download
md5:94f5a3047a21c1bdc00500a34a094a34
27.2 MB Download
md5:72481a2a74b8ec3c58e4cb62f4098113
18.0 MB Download

Additional details

Related works

Is new version of
Dataset: 10.13140/RG.2.2.17886.87362/1 (DOI)
References
Dataset: 10.15468/dl.msfxkn (DOI)