Signed Biodiversity Data Packages: A Method to Cite, Verify, Mobilize, and Future Proof, Large Image Corpora. hash://sha256/0154b9ddce4d2e280e627a08d1a2d42884201af6ac1ec19606e393deda57f4bb hash://md5/bae7f441cdd2648d2356b2330e4b71e8
Creators
- 1. Ronin Institute; UC Santa Barbara Cheadle Center for Biodiversity and Ecological Restoration
- 2. Botanical Research Institute of Texas
Description
Access to Natural History Collections helps researchers to better understand the natural world.
Millions of digital images of herbarium specimens are openly available via the Internet.
However, using these images in a data-intensive research project raises basic questions like:
"How do I efficiently access, and verify, hundreds of thousands of images?", and, "How do I cite
a version of a large image corpus?"
Here, we present a method to cite, verify and mobilize such image corpora across different
locations and medium types. We demonstrate our method with >100k images made available
through the Botanical Research Institute of Texas using available tools (e.g., rsync, Preston)
and technologies (e.g., internet, postal service). Our results show that our packaging method
allows the US Postal Service to transfer a packaged corpus at about 3 images/s, whereas
retrieving individual images via HTTP achieved a transfer rate of about 0.2 images/s.
Our results support that signed digital packaging of image corpora enables distributed storage
using readily available transfer and storage methods. In addition, our method is future proof
because they can be used with any digital media, including those that are not yet available.
included files are:
00_Poelen_DD2023.mp4 - recorded presentation (see also https://vimeo.com/832006741)
00_Poelen_DD2023.pdf - presentation slides
00_Poelen_DD2023.pptx - presentation slides (powerpoint)
00_Poelen_DD2023_Abstract.pdf - presentation abstract.
Part of:
Digital Data in Biodiversity Data Conference 2023
@ Arizona State University 5-7 June 2023.
Provenance:
preston head \
--remote https://linker.bio\
--anchor hash://sha256/0154b9ddce4d2e280e627a08d1a2d42884201af6ac1ec19606e393deda57f4bb
hash://sha256/0154b9ddce4d2e280e627a08d1a2d42884201af6ac1ec19606e393deda57f4bb
preston head\
--remote https://linker.bio\
--anchor hash://sha256/0154b9ddce4d2e280e627a08d1a2d42884201af6ac1ec19606e393deda57f4bb\
| preston cat\
--remote https://linker.bio\
--anchor hash://sha256/0154b9ddce4d2e280e627a08d1a2d42884201af6ac1ec19606e393deda57f4bb\
| md5sum\
| sed 's+^+hash://md5/+g'\
| cut -d ' ' -f1
hash://md5/bae7f441cdd2648d2356b2330e4b71e8
preston alias\
--remote https://linker.bio\
--anchor hash://sha256/0154b9ddce4d2e280e627a08d1a2d42884201af6ac1ec19606e393deda57f4bb
<urn:preston:guoda:bio:Poelen_DD2023.pdf> <http://purl.org/pav/hasVersion> <hash://sha256/56e997ea728d9276d750c837820796536e86915ca56168f035023e9e254a1f1d> <urn:uuid:03d88005-cb81-4c89-a5a3-6a1c0d2b2b15> .
<urn:preston:guoda:bio:Poelen_DD2023.pptx> <http://purl.org/pav/hasVersion> <hash://sha256/77a0a28a66dcfb335cb4a636f7b2ae0910e911e7e090a8d2b77c44647f916d2d> <urn:uuid:f84952cb-7a08-4acb-8066-fc94e103c073> .
<urn:preston:guoda:bio:Poelen_DD2023.mp4> <http://purl.org/pav/hasVersion> <hash://sha256/085ce0a06692610606f90a7f8a3be79fe2b3d8d670f7b7ad6109600e0e6af05a> <urn:uuid:aac8ae03-0f4c-4db3-89d6-bee1843f5124> .
<urn:uuid:24509fda-cb15-4bf0-816a-52f8795ed9d0> <http://purl.org/pav/hasVersion> <hash://sha256/085ce0a06692610606f90a7f8a3be79fe2b3d8d670f7b7ad6109600e0e6af05a> <urn:uuid:4c326c81-dfa2-460e-a544-63cc3c1dc900> .
<https://docs.google.com/presentation/d/1EDp2PWdggttM0ZumuSED8MKEwQ_iKNgWZ4_wpFoqQaI/export/pptx> <http://purl.org/pav/hasVersion> <hash://sha256/77a0a28a66dcfb335cb4a636f7b2ae0910e911e7e090a8d2b77c44647f916d2d> <urn:uuid:1869c40a-367e-46b4-b23a-079c5dfbe0fd> .
<https://docs.google.com/presentation/d/1EDp2PWdggttM0ZumuSED8MKEwQ_iKNgWZ4_wpFoqQaI/export/pdf> <http://purl.org/pav/hasVersion> <hash://sha256/56e997ea728d9276d750c837820796536e86915ca56168f035023e9e254a1f1d> <urn:uuid:fdf921ad-3dfa-4720-a49d-f5f9f7f99b0f> .
<urn:preston:guoda:bio:Poelen_DD2023.mp4> <http://purl.org/pav/hasVersion> <hash://sha256/2a5ae6e43b324f7faee95f5bcaa25fcf2fc19a42b60bdcf880333dcb2e4cc77e> <urn:uuid:c21ff7b2-1932-4cf9-afcc-1e82900f4295> .
<urn:preston:guoda:bio:Poelen_DD2023.pdf> <http://purl.org/pav/hasVersion> <hash://sha256/c4c1602421ec450a61417123b29d13998451ea1df885add2de807cceb3dd1278> <urn:uuid:e1dd4e5a-56fb-4a53-9c44-12898deb0fdb> .
<urn:preston:guoda:bio:Poelen_DD2023_Abstract.pdf> <http://purl.org/pav/hasVersion> <hash://sha256/ec9d5f60c62994a0c5f512c0ce54f3e0faba08b70cae454022f0be6c8455e9b0> <urn:uuid:fbf0c0e4-ca25-49f1-87cf-14c0bff11960> .
<urn:preston:guoda:bio:Poelen_DD2023.pptx> <http://purl.org/pav/hasVersion> <hash://sha256/3b581c7123f742b4d806f4c56df798dbb46f927d9f55a496b26918329a7ca627> <urn:uuid:2cb0e901-5019-4058-b0b7-a8b216d2e6ad> .
<urn:uuid:a36b76d8-dc5c-490c-a4f0-4d71560922ab> <http://purl.org/pav/hasVersion> <hash://sha256/2a5ae6e43b324f7faee95f5bcaa25fcf2fc19a42b60bdcf880333dcb2e4cc77e> <urn:uuid:238049a1-c722-4511-9dac-085ba3effd92> .
<urn:uuid:0f12ad74-d270-43f2-824d-b8f445657346> <http://purl.org/pav/hasVersion> <hash://sha256/ec9d5f60c62994a0c5f512c0ce54f3e0faba08b70cae454022f0be6c8455e9b0> <urn:uuid:57abcd50-04a1-4681-a109-d357b74940ad> .
<https://docs.google.com/presentation/d/1EDp2PWdggttM0ZumuSED8MKEwQ_iKNgWZ4_wpFoqQaI/export/pptx> <http://purl.org/pav/hasVersion> <hash://sha256/3b581c7123f742b4d806f4c56df798dbb46f927d9f55a496b26918329a7ca627> <urn:uuid:05b6a358-3037-4d25-9fe1-9d56cee84647> .
<https://docs.google.com/presentation/d/1EDp2PWdggttM0ZumuSED8MKEwQ_iKNgWZ4_wpFoqQaI/export/pdf> <http://purl.org/pav/hasVersion> <hash://sha256/c4c1602421ec450a61417123b29d13998451ea1df885add2de807cceb3dd1278> <urn:uuid:e9b0717d-557a-404b-92af-eb3054969eeb> .
Notes
Files
00_Poelen_DD2023.mp4
Files
(229.6 MB)
Name | Size | Download all |
---|---|---|
md5:6bf28a821512caef815c2b512ab7e4d1
|
91.9 MB | Preview Download |
md5:f3439f1f3ee60f67bd3fe4b7a885b989
|
4.7 MB | Preview Download |
md5:c555077be93ea0341c1649853d5b2822
|
3.9 MB | Download |
md5:29a5953e93bf19966f0366ec37697cc0
|
49.8 kB | Preview Download |
md5:bae7f441cdd2648d2356b2330e4b71e8
|
3.2 kB | Download |
md5:a057c5998d1a57fed5f798b57c8ba4ac
|
78 Bytes | Download |
md5:6bf28a821512caef815c2b512ab7e4d1
|
91.9 MB | Download |
md5:0dc20ccffb9368e65644ba5591bec45d
|
78 Bytes | Download |
md5:874953efd7a0e0f789ffc90f09cde1df
|
78 Bytes | Download |
md5:eb7efda1d477239a6de54017c55e6871
|
4.0 kB | Download |
md5:27b9d0e7a6772d6bc422a91fb286318f
|
20.1 MB | Download |
md5:333b52d9f919024754229dd90a712969
|
78 Bytes | Download |
md5:ca32899fbbc9ee0cc011c661dc3d4ac7
|
78 Bytes | Download |
md5:d40f77038e20f9c208240144ab048fa6
|
2.8 kB | Download |
md5:178043d342a60d3f6fd67fbd0cd261c5
|
3.9 MB | Download |
md5:40586a0e5b6cca17db710b64c07f8aed
|
2.8 kB | Download |
md5:47194d7f8a3894d8fbb4006aa0b445ef
|
78 Bytes | Download |
md5:f3439f1f3ee60f67bd3fe4b7a885b989
|
4.7 MB | Download |
md5:0c762a75a57e9582cc06714fc93b9793
|
78 Bytes | Download |
md5:82d3bba99652916e724c73857d252306
|
2.8 kB | Download |
md5:c555077be93ea0341c1649853d5b2822
|
3.9 MB | Download |
md5:c5b7974f2f8950e8c891f7cfaa33a6da
|
4.0 kB | Download |
md5:933ef30fab390fb3db9c564003b6f432
|
78 Bytes | Download |
md5:0f8d11aa0fb03455190c9e5bcc46d604
|
78 Bytes | Download |
md5:f2886b23264125854bed6468572e01d7
|
78 Bytes | Download |
md5:06d6ef35cf718766137ed6b0d2e3bc7c
|
2.8 kB | Download |
md5:747cdc95a316df98078ff1e11e90ab43
|
78 Bytes | Download |
md5:26a380d456ee3b2a5b6f0d0f5d674501
|
2.8 kB | Download |
md5:968aa5a4b1f4dc5b3b551cae0d8b98fe
|
78 Bytes | Download |
md5:8f2bbf3dc85496c6c30b4bd260f75155
|
4.7 MB | Download |
md5:83cef7e4e5937686e01480a9e525a46a
|
78 Bytes | Download |
md5:18c891b081de5469eecc85a2ca5816e1
|
2.8 kB | Download |
md5:7c3d68db3bb26bab8151c699dfdcad7f
|
5.6 kB | Download |
md5:b460f05523fb155279e31b2c12d6b936
|
3.4 kB | Download |
md5:64e2cd037dc496ce184df32d79218eae
|
4.0 kB | Download |
md5:29a5953e93bf19966f0366ec37697cc0
|
49.8 kB | Download |
md5:a96f09d798262518014fefc77495c2d6
|
2.8 kB | Download |
md5:80b4935b58d2b33cf5718059ea320f81
|
5.4 kB | Download |
md5:7e6dbcabe414296c2ef68283d982812e
|
78 Bytes | Download |
Additional details
Identifiers
Related works
- Cites
- Journal article: 10.1038/s41597-023-02230-y (DOI)
- Journal article: 10.1016/j.ecoinf.2020.101132 (DOI)
- Is derived from
- Dataset: hash://sha256/76d40abccfc71bc2cdaf4ea4a6003b9ac49123b27abe9f0d81e233299baf5e94 (URL)
- Presentation: https://docs.google.com/presentation/d/1EDp2PWdggttM0ZumuSED8MKEwQ_iKNgWZ4_wpFoqQaI (URL)
- Is source of
- Video/Audio: https://vimeo.com/832006741 (URL)
Funding
- U.S. National Science Foundation
- EAGER: Towards the Web of Biodiversity Knowledge: Understanding Data Connectedness to Improve Identifier Practices 1839201