Building a Natural and Cultural Heritage Repository for the Storage and Dissemination of Knowledge: The Algarium Veneticum and the Archivio di Studi Adriatici Case Study

ABSTRACT The Archivio di Studi Adriatici (ASA) is a repository of the Institute of Marine Sciences (ISMAR-CNR) of Venice. The ASA repository, completely open source and open access, hosts natural collections, heritage books, documents, and maps of the Institute of Marine Sciences. It was developed following the discovery of a historical algal collection at the Biblioteca Storica di Studi Adriatici of Venice. This collection, after having been catalogued, has been digitized with a digital planetary scanner. Digitized specimens and metadata, compiled using Dublin Core and Simple Darwin Core formats, are hosted on a website, based on Fedora Repository and Islandora framework.


Background
The following overview summarizes what is the role of herbaria as data sources providing both scientific and historical information.
A herbarium can be explained as a data bank with a vast quantity of raw data. Each specimen holds information about the vegetation of an area, a population, and the taxon to which it belongs (Rollins, 1965). Therefore, the collection represents a source of primary information about explorations and observations of the vegetation and records the results of much of the past inquiry into the nature and relationships of plants (Massey, 1974).
"Historical plant collections represent physical evidence of the occurrence of a species at a particular time and place, provide us with information on the botanical interest of past centuries and, in some cases, tell us something about the history of plant names and uses" (van Andel et al., 2012). Digital and traditional collections are used to document the decline of species, the range of rare species, and the spread of invasive plants (Schmidt, 2007;Shaffer, Fisher, & Davidson, 1998).
Researchers are interested in the establishment of databases to record biodiversity and specimen inventories, which will allow regional and global inventory and monitoring of plant species (O'Connell, Gilbert, & Hatfield, 2004;Smith et al., 2003). To meet this need, herbaria around the world have begun to digitize the image of their collections and create searchable databases to maintain them (Begnoche, 2002;Ong et al., 2002).

Project
In 2010, 28 folders containing a forgotten historical algal herbarium were found in the Biblioteca Storica di Studi Adriatici at the Institute of Marine Sciences (ISMAR-CNR) of Venice .
The collection, consisting mainly of specimens of the red alga (Rhodophyta) Gracilaria, was re-ordered and a provisional catalogue was created. Initially the herbarium sheets were manually photographed and numbered with a unique code; then, due to the consistency of the samples, the collection was transferred to the headquarters of the institute to proceed to the digitization.
For a suitable and proper management of the historical collection, a new herbarium named Algarium Veneticum (Index Herbariorum code: <ISMAR>) has been established at the Institute of Marine Sciences (Ceregato, Armeli Minicante, Sigovini, & Trincardi, 2015; with the following aims: (i) digitizing the historical and modern algal collections and publishing the metadata on the Archivio di Studi Adriatici (ASA) website (www.archiviostudiadriatici.it) and on the CIGNo (http://cigno.ve.ismar.cnr.it/) platform (ii) expanding the algarium with modern algal collections from Venice Lagoon and the Adriatic Sea

Cataloguing and digitization
The first step was to number all folders and sheets using a progressive alphanumeric code (e.g., ISMAR0148) to get a list of the specimens and to collect preliminary information about the same collection. Subsequently, to each sheet was created and attached a label containing the voucher and the collection ID, the scientific name of the species, the date and the collection site, and the authors of the collection. Lastly, each sheet was digitized with a digital planetary scanner Bookeye 3.
Original folders with handwritten notes by the authors were preserved, and the associated metadata (specimen information, collection data, taxonomy, specimen details, and publications) were recorded in a spreadsheet.
The collection was conceived by Michelangelo Minio (1872Minio ( -1960, an eminent botanist, and performed together with Nicolò Spada and Giacomo Zolezzi and includes 1,169 herbarium sheets (exsiccata). The main section is entitled "Distribuzione e polimorfismo di Gracilaria confervoides nella laguna di Venezia" (Distribution and polymorphism of Gracilaria confervoides in the Venice Lagoon) and contain 19 folders with 884 exsiccata labeled as Gracilaria confervoides (L.) Greville. The specimens are arranged individually or grouped together in the same sheet ( Figure 1). Samples were collected between 1941 and 1950 from 107 stations, belonging to four sampling zones distributed around the historical city of Venice and three zones corresponding to the islands of Chioggia, Lido, and Murano. The collection includes also a miscellaneous section with nine folders containing 285 exsiccata belonging to different taxa of red, green, and brown algae and collected in Venice Lagoon during the same period of time ( Figure 2).

Bibliographic research and historical background
By a literature search carried out at Biblioteca Storica di Studi Adriatici, Biblioteca Marciana, and the Natural History Museum of Venice, it was possible to retrace the historical and scientific background that led to the creation of Gracilaria collection. When Italy entered the Second World War (1940), the import of most of the goods from abroad decreased progressively. A number of primary commodities ran low, including the agar-agar, a natural compound commonly used for food manufacturing and in biological laboratories, which at the time was extracted exclusively from red algae (Gracilariaceae) from Japan and the Indo-Pacific area. The Commissariato Generale per la Pesca (General Fisheries Commission), the office in charge of the national fishery management policy, assigned to Gustavo Brunelli, director of the Laboratorio Centrale di Idrobiologia (Central Hydrobiological Laboratory) was responsible for finding how to extract agar-agar from Mediterranean algae. The red algae Gracilaria and Gelidium were rapidly identified as the most promising local sources (Labranca & Maldura, 1941). The next step was to study their biology and distribution along the Italian coastline. For this reason, every Italian institution working at that time on marine sciences was asked to investigate these topics. In 1941, Giacomo Zolezzi, a fishery scientist from the hydrobiological lab at the Osservatorio di Pesca Marittima di Venezia (Marine Fishery Observatory of Venice [OPM]), was entrusted to manage the project in the Venice Lagoon involving the Istituto di Studi Adriatici (Institute for Adriatic Studies, ISA), the OPM, and the Natural History Museum of Venice. Michelangelo Minio, former director of the museum, together with Nicolò Spada, secretary of the ISA and acting director of the OPM, carried on the samplings and studies focusing on Gracilaria confervoides (L.) Greville, the most abundant Gracilaria species in Venice Lagoon (Zolezzi, 1946(Zolezzi, , 1947. The expertise of Minio, a naturalist and botanist with pioneering research in phenological studies, combined with the information provided by the Vatova and Schiffner studies, led to the knowledge of reproduction (Minio, 1949), phenology and distribution in the Venice lagoon (Minio & Spada, 1950) and physiological features of Gracilaria confervoides (Polli, 1951;Minio & Spada, 1952).

Georeferencing
Each sampling station is identified by univocal ID in which the first part is the sampling zone (e.g., Z1_01). Each station was also characterized by one short sentence describing its position, which was revised and recorded as metadata. Sampling stations were georeferenced based on several sources of information. The authors of the collection represented the position of 102 over 107 sampling stations on seven maps, each one focusing on a sampling zone (Minio & Spada, 1950). Samples from five sampling stations were lost over time.
Maps and associated notes in the paper by Minio and Spada (1950) may be considered as primary sources, however they don't allow a complete and accurate georeferencing of stations. This is due to several distinct issues. At first, the spatial scale of the sampling stations is neither defined nor univocal: Few samples were collected once at a precise location, whereas others were collected over time from a given spatial context (e.g., a mudflat or a channel) and then pooled together. In the second place, the accuracy of the sampling stations position is related to the cartographic scale (about 1:20000) and to the markers' size (and in some case their intentional offset). Finally, some error may have been introduced in the map design, or some station may have been left out. In fact, three stations belonging to the Chioggia zone were not described in this paper nor included in the map. Two stations belonging to the Murano zone, actually located in the nearby Vignole Island, were also not located in the map. Therefore, the original handwritten annotations on the sheets were considered the main primary sources. The comparison with autographed documents allowed the attribution of most of them to Nicolò Spada. The original annotations localize the sampling stations by referring to places and landmarks. Since the sampling years, however, both the territory and toponymy have undergone some changes, forcing in some cases catalogers to conduct research on historical maps, pictures, and other documents. Some mentioned landmarks disappeared, however others still exist, such as the typical wooden beacons along the channels (known as bricole), which have retained essentially the same position and numbering system. After comparing the published maps with the original notes, the position of the stations Z1_05 and Z1_06 and the position of Z4_07 and Z4_08 had to be switched. Some stations are now located on dry land.
Georeferencing have been performed on WGS 84 datum. The positional accuracy was estimated, taking into account all the previously listed sources of uncertainty and recorded as metadata. Absolute accuracy ranges between 10 m and 200 m on the ground.

Metadata and ASA website
In order to create a digital herbarium or any digital collection, it must be taken into account how the material is originally organized and displayed (Schmidt, 2007).
For this reason it was chosen to integrate the Dublin Core with the Darwin Core to create a metadata set that would be important to biologists and librarians (Biodiversity Information Standards-TDWG, 2015). The Darwin Core is an extension for biodiversity data of the standards developed by the Dublin Core Metadata Initiative (DCMI, http://dublincore.org/). The Dublin Core Metadata Element Set itself consists of 15 data elements; all elements are optional, and all elements are repeatable (Caplan, 2003). The Dublin Core has proven useful in several library contexts. Basically, Darwin Core is a set of terms having clearly defined semantics that can be understood by people or interpreted by machines, making it possible to determine appropriate uses of the data encoded therein. The terms are organized into 13 classes, six of which cover broad aspects (event, location, geological context, occurrence, taxon, and identification) of the biodiversity domain. The remaining categories cover relationships to other resources, measurements, and generic information about records. Due to repository requirements we implemented a Simple Darwin Core schema, a "flat" version of a subset of Darwin Core terms. The Simple Darwin Core and the Dublin Core elements selected in this work are shown in Table 1.
It was chosen to make immediately visible some Dublin Core and Darwin Core associated with the images to provide the most comprehensive information about the subject without weighing up the user's consultation. However, on the left margin a linked list allow a user to download the XML containing all the image-related metadata (both Dublin and Darwin Core), the voucher image (in TIFF and JPG format), and access to external Web pages for taxonomic information (www.algaebase.org). The choice of directing to the Algaebase website is due to the high level of information which is available, including taxonomic status (as classification, status of name, synonyms), environment and distribution details, and an exhaustive reference; last but not least the database is constantly updated on taxonomic changes that often occur in the phycology studies.
At present, digitized specimens and metadata are already hosted on the Archivio di Studi Adriatici website (www.archiviostudiadriatici.it). The purpose of the website is to act as a portal, presenting the project with the indication of the contacts and the addressing to the repository. In particular, specific buttons were created to guide users to various collections, offering an overview and stimulating the consultation     The nature or genre of the resource. For Darwin Core, recommended best practice is to use the name of the class that defines the root of the record.
StillImage dwc basisOfRecord The specific nature of the data record.
PreservedSpecimen of the various archives. From the portal is present a list of initiatives in which the Archivio di Studi Adriatici is taking part (e.g., Global Registry of Scientific Collections, CollMap, and Index Herbariorum). The ASA website is based on Fedora repository for digital preservation and Islandora framework as presentation layer (Ceregato, Armeli Minicante, Minuzzo, Birello, & Perin, 2017). The Fedora Repository framework is widely used for management of digital objects, since it well supports large amounts of data, assignment of persistent identifiers, programmable ingesting, and semantic description of relationships among objects and has a model-based architecture. In addition, SOLR, another open-source software, provides a ready-to-use indexing and search platform with high-efficiency performances.
The front end consists of two main applications installed on an Apache HTTP Server: Drupal CMS and Islandora. Drupal hosts the Islandora open-sourcesoftware framework, which releases specialized solution packs that allow users to work with different data types (such as maps, images, books) and knowledge domains. Islandora includes a suite of tools to talk directly with the Fedora repository, allowing access to objects and their metadata and showing them to the user in a well-organized view. This combination of software has been used also for the V2P2 repository hosted by IRCrES-CNR of Turin (Abbà et al., 2015). The V2P2 repository idea was to preserve and make accessible research data about plant-microbe interactions that have been produced by the Institute for Sustainable Plant Protection (IPSP) of the National Research Council of Italy (CNR) in over 50 years of activity. The IPSP archives contain research data in several nondigital formats, which can be divided into two main categories: Texts (such as congress proceedings, registries, technical reports, notebooks, etc.) and images (e.g., photographic glass plates, film negatives, prints on high-definition photo paper, and black-and-white/color slides). At present, the repository (http://v2p2.to.cnr.it, http://v2p2.to.cnr.it/>) mainly hosts objects related to the study of a specific plant virus and is in progress. The architecture of the V2P2 repository is completely open source, hosted, and managed by IRCrES-CNR as the ASA repository and Digibess project. The Islandora architecture is extremely flexible and versatile, but it was not originally developed to deal with biological objects. Therefore, in collaboration with the Islandora development community and the Islandora founders, the IRCrES IT office and library are developing new configurations for biology research data to expand the utility of Islandora-that is, include Darwin Core metadata as for the ASA repository.
Together with the Algarium Veneticum section, two boxes link to the respective repository that hosts the historical library and the map collections of the Biblioteca Storica di Studi Adriatici (in progress).
Furthermore, from the ASA website it is possible to access the sampling stations spatial data provided by CIGNo (http://cigno.ve.ismar.cnr.it/), which allow users to produce and export map layers (Figure 3).
All the architecture developed was based on the open-source-software use. Besides the economic aspect, the choice to embrace open-source software is

Algarium Veneticum: Workflow and perspectives
The Algarium Veneticum workflow is illustrated in Figure 4: The algal samples that join to the herbarium are catalogued, identified, and digitized to be available, with their respective metadata, on the ASA website (www.archiviostudiadriatici.it). The spatial data are available on CiGNO (http://cigno.ve.ismar.cnr.it/) and, subsequently, on Atlante della Laguna (www.atlantedellalaguna.it) local platforms. At this stage, the GET-IT editor (www.get-it.it) will help check the place names from a controlled vocabulary, such as Geographic Names (http://www.geonames.org/), and to generate new ones if necessary.
The next step of this project is to revise the Algarium Veneticum samples, including the historical samples, by an integrated approach of both traditional taxonomic methods and DNA barcoding techniques, using protocols designed for the study of ancient DNA. According to Guiry & Guiry (2017), specimens reported as Fucus verrucosus, Fucus confervoides, Gracilaria verrucosa, and Gracilaria confervoidesin fact, all terete Gracilaria-require individual examination to determine whether they belong to Gracilaria or Gracilariopsis and then a decision as to which species is in question. When molecular analysis will be performed, data will be released on international databases such as BOLD systems (http://v4.boldsystems.org/) or GenBank (https://www.ncbi.nlm.nih.gov/). Furthermore, the information retrieved from the historical algarium may allow users to evaluate compositional and ecological changes that have occurred during nearly one century in the sampling areas. In addition, the Algarium Veneticum will be enriched with new algal collections from the Venice Lagoon and the Adriatic Sea, which include a new section of Chlorophyta (green algae) already established. This will be a valuable tool for phycological and ecological studies, allowing researchers to monitor floristic and vegetational changes that may be due to human impacts on the lagoon, in particular, the increasing introduction and spread of non indigenous species.
The Algarium Veneticum collections will be recorded and available through national and international initiatives, including the CollMap Database (http://www.anms.it/collmap/), for a census of the natural history collections of Italian scientific museums, and the LifeWatch-Italia infrastructure (http://www.servicecentrelifewatch.eu/web/lifewatch-italia/home), aimed at biodiversity studies.
Also, the work procedure set-up represents a guideline for further studies that may be undertaken within the Archivio di Studi Adriatici (ISMAR-CNR) and that regarding additional fields (zoological collections, geological samples, etc.).

Conclusions
It is estimated that the number of specimens in natural history collections worldwide is greater than two billion (Arino, 2010). It is hard to quantify data sets about species and their distributions produced by laboratories worldwide (Robertson et al., 2014). Data quality and fitness for use are primary concerns in the biodiversity community (Guralnick & Hill, 2009, Hill, Otegui, Arino, & Guralnick, 2010, where information comes from heterogeneous sources spanning the globe over hundreds of years (Wieczorek et al., 2012).
A number of authors have pointed to a decline in natural history research (Suarez & Tsutsui, 2004;Tewksbury et al., 2014). However, the knowledge of organismstheir life cycle, their range distribution, and so forth-remains vital to science and society, thanks to the vast information potential hold both by single specimens and whole collections in global climate change research and eco-informatics. (Lavoie, 2013;Ward, 2012). As reported by Tewksbury et al. (2014), herbaria have seen a steady consolidation of collections. The consolidation of collections can often increase the ease of access by taxonomists, curators, librarians, and ecologists and are also improving at the same time thanks to computer technicians' skills.
Among the technological tools, the digital platforms are focused on expanding participation in the collection, curation, and exchange of natural history information. These platforms (e.g., Encyclopedia of Life, GBIF, Map of Life, iNaturalist) represent a fundamental shift away from private records and individual papers and toward a more collaborative approach to observing and understanding our world (Tewksbury et al., 2014).
According to Schmidt (2007), librarians who take on digitization of herbarium specimens must know how to handle the specimens and how botanists catalogue and organize them. Education or experience in biology facilitates discussions with scientists or botanists through the use of a common language. In order to create a digital herbarium, or for that matter, any digital collection, librarians should note how the material is originally organized and displayed. The metadata for the digital collection will be a combination of the Darwin Core and Dublin Core metadata, including information for botanists and the general public. Furthermore, the digital collection would increase access to a previously hidden collection and increase visibility through the Worldwide Web (Schmidt, 2007).
As reported in "The Importance of Herbaria" by Vicki A. Funk (2003), "Herbaria, dried pressed plant specimens and their associated data, ancillary collections (e.g., photographs) and library materials, are remarkable and irreplaceable sources of information about plants and the world they inhabit. " They provide the comparative material that is essential for studies in taxonomy, systematics, ecology, anatomy, morphology, conservation biology, biodiversity, ethnobotany, and paleobiology. " Finally, they have educational as well as historical value. Herbaria and library materials are a veritable gold mine of information even after a long time, so they must be preserved for future generations.