Knowledge sharing and discovery across heterogeneous research infrastructures

Research infrastructures play an increasingly essential role in scientific research. They provide rich data sources for scientists, such as services and software packages, via catalog and virtual research environments. However, such research infrastructures are typically domain-specific and often not connected. Accordingly, researchers and practitioners face fundamental challenges introduced by fragmented knowledge from heterogeneous, autonomous sources with complicated and uncertain relations in particular research domains. Additionally, the exponential growth rate of knowledge in a specific domain surpasses human experts’ ability to formalize and capture tacit and explicit knowledge efficiently. Thus, a knowledge management system is required to discover knowledge effectively, automate the knowledge acquisition based on artificial intelligence approaches, integrate the captured knowledge, and deliver consistent knowledge to agents, research communities, and end-users. In this study, we present the development process of a knowledge management system for ENVironmental Research Infrastructures, which are crucial pillars for environmental scientists in their quest for understanding and interpreting the complex Earth System. Furthermore, we report the challenges we have faced and discuss the lessons learned during the development process.


Introduction
Contemporary societies are faced with a new challenge for the 'globe' -the changing of the world's climate 1 . Climate change is unpredictable in its form and scope and is long-term rather than immediate in its impacts and remedies. Any practical solutions lie beyond any act of national will, requiring the international collaboration of unprecedented dimension and complexity. While an effective solution to address the challenge would play out over several decades, it is required to be shaped and put in place over the next few years 2 .
Climate change has been identified as a major environmental problem for humanity by the United Nations and the European Union. Research is expected on potential scenarios on climate change that will drastically affect natural ecosystems, plants, habitat, and animals, contributing to speedup in biodiversity loss in some areas. The impacts would have knock-on effects for many communities and sectors that rely on natural resources, including agriculture, fisheries, fuels, tourism, and water. Additionally, the ocean plays a central role in regulating the Earth's climate 3 .
Assessments of climate change and their association with the driving forces must be based on trustworthy and well-documented observations. This is a difficult task due to the many interactions that exist between the atmosphere, soil, and hydrosphere. The resulting impacts on ecosystems all need particular and focused, high-quality long-term observations. This forces us to have better observations and data on these essential preconditions to inform decision-makers better to take the measures necessary to maintain a thriving society 4 .
Research Infrastructures (RIs) are vital for providing the required information to support science and fact-based policy development. Research infrastructures, including advanced computing and storage infrastructure, in environmental science, are essential requirements for scientists in this domain to understand and analyze the sophisticated earth system 5 . Interdisciplinary research communities and research infrastructures collaborate with the neighboring disciplines, namely atmosphere, biosphere, hydrosphere, and geosphere. Internal cooperation across different realms resulted in the formation of distinct research traditions, skills, and cultures. The interconnected essence of the earth system, on the other hand, requires the scientific community to transcend well-established divisions between disciplines and domains and work toward a common understanding of the world as a whole 6 .
The data from the ICOS 1 Research Infrastructure, for example, aids climate science by informing scientists and the general public on natural and human-caused greenhouse gas emissions and uptake from the ocean, land ecosystems, and atmosphere. It gives access to high-quality data processed by the Thematic Centers as raw, near real-time, and final quality-controlled data and supplemented with elaborated (model) data and analyses, almost always licensed under a CC4BY 2 license. The IAGOS 3 research infrastructure provides atmospheric composition information, including greenhouse gas observations from commercial aircraft. IAGOS data are used by researchers worldwide for process studies, trend analysis, validation of climate and air quality models, and spaceborne data retrievals validation. Aerosols and their precursors are monitored by the ACTRIS 4 research infrastructure. Aerosols have a significant impact on the Earth's radiation balance, and consequently, the climate. Their levels are inextricably linked to human activity and emissions. Such RIs are part of a more significant worldwide effort to advance science-based, high-quality observations that will help people in making better decisions. As a result, the data and procedures are based on international, typically community-based standards.
Typically, RIs are domain-specific and are not connected, so that interoperability can be a critical issue for scientists involved in interdisciplinary research projects. Moreover, researchers/ developers are not knowledgeable in all domains, so a knowledge management system is required to capture cross-domain environmental knowledge automatically and enable researchers to access data, software tools, and services from different sources and integrate them into cohesive experimental investigations with well-defined, replicable workflows for processing data and tracking results' provenance. Accordingly, a knowledge management system is required for research communities that (1) discover cross-domain knowledge and capture them automatically, (2) answer any domain question without any limitation to its current search space, (3) deal with noisy sets of retrieved documents, likely consisting of many irrelevant documents and semantically and syntactically ill-formed documents, (4) have an advanced search engine to interpret and reformulate queries by information retrieval algorithms, (5) return a set of recommended solutions (answers) based on the retrieved documents, and (6) visualize its outcomes to facilitate the data analysis for research communities. This paper introduces a novel Knowledge management system, called ENVRI-KMS, to meet the ENVRI research community's requirements and make the research assets Findable, Accessible, Interoperable, and Reusable (FAIR 10 ) for the community.
The rest of this study is structured as follows: Section 2 introduces knowledge discovery and sharing challenges, formulates the design research questions, and elaborates on the research methods that have been employed to capture knowledge regarding the ENVRI-KMS.
Section 3 outlines the development process of the ENVRI-KMS. Section 3.1 explains the online survey that we conducted to collect requirements of the ENVRI-KMS. Section 3.2 shows the use case scenarios that we identified based on the survey. Section 3.3 introduces the design decisions that we made to design the ENVRI-KMS architecture. Section 4 elaborates on the selected technologies that we employed to develop the ENVRI-KMS and demonstrates part of the current implementation. Section 5 analyzes the requirements and maps them to the survey questions and design research questions based on the participants' responses. Section 6 highlights the challenges and lessons learned during the development process of the ENVRI-KMS. Section 7 positions the proposed approach in this study among the other knowledge management approaches in the literature. Finally, Section 8 summarizes the proposed approach, defends its novelty, and offers directions for future studies.

Challenges regarding knowledge sharing and discovery
In this paper, we present a novel knowledge management system, called ENVRI-KMS, to meet the ENVRI research community requirements and make the research assets Findable, Accessible, Interoperable, and Reusable (FAIR 10 ) for the community. The ENVRI-KMS is a Knowledge-as-a-Service (KaaS) for ENVRI-FAIR research communities to document the development and operation processes of RIs and support them with their engineering and design decisions. In general, the ENVRI-KMS should (1) ingest technical results from ENVRIplus, FAIR assessment 5 , the key sub-domains, and other tasks using a formal language for knowledge representation and proven semantic technologies; (2) provide services and tools to enable  RI developers and data managers to browse, search, retrieve and  compare RI technical statuses and technical solutions to development problems via available content; (3) provide content  management tools for specialists in the ENVRI community  to ingest new knowledge and control the quality of content; (4) also provide interfaces to other existing semantic resources, e.g., the service catalog of a future ENVRI-HUB 6 , to enhance knowledge discovery and cross-RI search, between knowledge services and the online presence of ENVRI resources.
A significant number of advanced research infrastructures, such as ICOS 7 and IAGOS 8 , are available to facilitate the access of researchers to research assets (e.g., data products, best practices, data service design decisions, software tools, and services). Such research assets are scattered among a wide range of heterogeneous knowledge resources 5  This study employs a mixed research method based on design science research, surveys, and documentation analysis to capture knowledge regarding knowledge management systems and answer the design research questions. The research approach for creating the proposed knowledge management system, called ENVRI-KMS, is Design Science, which addresses research by building and evaluating artifacts to meet identified business needs 11 in an iterative process 12 . Furthermore, we designed a survey form and asked several of our colleagues to critique it. We conducted an online survey in the context of 26 research infrastructures to collect their functional requirements and quality concerns. In total, 35 domain experts participated in the research to assist us with the ENVRI-KMS development life cycle and the requirement analysis phase. Moreover, to develop the ENVRI-KMS, we reviewed webpages, whitepapers, scientific articles, fact sheets, technical reports, product wikis, product forums, product videos, and webinars to collect data. A structured coding procedure is employed to extract knowledge from the selected sources of knowledge.
Knowledge management systems employ problem-solving techniques, and knowledge discovery approaches to answer particular questions 13,14 . Knowledge discovery is the process of extracting useful and hidden information 15 . A variety of Knowledge management systems have been introduced in literature [16][17][18] .
Most of the existing knowledge management systems in the literature bound to a limited search space and optimized to address questions in a particular context. Each question-answer-context tuple is well-formed, standardized, and generated rising from the context in which the question and answer were extracted.

ENVRI knowledge management system
The ENVRI-KMS 19 is a cluster-level knowledge base that allows different ENVRI users, such as RI developers and data managers, to effectively share their technical practices, identify common data and service requirements and design patterns, and facilitate the search and analysis of existing RI solutions for environmental RI interoperability challenges.

Requirement analysis
We organized a webinar to facilitate an online survey within the context of 26 research infrastructures actively engaged in the ENVRI-FAIR project 20 . The primary objective was to gather comprehensive functional requirements and quality concerns 21 from these infrastructures. In pursuit of this goal, a total of 35 domain experts were carefully selected to participate in the research, specifically to contribute to the ENVRI-KMS development life cycle and requirement analysis phase. The selection process took into account their extensive expertise and considerable years of experience in their respective domains. On average, the participants possessed over ten years of domain-specific experience, rendering them highly attuned to the potential challenges researchers within their communities and fields might encounter while performing their daily tasks.
To initiate the webinar, we introduced the potential functionalities of the ENVRI-KMS, drawing upon a literature review and internal meetings conducted with a select group of domain experts. We subsequently employed an online survey tool called Mentimeter 22 , leveraging its capabilities to distribute a virtual questionnaire encompassing the following inquiries: (Q1) What specific information do you typically seek from the ENVRI community?
(Q2) What typical queries would you pose to the ENVRI-KMS?
(Q3) How do you currently navigate the process of accessing information from the ENVRI community?
(Q4) From your perspective, which aspects of knowledge management system functionality prove most beneficial to you?
(Q5) What functionalities do you anticipate in the next version of the ENVRI-KMS?
Following the completion of the online survey, we meticulously collected and meticulously analyzed all responses. To prioritize the requirements, we conducted a thorough examination of the frequencies at which similar statements or themes emerged, allowing us to identify patterns and establish a hierarchy of importance.
It is noteworthy that the participants in our study were specifically selected from the Atmosphere, Ecosystem, Marine, and Solid Earth domains. We requested the research infrastructures involved in the ENVRI-FAIR project to nominate individuals who possessed expertise in their respective domains, ensuring a comprehensive understanding of their concerns and requirements. This deliberate selection process aimed to capture a representative sample of participants deeply knowledgeable about their domains, fostering the integrity and validity of our study.
Next, we have collected all responses and prioritized them based on analyzing the frequencies of similar statements 7 . Compatible with semantic web technologies. As the most common type for knowledge storage, representation, reasoning, the support of Resource Description Framework (RDF) is the core requirement in the design and development of the ENVRI-KMS. This requirement can include the following specific options: RDF import/export, RDF storage, owl import, SPARQL, and GeoSPARQL support. It is acknowledged that while providing many advantages, especially in the context of integrating and operating on heterogeneous knowledge sources and of linking to existing external resources, RDF, but also the overall concept of operating on a non-monolithic set of data collections, comes with specific limitations as well, such as lack of support for referential integrity. Nevertheless, it is assumed that the ENVRI-KMS content's nature is rather non-volatile, shifting this aspect into the background.
Semantic search and query functionality. An interface for searching and discovering ENVRI-KMS content should be provided; this could be the conventional keyword-based search or faceted search. A semantic search function is further expected to permit search based on 'similar' or 'related' terms across multiple ontologies/controlled vocabularies rather than strict adherence to a single controlled vocabulary or keyword set 26 . 7 We have published the responses of the domain experts who participated in the survey besides the data analysis phases on Mendeley Data 21 .
Open and flexible knowledge ingestion. Due to the variance of source types in the ENVRI community, various methods should be supported for knowledge acquisition, like form-based manual RDF ingestion, Questionnaire-based RDF triple generation, existing RDF integration, structured and unstructured information transformation, etc. Specific measures should be considered to facilitate non-technical users straightforwardly adding knowledge.
Provenance and version control of the knowledge. Considering the typical case where multiple users contribute to the ENVRI-KMS, provenance is of fundamental importance for monitoring and tracking issues, for example, enabling the third party to reproduce the scientific workflow for an authority to audit the whole process. This primarily refers to tracking individual additions, deletions, and updates and their administration, i.e., approval, rejection, and reversion. User-friendly and customizable user interface. A clear and straightforward user interface is needed to fulfill their objectives, like query, semantic search. Different user interfaces should be offered to meet the requirements of the general public and professional users.
Scaling and increasing performance. A choice between centralized or distributed storage should be considered to tackle the growing size of the ENVRI-KMS. Also should be considered includes the dynamic resource scheduling facing concurrent search/query requests. Other features like collaborative editing are required to enable comments on contributions by other users.
API interface. An application programming interface (API) abstraction layer can help make knowledge accessible through applications to facilitate knowledge via APIs.
Among such technical requirements, the ENVRI-KMS should play a key role in the ENVRI communities to develop FAIR data services and share their best practices.

Use case scenarios
Based on the survey we conducted (see section 3.1), we identified the following four types of users (see Figure 1) of the ENVRI-KMS: (1) End users may use the ENVRI-KMS to find answers to their general questions about available sources of data, services, and tools, and to use the discovered information to perform further research activities using the other tools like Virtual Research Environments or services like the RI catalogs of data or services.
(2) RI managers or operators may use the ENVRI-KMS to check the status of the FAIRness of specific repositories or update the state of their RIs. The update process often needs the output of other third-party tools, e.g, FAIRness assessment tools.
(3) RI developers may use the ENVRI-KMS to check the existing technologies, e.g., those development results in the ENVRI portfolio or the demonstrators prepared for some known FAIRness gaps. They can also publish or update the technical descriptions using ENVRI-KMS components, such as an online description form.
(4) Knowledge curator and knowledge base operators may use the ENVRI-KMS to ingest content from new sources and respond to the possible errors that occurred during the ingestion or the operation.

Conceptual architecture
Based on the use case scenarios (see Figure 1), we design the key components of the ENVRI-KMS from the conceptual point of view. Note, the architecture is designed based on the Open Distributed Processing (ODP) framework 27-30 . Figure 2 shows the key components via three layers: The interface layer atop contains components dealing with user-related activities. The ENVRI-KMS will be an open system for community users; the user management component is not for acquiring and processing users' personal information but more for providing customized user support based on their interaction or contexts. A user can log in to the system using an open identity provider. The User Interface (UI) components are the application parts that allow users to interact with it. It can be formatted and rendered into various presentations to address different users' requirements. Additionally, it validates and collects required data from users.
The service layer abstracts the functionality that the ENVRI-KMS offers; it can be roughly split into three sub-layers, namely: (1) The Application sub-layer provides customized application logic (e.g., FAIRness Gap Analysis, Engineering support, or discovery knowledge from ENVRI community) based  on the data passed from the underlying discovery sub-layer those results up to the User Interface Component.
(2) The Discovery sub-layer provides the functionality for searching the ENVRI-KMS, ranking the results, and recommending relevant content.
(3) The Content sub-layer provides functionality for managing the content in the ENVRI-KMS, typically in a pipeline covering: ingesting information, the transformation from information to knowledge, quality control of the knowledge generation, CRUD (Create, Read, Update, Delete) of the ENVRI-KMS content, and the provenance of these activities.
The storage layer at the bottom is responsible for data storing and access. The data storage options needed in this project include RDF Triple Store and Inverted Index. Currently, information collected in the ENVRI-KMS consists of two main parts, as illustrated in Figure 3. The structured data in the ENVRI-KMS is based on RDF and mainly includes: (1) OIL-e (ontology of the ENVRI Reference Model) based ENVRI RI description, (2) description of the service portfolio from the previous project, and the possibly new ones in ENVRI-FAIR, (3) FAIRness principles and the results of assessing the ENVRI research infrastructures, and (4) demonstrators for tackling the known gaps, e.g., those being identified during the FAIRness assessment.
The versions of the structure data currently can be managed via version control systems. Currently, GitHub is used. The dynamic data in the ENVRI-KMS will be ingested from different online sources of the ENVRI communities. Figure 4 depicts the necessary information flow of the knowledge ingestion.
(1) A significant amount of relevant information is represented in human-readable form, residing in Wikis, other content management systems, or even static web-pages, in the "offline" text found in various documents such as books, project deliverables, or scientific publications. In the ENVRI-FAIR context, the research infrastructure websites are an excellent resource of related information, including news/events, background knowledge, etc. Similar to ENVRI, ENVRI-FAIR, the community websites also contain lots of related information, like news/events, community introduction, community landscape, project information, progress, etc. These information sources have different formats, such as a webpage, word document, and pdf file.
(2) Another approach to populate the ENVRI-KMS would be to process such free-text information to extract structured, machine-readable information. Named entity recognition would represent the first step in this regard, while the application of more complicated Natural Language Processing operations could be a valuable field of research in its own right.
(3) Information from the available catalogs of data and services. It should be clear that the indexes generated from those sources will not aim to replicate the entire catalogs but provide a quick searching capability for community users. For some RI, such information will be already managed in RDF format and accessible from triplestores.

Prototype
The ENVRI-KMS development follows an interactive approach, in which the requirements based on the experts' responses (see Section 3.1) have been analyzed, and technical choices have been selected according to the state-of-the-art review published in 31. We use Ontowiki to manage the RDF triples and Open Semantic Search to develop the ENVRI-KMS's search engine in the current prototype. Several tools were developed for ingesting specific knowledge, e.g., a technology description form for describing the service portfolio, interactive graph visualizer for the search results, and dynamic online data ingestion pipeline. These tools will be described in the following sections.

Knowledge storage
The comparison of existing RDF content management platforms is summarized in 31. Note, we have selected OntoWiki for managing RDF content. The main reasons for this decision were as follows: (1) Direct operation on RDF triples: Ontowiki can directly operate on a triplestore as the underlying storage layer and provides an API to populate it with RDF.
(2) Integrated User management and statement-level provenance: Ontowiki supports user management with varying permissions and offers a detailed create/update/delete history on the RDF statement level.
(3) Named-graph-based separation of RDF content and administrative data: RDF data ingested via Ontowiki is directly written as-is into the underlying triplestore, while all the administrative statements such as provenance etc., are stored separately.
(4) Plugin-based extensions: Ontowiki offers a framework for developing plugin extensions.
The choice of Ontowiki directly affected the selection of the underlying Triplestore since Ontowiki provides a pre-configured  connector to the Openlink Virtuoso data management system, which members of the ENVRI-KMS team already had experience with from previous projects. The open-source edition of Openlink Virtuoso 32 (Version 7.2.5.1) was therefore deployed for that purpose and configured for Ontowiki (and vice-versa).

Tools for ingesting knowledge
The population of knowledge bases can take different routes.
On the one hand, existing collections of information can sometimes be transformed so that they can be "bulk" imported into the ENVRI-KMS, which includes rearrangements and mappings of existing collections of structured information but potentially also the extraction of structured content from unstructured sources such as free text, which is by no means an easy task considering the complexity in the natural language processing/understanding. On the other hand, it is usually possible to manually add ENVRI-KMS's contents, "fact by fact". However, manual input can be slow, tedious, and error-prone if not supported by dedicated tools. In the context of the ENVRI-KMS, it should be possible to provide content in both ways.
As far as manual data entry is concerned, the system supports the creation of valid RDF data via custom HTML Web forms. They are dynamically created using the RDForms 33 Javascript library based on formal JSON descriptions of the underlying data model. This also includes the specification of constrained SPARQL queries for the dynamic retrieval of menu options to maintain consistent RDF relationships between the described entity and related terminology and other entities already stored in the ENVRI-KMS.

FAIRness status sharing and gap analysis
To improve the findability, accessibility, interoperability, and reusability of digital research objects for both researchers and machines, the ENVRI-KMS offers a FAIR assessment dashboard 8 . It supports RIs by discovering gaps in FAIR principle implementation at the granularity of their repositories and the discovery of possible technology solutions to address such gaps. For instance, the FAIRness assessment of a particular RI can be modified to indicate whether the repository contains machine-readable provenance information. By selecting an RI, the user interface gives an overview regarding its FAIRness status and gap analysis.

Ontowiki as a knowledge management platform
OntoWiki is a free and open-source semantic wiki web application that serves as an ontology editor and a knowledge acquisition system. Additionally, Ontowiki is a suitable RDF data management platform. A test instance is configured 34 and slightly customized to use the ENVRI logo and display the ENVRI RSS news feed on the front page. It currently serves as a data gateway for the facts added via forms based on the FAIR assessment dashboard. Ontowiki was found to perform well as RDF "middleware" used to ingest data from the RDF forms.

Search Engine
In this section, we present a running example of the ENVRI-KMS Search Engine. To facilitate the general users to explore the ENVRI-KMS easily, we build the ENVRI-KMS Search Engine based on the Open Semantic Search's fundamental concepts and components 35 . Figure 5 illustrates the search interface 9 .
A searcher can go to the landing page of the ENVRI-KMS (See Figure 5 (a)) directly and enter her search query in the search box and see the results immediately (See Figure 5 (b)).
The results and their relevance to RIs and be visualized based on the graph visualization of the ENVRI-KMS (See 5 (c)). Note, the searcher can limit the search space of the ENVRI-KMS to a particular category, such as Webpages and RIs, as well. For instance, Figure 5 (d) shows the dataset search of the ENVRI-KMS.
The ENVRI-KMS can automatically capture, extract, and index knowledge regarding research assets based on the URL of the RIs (See Figure 5 (e)). Additionally, knowledge curators can ingest research assets manually to the knowledge base of the ENVRI-KMS (See Figure 5 (f)). Note, the ENVRI-KMS checks the indexed documents periodically to keep its knowledge base always up-to-date.

Operational workflow
This section elaborates on the operational workflow of the ENVRI-KMS 10 and presents its constituent components (See Figure 6).  list of URLs to visit, called the seeds. As the crawler visits such URLs, it identifies all the hyperlinks on webpages by the aim of a sitemap extractor, and adds them to the list of its URLs to visit, called the crawl frontier. For instance, in the knowledge extraction process, the NER and RE approaches identify the entities represented in documents and their relations as fundamental knowledge extraction processes. The extracted knowledge is used to build the knowledge graph in the knowledge base of the ENVRI-KMS. Data Storage technologies, including Apache Solr and MySQL, are used to store the acquired knowledge systematically. The Knowledge Base of the ENVRI-KMS integrates user profiles, user search histories, decision models (e.g., meta-models), and infers solutions (results) based on searchers' queries. the User Interface receives user queries, such as keywords and user stories, and demonstrated the results (e.g., publications, graph visualizations, and recommendations) to the Searchers the process of extracting useful and hidden information 11 (See section 3.2).

Analysis
In this subsection, we reflect on each of the proposed design research questions based on our observations during the development process, the online survey, and documentation analysis.

Design decisions
We revisit the requirements and analyze the gap for the tools or platforms we investigated in terms of the requirements identified in Section 3.1.
Compatible with Semantic Web technologies. The two storage solutions (Apache Jena and Virtuoso) are triplestores dedicated to storing RDF data, thus fully meeting semantic web technology compatibility requirements. Regarding the knowledge management solutions, as the comparison in 31 indicates, both Semantic Mediawiki and Ontowiki are RDF compatible.
Semantic search and query functionality. Though the several Knowledge management systems investigated (like Ontowiki, Semantic Mediawiki) allow users to explore, search and edit the ENVRI-KMS's content via GUI tools, they still lack easy user experience in terms of the technology required. The original purpose of both Semantic Mediawiki and Ontowiki is a semantic annotation of wiki pages and as a knowledge base editor, respectively.
Open and flexible knowledge ingestion. As shown in 31, knowledge management systems, such as Semantic Mediawiki and Ontowiki, support RDF import, facilitating the ingestion of knowledge. However, to prepare RDF triples or transform the information needed into knowledge, some customized tools needed to be designed and implemented considering the diversity of our project's information sources.
Provenance and version control of the knowledge. As far as the considered knowledge management platforms are concerned, Ontowiki meets the requirements by providing detailed user management and statement-level provenance for RDF data, allowing tracking and potentially editing individual user contributions to the ENVRI-KMS.
User-friendly and customizable user interface. As already analyzed, although the Knowledge management systems provide a GUI for search and query, their targeted users are knowledge base administrators considering the technology barriers. For general users without much technical knowledge of the SPARQL or triplestores, a straightforward user interface for searching and exploration is expected to increase the user experience.
Scaling and increasing performance. Apache Jena Fuseki does not currently support horizontal scale-up, but there are workaround solutions like coordinating the updates from a staging server and publishing (read-only) to external clients. Based on the comparison, it is clear that no one single solution satisfies all the requirements. The optimal solution should be combining existing options, and other software such as Blazegraph could be a candidate.

Design research questions
To answer the first two design research questions (RQ1 and RQ2), we have conducted an extensive literature review besides a set of expert interviews with domain experts at the RIs to build the search space (including webpages, datasets, etc.) of the ENVRI-KMS and capture knowledge systematically. The current search space 11 of the ENVRI-KMS includes all research infrastructures which are mentioned on the ENVRI community knowledge base 12 . It is essential to highlight that the search space is not limited to the initial sets and grows automatically. Accordingly, the third design research question (RQ3) can be addressed based on the natural language processing approach and Open Semantic Search that we have employed in the implementation of the ENVRI-KMS 31 . To answer the fourth design research question (RQ4), we have evaluated a set of technologies that can be employed to store and retrieve data. The last design research question (RQ5) is one of the key challenges in this research. We plan to build a community around the ENVRI-KMS and ask the stakeholders, including domain experts, practitioners, and researchers, actively assess the search results and recommendations.
The FAIRness of the ENVRI-KMS should be elaborated in order to answer the study's main design research question. As a result, research assets become Findable when adequate metadata characterizes them and a searchable resource efficiently indexes them, allowing them to become recognized and available to potential users. A unique and persistent identifier should also be established so that the data may be referred and mentioned in research communications without ambiguity. The identifier facilitates data discovery and reuse by allowing persistent mapping between data, metadata, and other associated resources. The code or models required to utilise the data, research literature that provides additional insights into the data's development and interpretation, and other related information are examples of related resources. The ENVRI-KMS indexes research assets and assigns them a unique identifier, allowing them to be shared among RIs.
Accessibility means that a human or a machine is given the exact conditions under which research assets can be accessed via metadata. Researchers in research communities can use the ENVRI-KMS to access research assets in accordance with RI policies and regulations.
The ENVRI-KMS search entities are characterized using normative and community-accepted specifications, vocabularies, and standards that define the precise meaning of concepts and qualities represented by the data. Interoperability is a crucial aspect of research assets' value and usefulness. It is not only semantic interoperability that is important, but also technological and legal interoperability. Technical interoperability refers to the research assets being encoded using a standard that can be read by all systems involved.
The FAIR principles highlight the necessity for extensive metadata and documentation that match relevant community 11 https://search.envri.eu/ 12 https://envri.eu/research-infrastructures/ standards and give information about provenance in order for research materials to be reusable. The ability of humans and machines to evaluate and select research assets based on provenance information criteria is critical to their reuse. Reusability also necessitates the publication of research assets with a "clear and accessible usage license," which means that the terms under which the assets can be utilized should be transparent to both humans and machines.  RQ5). Additionally, the table shows that more than half of the identified requirements (62% ) are at least partially addressed so that the main components of the ENVRI-KMS are functional.

Discussion
This section summarizes our observations and highlights several lessons learned during the development process of the ENVRI-KMS.
Software engineers have a broad knowledge of software development technologies, and they apply software engineering principles to develop software products. By employing such engineering principles in the software development lifecycle, from requirement analysis to software implementation and then deployment, they can build customized software products for individual stakeholders. The demand for highly skilled and qualified software engineers seems to have no end. This demand is growing in a changing economic landscape and fueled by the necessity of software development technologies. On the one hand, billions of dollars are spent annually on software products 37 that are produced and maintained by software engineers. On the other hand, business processes are introduced and managed by stakeholders and top-level managers who principally understand businesses 38 .
Software architecture deals with the base structure, subsystems, and interactions among these subsystems, so it is critical to the success or failure of any software system 39 . Software architecting can be thought of as a decision-making process in which software architects consider a collection of possible solutions for solving a system design problem and choose the one that is evaluated as the optimal 40 . Software architecture decisions are design decisions that meet both functional and quality requirements in a system. Design decisions are concerned with the system's application domain, architectural patterns employed in the system, Commercial off-the-shelf components, other infrastructure selections, and other aspects needed to satisfy all requirements 41 . According to Avgeriou et al. 42 , failing to make architectural design decisions during software development has well-known implications, such as costly system evolution, weak stakeholder communication, limited reusability of architectural assets, and poor traceability between specifications and implementation.
In order to make the design decisions to design the architecture of the ENVRI-KMS, we analyzed several alternative tools that could be used to build the fundamental components of the knowledge base. Selecting the right database system(s) (DBMS) was one of those design decisions. The DBMS selection problem is a subclass of the Commercial off-the-shelf (COTS) selection problem, and both problems are a subclass of Multi-Criteria Decision-Making (MCDM) problems 43 . Accordingly, we used a decision support system that has been introduced by Farshidi et al. 44 to evaluate potential alternative solutions that we can employ to store and retrieve data. After performing an extensive evaluation, we decided to use Apache Solr to indexing the search entities and MySQL to manage user profiles and user search histories.
Judging the suitability of a set of technologies, such as programming languages, for developing a knowledge base system is a non-trivial task. For instance, a purely functional language like Haskell is the best fit for writing parallel programs that can, in principle, efficiently exploit huge parallel machines working on large data sets 45 . However, while developing a dynamic website, a software engineer might consider ASP.net as the best alternative, and others might prefer using PHP or a similar scripting language. It is interesting to highlight that successful projects have been built with both: StackOverflow is built in ASP.net, whereas Wikipedia is built in PHP. Furthermore, a software engineer might prefer particular criteria, such as scalability in enterprise applications, whereas other criteria, such as technology maturity level, might have lower priorities.
We realized that we needed to select the right programming language ecosystems for developing the ENVRI-KMS.
We used the decision model in the knowledge base of the decision support system 46 to evaluate potential programming languages that we can use to develop the ENVRI-KMS. Note, as mentioned earlier, we use an open-source tool, called open semantic search, in which its backend was implemented in PHP and Python, as the initial phase of the development process of the ENVRI-KMS. So that the first two solutions for us were these programming languages. However, the decision support system suggested C# , Java, and Ruby as three more alternative solutions. Finally, we decided to continue using Python, as we had more experience with it and found so many open-source projects on Github, which were implemented in Python, that could boost the development process.
Some issues were discovered regarding the cross-referencing of statements between knowledge bases (named graphs). A workaround published in a newsgroup provided a potential fix for static data but would have to be extended for a continuously growing data collection. A possible solution would be to store information that is expected to change/grow, e.g., the entity descriptions and the user terminology collected from the RDF forms, in a typical named graph and to configure Ontowiki filters for its efficient navigation while storing more static content, such as external ontologies, in separate graphs. While Ontowiki supports flexible navigation and data editing at the RDF statement level, the interface is arguably not

Practical Implications
Software engineers with broad knowledge and software engineering principles are essential for successful software development projects.
Software architecture, encompassing base structure and subsystems, significantly impacts the success or failure of a software system.
Design decisions in software architecture should meet both functional and quality requirements, considering the application domain and other aspects.
Selecting the appropriate database system(s) (DBMS) is a crucial design decision that affects data storage and retrieval in the ENVRI-KMS.
Evaluating and selecting the right programming language ecosystem is crucial for developing a knowledge base system like the ENVRI-KMS.
Cross-referencing statements between knowledge bases (named graphs) requires careful consideration, with separate graphs for static and dynamic data.
Ontowiki can serve as an RDF-based middleware, supporting highlevel user applications and services in the ENVRI-KMS.
Versioning and backup strategies are crucial for managing RDF data in the ENVRI-KMS, including exporting RDF dumps and using external means.
Continuous development and growth of the ENVRI-KMS depend on the efforts and contributions of the ENVRI subdomains and research infrastructures.
Interaction and collaboration with other subdomain developers and semantic search workgroups provide valuable input for the ENVRI-KMS.
Future development efforts will focus on continuous content ingestion and curation, improvement based on community feedback, DevOps practices, and community involvement in content maintenance.
appropriate for the vast majority of RI managers or developers. We conducted some experiments with the atmospheric domain, but RIs did not engage with the user interface. This is to be expected since Ontowiki relies on a good understanding of the RDF data model. Moreover, presenting information at the RDF statement's granularity is typically inadequate for high-level information needs, e.g., discovering FAIR gaps in the data centers of an RI. We thus suggest that Ontowiki can act as an RDF-based middleware that powers high-level user applications and services. A critical aspect of using Ontowiki to manage the generated RDF data will be the question of versioning. While built-in features such as the statementlevel provenance in principle allow detailed tracking of changes/revisions of the provided data, a backup strategy using external means should be considered as well. One straightforward step would be to export complete RDF dumps of the provided content in regular intervals and to track their versions in source code repositories such as Github.

Related work
In this research, Snowballing was the primary method to investigate the existing literature regarding tools and techniques that address the knowledge management challenges. A subset of selected studies is presented in Table 4.
Since 1990, business publications have started to publish an extensive list of research articles on knowledge management and decision support systems 47 . Wielinga et al. 48 explained knowledge-based systems' development as a modeling activity. Sapuan 49 reported a set of knowledge management systems' (2) Knowledge Management that explains the process of creating, sharing, using, and managing the knowledge and information of an organization.
(3) Knowledge Discovery that refers to the process of finding explicit knowledge in data and emphasizes the "high-level" application of particular data mining methods. The main goal is to extract such knowledge from data in the context of large databases. (4) Knowledge Acquisition which is the process used to define the rules and ontologies required for a knowledge-based system and is the process of extracting, structuring, and organizing knowledge from one source, usually human experts. (5) Knowledge Representation that translates information from the real world into a machine-understandable form and then utilizes acquired knowledge to solve complex decision-making problems. (6) Decision-Making Process which is a reasoning process based on assumptions of values, preferences, and beliefs of decision-makers. It leads to suggesting a set of solutions among several possible alternative options. The ENVRI-KMS will continue in the rest of the ENVRI-FAIR project. In the next phase, the development effort will mainly focus on the following aspects: (1) Continuous content ingestion and curation. The ENVRI-KMS team will improve the knowledge ingestion tool and continuously ingest the description (metadata) of high-quality results from the ENVRI community (e.g., sub-domain or RI developers), including development results (e.g., best practices, software technologies, recommendations, updated FAIRness assessment possibly generated by new tools) in the ENVRI-KMS, and make those descriptions FAIR for the community.
(2) Continuous improvement of the ENVRI-KMS based on the feedback is received from the community. Extra features, e.g., for ENVRI-KMS discovery and recommendation, will be further explored. (3) The development and operation of the ENVRI-KMS will also follow the software engineering DevOps practices. The continuous testing, integration, and deployment pipeline will be established.
(4) We will also extend the content maintenance to community specialists. In this way, we hope the community will play a key role in the ENVRI-KMS.

Italy
Dear authors, Thanks for your submission to Open Research Europe. Since I have stepped in during the second round of revision of the present paper, I will ground my comments even on the previous reviews and the previous reviewers' comments.
As a first impression, I have noticed a strong improvement on the paper, especially about setting the boundary of your study on ENVironmental Research Infrastructures (ENVRI), while I see additional room for improvement in some aspects of the paper.
In particular, there are two points that require an additional revision, also considering the previous reviewers' comments: It is still not clear the representativity of the sample and the details of the participants. You stated you conducted an online webinar with experts while not providing details about the questions, the topic of discussion, and the approach to the interviews.
At the same time, you did not disclose any information about the participant in the subsequent online survey that is the core of your study. I would suggest a table (or some text) detailing the sample and how the sample has been selected.

○
The second focal point to be addressed is about the practical implications. In Section 6, you aim to summarize the lesson learned during the ENVRI-KMS project. However, a practitioner could find it difficult to apply such findings or could be in difficult to find them inside the paper. As a result, I suggest adding a table summarizing in few bullet points the main implications of your study.
○ Thanks for your time and good luck with the review.

Is the description of the software tool technically sound? Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others? Yes Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Is the rationale for developing the new software tool clearly explained? Partly

Is the description of the software tool technically sound? Partly
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others? No Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool? Partly In detail, the paper leaves too much room for technology reviews and generalities. On the other hand, many questions remain open concerning the implementation and use of the tool. Many descriptions of the individual components are unfortunately still fragmentary and incoherent. In addition, a profound discussion is missing, which explains, for example, what distinguishes the proposed tool in comparison with commercial solutions and how the individual components and differentiate between community specific and generic requirements Section 3 This is a very topical article, given that sharing and discovery across different areas promises new insight, but is, in practice, difficult to do.
From my side, I would be interested in: a) update and use of the described tool b) implications for European policies, in particular in regards infrastructures (ESFRI & co) but also broader data management and sharing policies in Horizon Europe and the ERA.

Competing Interests:
No competing interests were disclosed.