2024-03-28T14:12:07Z
https://zenodo.org/oai2d
oai:zenodo.org:5569659
2021-10-15T01:48:29Z
user-gateways-2021
Michels, Alexander
Padmanabhan, Anand
Li, Zhiyu
Wang, Shaowen
2021-10-14
<p>JupyterHub [1] has become a popular choice in many scientific communities, offering an easy-to-use interface for users with little to no frontend development work while promoting reproducible and replicable (R&R) science [2]. In the broad geospatial science community, CyberGISX [3] provides such a gateway environment with many cyberGIS (i.e., geospatial information science and systems based on advanced cyberinfrastructure) and geospatial software packages prebuilt and ready to use. Like other JupyterHub-based solutions, CyberGISX also provides container-based access for its users and must balance a trade-off between providing a static compute environment which enhances R&R and continuously updating the software environment to keep up with advances in scientific software. Solutions such as Binder [4] have attempted to address this trade-off by having required dependencies encoded in the package and building the software environment at the time of use. However, such a solution comes with two major disadvantages: (a) software is built at the time it is needed, increasing startup time and introducing the possibility that some of the dependencies of the environment are no longer available or have changed; and (b) the onus of specifying and managing software installations is passed to notebook developers, many of whom are domain scientists and not comfortable with such responsibilities.<br>
To address these challenges and enhance R&R with minimal effort from end-users, we have designed and implemented a solution on CyberGISX that allows software to be kept on an external file server mounted into each user's environment. Scientific software is installed with Easybuild [5] and managed by Lmod [6] giving a variety of benefits: (1) the compute environment is more standardized and easily reproducible outside of the gateway; (2) multiple versions of software can be made available to users without increasing container size; and (3) the exact copies of software are always available on the gateway instead of being rebuilt for every release, further enhancing R&R. We also employ an Easybuild-installed Anaconda [7] to create and manage conda environments on the file server. The combination of the software stack from Easybuild and Python environment from conda provides end-users with kernels for their Jupyter notebooks which are persistent and unchanged as the gateway's container updates. This design enhances R&R and adds functionality for advanced users without introducing technical barriers to non-technical end-users. As such, domain scientists using this solution need not build their own software and specify dependencies, which helps prevent the notebooks they have developed from getting broken by the next software release. This talk explores the new architecture and applications of this solution to CyberGISX [3] and CyberGIS-Jupyter for Water (CJW) [8].</p>
https://doi.org/10.5281/zenodo.5569659
oai:zenodo.org:5569659
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5569658
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
cyberGIS
Easybuild
Geospatial Software
Jupyter
Towards Reproducible Research on CyberGISX with Lmod and Easybuild
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5569397
2021-10-14T19:09:10Z
user-gateways-2021
Casertano, Matthew R.
Brookes, Emre H.
Gama Lima Costa, Raquel
Fushman, David
2021-10-14
<p>Knowledge of the structure and motions of proteins and nucleic acids is required for understanding how these molecular machines work and for development of therapeutics to combat diseases. Painting an adequate and reliable portrait of these inherently dynamic biological macromolecules presents a significant challenge due to the need to integrate experimental data from various sources and to treat these systems as structural ensembles because the experimental data are a convoluted average of contributions from multiple conformations [1]. In particular, Nuclear Magnetic Resonance (NMR) spectroscopy is a powerful versatile experimental technique that provides vital information on molecular structure and dynamics. Several approaches and software packages have been developed to tackle the complicated analysis of experimental NMR data in order to extract important structural information from underdetermined structural data. However, many of them are offline packages or command-line applications that require users to set up the run time environment and also to possess certain programming skills, which inevitably limits accessibility of this software to a broad scientific community. To address these current limitations in NMR data analysis and applications, we developed a science gateway designed for the NMR/structural biology community. The NMRSuite gateway enables comprehensive analysis of distance and orientation-dependent effects caused by paramagnetic tagging, including pseudo-contact shifts (PCS), residual dipolar couplings (RDC), and paramagnetic relaxation enhancement (PRE). The analysis is performed both for a single-structure model and for a conformational ensemble. The latter option supports two ensemble treatments: the Sparse Ensemble Selection (SES) method and the Maximum Entropy (MaxEnt) method [1]. Both methods are integrated with an additional module, Predict, which enables ab initio prediction of the relevant experimental NMR data for a given molecular structure. To make these analytical tools broadly available, we used the GenApp framework for scientific gateways [2] to transform the original software packages into a science gateway that provides advanced computational functionalities, streamlines cloud-based data input, output, and storage, and offers interactive 2D and 3D plotting and visualizations. All the aforementioned modules were originally written as standalone programs in Matlab or Java, while NMRSuite is written almost entirely in a single language, Python, to simplify future development and maintenance. All code for NMRSuite is available in a GitHub public repository, allowing anyone to read it and/or suggest improvements, as well as offering a valuable reference for others who wish to create a GenApp gateway hosted, as NMRSuite, on Jetstream or other publicly available resource. This gateway will assist researchers by providing simple, yet customizable, tools for analyzing the structure, dynamics, and function of biological macromolecules on a single easily accessible website. NMRSuite modules have been successfully used in teaching Biomolecular NMR courses and in research at University of Maryland. As a next step, we plan to integrate NMRSuite with SASSIE-web gateway [3] to enable joint analysis of NMR and small-angle scattering data and provide access to more analysis tools under a single website. We hope that our work will inspire other researchers to deploy their work with a scientific gateway.</p>
https://doi.org/10.5281/zenodo.5569397
oai:zenodo.org:5569397
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5569396
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Science Gateway
GenApp
Generalized Application Framework
Biomolecular NMR
Conformational Ensemble
Structural Biology
Biophysics
GenApp-Generated Science Gateway for NMR Data Analysis and Determination of Structural Ensembles of Biological Macromolecules
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5569976
2021-10-15T01:48:26Z
user-gateways-2021
DeHart, Jennifer
Javornik, Brenda
2021-10-14
<p>Radar and lidar instruments are widely used for the remote sensing of the atmosphere in three dimensions. These are complex instruments that produce copious quantities of data. Both the size of the data, and the complexities associated with real-time applications and post-analyses, pose a challenge for researchers, students, and instrument developers. Good software tools are needed to facilitate research and education and to maximize returns on NSF investments in observing facilities for atmospheric research. The Lidar Radar Open Software Environment (LROSE) [1], funded by NSF and developed collaboratively by the National Center for Atmospheric Research (NCAR) and Colorado State University, is a tested and documented software toolbox for analyzing radar and lidar data.<br>
Installation is one of the biggest challenges for users of research-grade software. Based on community feedback from prior and recent surveys, LROSE is no exception. The LROSE science gateway was developed as an accessible solution to the installation problem, providing a web-based interface and allowing jobs to be run on NSF-provided resources in the cloud. The gateway will extend these services available to users who otherwise might not be able to install the software on their desktops, laptops, or servers or use only a few applications.<br>
Currently, the gateway supports a few key functionalities. A user can run a subset of key LROSE applications individually. A popular workflow that estimates rainfall from radar data steps a user through a series of applications that convert data into a standard format, calculate key parameters and three-dimensional precipitation rates, and estimate the rainfall occurring just above the surface. Users can upload their own research data, access radar data in tutorials, and grab operational National Weather Service Next-Generation Radar (NEXRAD) directly from the cloud through Amazon Web Services. Future plans include expanding to other data sources in the cloud, implementing other key LROSE workflows, such as the Wind Analysis workflow shown in Figure 1, and developing modules that educators could use in the classroom to teach students about radar and lidar data processing. The wind analysis tools are a critical piece of LROSE that can be computationally expensive. Workflows for research and educational purposes also support the continued development and improvement of tutorials and documentation for users.</p>
https://doi.org/10.5281/zenodo.5569976
oai:zenodo.org:5569976
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5569975
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
gateway applications
radar
lidar
atmospheric science
The LROSE Science Gateway: Accessible Lidar and Radar Processing in the Cloud
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5569393
2021-11-15T09:42:15Z
user-is-enes3
user-gateways-2021
user-eu
Pagé, Christian
Zimmerman, Klaus
Aoun, Abel
Veldhuizen, Mats
Spinuso, Alessandro
van der Neut, Ian
Bärring, Lars
Striewski, Friedrich
2021-10-14
<p>Researchers and end users using climate data face a challenge when they analyze the data they need. Data volumes are increasing very rapidly, and the ability to download all needed data is often no longer possible. Most of the climate analysis tools for research and application needs must use very large datasets, often distributed among several data centres and into a large quantity of files. This is especially true when they are stored in a federated architecture like the ESGF.<br>
<br>
One of these tools is icclim (https://github.com/cerfacs-globc/icclim ), a flexible python software package to calculate climate indices and indicators. This tool adheres as much as possible to metadata conventions such as CF, implementing also provenance information. It also aims at providing increasing support for all FAIR aspects. It is designed with performance and optimisation in mind, because the goal is to provide on-demand calculations for users. It provides the implementation of most of the international standard climate indices such as ECAD, ETCCDI, ET-SCI, including the correct methodology for calculating percentile indices using the bootstrapping method. It has been validated against R.Climdex as well (https://cran.r-project.org/web/packages/climdex.pcic/index.html ). The new 5.x version of icclim is now based on functions from the xclim python library, which was inspired by earlier versions of icclim, but using xarray and dask for data access and processing. icclim is also a candidate as the software to calculate climate indices for the C3S toolbox (https://cds.climate.copernicus.eu/cdsapp#!/toolbox ).<br>
<br>
This tool is integrated in the IS-ENES C4I 2.0 platform (https://dev.climate4impact.eu/ ), using a Jupyter notebook collection in a SWIRRL environment (Software for Interactive Reproducible Research Labs https://gitlab.com/KNMI-OSS/swirrl ). Having access to this type of complex analysis tool is very useful, and integrating them with front-ends like C4I enable the use of those tools by a larger number of researchers and end users.<br>
<br>
This project (IS-ENES3) has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement N°824084.</p>
https://doi.org/10.5281/zenodo.5569393
oai:zenodo.org:5569393
Zenodo
https://zenodo.org/communities/is-enes3
https://zenodo.org/communities/gateways-2021
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.5569392
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
climate analysis
climate indices
climate indicators
jupyter notebook
scientific data
interactive platform
data analysis
python software
An Interactive Platform for Climate Analysis using a Climate Indices Tool
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5570549
2021-10-15T01:48:30Z
user-gateways-2021
Campbell, Rob
Kalyanam, Rajesh
Song, Carol
Zhao, Lan
2021-10-14
<p>A minimal set of reusable software packages and<br>
methodologies is described which have enabled the efficient<br>
creation of multiple, self-contained, data-centric web<br>
applications ("tools"). The tools themselves are dedicated to<br>
the analysis, sharing, and visualization of data produced by<br>
researchers from diverse domains. The development template<br>
described is intended to support a range of development<br>
contexts, developer proficiencies, and hosting environments.<br>
Specifically, we organized the tools' codebases within a<br>
simplified Model–View–Controller pattern and used<br>
Anaconda, Python, Jupyter, Docker, and Voilà [1] to develop<br>
and deploy the tools. The basis of the tool code is Python<br>
running as a Jupyter notebook. However, we forgo traditional,<br>
cell-based notebook organization and use Jupyter only to<br>
enable the use of ipywidgets for user interface controls.<br>
Further, this development context allowed us to leverage the<br>
use of libraries such as Pandas for data access, Ipyleaflet for<br>
geospatial visualization, and Matplotlib and Plotly for plots,<br>
charts, and graphs.<br>
A portion of the tools produced were published as<br>
stand-alone, containerized, independently hosted [2] web<br>
applications. We relied on Docker and Voilà to create<br>
containerized applications which include their own web<br>
servers. Others were published as tools embedded within a<br>
HUBzero [3] based science gateway. In this context, we relied<br>
on the gateway's Jupyter server and its ability to render the<br>
tool using Jupyter's "Appmode" extension. Thus far six tools<br>
have been developed using this template with subsequent tools<br>
only taking approximately one third as long as the initial effort<br>
employed in developing the template.<br>
The development template emerged in order to easily<br>
enhance code created by researchers and supplement it with a<br>
user interface (UI) in order to expose a tool allowing others to<br>
explore the researchers' data through a variety of rich UI<br>
elements that provide data-specific filtering, query, and<br>
visualization capabilities. One requirement was that the<br>
researchers themselves, not dedicated web developers, should<br>
be able to enhance and maintain the tool code. The template<br>
then evolved as we used it as the basis for the development of<br>
other tools and then as the basis for student interns to quickly<br>
build data-centric applications.<br>
We describe the steps we use to create these types of tools<br>
as well as the pitfalls we encountered as we assembled the<br>
template over time.</p>
https://doi.org/10.5281/zenodo.5570549
oai:zenodo.org:5570549
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5570548
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
gateways
tools
development
Jupyter
Voilà
A Template for Rapid Development of Interactive Computing Tools
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5569422
2021-10-14T19:09:59Z
user-gateways-2021
Bockholt, H. Jeremy
Verner, Eric
Salman, Mustafa S.
Baker, Bradley T.
Misiura, Maria B.
Calhoun, Vince D.
2021-10-14
<p>We present BrainViz, novel scientific gateway (SG) designed for psychiatry researchers to access artificial intelligence (AI) methods and explore options for treatment and characterization of various phases of disease. Our initial implementation of BrainViz provides access to (1) analysis tools, (2) visualization tools, (3) collaboration capabilities, (4) cluster computing, and (5) a data repository. We have implemented BrainViz as a scalable web application built using the Python Django[1] framework within Amazon Web Services (AWS) with a HIPAA-compliant architecture[2]. Amazon Relational Database Service[3] (RDS), a scalable, serverless database, is used for the back end. The data repository storage is configured using S3 with backups of the repository synchronized to a separate Glacier storage tier within AWS. The front-end website is hosted at brainviz.net, introducing this SG to the community. A user may (1) create an account; (2) manage contact information, affiliation, and password; (3) learn about and join a collaborative community; (4) upload community datasets; (5) test a compliant dataset using an existing AI model; (6) visualize the results of the application of the model; (7) share data derivatives with the community.</p>
https://doi.org/10.5281/zenodo.5569422
oai:zenodo.org:5569422
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5569421
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Precision Medicine
MRI
Mental Health
Brain
BrainViz: A Scientific Gateway for Deploying AI Methods to be used as a Clinical Decision Support System for Treatment in Brain Disorders
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5569388
2021-10-14T19:06:14Z
user-gateways-2021
Meiring, Joseph
Franklin, Nathan
Park, Ian
2021-10-14
<p>The DesignSafe [1] portal is a web-based application created to facilitate researchers in the natural hazards community, providing data storage, computational resources, and publication pipelines to its users. With partners such as the RAPID response field reconnaissance teams who collect data immediately after a disaster event, the geospatial component of the data is essential information needed in subsequent analysis. Storing, searching, and visualizing this geospatial information has subsequently become a need for the DesignSafe and natural hazards community. A typical field reconnaissance mission can generate many gigabytes of geotagged images, videos and point cloud datasets. To integrate these data with the DesignSafe platform, we have developed a new set of RESTful APIs and a single page application to store, query and visualize these datasets.<br>
<br>
As one of the requirements for these tools is tight integration with the existing DesignSafe platform, which is built using Tapis [2] APIs, we have implemented the GeoAPI as a separate microservice in the Tapis v2 platform. This design allows us to easily utilize the existing data storage and user management tools built into DesignSafe. Users can access their existing data that is shared with their colleagues and already stored on the DesignSafe platform. Similarly, by utilizing the authentication services in Tapis, users can access these services with a single account.<br>
<br>
The backend has been developed using Python and the Flask framework. Application data is stored in a Postgres database with the PostGIS extension providing geospatial indexing. We have leveraged open-source libraries in the application such as GDAL, PDAL, Laspy, GeoPandas, and Shapely. Reconnaissance teams often use Lidar to collect point cloud data of structures damaged during a natural hazard event. These datasets alone can grow to many hundreds of GB and require additional processing. We use Celery to asynchronously process these data with Potree [3] to visualize them in the browser. Static assets such as images and videos are saved into a Ceph filesystem and are served directly by a Nginx proxy for performance. Maps can be automatically generated from data already in DesignSafe and asynchronously updated when new data is added to the storage system. The application is deployed into a Kubernetes cluster at TACC. A general architecture diagram is shown in figure 1.<br>
<br>
To visualize these datasets, we have created Hazmapper 2.0, a single page web application. This application was written in TypeScript using the Angular framework and Leaflet libraries. Additionally, we use an auto generated client library Ng-Tapis [4] to interact with the Tapis APIs. By doing so, we can utilize the API directly from the browser without the need for a separate backend application. External map tiles can be imported through a searchable interface using QuickMapServices (QMS) APIs [5].<br>
<br>
The decoupled backend architecture allows us to have multiple user interfaces that are specialized for different end-user needs. One such example is the Taggit [6] application from Designsafe researchers. Separate from Hazmapper, this application allows for tagging and classification of georeferenced images.</p>
https://doi.org/10.5281/zenodo.5569388
oai:zenodo.org:5569388
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5569387
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
User interfaces
Tapis
DesignSafe
GeoSpatial
GeoApi and Hazmapper: A RESTful API and Responsive User Interface For Geospatial Data In The Natural Hazards Community
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5569610
2021-10-15T01:48:28Z
user-gateways-2021
McHenry, Kenton
Bobak, Mike
Coakley, Kevin
Fils, Doug
Gatzke, Lisa
Richard, Steve
Valentine, David
Zaslavsky, Ilya
Zhang, Bing
Kirkpatrick, Christine
2021-10-14
<p>The NSF EarthCube program has worked to bring together data repositories within the geosciences in the adoption of schema.org to annotate stored datasets and allow them to be crawlable. Through this, data stored within the union of these repositories can be indexed centrally by resources such as Google or other services perhaps more tuned for geoscience or a particular area within it. We will present an effort by the past two EarthCube offices, GeoCODES [1], to prototype such a service. In addition to crawling, indexing, and presenting datasets within a unified search interface the effort aims to provide additional capabilities along the gateway notion allowing users of the system to better make use of the data. One such aspect is that of bringing together the tools catalogued within the EarthCube Resource Registry so that they are discoverable along with the datasets that they are capable of operating on. This is being extended further to allow notebooks collected from the community through the annual call for notebooks [2] or elsewhere to also be associated with datasets and even executed from the results page interface through MyBinder or Google Colab with the retrieved data already pre-loaded. Further, given the growing realization in the importance of addressing how users interact and effectively make use of software tools we will present on our interactions with a professional user interface/user experience (UI/UX) team and the aims there in terms of determining layout improvements. Overall, we will describe and demonstrate GeoCODES [3] in the context of a possible scientific gateway providing capabilities on top of the data as described, as well as other capabilities determined important from the geoscience community such as identifying related datasets, and temporal and geospatial search.</p>
https://doi.org/10.5281/zenodo.5569610
oai:zenodo.org:5569610
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5569609
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
geoscience
data repositories
discoverability
data analysis
EarthCube GeoCODES
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5569811
2021-10-15T01:48:30Z
user-gateways-2021
Nicholson, Todd
Marini, Luigi
McHenry, Kenton
Witharana, Chandi
Udawalpola, Rajitha
Walker, Lauren
Jones, Matt
Thiessen-Bock, Robyn
Nitze, Ingmar
Wind, Gala
Jones, Chris
Liljedahl, Anna
2021-10-14
<p>Warming climate is causing rapid and significant change to permafrost in the Arctic region. Fortunately a large quantity of satellite data is available for analysis. The goal of the Permafrost Discovery Gateway is to a) enable the creation of pan-Arctic geospatial products and b) make them accessible to both scientists and the public through visualization and analysis tools. To achieve the development of large geospatial data we are building a science gateway to manage hybrid machine learning pipelines using both Cloud and HPC Resources.<br>
<br>
Part of this pipeline takes high resolution satellite imagery and maps permafrost thaw features across the Arctic region. This novel high performance image analysis framework, Mapping application for Arctic Permafrost Land Environment (MAPLE), detects ice wedge polygons from very high resolution optical imagery data archived at the Polar Geospatial Center, in three steps. The first step is image preprocessing, the second is DLCNN (Deep Learning Convolutional Neural Network) prediction, followed by a third post-processing step. The first and third steps have CPU implementations, but the DLCNN requires GPU resources.<br>
<br>
Furthermore we create geospatial datasets of lake area change, fire scars and retrogressive thaw slumps, which occurred over the past 20 years across the Arctic permafrost region. These datasets are based on Landsat, which are pre-processed through Google Earth Engine and further analyzed using machine learning and geospatial data analysis in an automated processing pipeline.<br>
<br>
For visualization, we incorporate Cesium as a 3D tile-based Imagery Viewer that allows exploration of pan-Arctic, sub-meter map products over time and can be exported as publication-quality map images. We also incorporate the Fluid Earth Viewer to enable global and regional visualization of Arctic data products over time. A third visualization tool will include the 2D-4D graph plotting of the big geospatial data.<br>
<br>
We host data on an instance of Clowder using a Kubernetes cluster hosted on NCSA Radiant Openstack. We have adapted existing workflows (MAPLE and the analysis of data from Google Earth engine) as Clowder information extractors. Jobs that do not require GPU resources are executed in the local Clowder cluster, while those that require GPU resources are submitted to external clusters , such as XSEDE Bridges2, and the results uploaded back to the Clowder instance. We are in the process of automating the data ingestion and processing step.<br>
<br>
Together, these components provide a starting environment to support permafrost science. Our long term goal is to apply lessons learned from implementing these solutions for specific use cases to other research questions around the study of the Arctic region.</p>
https://doi.org/10.5281/zenodo.5569811
oai:zenodo.org:5569811
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5569810
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
permafrost
arctic
global warming
kubernetes
geospatial
satellite
machine learning
visualization
climate
Creating a Permafrost Discovery Gateway - Providing Researchers and the Public with access to arctic data
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5569673
2021-10-15T01:48:30Z
user-gateways-2021
Burnette, Maxwell
Kooper, Rob
Lambert, Michael
2021-10-14
<p>NCSA has two open-source applications focused on research data management and accessibility: Clowder is a scalable data repository with extensive metadata search capabilities and support for automated extraction of metadata from uploaded files, and Labs Workbench is an application catalog capable of registering and running instances of containerized research environments in the cloud. Both of these applications are designed around making research data available and interactable for a broad set of communities with minimal effort from the user. In the summer and fall of 2021, we are integrating these two applications together in order to enable users to seamlessly move data between Clowder instances where files are organized and Workbench applications where files can be examined and processed, and outputs can be shared back into the Clowder environment.<br>
As scientific research has increasingly moved towards cloud computing, big data and dense software dependency trees, it has also become increasingly difficult for researchers to perfectly replicate the environments necessary to house and analyze these datasets. Big files are slow to move around, individual laptops may not have the necessary storage or processing power, and differences between operating systems cause inconsistencies. Clowder has a user-friendly GUI for uploading files and datasets, tagging and searching for them, and submitting them to extractors for processing. Labs Workbench provides a cloud management environment for containers running applications in browsers, such as Jupyter notebooks or GIS interfaces. We recognized an opportunity to move these complementary feature sets together to build a complete platform of data storage, sharing, analysis and discovery.<br>
The goal of our integration is to provide a direct path for datasets to move between Clowder instances and Workbench applications easily and seamlessly, particularly in environments where data storage can be mounted on co-located Clowder and Workbench virtual machines. Users will see options in the Clowder interface to send their data to their chosen Workbench instance, while in Workbench the files will land in a shared home directory accessible between all of the user’s applications. Behind the scenes, we have leveraged Clowder’s existing interface for submitting datasets to extractors, which are containers running prepared scripts that do a single task like running text-to-speech on an audio file or creating a face recognition mask on a photograph. This extractor framework is widely used, and we wanted to provide simple ways for researchers to develop new extractors and share them back to the community. Workbench was a great opportunity to provide a simple path: researchers can move a sample subset of data from Clowder to Workbench, develop an algorithm to process meaningful metadata from it, and deploy that algorithm as a Clowder extractor in an identical container environment to process the full dataset. In cases where data are co-located on a shared mount, large datasets can be moved and processed instantly.</p>
https://doi.org/10.5281/zenodo.5569673
oai:zenodo.org:5569673
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5569672
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
cloud computing
research platforms
containerization
data repositories
Integration of Clowder Research Data Framework with NCSA Labs Workbench
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5570263
2021-10-15T01:48:28Z
user-gateways-2021
Cho, In Ho
Kim, Jae-Kwang
Yang, Yicheng
2021-10-14
<p>There emerges a strong need for a large/big data-oriented imputation method for accelerating data-driven scientific discovery in the new era of big data and powerful computing. Imputation is a statistics-based procedure to fill in missing data, and there exists a wide spectrum of methods. Still, they are often not applicable for large/big incomplete data and require difficult statistical assumptions. With support from NSF (OAC-1931380), we developed the ultra data-oriented parallel fractional hot-deck imputation (UP-FHDI [1,2]) which is general-purpose, assumption-free software for handling item nonresponse in big incomplete data by leveraging the theory of FHDI and parallel computing. Here, “ultra” data means a data set with high dimensions and many instances (i.e., concurrently big-p and big-n; see Figure). UP-FHDI inherits the strength of FHDI [3] that can cure multivariate missing data by filling each missing unit with multiple observed values without requiring any prior distributional assumptions.<br>
UP-FHDI adopts a parallel file system that supports inter-processor communication and allows simultaneous access from multiple compute servers to the hard drive to optimize memory usage by managing essential data in memory and other data on the hard drive. Meanwhile, we use the Optimal Overload IO Protection System with UP-FHDI to dynamically adjust the intensive and simultaneous IO workload during a job to avoid global file system performance degradation. Exploring the strength of this parallel file system, we provide full details of ultra data-oriented parallelisms on significant steps of UP-FHDI: cell construction, estimation of cell probability using expectation maximization, parallel imputation, and parallel variance estimation, respectively. The cell construction step adopts the parallel k-nearest neighbors method for deficient donor selection to break the computational bottleneck of cell-merging scheme of serial FHDI. The sure independence screening is embedded into the UP-FHDI for ultrahigh dimensional variable reduction and thus overcomes the curse of dimensionality. Besides the parallel Jackknife method, UP-FHDI implements another computationally efficient variance estimation using parallel linearization techniques.<br>
We validate the UP-FHDI’s accuracy by conducting Monte Carlo simulations. Results confirm that UP-FHDI can handle an ultra dataset with one million instances and 10,000 variables concurrently and do not have a specific limitation on data volume. UP-FHDI exhibits promising scalability with different ultra datasets, and its practical computational performance has a good agreement with the cost models of speedup and execution time. Furthermore, we confirm UP-FHDI’s positive impact on the subsequent deep learning performance. Since machine learning models may over-fit with ultrahigh dimensional data, we adopted a two-stage feature selection method that leverages the mutual information and graphical LASSO to reduce ultrahigh dimensionality to a small subset as a pre-processing remedy.<br>
We provide full documentation to illustrate how to deploy, compile, and perform UP-FHDI for ultra incomplete data curing. To maximize the benefit of broad researchers, many synthetic and practical example data sets for UP-FHDI are made publicly available in IEEE Data Port.</p>
https://doi.org/10.5281/zenodo.5570263
oai:zenodo.org:5570263
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5570262
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Parallel Imputation
Incomplete Data Curing
Fractional Hot Deck Imputation
General-Purpose Open-Source Program for Ultra Incomplete Data-Oriented Parallel Fractional Hot Deck Imputation
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5569653
2021-10-15T01:48:31Z
user-gateways-2021
Clark, Steven
Padilla, Daniel Mejia
Strachan, Alejandro
2021-10-14
<p>HUBzero has different subsystems for managing and launching the execution of software tools. These subsystems enable the setup and installation of software tools to be used by the end user community via the web browser easily. Earlier subsystems provided the capability to interact with Graphical tools via VNC or submitted compute jobs to HPC clusters. This tutorial will introduce and discuss the new SimTools, which provides a pluggable framework to package and launch rich interactive web applications by leveraging Jupyter notebooks. In addition, SimTool services and requirements, all the data they generate are accessible and queryable.<br>
<br>
In this tutorial, we will walk attendees through how SimTools operate within the nanoHUB.org gateway. In a 90-minute session designed for beginners, attendees will create an account on nanoHUB to complete hands-on activities during the tutorial. Some experience with Python is preferred.</p>
https://doi.org/10.5281/zenodo.5569653
oai:zenodo.org:5569653
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5569652
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
SimTools
nanoHUB
HUBzero
FAIR
scientific software
SimTools: Software tools that are FAIR
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5592123
2021-11-12T10:22:15Z
openaire
user-re3data
user-gateways-2021
Schabinger, Rouven
Strecker, Dorothea
Wang, Yi
Weisweiler, Nina Leonie
2021-10-21
<p>re3data is the global registry for research data repositories. As of October 2021, the service lists over 2700 digital repositories across all scientific disciplines and provides an extensive description of repositories based on a detailed metadata schema, the updated version of which was recently published. The service promotes open science practices and the visibility of science-driven open infrastructures. A variety of funders, publishers, and scientific organizations around the world refer to re3data within their guidelines and policies, recommending the service to researchers looking for appropriate repositories for storage and discovery of research data.</p>
<p>The re3data web interface and API function as an important Science Gateway, allowing its end users and other services to access the largest index of research data repositories in the world. Since January 2020, the re3data COREF project receives funding from the German Research Foundation (DFG) to further develop and enhance the registry service.</p>
<p>This presentation from the Gateways 2021 conference introduces the re3data service and gives an overview of the ongoing project work, including a short introduction to the recently published “Conceptual Model for User Stories” for re3data. The editorial process for the inclusion of repositories in re3data is explained and readers learn how to submit new entries and suggest changes. Readers also learn how to retrieve information and metadata from re3data.</p>
<p>A recording of the session will be uploaded on <a href="https://www.youtube.com/playlist?list=PLTkmCX5R7siNrdUtgrn4ncEpmW9ZyJJff">Youtube</a>.</p>
https://doi.org/10.5281/zenodo.5592123
oai:zenodo.org:5592123
eng
Zenodo
https://doi.org/10.5281/zenodo.5570662
https://zenodo.org/communities/re3data
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5592122
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Gateways 2021, 19-21 October 2021
Research Data Repositories
Repository Registry
Index
re3data COREF Project
Introducing re3data – the Registry of Research Data Repositories
info:eu-repo/semantics/lecture
oai:zenodo.org:5570247
2021-10-15T01:48:30Z
user-gateways-2021
Kalyanam, Rajesh
Campbell, Rob
Woo, Jungha
Brewer, Nicole
Zhao, Lan
Song, X. Carol
2021-10-14
<p>We describe our experience in designing and deploying a scalable and extensible approach to processing files in response to create, update, rename, and delete operations on a science gateway. The MyGeoHub gateway is based on the HUBzero framework and extends it with novel capabilities that enable researchers to self-manage, visualize, search, and discover geospatial data from various domains. Specifically, structured geospatial metadata is extracted automatically and ingested into HUBzero’s search infrastructure, and embedded web preview of geospatial files is enabled by processing and registering geospatial files to a map server. Since HUBzero (and MyGeoHub) enables both the web content management system (CMS) and containerized tools (running on a separate server) to expose a shared file space, any such file processing needs to be agnostic to the source of the file events.<br>
<br>
We utilize a combination of kernel-level event monitoring (via auditd), event message filtering via Logstash, and a Kubernetes/Helm-based cloud deployment of a high- availability message broker (RabbitMQ), scalable, extensible message processing worker containers, and a QGIS map server to implement our file processing pipeline. The various features of our technology choices that were leveraged and the lessons learned during this deployment are as follows: (1) the ability to filter, simplify, and/or augment the kernel messages via Logstash enables us to reduce the load on the message broker; (2) the ability to expose the message broker and map server via static fully qualified domain names (FQDN) through the use of the Kubernetes Ingress Load Balancer (ILB) reduces the need for configuration updates on the gateway; (3) the ability to deploy rolling updates to the message processing workers via Helm upgrades enables a phased deployment and evaluation of feature extensions; and (4) the use of QGIS projects to manage map layers for files in a particular folder speeds up preview<br>
processing and simplifies the web preview interface implementation.<br>
Our file processing pipeline is designed to be independent of the MyGeoHub gateway and can be easily integrated with any other gateway platform as long as file event messages in appropriate formats are generated and streamed to the message broker. The file processing pipeline is currently deployed to Jetstream cloud and the corresponding Helm charts can be found in our GitHub repository.</p>
https://doi.org/10.5281/zenodo.5570247
oai:zenodo.org:5570247
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5570246
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
gateways
pipeline
file processing
Kubernetes
message queue
Design and Deployment of Event-based File Processing in a Gateway
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5570009
2021-10-15T01:48:31Z
user-gateways-2021
Rocha, Alex
Franklin, Nathan
Tijerina, Sal
2021-10-14
<p>The challenges in building science gateways are as varied as the problems they seek to solve, and there is no guarantee that the obstacles of tomorrow will look anything like the challenges of today. To meet the ever-changing needs of researchers, gateways must be rapidly deployable, easy to customize, and yet consistent enough that researchers can intuitively understand and use them.<br>
<br>
At Texas Advanced Computing Center (TACC), a variety of science gateways are used by researchers to access TACC’s computing resources, manage data and collaborate with other researchers. First generation portals were deployed directly to a virtual machine and suffered from a slew of issues over time such as package conflicts, difficulties in allowing for customization and brittle deployments caused by VM upgrades (e.g security updates). To address these challenges, the services of science gateways - such as a user guide, content management system (CMS), dashboard and other applications - were separated into Docker containers hosting distinct services. We then developed a Docker container-based deployment framework called Camino to facilitate the deployment [1]. The Camino framework employs configuration settings stored in a repository to facilitate easy, transparent and reproducible deployments. With this approach, developers can easily customize and deploy upgrades at-will to individual containers, meeting the rapidly evolving needs of PIs and stakeholders. Camino is currently used to deploy and manage a multitude of scientific gateways at TACC. As an example, these gateways are typically centered around our core science gateway, a Django/React application called Core-Portal [2] which provides a dashboard to HPC job access, data access, and collaboration tools, and runs in its own isolated container.<br>
<br>
Join us for this lightning talk where we will share the tools, strategies, and real-world solutions we are building now for the challenges of tomorrow. Whether the needs require a science gateway for documentation, or one that can interface with storage and HPC resources, come learn about the tooling and strategies we use to deliver customizable on-demand gateways. We will present the tech we use and describe the strategies and software design philosophies we are implementing to generate composable, extendable gateways that can deliver everything from full-fledged multi-service solutions to a-la-carte style service selection.</p>
https://doi.org/10.5281/zenodo.5570009
oai:zenodo.org:5570009
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5570008
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
containers
docker
gateway
services
frontend
backend
deployment
continuous
infrastructure
Deploying Configurable and Containerized Science Gateways
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5569640
2021-10-15T01:48:29Z
user-gateways-2021
Quick, Rob
2021-10-14
<p>The Target Enablement to Accelerate Therapy Development in Alzheimer’s Disease (TREAT-AD) Center at Indiana School of<br>
Medicine and Purdue provides cutting-edge research aimed at drug discovery for the crippling disease. It consists of five core<br>
activities: medicinal chemistry, structural biology, high throughput screening, bioinformatics, and administration and data<br>
management. To enable these essential activities, the Cyberinfrastructure Integration Research Center (CIRC) at Indiana<br>
University has created the Bioinformatics and Computational Biology Portal (BCB Portal). This science gateway provides a<br>
repository for tools and data used by researchers at TREAT-AD. The BCB Portal integrates data management, diverse<br>
computational resources, visualization tools, publicly accessible user interfaces, and secure researcher interfaces. Currently, the<br>
BCB Portal is active. Several bioinformatics applications have been integrated into the Apache Airavata Framework to leverage<br>
its robust features, including data management, resource provisioning, application porting, and user-centric design. We will also<br>
provide a look at the drug discovery process, identifying target drugs, computational analysis of those targets, and finally, the<br>
final output of the TREAT-AD Center target enablement packages (TEPs). These TEPs allow promising compounds to move to<br>
laboratory and clinical testing. The initial conceptualization of the BCB Portal was presented at Gateways 2020, and this lightning<br>
talk will cover the implementation of those concepts over the past year.</p>
https://doi.org/10.5281/zenodo.5569640
oai:zenodo.org:5569640
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5569639
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Alzheimer's Disease
Science Gateway
Bioinformatics
Apache Airavata
The Alzheimer's Disease Bioinformatics and Comptational Biology Portal
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5569430
2021-10-14T19:11:20Z
user-gateways-2021
Thompson, Christopher
Ukkusuri, Satish
Gehlot, Hemant
Lei, Zengxiang
Verma, Rajat
Xue, Jiawei
Qian, Xinwu
Xianyuan, Zhan
Rice, Shawn
Song, Carol
2021-10-14
<p>While there is no shortage of research teams developing complex modeling code across the spectrum of scientific domains, most are not prepared to transition their simulations from private labs into publicly accessible gateway platforms. Research software engineers (RSEs) can bridge this vital knowledge gap, and in doing so help researchers both reach a greater audience for their work and build a greater value returned on their research funding. This presentation details the case of Purdue University RSEs helping build a gateway to the A-RESCUE simulation[1], a realistic vehicle traffic model for studying how efficiently an entire population will evacuate through a city-scale road network.<br>
<br>
Each project has unique challenges that will place constraints on what solutions are viable. RSEs for this project balanced what is technically the best direction for the architecture against the future roles of the researchers. They will be maintaining the gateway independently after the collaboration period, operating within the computing environment they have already created, and staying within the bounds of their grant deliverables.<br>
<br>
This project began by evaluating the existing simulation interface. The research team had developed a robust model which operated solely through a framework vendor provided GUI. This ran in a standard workstation but required each end user to setup a development environment and run the model through a specialized tool. For potential research collaborators this may be viable, but it would set too high a bar for any potential lay-user such as emergency personnel wanting to plan city evacuations.<br>
<br>
RSEs were able to raise these concerns and propose the creation of a new server interface built around an alternate batch-mode API found within the framework’s documentation. This allowed the simulation to be made available to new classes of remote users through a simple network socket with almost no modification of the underlying simulation code and no local software installation necessary.<br>
<br>
A-RESCUE is written in the Java language using the Repast Simphony agent-model framework[2]. This factored into the solution platform choice as the researchers would need to self-maintain the gateway after development concluded. NodeJS [3] was selected as the best platform for the gateway as it has excellent community support to help the team with debugging future issues, is written in JavaScript at which they were already proficient, and meshed well with their model Java code.<br>
<br>
With the server solution determined, the RSEs aided the team with refining a prototype web client and composing all the systems together within servers they had already purchased for the grant. Working with campus computing staff, they gained access to additional resources for server management and professional UI development for their web client prototype.<br>
<br>
Research teams utilizing campus RSEs to build scientific gateways adds value for all involved. Rather than merely optimizing model code, RSEs can create interfaces between applications, introduce new technologies, and streamline workflows for easier maintainability. These collaborations generate new audiences for both the research team’s work and the institution’s computing center. Promoting these collaborations should be a priority to all organizations.</p>
https://doi.org/10.5281/zenodo.5569430
oai:zenodo.org:5569430
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5569429
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
scientific gateways
NodeJS
user experience
ease-of-use
collaboration
research software engineer
rse
The Development of A-RESCUE Online Task Management
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5570605
2021-10-15T01:48:29Z
user-gateways-2021
Brewer, Nicole
2021-10-14
<p>We describe our experience designing and implementing a highly interactive, online computational tool in Jupyter Notebooks. The lessons learned and subsequent design choices can be applied to similar tools in various domains. This tool, Superpower [1], is a graphical interface for a set of functions designed to help users to perform power analysis on their study design in psychology. The supported statistical functions are computationally non-trivial in that each power analysis function requires many multidimensional parameters that unavoidably make use of lists or numpy arrays. It is also highly interactive such that user manipulation of any single value or element may require a cascade of updates to others.<br>
We chose an MVC design pattern to separate computational logic from the view. We used ipywidgets for view components such as selection boxes and numeric parameter settings, as it is the de facto standard for interactive components in Jupyter. Additionally, we used Interactive Matplotlib (ipympl) to create features such as a click and drag bar chart. ipywidgets is built on a traits library called traitlets [2]. Traits are object attributes that are designed to create and send a notification upon an event or a change in value. Correspondingly, event listeners are functions or methods designed to respond to these notifications. For example, the IntText widget is frequently used to manipulate the number of subjects in the user interface (UI). An observe function can be used to update the corresponding value in the model upon change of this number in the view (UI).<br>
However, traitlets proved insufficient for our use case due to the lack of support for “container traits”. Container traits are important in our models because of the need to monitor changes in multi-dimensional lists and arrays. For example, a grid of IntText and FloatText widgets were used to represent a set of related multidimensional parameters represented in the model as numpy arrays. Changing any value in this grid would require updates to others. Therefore, it was imperative to use lists observable at the item level. Consequently, we used Enthought’s traits library [3] to implement our models; traitlets is, in fact, a lightweight implementation of traits that is missing support for “container traits (list, dict, tuple) that can trigger notification if their contents change” [2].<br>
traits is syntactically similar to traitlets, with a few additional complexities that may make it more difficult to learn for someone not already familiar with a traits library; for example, traits does not raise errors that occur in observe functions, making debugging more difficult. Furthermore, unlike traitlets, traits does not support “links” that simplify the setup of bi-directional mapping between the view and the model. Traditional ipywidgets is likely to be sufficient for tools that do not have multilevel data structures or heavily interactive components. For tools with these complexities, we recommend the use of the traits to monitor changes in a model class. Our design choices can serve as a guide for developers looking to address complex interactivity in a Jupyter-based gateway tool.</p>
https://doi.org/10.5281/zenodo.5570605
oai:zenodo.org:5570605
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5570604
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
gateways
traits
Jupyter Notebook
interactive tools
Leveraging Traits for Highly Interactive Computational Tools in Jupyter
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5570657
2021-10-15T01:48:29Z
user-gateways-2021
Liang, Sheldon
MacCarthy, Elijah
Hall, Clara
Gardner, Patricia
2021-10-14
<p>Advanced Integral Digitalization (AID) aims to liaise with universal interface, aggregate intelligent browsing around Anything-as-a-Service (XaaS), and engage with user-centric experience throughout digital archiving and transformed analytics (DATA). AID-to-DATA proposes a panel tutorial in a CIA triad framework with ease to Contextual expression, Interactive engagement and Accessible emergence from wiseCIO.<br>
<br>
The AID promotes a CIA triad framework for ACTiVE applications with contextuality that promotes universal (user) interface design (with little coding), interactivity that engages with user-centric experience, and accessibility to archival content management and delivery (ACMD) via machine learning automaton (without coding). With AID, we have developed a set of hybrid online courseware utilized in hands-on learning throughout tutorial sessions. Case studies through XaaS have been explored and exploited with DATA over broad fields, such as ARM (archival repository for manageable accessibility), BUS (biological understanding from STEM), DASH (deliveries assembled for fast search & hits), DIGIA (digital intelligence governing instruction and administration), HARP (historical archives & religious preachings), MATH (mathematical apps in teaching and hands-on exercise), and SHARE (studies via hands-on assignment, review/revision and evaluation).<br>
<br>
The AID not only promotes interactivity among tri-parties (such as developer, user and computer) but also prompts spontaneity for the user’s patience and passion for their subjects (such as researchers & developers, instructors and students, and librarians as well). As a result, AID propels integral DATA to propagate student users to retain the knowledge conveyed with a known attendance rate increase from 50% to 90%. This proposed tutorial will share educational excellence emerging from wiseCIO. The AID-to-DATA combination plays a key role in hybrid online courseware that fosters learners’ patience and passion with AWE -- Appreciation for knowing what to learn, Wonderment at how to be excellent, and Excelling through their studies.<br>
<br>
Online Proposal/Paper: https://drive.google.com/file/d/1Z04TREovNtBncFZtp_aRcVp9cvQ2irt0/view?usp=sharing</p>
https://doi.org/10.5281/zenodo.5570657
oai:zenodo.org:5570657
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5570656
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
AID: Advanced Integral Digitalization
wiseCIO: web-based intelligent service engaging with Cloud Information Outlet
ACTiVE: accessible contextual traceable information for Vast Engagement
ACMD: Archival Content Management & Delivery
AID to DATA ~ Digital Archiving and Transformed Analytics -- Advanced Integral Digitalization via Machine Learning Automata
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5570569
2021-10-15T01:48:29Z
user-gateways-2021
Chuah, Joon Yee
Rosenberg, Jake
Strmiska, Keith
Stubbs, Joe
Cleveland, Sean
McLean, Jared
2021-10-14
<p>The Tapis framework provides to researchers a hosted, web-accessible, application programming interface (API) for managing data and executing software on cloud, high-throughput and high-performance cyberinfrastructure. Tapis aims to reduce the expertise and effort required to utilize advanced computing technologies by abstracting underlying access and security protocols as well as scheduling interfaces used by these different resources. More than 15 actively funded projects leverage the Tapis project to accelerate their research, and many more use Tapis without formal engagement; over 50,000 OAuth clients have been registered in the platform since 2015.<br>
<br>
Still, utilizing Tapis requires web programming knowledge that most scientists and engineers do not have. Large, well-funded efforts such as CyVerse or DesignSafe employ multiple, full-time web developers over several years to build full-featured web portals on top of Tapis. These large projects must also concern themselves with the administration and maintenance of servers where the web applications run. This presents a difficult challenge for individual labs and small research groups that simply want to provide an intuitive, graphical web interface for their research analysis.<br>
In response, the Tapis project is developing Tapis UI, a generic, customizable, open source, web-accessible graphical interface to the Tapis API. Tapis UI is intended for small teams and individual researchers that do not have dedicated web developers on staff. Tapis UI employs a novel, “serverless” architecture that can be hosted directly out of GitHub pages, meaning that a research team can run a customized version of the application without having to manage any servers.<br>
<br>
The architecture of Tapis UI strives to meet the operational requirements and constraints of small science gateway teams. As a serverless web application it requires no dedicated backend or hosting infrastructure. Tapis UI can be deployed as a science gateway by forking the repository, configuring it to use an associated Tapis tenant and running a simple deployment script that publishes to GitHub pages. The demonstration instance of Tapis UI which accesses Texas Advanced Computing Center resources is deployed to https://tapis-project.github.io/tapis-ui.<br>
<br>
For science gateway web developers that are interested in contributing or customizing Tapis UI, the project has a layered approach that provides abstractions of Tapis REST endpoints. The lowest layer of Tapis UI is the @tapis/tapis-typescript NPM package. This package provides TypeScript object definitions and API bindings that are automatically generated from Tapis OpenAPI specifications. tapis-typescript can be used in any web framework, such as Angular or Vue. The Tapis UI application is built in React with a tapis-hooks layer that provides stateful data abstractions of tapis-typescript API calls. Tapis-hooks objects such as file listings or application definitions are represented by tapis-ui components that provide user interface building blocks. Finally, the application composes tapis-ui components into the deployed application with interfaces for authentication, system and file navigation, application launching and job management.<br>
<br>
The Tapis UI project provides a low-requirements pathway to help researchers stand up science gateways for collaborating and reducing the time to science using the Tapis APIs.</p>
https://doi.org/10.5281/zenodo.5570569
oai:zenodo.org:5570569
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5570568
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
gateway
serverless
web
react
api
javascript
typescript
Tapis UI - A Rapid Deployment Serverless Science Gateway Built on the Tapis API
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5570267
2021-10-15T01:48:28Z
user-gateways-2021
Olshansky, Alex
Hayes, Cassandra
Kulkarni, Chaitra
Kee, Kerk
2021-10-14
<p>As the field of Science Gateways (SG) and cyberinfrastructure (CI) advance further, outreach and workforce development have become increasingly necessary to sustain their growth. Opinion leaders, or SG/CI influencers, play a fundamental role in accelerating the adoption and diffusion of SG/CI and broadening its workforce.<br>
Based on thematic analysis of 132 interviews with SG/CI practitioners, including domain scientists, computational technologists, and supercomputing center administrators, among others, we identified three key aspects of SG/CI influencers: who they are, what qualities they possess, and how they communicate. From these insights, we explain why they are successful at influencing others to adopt SG/CI, and we offer a prescriptive model for better SG/CI communication, message design, and influencer development in a crucial time for accelerating user adoption and diffusion of SG/CI.<br>
First, addressing who influencers are, we find that influencers exist across different roles (i.e., SG/CI users, developers, administrators, and outreach educators). Influencers tend to facilitate growth within SG/CI at two levels: (a) They play a mentoring role (e.g., professors, advisors) to help individuals grow, and (b) they are active and visible in a particular academic or professional field to help the community grow. Fundamentally, they are influencers because they introduce SG/CI to others and pass along essential knowledge.<br>
Second, we categorized what set of qualities influencers possess into two main types: boundary spanning [1] and reputational [2]. Within the SG/CI community, individuals with boundary-spanning qualities cross organizational or cultural boundaries to facilitate interactions and the exchange of knowledge between groups and possess qualities such as being visionaries, innovative, and cross-disciplinary. Reputational qualities of influencers include being respected, trusted, and regarded as distinguished or prolific in the field. Thus, influencers are opinion leaders because, as diffusion of innovations suggests, they can lead people’s opinions by their reputations (e.g., village elders, religious leaders, etc., in other social studies) [2].<br>
Finally, understanding how influencers are successful at persuading others to adopt SG/CI can provide a foundation for better communication practices and SG/CI promotional message design. We find that there are three dimensions to influencer communication, including the skills of (a) articulating and explaining what SG/CI is; (b) expressing why SG/CI is important for science; and (c) projecting passion and excitement for SG/CI.<br>
Thus, SG/CI messengers will be most effective if they possess (a) mentoring roles and (b) are active and visible members of an academic or professional community. Influencers in these roles facilitate growth within SG/CI at the individual and community levels. In summary, SG/CI messengers need to possess the boundary-spanning and reputational qualities described previously, combined with message design that includes three main elements: (a) effectively conveying what SG/CI is; (b) expressing why SG/CI is important; and (c) expressing emotion, passion, and excitement for SG/CI. To overcome the sometimes unavoidable “spray and pray” approach to SG/CI promotion, we suggest that a successful diffusion strategy would target users who are “ready to grow” with SG/CI.<br>
<br>
REFERENCES<br>
[1] Burt, R. S. (1992). Structural Holes. Harvard University Press.<br>
[2] Rogers, E. M. (2003). Diffusion of Innovations (5th ed.). New York: Free Press.</p>
https://doi.org/10.5281/zenodo.5570267
oai:zenodo.org:5570267
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5570266
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
cyberinfrastructure
diffusion of innovations
influencers
opinion leaders
science gateways
technology adoption
The Characteristics of Influencers and Opinion Leaders of Science Gateways and Cyberinfrastructure
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5570662
2021-10-15T01:48:29Z
user-gateways-2021
Weisweiler, Nina
Strecker, Dorothea
Schabinger, Rouven
Bertelmann, Roland
Elger, Kirsten
Ferguson, Lea Maria
Kindling, Maxi
Nguyen, Thanh Binh
Pampel, Heinz
Petras, Vivien
Schnepf, Edeltraud
Semrau, Angelika
Trofimenko, Margarita
Ulrich, Robert
Vierkant, Paul
Wang, Yi
Witt, Michael
2021-10-14
<p>re3data is the global registry for research data repositories [1]. As of September 2021, the service lists over 2700 digital repositories across all scientific disciplines and provides an extensive description of repositories based on a detailed metadata schema, the updated version of which was recently published [2]. The service promotes open science practices and the visibility of science-driven open infrastructures. A variety of funders, publishers, and scientific organizations around the world refer to re3data within their guidelines and policies, recommending the service to researchers looking for appropriate repositories for storage and discovery of research data [3].<br>
<br>
The re3data web interface and API function as an important Science Gateway, allowing its end users and other services to access the largest index of research data repositories in the world [4]. Since January 2020, the re3data COREF project receives funding from the German Research Foundation (DFG) to further develop and enhance the registry service [5].<br>
<br>
The planned presentation will introduce the re3data service and give an overview of the ongoing project work, including a short introduction to the recently published “Conceptual Model for User Stories” for re3data [6]. The editorial process for the inclusion of repositories in re3data will be explained and participants will learn how to submit new entries and suggest changes. Participants will also learn how to retrieve information and metadata from re3data.<br>
<br>
[1] Pampel, H. et al. (2013) ‘Making Research Data Repositories Visible: The re3data.org Registry’, PLoS ONE. Edited by H. Suleman, 8(11), p. e78080. doi: 10.1371/journal.pone.0078080.<br>
<br>
[2] Strecker, D. et al. (2021) ‘Metadata Schema for the Description of Research Data Repositories: Version 3.1’. doi: 10.48440/RE3.010.<br>
<br>
[3] Kindling, M. et al. (2017) ‘The Landscape of Research Data Repositories in 2015: a Re3data Analysis’, D-Lib Magazine, 23(3/4). doi: 10.1045/march2017-kindling.<br>
<br>
[4] Witt, M. et al. (2019) ‘Connecting Researchers to Data Repositories in the Earth, Space, and Environmental Sciences’, in Manghi, P., Candela, L., and Silvello, G. (eds) Digital Libraries: Supporting Open Science. Cham: Springer International Publishing, pp. 86–96. doi: 10.1007/978-3-030-11226-4_7.<br>
<br>
[5] https://coref.project.re3data.org/project<br>
<br>
[6] Vierkant, P. et al. (2021): re3data Conceptual Model for User Stories’. doi: 10.48440/re3.012</p>
https://doi.org/10.5281/zenodo.5570662
oai:zenodo.org:5570662
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5570661
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Research Data Repositories
Repository Registry
Index
re3data COREF Project
Introducing re3data – the Registry of Research Data Repositories
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5570117
2021-10-15T01:48:27Z
user-gateways-2021
Shao, Danying
Kellogg, Gretta
Nematbakhsh, Ali
Kuntala, Prashant
Mahony, Shaun
Pugh, B Frank
Lai, William
2021-10-14
<p>Reproducibility is one of the cornerstones of scientific research. However, reproducibility has been a longtime challenge across many scientific fields [1-3]. These difficulties arise from complexities in experimental and bioinformatic workflows that diverge over time, across different operators, and often with limited versioning [4-6]. In the field of genomics, collections of massive datasets that can be parsed in many ways has added to the reproducibility challenge [7-11]. What is needed is systematic metadata capture and management software that is tailored to (epi)genomic data collection.<br>
In general, a genomic project is composed of two distinct but interrelated components: ‘wet-bench’ biochemistry experiments and ‘dry-bench’ bioinformatic analysis. In wet-bench experiments: sample type (human tissue biopsy, yeast, etc.), reagents (catalogue number, wash buffer recipe, etc.), growth environment (log growth, % confluence, etc.), and experimental protocols (ChIP-seq, Western blot, etc.) are examples of critical metadata that need to be captured. Minor variations in these experimental components can result in distinct experimental outcomes [12, 13]. Confounding these issues is the traditional reliance on storing experiment metadata in hand-written notebooks, which are not searchable and often incomprehensible to a third party [14]. Consequently, it can be difficult to follow and accurately reproduce an experimental protocol from start to finish.<br>
Similarly, in bioinformatics analysis, different analytical tools, software versions, and tool parameters may generate different analytical outcomes. While progress has been made in tracking and reproducing informatic workflows (e.g., Pegasus, Galaxy), these platforms are generally limited to reproducing software workflows [15, 16]. To our knowledge, there are no free open-source platforms that manage entire experimental pipelines, from wet-bench experiments to bioinformatic analyses. Laboratory information management systems (LIMS) typically focus on inventory management and sample tracking and have limited capability to record experimental metadata, data analysis parameters, and for interfacing with project team members [17-19]. Although there have been several commercial efforts in this direction, they can be limited in scope (e.g., only tracking sequencing reagents) and/or rather expensive to small academic laboratories [20, 21]. These platforms typically have limited integration between data production and a wide range of experimental metadata.<br>
We developed the Platform for Epigenomic and Genomic Research (PEGR), a web-based project management platform that integrates wet-bench sample tracking with downstream bioinformatic workflows (managed by Galaxy.org workflows) to address the challenge of experimental reproducibility. PEGR provides end-to-end management of (epi)genomics projects from initial reagent usage through to the reproducible generation of publication-quality figures (Figure 1). PEGR logs sample information and experimental details. It manages metadata produced by Galaxy or any other workflow system and provides sample reporting and visualization. It supports Findable, Accessible, Interoperable and Reusable (FAIR) best practices by tracking the metadata throughout the entire sequencing pipeline to enable experimental reproducibility [22]. Previously, we presented a proof-of-concept vision of PEGR software [23]. We now present a fully functional and open-source version of PEGR software, including streamlined experiment tracking with reagent barcoding, flexibility for multiple heterogeneous bioinformatics workflows, and a project management approach. PEGR is freely available and open source at https://github.com/seqcode/pegr.</p>
https://doi.org/10.5281/zenodo.5570117
oai:zenodo.org:5570117
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5570116
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
genomics
Galaxy
high-throughput sequencing
data management system
reproducibility
science
Platform for EpiGenomic Research (PEGR): A flexible management platform for reproducible epigenomic and genomic research
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5585404
2021-10-21T01:48:56Z
user-gateways-2021
Taswell, Carl
2021-10-20
<p>The Nexus-PORTAL-DOORS-Scribe (NPDS) cyberinfrastructure provides a ‘who what where’ diristry-registry-directory system for identifying, describing, locating and linking things on the internet, web and grid. PORTAL registries identify resources with unique labels and lexical tags in a manner compatible with the lexical web. DOORS directories specify locations and semantic descriptions for these identified resources in a manner compatible with the semantic web. PORTAL registries and DOORS directories were designed to be analogous to IRIS registries and DNS directories. This original design has been enhanced with Nexus diristries to provide integrated services combining the functions of both PORTAL registries and DOORS directories. The principles for the PORTAL-DOORS Project (PDP) were first proposed and described by Taswell in 2006 as the foundation for work on PDP and the NPDS cyberinfrastructure. This work on PDP and NPDS has been continuously available since 2007 from a publicly accessible web site at <br>
www.PORTALDOORS.org. The 2006 PDP principles were renamed the 2019 DREAM principles with the acronym DREAM for "Discoverable Data with Reproducible Results for Equivalent Entities with Accessible Attributes and Manageable Metadata". PDP-DREAM software, available as open source software at Github, provides a comprehensive suite of software for management of the data repositories in the NPDS cyberinfrastructure. A version of PDP-DREAM software has been implemented with Microsoft platform technologies (C#, SQL Server, IIS Server), has been tested on the previews for Net 6, and will be fully validated for compatibility with Net 6 concomitant with its general availability release later in 2021. </p>
https://doi.org/10.5281/zenodo.5585404
oai:zenodo.org:5585404
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5585403
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
data stewardship
metadata management
PORTAL-DOORS project
NPDS cyberinfrastructure
DREAM principles
PDP-DREAM software
The NPDS Cyberinfrastructure
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5570184
2021-10-15T01:48:27Z
user-gateways-2021
Winkel, Brian
2021-10-14
<p>SIMIODE – Systemic Initiative for Modeling Investigations Using Differential Equations is a non-profit organization devoted to supporting teaching the pivotal STEM course in differential equations using modeling first and throughout.<br>
<br>
We describe the rich benefits from SGCI associations, beginning with three members of our team attending the SGCI “Boot Camp” in Indianapolis in May 2019 through our current status post NSF Grant period. SGCI provided SIMIODE with an eye-opening experience for our team and forced us to think about important issues we either wanted to ignore or to which we were oblivious. We returned from the workshop with many good intentions and plans and have worked on a number of them – never enough though, e.g., we conducted a survey of our membership as to how they found us, why they came to us, what keeps them involved, what they like, what they want improved, etc. in which we learned a great deal. We learned about more effective ways to use Analytics and to cold call – although the latter still scares us! We realized a customer viewpoint which often is not natural for the professorate.<br>
<br>
Most important, we learned about Sustainability or “What do you do after the grant runs out?” at our sessions in Indianapolis. Some of what we heard was filled with non-traditional words for academics like “sales” and “marketing.” SGCI provided ongoing stimulating consulting which got us to face these realities. We have since taken a fresh look at or initiated three sustainability efforts: (1) SCUDEM – an undergraduate and high school student team challenge which has grown, (2) EXPO – an international virtual conference, and (3) a virtual textbook as a complement to our online OER teaching resources. Currently we are considering developing a SIMIODE MOOC, featuring our virtual text and offering all our OER materials in support of the course.<br>
<br>
The memory of discussions surrounding building community are with us and we are still trying to do so, this time with the richer structure and friendlier interfaces of QUBES Hub as we migrate all of SIMIODE – members and resources into QUBES. We valued our sessions on user interfaces and will now incorporate them in our migration of the entire 4,800 member SIMIODE community, hundreds of resources, and community opportunities to QUBES Hub. Once SIMIODE is in the QUBES community we plan to do more focused community building, e.g., we have a high school coordinator and a community college coordinator who are ready to work on engagement activities, and to take advantage of other social and community actions afforded by the rich QUBES community features.<br>
<br>
We feel strongly about sharing news of what SGCI has done for our SIMIODE gateway.</p>
https://doi.org/10.5281/zenodo.5570184
oai:zenodo.org:5570184
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5570183
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
sustainability
community
student challenge
virtual conference
textbook
outreach
analytics
SGCI Helps Shape SIMIODE Sustainability and More
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5570056
2021-10-15T01:48:28Z
user-gateways-2021
Padmanabhan, Anand
Xiao, Zimo
Vandewalle, Rebecca
Michels, Alexander
Wang, Shaowen
2021-10-14
<p>Geospatial research and education have become increasingly dependent on cyberGIS, defined as geographic information science and systems based on advanced cyberinfrastructure (CI), [1] to tackle computation and data challenges. However, the use of advanced cyberGIS capabilities has typically been constrained to a small set of research groups who have the technical expertise of using CI resources. Over the past few years CyberGIS-Jupyter [2,3] has been developed to provide access to cyberGIS capabilities through an easy-to-use Jupyter Notebook interface which has made cyberGIS more accessible. For many cyberGIS and geospatial applications accessing CI resources needed for solving complex problems at scale. However, leveraging CI resources for geospatial application is challenging both due to the steep learning curve and lack of appropriate tools. CyberGIS-Compute fills this gap by providing an easy-to-use middleware tool for using and contributing geospatial application codes that leverage CI resources. This substantially lowers the learning curve for both geospatial users and developers to access cyberGIS capabilities at scale. CyberGIS-Compute is backed by Virtual ROGER (Resourcing Open Geospatial Education and Research); a geospatial supercomputer with access to a number of readily available popular geospatial libraries.<br>
<br>
With CyberGIS-Compute we have designed an easy-to-use middleware and associated Python SDK to provide access to CyberGIS capabilities, allowing geospatial applications to easily scale and employ advanced cyberinfrastructure resources. This presentation will first describe the basics of CyberGIS-Jupyter and CyberGIS-Compute, then introduce the Python SDK for CyberGIS-Compute with a simple example. Then, we will take multiple real-world geospatial applications use-cases like spatial accessibility and wildfire evacuation simulation using agent based modeling. Lastly, we will also descrive mechanism to contribute applications to the CyberGIS-Compute framework.</p>
https://doi.org/10.5281/zenodo.5570056
oai:zenodo.org:5570056
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5570055
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
cyberGIS
Jupyter
GIScience
Cyberinfrastructure
Enabling computationally intensive geospatial research on CyberGIS-Jupyter with CyberGIS-Compute
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5565544
2021-10-14T01:48:25Z
user-gateways-2021
Shawn Rice
Kevin Colby
2021-10-13
<p>Under a shared campus cluster model, with many research groups investing distinct amounts and annual new hardware acquisitions, management of users and resources can become very complex. At Purdue, the Research Computing division set out in 2011 to design a cluster management solution to empower faculty to manage access to their own purchased resources. This system needed to also allow center staff to quickly provision resources, provide accurate accounting and tracking of faculty purchases over time, and provide a single location for data about the cluster program.<br>
<br>
This need was not unique to Purdue and, as more institutions enter into HPC services, the growing interest in similar systems is evidenced by new applications such as ColdFront. Perhaps most critical for the many “condo” style HPC centers, purchase of services can be tracked alongside other access information. This allows self-service ordering for internal accounting environments where purchases must go through organizational approvals, and all center purchase history may be correlated to allocations or other information. For Purdue, the user allocation/authorization alone was estimated as saving 6.7 hours of center staff time per week over the first 18 months.<br>
<br>
As operations at Purdue expanded, the internal portal took on many aspects of the operations of an HPC center beyond resource allocation and management. These aspects are seldom topics of discussion but often just as important to the success of an HPC center and include things such as customer relationship management, news, online presence, and documentation. Halcyon not only provides tools for these but integrations between them, allowing for such actions as emailing outage or maintenance news articles to users of a specific resource and auto-archiving of documentation when a resource is retired.<br>
<br>
While the original Purdue Research Computing portal served its purpose well over the years, making changes and adding features could be a cumbersome task. Reconstituting the portal features into a modular, extensible framework was an essential next step to allow for the growth and maintenance necessary. Re-architecture focused on:<br>
<br>
1) Abstraction of environment assumptions and settings<br>
2) Restructure for modularity and easier extension<br>
<br>
To address (1), standard local language for many aspects , site content, and various hard-coded site settings were pulled into the internal database, and interfaces were created for editing these. Other environment site settings were incorporated through new configuration files stored in a central location.<br>
<br>
To address (2), the code was divided into modules—logical groupings based on the data being handled, tasks, and interfaces. Third-party services are integrated as plugins, allowing modules to ingest data from sources such as LDAP or REST APIs. Essential to interactions with external services, modules implement REST APIs conforming to the OpenAPI v3 specification.<br>
<br>
Halcyon provides a platform for integration between various self-service and admin-only elements such as resource allocation, access, communications, and knowledge. Halcyon allows for easier targeting and management of information critical to end users, and reduces time and effort on the part of center staff by integrating these missions.</p>
https://doi.org/10.5281/zenodo.5565544
oai:zenodo.org:5565544
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5565543
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Halcyon
high performance computing center operations
resource allocation
customer relations
news management
documentation management
Halcyon: Unified HPC Center Operations
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5570296
2021-10-15T01:48:30Z
user-gateways-2021
Jamthe, Anagha
Stubbs, Joe
Packard, Mike
Chuah, Joon Yee
Looney, Julia
Curbelo, Gilbert
2021-10-14
<p>Over the last 15 years, a great investment has been made in providing web access to advanced computing resources for computational research. Historically, such web applications, or “science gateways”, have enabled users to run analyses asynchronously on remote systems. More recently, as a growing number of disciplines bring big data techniques to bear on fundamental problems, interactive computing modes such as Jupyter Notebooks have gained tremendous popularity for the ease within which one can perform a range of computational tasks in real time, including data cleansing, analysis, visualization, and post-processing. Nevertheless, to date there is not a national-scale offering that provides production-grade, scalable interactive computing that integrates deeply into the academic cyberinfrastructure (CI) ecosystem.<br>
<br>
The Texas Advanced Computing Center recently launched the Scientific and Interactive Computing (SCINCO) project to provide a hardened, production-grade Jupyter-notebooks-as-a-service platform capable of utilizing advanced storage and computing CI. Launched in 2020, SCINCO builds upon a custom JupyterHub offering developed at TACC since 2015. TACC’s custom JupyterHub supports more than 1600 users running across five different clusters at TACC. It has become a crucial component of independently funded gateway projects such as: DesignSafe Cyber-Infrastructure, Synergistic Data Discovery Environment, 3DEM and HETDEX. Researchers from Astronomy, Biology, Climate Science, Neuroscience, etc. are leveraging SCINCO to analyze big data, implement computational models, disseminate results and train researchers.<br>
<br>
SCINCO tackles scalability challenges faced by TACC’s original JupyterHub by combining state-of-the-art open source container technologies such as Kubernetes with a customizable JupyterHub to deliver a platform that is capable of serving dozens or even hundreds of projects (i.e., “tenants”) with a minimal developer overhead. SCINCO executes notebook containers across a shared Kubernetes cluster running on bare-metal that uses namespaces for isolating different projects from each other. SCINCO inherits all the advantages of Kubernetes including scalability, reproducibility and portability while significantly reducing the administrative overhead associated with managing servers. For example, TACC’s original JupyterHub utilized thirty 16GB virtual machines arranged into 5 different clusters; SCINCO can provide the same computational power with better load balancing using just 2 bare metal, physical servers with 256 GB of RAM each. Customizations (Figure 1) made within SCINCO support user-ID and group-ID lookup for an associated username, when launching a user’s notebook server. Additionally, the SCINCO design dynamically determines which persistent volumes the user has access to at runtime. SCINCO also allows for dynamic configuration changes at runtime, such as adding or removing volume mounts for a specific user group, notebook images, admin users, etc.<br>
<br>
The SCINCO project is currently developing an administrative portal, which will empower project administrators to manage their own JupyterHub clusters without direct support from the project staff. Project admins will get access to a dashboard where they can manage users, custom images, and volume mounts for individual users. They will be able to start, stop and start a shell session on the user's notebook servers. The SCINCO project made its initial production release in July of 2021 and the administrative portal will be released for early adopters in Fall 2021.</p>
https://doi.org/10.5281/zenodo.5570296
oai:zenodo.org:5570296
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5570295
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
JupyterHub
Kubernetes
Cyberinfrastructure
SCINCO
Interactive computing
Enriching scientific and interactive computing with project SCINCO: JupyterHub on Kubernetes
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5570280
2021-10-15T01:48:29Z
user-gateways-2021
Canon, Shane
Christianson, Danielle
Duncan, William
Eloe-Fadrosh, Emiley
Fagnan, Kjiersten
Hays, David
Huntemann, Marcel
Lebedeva, Sofya
Miller, Kayd
Miller, Mark
Mouncey, Nigel
Mungall, Chris
Reddy, Tbk
Rudolph, Marisa
Sarrafan, Setareh
Sundaramurthi, Jagadish Chandrabose
Unni, Deepak
Vangay, Pajau
Wood-Charlson, Elisha
Ahmed, Faiza
Baumes, Jeffrey
Davis, Brandon
Anubhav, Fnu
Borkhum, Mark
Bramer, Lisa
Corilo, Yuri
Lipton, Mary
Mans, Douglas
McCue, Lee Ann
Millard, David
Piehowski, Paul
Prymolenna, Anastasiya
Purvine, Samuel
Richardson, Rachel
Smith, Montana
Stratton, Kelly
Babinski, Michal
Chain, Patrick
Davenport, Karen
Flynn, Mark
Hu, Bin
Kelliher, Julia
Li, Po-E
Lo, Chien-Chi
Jackson, Elais Player
Shakya, Migun
Xu, Yan
Drake, Meghan
Martin, Stanton
Wilson, Bruce
Winston, Donny
2021-10-14
<p>The cross-cutting nature of microbiome research in environmental sciences, health, agriculture, energy, and natural and built environments, and the velocity at which microbiome data are generated has far outpaced current data infrastructure. Resources and solutions for collection, processing, and distribution of these data in an effective, uniform, accessible, and reproducible manner, are lacking even at the largest data centers. The National Microbiome Data Collaborative (NMDC) is a pilot initiative launched to support microbiome data exploration and discovery through a collaborative, integrative science gateway. The NMDC is tackling infrastructure challenges in microbiome data science by making use of distributed computational resources available across four Department of Energy National Laboratories, Lawrence Berkeley National Laboratory (LBNL), Los Alamos National Laboratory (LANL), Pacific Northwest National Laboratory (PNNL) and Oak Ridge National Laboratory (ORNL).<br>
The NMDC team aims to deliver a set of unique microbiome data science capabilities, which include: leveraging existing ontology mapping software and curation resources to enable automated annotation of standardized metadata; developing workflows for metagenome, metatranscriptome, metaproteome, and metabolomics data processing leveraging HPC systems, and integrating the execution of these pipelines to produce NMDC-compliant data products; developing data registration, indexing, and access services to link data through a suite of publicly available APIs; and developing communication and sustainability strategies to assess current and future needs and capabilities to empower users, and promote the NMDC to the larger scientific community.<br>
To ensure that the NMDC data ecosystem supports and evolves with the needs of the microbiome research community, the NMDC team uses a community-centered design approach to seek feedback from the scientific research community throughout its phases of iterative development. Community feedback has informed the priorities of the NMDC data standards, bioinformatic workflows, and engagement activities, but perhaps the most visible contributions of the research community can be seen through the features and enhancements on the NMDC data portal. Here, we present an overview of the NMDC mission and vision, the distributed data infrastructure, and our community-centered design approach in developing the NMDC data portal.</p>
https://doi.org/10.5281/zenodo.5570280
oai:zenodo.org:5570280
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5570279
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
microbiome
data science
data infrastructure
science gateway
The National Microbiome Data Collaborative: a data science ecosystem for microbiome research
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5570274
2021-10-15T01:48:29Z
user-gateways-2021
Stubbs, Joe
Jamthe, Anagha
Black, Steve
Cleveland, Sean
Looney, Julia
2021-10-14
<p>This tutorial will focus on providing attendees exposure to state-of-the-art techniques for portable, reproducible research computing, enabling them to easily transport analyses from cloud to HPC resources, share computations with collaborators and disseminate final results to communities of interest. We will introduce various open source technologies, including Jupyter, Docker and Singularity, and show how to utilize these tools within the NSF-funded Tapis v3 platform, an Application Program Interface (API) for distributed computation. After a brief introduction to the open source technologies above, this tutorial will be focused on hands-on exercises in which the attendees will build a portable analysis that can be seamlessly moved to different execution environments, including a small virtual machine and a national-scale supercomputer. Using techniques covered in the tutorial, attendees will also be able to easily share their results with one or more additional users. The tutorial will make use of a specific machine learning image classifier analysis to illustrate the concepts, but the techniques introduced can be applied to a broad class of analyses in virtually any domain of science or engineering.<br>
<br>
Description and Format:<br>
<br>
TACC training accounts will be set up for all registered attendees, which will have access to allocations on XSEDE cloud systems and one or more HPC resources such as TACC’s Stampede2 or Frontera. The tutorials will be hands-on exercises, where the attendees will interact with the Tapis v3 services within a Jupyter notebook. Registered attendees will be notified with their account details closer to the tutorial date. All the course materials will be published on github pages so the attendees will have access to them during and after the tutorial. We will have enough proctors throughout the session, who will help attendees through slack or breakout sessions. Proposed tutorial schedule is as shown in Table 1.<br>
<br>
Learning Outcomes:<br>
<br>
In this tutorial, attendees will gain an understanding of the concepts of using container technology (Docker, Singularity) for portable analysis, programmatically executing analyses in both Cloud and HPC environments using an API, interacting with and visualizing the results in Jupyter notebooks and sharing results with collaborators. By the end of this workshop attendees will be able to:<br>
• Have a basic understanding of Docker and Singularity containers in relation to computational research.<br>
• Use Tapis to access HPC storage and compute resources in a programmatic and reproducible way.<br>
• Utilize Jupyter notebooks for interactive computing.<br>
• Use Tapis to share results with others.<br>
<br>
Content Level and Length: Beginner 70%, Intermediate 30%<br>
3 hours.<br>
<br>
Audience Prerequisites: Basic familiarity with Jupyter notebooks and Python will be helpful. Attendees must use their own laptop for the hands-on part of the tutorial.</p>
https://doi.org/10.5281/zenodo.5570274
oai:zenodo.org:5570274
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5570273
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Jupyter
Docker
Singularity
Tapis
Machine Learning
Portable, Scalable, and Reproducible Scientific Computing: from Cloud to HPC
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5569498
2021-10-15T01:48:28Z
user-gateways-2021
Pierce, Marlon
Abeysinghe, Eroma
Christie, Marcus
Coulter, Eric
Marru, Suresh
Pamidighantam, Sudhakar
Quick, Rob
Ranawaka, Isuru
Wang, Jun
Wannipurage, Dimuthu
2021-10-14
<p>Tutorial length: 90 minutes<br>
Skill level: Any<br>
Technology requirements: None<br>
<br>
Since its inception in the Apache Software Foundation in 2011, Apache Airavata has evolved from a middleware system for supporting science gateway workflow executions to a comprehensive set of semi-autonomous subsystems that can be used to provide solutions for a wide range of science gateways. This tutorial provides a series of lightning overviews of each of these major subsystems and illustrates their usage in different science gateways.<br>
<br>
The Virtual Cluster System provides a mechanism for creating dynamic virtual clusters on OpenStack-based clouds. These virtual clusters can be used to execute both containerized serial and parallel scientific applications, providing users and gateways with their own private clusters. They can also be deployed with the JupyterHub interface, providing on-demand access to JupyterLab servers.<br>
<br>
Apache Airavata’s metadata and workflow scheduling infrastructure (the original core of Apache Airavata) builds on Apache Helix and Airavata’s own metadata management system to manage the full lifecycle for job executions, capturing the metadata needed to audit and reproduce execution outcomes.<br>
<br>
The Airavata Django Portal provides an out-of-the-box end user environment for all of the Apache Airavata middleware subsystems. Through the use of the Wagtail Content Management System and the Django Apps extension mechanism, the Airavata Django Portal can be extensively customized to create unique user interfaces that meet the usability requirements of different research communities.<br>
<br>
Airavata Custos encompasses Apache Airavata’s security services for managing user accounts; federated authentication; role, group, and attribute-based authorization; sharing and permissions; and resource credential (secrets) management. Custos services can be used independently of other Airavata services and can be integrated into other science gateway platforms such as Galaxy through the Custos API.<br>
<br>
Airavata Managed File Transfer (MFT) subsystem supports data transfer and storage endpoint management for users’ local storage systems, parallel file and mass storage systems operated by research computing systems, and cloud storage systems such as Amazon S3, Google Drive, and Box. Central MFT services and locally deployed agents can support emerging high performance transfer protocols and provide optimized transfers that are decoupled from gateway middleware.<br>
<br>
Airavata Data Lake provides secured, controlled access to data from a wide range of sources including scientific instruments, results of computations, and user and machine annotated metadata. The Airavata Data Lake system can orchestrate data movements managed by Airavata MFT and execute data pipelines to extract searchable metadata. Collectively these components enable processing of data, the movement of data from the data sources to central storage points, and distribution of data to respective authorized users.<br>
<br>
The Science Gateways Platform as a Service (SciGaP) is an operational deployment of the Airavata software stack that is run by the Indiana University Cyberinfrastructure Integration Research Center for over 40 client gateways.<br>
<br>
We conclude the tutorial with a discussion of future directions for the Apache Airavata software stack and gateways in general, including greater support for FAIR science and secure integration of a greater number of edge systems.</p>
https://doi.org/10.5281/zenodo.5569498
oai:zenodo.org:5569498
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5569497
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Science gateways
Scientific workflows
Cybersecurity
Managed file transfer
Web portal
Open source software
An Overview of the Apache Airavata Software Stack for Science Gateways
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5565087
2021-10-13T01:48:32Z
user-gateways-2021
Qi Sun
Ali Nematbakhsh
Prashant Kuntala
Gretta Kellogg
Frank Pugh
William Lai
2021-10-12
<p>The ability to aggregate experimental data analysis and results into a concise and interpretable format is a key step in evaluating the success of an experiment. This critical step determines baselines for reproducibility and is a key requirement for data dissemination. However, in practice it can be difficult to consolidate data analyses that encapsulates the broad range of datatypes available in the life sciences. We present STENCIL, a web templating engine designed to organize, visualize, and enable the sharing of interactive data visualizations. STENCIL leverages a flexible web framework for creating templates to render highly customizable visual front ends. This flexibility enables researchers to render small or large sets of experimental outcomes, producing high-quality downloadable and editable figures that retain their original relationship to the source data. REST API based back ends provide programmatic data access and supports easy data sharing. STENCIL is a lightweight tool that can stream data from Galaxy, a popular bioinformatic analysis web platform. STENCIL has been used to support the analysis<br>
and dissemination of two large scale genomic projects containing the complete data analysis for over 2,400 distinct datasets. Code and implementation details are available on GitHub: https://github.com/CEGRcode/stencil</p>
https://doi.org/10.5281/zenodo.5565087
oai:zenodo.org:5565087
eng
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5565086
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Genomics
Sequencing
Interactive visualization
STENCIL: A web templating engine for visualizing and sharing life science datasets
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5570225
2021-10-15T01:48:29Z
user-gateways-2021
Chalker, Alan
Settlage, Rober
Hudak, David
2021-10-14
<p>Open OnDemand (openondemand.org) is an NSF-funded open-source HPC platform currently in use at over 200 HPC centers around the world. It is an intuitive, innovative, and interactive interface to remote computing resources. Open OnDemand (OOD) helps computational researchers and students efficiently utilize remote computing resources by making them easy to access from any device. It helps computer center staff support a wide range of clients by simplifying the user interface and experience.<br>
<br>
This working session is meant to be an open discussion to guide the future roadmap for OOD in the near term, by getting feedback from the science gateways community on the development and integration of applications within OOD.<br>
<br>
Nearly any software application can be made accessible via OOD. The official OOD github repo currently has links to software that appeals to a wide range of scientific disciplines, such as Jupyter, Abaqus, ANSYS, COMSOL, MATLAB, RStudio, Tensorboard, QGIS, VMD, RELION, STATA and Visual Studio. The OOD development team is also aware of planned or ongoing work to integrate many other software packages and platforms, including many that are prominent within the Science Gateways community, such as Galaxy, TAPIS, Globus, and Pegasus.<br>
<br>
The OOD team has held more generic ‘Birds of a Feather’ sessions at multiple PEARC and SC conferences in recent years as well as regular online webinars, each of which have seen significant attendance of many dozens of people. As the Gateways series of conferences has historically had many attendees from locations that utilize OOD, as well as attendees from many of the most prolific science gateways development teams, holding a working session to bring together some of these people is a natural fit for the conference.<br>
<br>
This proposed working group session is meant to follow that same general format as utilized at PEARC and SC in the past and be an open discussion to guide the future roadmap for OOD with regards to application development and integration. In following with our previous webinars and BoFs, key outcomes include a summary of the comments and discussion points, which include reports on installation and utilization issues from locations that currently have it installed, as well as a list of feature requests and development prioritizations. The initial slides that the organizers will present, briefly providing an OOD overview, roadmap summary and items of note will also be posted on our website for review by the community.</p>
https://doi.org/10.5281/zenodo.5570225
oai:zenodo.org:5570225
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5570224
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
App Development
Open OnDemand
HPC
Open OnDemand App Development and Integration
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5570615
2021-10-15T01:48:28Z
user-gateways-2021
Barakhshan, Parinaz
Eigenmann, Rudolf
Marrs, Adam
Safronova, Marianna
Arora, Bindiya
2021-10-14
<p>In many applications, ranging from studies of fundamental physics to the development of future technologies, accurate atomic theory is indispensable to the design and interpretation of experiments. Direct experimental measurement of relevant parameters is often infeasible if not impossible.<br>
This paper reports the release of Version 1 of an online atomic portal for high-precision atomic data and computation that provides such information to a wide community of users. Version 1 of the portal provides transition matrix elements, transition rates, radiative lifetimes, branching ratios, hyperfine constants, quadrupole moments, and scalar and dynamic polarizabilities for atoms and ions. Version 1 includes data for the elements and ions Li, Be+, Na, Mg+, K, Ca+, Rb, Sr+, Cs, Ba+, Fr, and Ra+. The atomic properties are calculated using a high-precision, linearized coupled-cluster method.<br>
All values include estimated uncertainties. Where available, experimental values are included, with references to their sources. Data for more elements and properties will be added in the future, with alkaline earth metal atoms planned for the next release. Future updates will also include releases of user-friendly, atomic computational codes. Community input is sought to improve the portal and guide the next stages of the project. The portal includes the requisite user feedback functions for guiding future releases. A site tour and help functions allow new users to orient themselves quickly.<br>
The current portal version shows pre-computed information through an interactive interface, providing instant access to the requested data, equipped with common print and download functionality. Future versions will also allow users to request properties and elements that are rarely needed and information that has not yet been computed. Such requests will invoke the needed computations on-demand and notify the user when complete. For some requests, the response time may be a few seconds, while more advanced calculations require minutes to hours.<br>
The current implementation of the portal uses a static page design, with the data directly embedded in HTML. The pages, written in HTML, CSS, and JavaScript, are created automatically, via python scripts, from the original CSV-formatted physics data. One challenge was the representation of the denominations for electron states, such as 5f6p2 2F05/2, for which LaTeX inserts are generated. Graphs for such properties as polarizabilities make use of the Bokeh visualization package.<br>
The computational codes are written in Fortran 77/90 and have been parallelized using OpenMP for efficient execution on multicore nodes. The code runs consume significant resources, making use of concurrent runs on many nodes of the computational clusters at the University of Delaware, for the computation of properties of multiple elements simultaneously.<br>
The team includes both physicists and computer scientists, driving the design of the portal from both the physics and the software engineering angle, applying best practices for collaboration and software design.</p>
https://doi.org/10.5281/zenodo.5570615
oai:zenodo.org:5570615
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5570614
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
transition matrix elements
transition rates
radiative lifetimes
branching ratios
hyperfine constants
quadrupole moments
scalar polarizabilities
dynamic polarizabilities
Portal for High-Precision Atomic Data and Computation
info:eu-repo/semantics/conferencePaper
oai:zenodo.org:5569451
2021-10-14T19:12:20Z
user-gateways-2021
Ganapathy, Kaushik Ram
Man, Bailey
Lei, Jiaxi
Zaslavsky, Ilya
2021-10-14
<p>Geographically-enabled Agent-based model for COVID-19 Transmission (GeoACT) is a new science gateway developed<br>
to assist K-12 schools in evaluating potential strategies for safe school reopening amid the COVID-19 pandemic. School closures have resulted in learning interruptions, increased inequality and adverse economic impacts, and additional challenges to the social and intellectual development of students. Reopening schools for in-person instruction requires careful planning to avoid serious risks of infection especially with the rise of the more infectious virus variants. The core component of the gateway is an agent-based model simulating viral transmission in the course of daily school activities, under several intervention strategies. The model follows a modified SEIR approach, describing transitions between Susceptible, Exposed, Infected (symptomatic and asymptomatic), and Removed/Quarantined states for student and staff agents. A distinct feature of the model is that risks of viral exposure through both aerosol and droplet transmission are modeled for granularly-defined school activities (learning at desks, group activities, recess, lunch, etc.), which are simulated at 5-min intervals in specific spatial settings. The latter are extracted from school floor plans, including dimensions of classrooms and other school areas, their occupancy, and ventilation characteristics. The model is deployed on the XSEDE Expanse resource and accessed via the Apache Airavata portal. Users can enter school-specific parameters, such as the number of employees and students in each grade, non-pharmaceutical interventions (NPIs) like mask-wearing, improved ventilation,<br>
and cohort separation; and pharmaceutical measures, namely vaccination levels, and testing strategies. An additional module simulates viral transmission on school buses, given different bus types, seating patterns, mask compliance, and whether windows and hatches are open or closed. Model runs generate risk heat maps and histograms that visually demonstrate the impact of the chosen intervention strategies. The preliminary results for selected elementary schools and bus routes showed that efficient reopening plans rely on a combination of mitigation strategies, most importantly minimizing cohort sizes, improving ventilation in enclosed spaces, and maximizing maskwearing compliance. Model results have been validated against outcomes of similar models and infection patterns reported in literature. Empirical model validation is pursued through our ongoing work with the San Diego County Office of Education (SDCOE) on generating and tracking school exposure notifications. Additional ongoing work focuses on improving model performance<br>
(in particular, switching to Julia and the Agents.jl framework from Python with MESA and MESA-GEO packages we used earlier), modeling additional school activities such as athletics, Gateways 2021, October 19-21, 2021 integration with a spatial analysis dashboard, and developing a meta-analysis module on the Airavata portal. We wish to acknowledge guidance and collaboration from San Diego County Health and Human Services Agency, SDCOE, and San Diego Unified School District, as well as support from XSEDE and the Science Gateways Community Institute (SGCI). NSF support (award 2139740) is gratefully acknowledged.</p>
https://doi.org/10.5281/zenodo.5569451
oai:zenodo.org:5569451
Zenodo
https://zenodo.org/communities/gateways-2021
https://doi.org/10.5281/zenodo.5569450
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
COVID-19
School Reopening
Agent-based Model
Gateway
Simulation
Transmission
Geographically-enabled Agent-based Model for COVID-19 Transmission
info:eu-repo/semantics/conferencePaper