There is a newer version of the record available.

Published June 28, 2023 | Version 0.0.2
Software Open

VocPopuli

Description

VocPopuli

VocPopuli is a Python-based software platform for collaborative development of controlled vocabularies (glossaries, taxonomies, or thesairi). VocPopuli's vocabularies are designed to support the collection of FAIR research data.

Basic Functionality

From a technical perspective, VocPopuli is a frontend for a graph database with a GitLab backup which enables people to collaboratively define, version, and organize terms in controlled vocabularies. Once a vocabulary repository has been initialized, new terms can begin being added to the vocabulary by filling out the relevant form in VocPopuli.

Setup

Currently, VocPopuli can be installed locally, while the connection to the database and the GitLab synchronization can be shared. It is recommended to use a virtual environment when running VocPopuli. This can be done using tools such as conda.

Required Software

VocPopuli requires Python 3.9+ to run.

The Python packages needed to run the tool must first be installed in the chosen environment. This can be done by running the following command inside the vocpopuli directory: pip install -r requirements.txt

Infrastructure Details Redis is an in-memory database.

It is used to store some of the necessary user-specific data on the server running VocPopuli with the help of Flask-Session.

Flask-Session also offers support for additional storage. These have not yet been tested.

Redis is also used as a results backend and task broker by Celery.

This is needed for VocPopuli to be able to run some of its functions in the background, without disturbing the user experience.

Celery offers support for additional brokers, and backends. These have not yet been tested.

Vocabularies are stored in a Neo4j database and synchronized with GitLab. To run migrations, you need to download neo4j-migrations.

To set up a local database using docker, set $PATH_TO_DB to point to the folder where you want to keep the database data, and run the following command:

docker run \
    -p 7474:7474 -p 7687:7687 \
    --volume $PATH_TO_DB:/data \
    -e apoc.export.file.enabled=true \
    -e apoc.import.file.enabled=true \
    -e apoc.import.file.use_neo4j_config=true \
    -e NEO4J_PLUGINS=\[\"apoc\"\] \
    --restart=unless-stopped \
    -d neo4j

GitLab Setup

VocPopuli uses GitLab as the database for vocabulary entries. The advantages to that are that the vocabulary owner always has full control over vocabulary access and visibility rights (an administrator can even block VocPopuli's access to it, if desired). In order for VocPopuli to be able to integrate with GitLab, the specific VocPopuli installation needs to be registered as a GitLab application, first. For this, consult the following guide.

The following Redirect URI's should be defined during the application's registration:

https://your-domain-name.com/login/gitlab
https://your-domain-name.com/login/gitlab/authorized

https://your-domain-name.com is a placeholder for the domain of the server hosting VocPopuli. For development purposes, it can be set to http://localhost:5000.

After the app has been registered in GitLab, its Application ID and Secret should be stored on the machine running VocPopuli as environment variables. These should be called OAUTH_CLIENT_ID, and OAUTH_CLIENT_SECRET, respectively. Alternatively, instead of storing the variables as environment variables, they can be hard-coded directly in the Config class inside config.py.

Next, GitLab's user management is used for VocPopuli's user management as well. For that, a GitLab user needs to be selected, who will act as the administrator of all vocabularies managed by a given VocPopuli instance. In the case of a single person using the tool by themselves, this user will become the administrator. The chosen user needs to generate an access token in GitLab which will be used to make some of the API calls used by the tool. The token's scope should be set to api. The access token should be copied, and stored in a manually created file called .env(w/o file name) which should reside in the main vocpopuli directory. The file should contain two text lines which look the following way:

VOCPOPULI_TOKEN = 'your-token-here'
VOCPOPULI_ADMIN_USER_ID = '***'

 

The value of VOCPOPULI_ADMIN is set to the admin user's User ID, which can be found on their profile page (represented only by numbers). The .env file can also be created by copying the .env_template file, and removing the _template suffix.

Neo4j Setup

In addition to GitLab, VocPopuli also relies on a Neo4j graph database instance for fast querying of the vocabularies, and their terms. A Neo4j instance can be set-up either in the cloud or on a dedicated server. Detailed information on the topic can be found on the Neo4j website.

After an instance has been set up, its credentials need to be imported into VocPopuli's configuration. Additional information can be found below.

Configuration

The configuration variables defining the behaviour of a given VocPopuli instance are stored inside the various Config classes in config.py. Their values can either be hard-coded in the file (not recommended for non-development purposes), or stored as environment variables.

An overview of the configuration variables can be found below:

Variable name Description
OAUTH_CLIENT_ID See above.
OAUTH_CLIENT_SECRET See above.
SECRET_KEY The application's secret key. It is recommended to use a randomly generated string as its value.
VOCABULARIES_DIR The path of the directory used for locally storing vocabulary repositories.
SESSION_TYPE The storage method used by Flask-Session
SESSION_PERMANENT See Flask-Session.
SESSION_USE_SIGNER See Flask-Session.
CELERY_BROKER_URL The URL of the instance running the Celery broker.
result_backend The URL of the used Celery results backend.
NEO4J_URI The URI of the Neo4j database.
NEO4J_USERNAME The username of the Neo4j database (default: neo4j).
NEO4J_PASSWORD The password for the Neo4j database.
Additional variables Description
VOCPOPULI_TOKEN See above.
VOCPOPULI_ADMIN_USER_ID See above.

Starting VocPopuli

Before VocPopuli can be started, the script load_repositories.py inside the main VocPopuli directory needs to be ran. This will ensure that the vocabularies, to which the current VOCPOPULI_ADMIN has access to, are loaded properly.

Afterwards, the Redis instance specified in SESSION_TYPE, CELERY_BROKER_URL, and result_backend needs to be started. This is usually done by running the command redis-server (with added arguments if needed) in a new console windows.

Next, the following environment variables need to be set, so that the Flask application containing VocPopuli can be started: FLASK_APP=vocpopuli.py, FLASK_DEBUG=1.

After that, a celery instance needs to be started from inside the main vocpopuli directory. This is done by running the following command in a new console window:

celery -A celery_worker.celery worker --loglevel=info

If running VocPopuli on a Windows machine, the previous command needs to be changed as follows:

celery -A celery_worker.celery worker --loglevel=info --pool=solo

Next, make sure that the Neo4j database is running. For local development you can use Neo4j Desktop or the neo4j Docker image. To apply any outstanding migrations make sure that the environment variables starting with NEO4J_ are exported and run

flask migrate

 

VocPopuli can now then be started by running the flask run command inside the vocpopuli directory, and opening http://localhost:5000.

Setting up an initial vocabulary

Currently, an initial empty vocabulary can be created from inside VocPopuli by clicking on 'Import/Create vocabulary' in the navigation bar. After a name for the vocabulary has been entered, and the form submitted, the vocabulary will be created automatically and exported to GitLab. There is no need for any JSON files to be uploaded when creating an empty new vocabulary.

Afterwards, new terms can be added using the 'New term' option in the navigation bar.

Version History

  • 0.0.2 Multiple improvements; Largest improvement is the inclusion of a graph database for speed performance
  • 0.0.1 Initial working version

Remarks

  • VocPopuli is in its early development stages. Bugs are to be expected.
  • Currently, the software has been tested mainly on MacOS and Ubuntu. Windows compatibility might be limited.

Funding

VocPopuli is funded by the Initiative and Networking Fund of the Helmholtz Association in the framework of the Helmholtz Metadata Collaboration project call.

Files

vocpopuli-main.zip

Files (898.6 kB)

Name Size Download all
md5:603de26bee26179ff6e8664c4b6cbb43
898.6 kB Preview Download

Additional details

Related works

Is derived from
Software: https://gitlab.com/metacook/vocpopuli (URL)