TaxonWizard
The TaxonWizard has the task to build, update and extend the species list.
It also monitors the list for errors.
Administations lists
In order for the TaxonWizard to build the species list, some administration lists are needed.
The final species list is generated on the basis of these lists.
The following shows the status of the TaxonWizards admin lists.
TaxBase
Structure: This list contains the base set of Aphia Ids for the final species list.
Mandatory fields: AphiaId
Meaning: It represents the minimum set of taxa for the final species list.
TaxAaid (expert knowledge)
Structure: In this list, Aphia Ids are mapped to divergent Accepted Aphia Ids.
Mandatory fields: aid (Aphia Id), aaid (Accepted Aphia Id).
Meaning: The taxonomy of WoRMS is adjusted with respect to accepted taxa according to own expert opinion.
TaxPrivate (expert opinion)
Structure: In this list, alternative taxon names are mapped to Aphia Ids.
Mandatory fields: private_name (alternative name), aid (Aphia Id)
Purpose: Unofficial taxon names, which are e.g. in historical datasets, are mapped to Aphia Ids.
Thus, these data can also be brought into the front-end search algorithm of the species list.
It is thus not only searched directly by name in the species list, but also additionally in this list.
TaxColony (expert knowledge)
Structure/meaning: This list contains Aphia Ids of taxa declared as colony living organisms.
Mandatory fields: aid (Aphia Id).
TaxChange (expert knowledge)
Structure: Different Aphia Ids are assigned to the following data:
- Validity field (validity): includes the datasets that are valid for the assignment
- Group name of the new taxon complex (treated)
- Workflow abbreviation: defines the type of data, e.g. Arctic, Antarctic, North Sea
- Date field (date_revision): temporal validity range of the mapping
- Source field (source): Scientific source from which the mapping is derived
- Group name of the new taxon complex (treated)
Mandatory fields: aid (Aphia Id), treated (group name)
Background: Over time, taxa may be more accurately reclassified or reclassified.
For example, it may be that the taxon with Aphia Id x
was measured at time 0 at location X
.
At time 1, a taxon with aphia Id y
is determined, but due to a reclassification,
the taxon x
measured at location X
is a mixture of taxon x
and y
.
This reclassification is caught by grouping the Aphia Ids x
and y
into a taxon complex.
TaxOut
Structure: ScientificNames are mapped to external databases.
Mandatory fields: scientificname (taxon name)
Background: There are taxa in the database that are not in WoRMS but should remain in the database.
In order not to fall out of the grid they need an artificial Aphia Id (e.g. Aphia Id: -2,
Scientificname: Variable, or can also be found in the TaxOut list).
Thus they can be introduced into the system and can be connected via search in the TaxOut list with corresponding
external databases.
Alternatively a WoRMS compatible manual species list entry could be generated here to be compatible for evaluations to
be compatible.
Species list
The species list generated from the administration lists consists per taxon of a subset of all attributes provided
by WoRMS plus the following additional information.
- Accepted Aphia Id (aaid)
- Accepted Scientific Name (aname)
- is_colony
The Accepted Aphia Id has either been determined via WoRMS or has been added using the admin list TaxAaid.
It is added to the taxon list for simplicity.
The same applies to the Accepted Scientificname.
The entry in the is_colony column is taken from the from the administration table TaxColony.
Below a generated species list is shown.
Building Species List
In CRITTERBASE the species list is considered with great care.
The user cannot directly modify or create the taxon list.
Only the TaxonWizard is able to do this.
The user has only two possibilities to send taxa directly to the TaxonWizard to build up the species list.
- Direct import of TaxonWizards base list
- Import via biota sheet while ingest process
Below you can see the import dialog window for TaxonWizards administration lists.
Below you can see a base list.
Below you can see the dialog window for ingestion of North Sea data.
Update the species list and detecting problems
During the update or assembly process of a species list, the following quality tests are performed:
Ambiguous scientificname entries
Is there the same scientific name in admin list ...
- TaxPrivate, TaxOut and in the species list or in ...
- TaxPrivate and in TaxOut
... an error is reported.
Thus, it is prevented that during a named taxa search in the taxon system in the imposed expert knowledge per se -
or in combination with the taxon list - ambiguous results are delivered.
For example, there must be no scientific names in the alternative nomenclature of TaxPrivate which also exist
in the species list; e.g. the name
Abra alba
must not
exist as an alternative name in TaxPrivate if it also exists in species list.
Missing ScientificName or Accepted ScientificName
If gaps (no value entered) exist in the ScientificName or Accepted ScientificName fields in an entry
t from taxon list a warning is issued.
These gaps can occur subsequently due to changes in the WoRMS dataset, for example, because the Scientificname of
t changes and is unfortunately deleted by WoRMS.
Taxon t then changes status from accepted to unaccepted and refers to a new accepted taxon.
Unfortunately, WoRMS deletes the corresponding name and does not leave it in the unaccepted t.
This results in the following problem:
At the time of importing a record with such a taxon entry t, its scientific name was in the species list.
A backend search of this name in the taxon system resulted in the matching Aphia Id, which could be retrieved in the
biota table.
Now the search entry by name is blocked.
It is then only possible to find out which taxon the user meant at the time of import by the explicit,
optional additional entry of the scientific name in the biota table. The systemic search entry
via the taxon system has thus been subsequently undermined.
Two possible solutions are discussed:
Either an additional admin table (e.g. TaxGapFix) could be created which solves this WoRMS-induced problem.
Alternatively, after a named search in the species system fails, the entire dataset could be searched for optional
taxon names (biota.given_taxon_name).
At the moment we prefer the first way, because the system continues to search for this species in the taxon system
independently of the actual measurement data (biota) - until the corresponding Aphia Id is found.
This is then used to search in the measurement data.
Duplicate names in the ScientificName field
Checks if taxa with different Aphia Ids and the same scientific name exist.
A search operation using the ScientificName would thus end up ambiguous.
This case would be quite legitimate, since there may be, for example, different
non-accepted taxa in an Aphia Id cascade that have the same scientific name.
Appropriate information will still be created.
Duplicate names in the Accepted ScientificName field
It is checked if taxa with different Aphia Ids and the same Accepted ScientificName exist.
Thus, a search operation using the Accepted ScientificName would also end up ambiguous.
This case is basically an error and should not occur.
The reason for this can apparently be an error in the WoRMS database.
A corresponding warning is generated.
No solution: How this circumstance can be fixed is not yet clarified.
Taxa with misleading Accepted AphiaIds
A check is made to see if there are totally unaccepted taxon entries (entries that do not result in an accepted entry
via the Aphia Id cascade) in species list.
For details see the misfit section in chapter MyWoRMS.
Taxa with other configuration problems
All taxa are checked for correct configuration of AphiaIds, names and status.
For details see the misfit section in chapter MyWoRMS.
Taxa in Limbo
Furthermore, reference is made to taxa that are in a so-called limbo status.
For details see the misfit section in chapter MyWoRMS.
The following is the result log of a taxon list update process.