JSON schema for occupational substance exposure named entity corpus

Type: array of object

Schema for corpus annotated with named entitites relating to occupational substance exposures

No Additional Items

Each item of this array must be:


Article annotated with named entitites relating to occupational substance exposures

No Additional Properties

Type: object

The following properties are required:

  • DOI
Type: object

The following properties are required:

  • URL

Type: string

Unique identifier for annotated document

Type: enum (of string)

Main type of substance exposure discussed in document (diesel exhaust or respirable crystalline silica (RCS) )

Must be one of:

  • "diesel-exhaust"
  • "rcs"

Type: string

PubMed PMID for the document (if document is indexed by PubMed)

Type: string

Digital Object Identifier (DOI) for the document, if available

Type: string

URL where document can be accessed, if no DOI is available

Type: string

Text corresponding to the subsections of the document that have been annotated, i.e., the abstract/summary, Methods section and Results section

Type: array of object

An array of sentences within 'doc_text

No Additional Items

Each item of this array must be:

Type: object

Properties of individual sentences in 'doc_text'

No Additional Properties

Type: string

Unique identifier for sentence

Type: number

Start character offset of sentence in 'doc_text'

Type: number

End character offset of sentence in 'doc_text'

Type: string

Text covered by the sentence

Type: array of object

Array of named entity annotations in document

No Additional Items

Each item of this array must be:

Type: object

Properties of a named entity annotation

No Additional Properties

Type: string

Unique identifier for the named entity annotation

Type: array of object

Array of text spans in 'doc_text' that constitute the named entity annotation. Annotations spans may be continuous or discontinuous; discontinuous annotations consist of one or more non-contiguous text spans

No Additional Items

Each item of this array must be:

Type: object

Properties of a span that constitutes or forms part of a named entity annotation

No Additional Properties

Type: number

Start character offset of span in 'doc_text''

Type: number

End character offset of span in 'doc_text''

Type: string

Text covered by the named entity annotation. In the case of annotations with discontinuous spans, the value of this attribute is created by concatenating the set of non-contiguous spans, separarated by spaces.

Type: enum (of string)

Semantic category of the named entity annotation

Must be one of:

  • "IndustryWorkplace"
  • "OccupationJobTitle"
  • "JobTaskactivity"
  • "SubstanceOrExposureMeasured"
  • "OHMeasurementDevice"
  • "SampleTypePersonal"