Utterance

Validate against: http://json-schema.org/schema#

Description

The term utterance is intentially ambiguous, and refers to any unit of a text above the word level. The DLx framework imposes no requirements regarding this size of this unit or how segmentation of the text into units should be accomplished. The user may choose to segment a text based on prosodic units, turns, sentences, or any other appropriate subdivision.

Type: object

Required Properties

  • transcription
  • translation

Additional properties: true

Dependencies

  • If the property startTime is present, the following properties must also be present:

    • endTime
  • If the property endTime is present, the following properties must also be present:

    • startTime

Properties

  • Type: "type"

    Description

    The type of object. Must be set to Utterance.

    Type: string

  • Key: "key"

    Description

    A key which uniquely identifies this Utterance within the Text. The key for an Utterance consists of the abbreviation of the Text, a period, and then the number of this Utterance within the Text (index starts at 1). For example, the third Utterance of a Text with the abbreviation A would be A.3. Keys should be unique within a corpus.

    Type: string

    Regular expression pattern: ^[(a-z)|(A-Z)|(0-9)]+\.[0-9]{1,3}$

  • End Time: "endTime"

    Description

    The time that the speaker finishes producing this Utterance within the media file(s) associated with this Text. The timestamp should be formatted in SS.MMM (seconds and milliseconds).

    Type: number

    Minimum: 0.001

  • Language: "language"

    Description

    The key for the Language used in this Utterance, e.g. spa or eng. If the text is labeled with a Language, all its Utterance are assumed to be the same Language unless labeled otherwise. Likewise, if a Utterance is given a Language, all its words are assumed to be the same Language unless the word is labeled otherwise.

    Must be an instance of the Abbreviation schema.

  • Link: "link"

    Description

    A URL where a presentational format for this resource may be viewed

    Type: string

    Format: uri

  • Notes: "notes"

    Description

    A collection of notes about this Utterance

    Type: array

    Unique items: true

    Items

    Note

    Description

    A note about this Utterance

    Must be an instance of the Note schema.

  • Speaker: "speaker"

    Description

    The abbreviation of person who produced (uttered, signed, spoke, sung) this Utterance. The value of this field must match the abbreviation of one of the people listed in the contributors array of the Text. If the text has a single contributor with the role of speaker, that speaker is assumed to be the speaker for all Utterances in the Text. If multiple contributors with a speaker role are included in a text, each Utterance must have its speaker attribute specified.

    Must be an instance of the Abbreviation schema.

  • Start Time: "startTime"

    Description

    The time that the speaker begins producing this Utterance within the media file(s) associated with this Text. The timestamp should be formatted in SS.MMM (seconds and milliseconds).

    Type: number

  • Tags: "tags"

    Description

    A set of tags for this Utterance

    Must be an instance of the Tags schema.

  • Transcript: "transcript"

    Description

    A transcript of this Utterance, including things like prosodic markup, overlap, pauses, and various other discourse features. This field is intended for use by those doing discourse or conversation analysis, who need to mark up their text without affecting the phonemic transcription (in the transcription property). The transcript may be in multiple orthographies, or representational systems (e.g. you might have a CA transcript and a DT transcript, for discourse transcripts using Conversation Analysis and Discourse Transcription conventions respectively).

    Must be an instance of the Transcription schema.

  • Transcription: "transcription"

    Description

    The transcriptions for this Utterance, optionally in multiple orthographies. This field is intended for use with purely phonemic / morphophonemic transcriptions. Punctuation should generally be avoided. To add punctuation and other discourse-level transcriptional features, use the transcript property. The transcription must be provided in at least one orthography.

    Must be an instance of the Transcription schema.

  • Translation: "translation"

    Description

    The translations for this Utterance, optionally in multiple languages. Also includes an optional type attribute, for specifying things like free or literal translation. The translation must be provided in at least one language.

    Must be an instance of the Translation schema.

  • URL: "url"

    Description

    The URL where this Utterance can be retrieved in JSON format

    Type: string

    Format: uri

  • Words: "words"

    Description

    A collection of the word tokens contained in this Utterance. Tokens do not need to be unique.

    Type: array

    Unique items: false

    Items

    Word

    Description

    A Word object

    Must be an instance of the Word schema.