CRAN version

1 Introduction

Abbreviation list is an obligatory part of linguistic articles that nobody reads. These lists contain definitions of abbreviations used in the article (e. g. corpora names or sign language names), but also a list of linguistic glosses — abbreviations used in linguistic interlinear examples. There is a standardized list of glossing rules (Comrie, Haspelmath, and Bickel 2008) which ends with a list of 84 standard abbreviations. Much bigger list is present on the Wikipedia page. However researchers can deviate from those lists and provide their own abbreviations.

The worst abbreviation list that I have found in a published article make it clear that there is a room for improvement:

NOM = nominative, GEN = nominative, DAT = nominative, ACC = accusative, VOC = accusative, LOC = accusative, INS = accusative, PL = plural, SG = singular

Except obvious mistakes in this list there are some more problems that I want to emphasize:

  • lack of the alphabetic order;
  • there is also some abbreviation (sbjv, imp) in the article that are absent in the abbreviation list.

The main goal of the lingglosses R package is to provide an option for creating:

  • linguistic glosses for .html output of rmarkdown (Xie, Allaire, and Grolemund 2018)1;
  • semi-automatic compiled abbreviation list.

You can install the stable version from CRAN:

install.packages("lingglosses")

You can also install the development version of lingglosses from GitHub with:

# install.packages("remotes")
remotes::install_github("agricolamz/lingglosses")

In order to use the package you need to load it with the library() call:

library(lingglosses)

You can go through examples in this tutorial, but you can also create an lingglosses example from the rmarkdown template (File > New File > R Markdown… > From Template > lingglosses Document).

2 Create glossed examples with gloss_example()

2.1 Basic usage

The main function of the lingglosses package is gloss_example(). This package has the following arguments:

  • transliteration;
  • glosses;
  • free_translation;
  • comment;
  • orthography2;
  • line_length.

Except the last one all arguments are self-exploratory.

gloss_example(transliteration = "bur-e-**ri** c'in-ne-sːu",
              glosses = "fly-NPST-**INF** know-HAB-*NEG*",
              free_translation = "I cannot fly. (Zilo Andi, East Caucasian)",
              comment = "(lit. do not know how to)",
              orthography = "Бурери цIиннессу.")
Бурери цIиннессу.
bur-e-ri c’in-ne-sːu
fly-npst-inf know-hab-neg
(lit. do not know how to)
‘I cannot fly. (Zilo Andi, East Caucasian)’

In this first example you can see that:

  • the transliteration line is italic by default (if you do not want it, just add the italic_transliteration = FALSE argument);
  • users can use standrad markdown syntax (e. g. **a** for bold and *a* for italic);
  • the free translation line is framed with quotation marks.

Since function arguments’ names are optional in R, users can omit writing them as far as they follow the order of the arguments (you can always find the correct order in ?gloss_example):

gloss_example("bur-e-**ri** c'in-ne-sːu",
              "fly-NPST-**INF** know-HAB-_NEG_",
              "I cannot fly. (Zilo Andi, East Caucasian)",
              "(lit. do not know how to)")
bur-e-ri c’in-ne-sːu
fly-npst-inf know-hab-neg
(lit. do not know how to)
‘I cannot fly. (Zilo Andi, East Caucasian)’

It is possible to number and call your examples using strandard rmarkdown tool for generating lists (@):

(@) my first example
(@) my second example
(@) my third example

renders as:

  1. my first example
  2. my second example
  3. my third example

In order to reference examples in the text you need to give them some names:

(@my_ex) example for the referencing
  1. example for the referencing

With names settled you can reference example (4) in the text using the following code (@my_ex).

So this kind of example referencing can be used with lingglosses examples like in (5) and (6). The only important details are:

  • change your code chunk argument to echo = FALSE (or specify it for all code chunks with the following comand in the begining of the document knitr::opts_chunk$set(echo = FALSE"));
  • do not put an empty line between reference line (with (@...)) and the code chunk with lingglosses code.
  1. bur-e-ri c’in-ne-sːu
    fly-npst-inf know-hab-neg
    (lit. do not know how to)
    ‘I cannot fly. (Zilo Andi, East Caucasian)’
  2. Zilo Andi, East Caucasian
    bur-e-ri c’in-ne-sːu
    fly-npst-inf know-hab-neg
    (lit. do not know how to)
    ‘I cannot fly.’

Sometimes people gloss morpheme by morpheme (this is especially useful for polysynthetic languages). It is also possible in lingglosses (and you can annotate slots with orthography argument, see footnote 2 for the details):

  1. Abaza, West Caucasian (Arkadiev and Lander 2020: example 5.2)
gloss_example("s- z- á- la- nəq'wa -wa -dzə -j -ɕa -t'",
              "1SG.ABS POT 3SG.N.IO LOC pass IPF LOC 3SG.M.IO seem(AOR) DCL",
              "It seemed to him that I would be able to pass there.")
s- z- á- la- nəq’wa -wa -dzə -j -ɕa -t’
1sg.abs pot 3sg.n.io loc pass ipf loc 3sg.m.io seem(aor) dcl
‘It seemed to him that I would be able to pass there.’

Glossing extraction algorithm implemented in lingglosses is case sensitive, so if you want to escape it you can use curly brackets:

  1. Kvankhidatli Andi, (Verhees 2019: 203)
gloss_example("den=no he.ʃː-qi hartʃ'on-k'o w-uʁi w-uk'o.",
              "{I}=ADD DEM.M-INS watch-CVB M-stand.AOR M-be.AOR",
              "And I stood there, watching him.")
den=no he.ʃː-qi hartʃ’on-k’o w-uʁi w-uk’o.
I=add dem.m-ins watch-cvb m-stand.aor m-be.aor
‘And I stood there, watching him.’

In the example above {I} is just English word I that will be escaped and will not appear in the gloss list as marker of class I.

2.2 Multiline examples

Sometimes examples are to long and do not fit into the page. In that case you need to add argument results='asis' to your chunk and gloss_example() will automatically split your example into multiple rows.

  1. Mishlesh Tsakhur, East Caucasian (Maisak and Tatevosov 2007: 386)
gloss_example('za-s jaːluʁ **wo-b** **qa-b-ɨ**; turs-ubɨ qal-es-di ǯiqj-eː jaːluʁ-**o-b** **qa-b-ɨ**', 
               '1SG.OBL-DAT shawl.3 AUX-3 PRF-3-bring.PFV woolen_sock-PL NPL.bring-PL-A.OBL place-IN shawl.3-AUX-3 PRF-3-bring.PFV',
               '(they) **brought** me a shawl; instead of (lit. in place of bringing) woolen socks, (they) **brought** a shawl.',
               '(Woolen socks are considered to be more valuable than a shawl.)')
za-s jaːluʁ wo-b qa-b-ɨ; turs-ubɨ qal-es-di
1sg.obl-dat shawl.3 aux-3 prf-3-bring.pfv woolen_sock-pl npl.bring-pl-a.obl
ǯiqj-eː jaːluʁ-o-b qa-b-ɨ
place-in shawl.3-aux-3 prf-3-bring.pfv
(Woolen socks are considered to be more valuable than a shawl.)
‘(they) brought me a shawl; instead of (lit. in place of bringing) woolen socks, (they) brought a shawl.’

If you are not satisfied with the result of automatic split you can change value of the line_length argument (the default value is 70, that means 70 characters of the longest line).

2.3 In-text examples

When an example is small, author may do not want to put it in a separate paragraph, but rather prefer to keep it within the text. It is possible to achieve using standard for rmarkdown inline code. The result of the R code can be inserted into the rmarkdown document by using backtick symbol and small r, for example `r 2+2` will be rendered as 4. Currently lingglosses can not automatically detect, whether code provided via code chunk or inline. So if you want to use in-text glossed example and want them to appear in the glosses list, it is possible to write them using the gloss_example() with the intext = TRUE argument. Here is a Turkish example from (DeLancey (1997)): Kemal gel-miş (Kemal come-mir) that was produced with the following inline code:

`r gloss_example("Kemal gel-miş", "Kemal come-MIR", intext = TRUE)`

In the third section I show how to create a semi-automatic compiled abbreviation list for your document. As an example I provide the abbreviation list for this exact document. Even though the mir gloss appears only in this exact section in the in-text example above, it appears in the gloss lists presented in the third section.

2.4 Stand-alone glosses with add_gloss()

Sometimes glosses are used in text without any example, e. g. in the table or in the text. So if you want to use in-text glosses and want them to appear in the glosses list, it is possible to write them using the add_gloss() function. As an example I adapted part of the verbal inflection paradigm of Andi (East Caucasian) from the Table 2 (Verhees 2019: 199):

aff neg
aor -∅ -sːu
msd -r -sːu-r
hab -do -do-sːu
fut -dja -do-sːja
inf -du -du-sːu

that is generated using the folowing markdown3 code4:

|                      | `r add_gloss("AFF")` | `r add_gloss("NEG")` |
|----------------------|----------------------|----------------------|
| `r add_gloss("AOR")` | -∅                   | *-sːu*               |
| `r add_gloss("MSD")` | *-r*                 | *-sːu-r*             |
| `r add_gloss("HAB")` | *-do*                | *-do-sːu*            |
| `r add_gloss("FUT")` | *-dja*               | *-do-sːja*           |
| `r add_gloss("INF")` | *-du*                | *-du-sːu*            |

In the third section I show how to create a semi-automatic compiled abbreviation list for your document. As an example I provide the abbreviation list for this exact document. Even though the fut and msd glosses appears only in this exact section in the table above, it appears in the gloss lists presented in the third section.

2.5 Glossing Sign languages

Unfortunately, gloss extraction implemented in lingglosses is case sensitive. That makes it hard to use for the Sign Languages glossing, because

  1. Sign linguists used to gloss lexical items with capitalized English translations;
  2. Sign language glosses sometimes are splitted into two lines associated with two hands (or even more if you want to account for non-manual markers);
  3. Sign language glosses should be somehow aligned with video/pictures (see the fascinating signglossR by Calle Börstell);
  4. There can be empty space in glosses;
  5. There can be some placeholders that corresponds to utterance by one articulator (e. g. hand), which are held stationary in the signing space during the articulation made by another articulator.

I will illustrate all this problems with the example from Russian Sign Language (Kimmelman 2012: 421):

gloss_example(glosses = c("LH: {CHAIR} ________",
                          "RH: {} CL:{SIT}.{ON}"),
              free_translation = "The cat sits on the chair", 
              comment = "[RSL; Eks3–12]",
              drop_transliteration = TRUE)
lh: CHAIR ________
rh: cl:SIT.ON
[RSL; Eks3–12]
‘The cat sits on the chair’

So, first of all the capitalization that is not used for glossing is embraced with curly brackets, so lingglosses does not treat it as a gloss. Two separate gloss lines for different hands are provided with a vector with two elements (see c() function for the vector creation). It is important to provide the drop_transliteration = TRUE argument, otherwise internal tests within the gloss_example() function will fail.

It is also possible to use pictures in a transliteration line, see example from Kazakh-Russian Sign Language (Kuznetsova et al. 2021: 51) (pictures are used with the permission of Anna Kuznetsova, the author):

gloss_example("![](when.png) ![](mom.png) ![](tired.png)",
              c("br_raise_______ {} {}",
                "chin_up_______ {} {}",
                "{WHEN} {MOM} {TIRED}"), 
              "When was mom tired?")
br_raise_______
chin_up_______
WHEN MOM TIRED
‘When was mom tired?’

The first line corresponds to pictures in markdown format that should be located in the same folder (otherwise you need to specify the path to them, e. g. ![](images/your_plot.png)). The next three lines correspond to different lines in the example with some non-manual articulation: as before all glossing lines stored as a vector of strings. User can replace {} with _______ in order to show the scope of non-manual articulation.

3 Create semi-automatic compiled abbreviation list

After you finished your text, it is possible to call the make_gloss_list() function in order to automatically create a list of abbreviations.

make_gloss_list()

1sg — first person singular; 3 — third person; 3sg — third person singular; a — agent-like argument of canonical transitive verb; abs — absolutive; add — additive; aff — affix; aor — aorist; aux — auxiliary; cl — clitic; cvb — converb; dat — dative; dcl — declarative; dem — demonstrative; fut — future; hab — habitual; imp — imperative; in — in a container; inf — infinitive; ins — instrumental; io — indirect object; ipf — imperfective; lh — left hand; loc — locative; m — masculine; mir — mirative; msd — masdar; n — neuter; neg — negation; np — noun phrase; npl — neutral plural; npst — non-past; obl — oblique; pfv — perfective; pl — plural; pot — potential; prf — perfect; prfx — prefix; rh — right hand; root — root; sbjv — subjunctive; sfx — suffix

This function works with the built-in dataset glosses_df that is compiled from Leipzig Glosses, Wikipedia page and articles from the open access journal Glossa5. Everybody can download and change this dataset for their own purposes. I will be thankful if you leave your proposals for the dataset change for this list in the issue tracker on GitHub.

It is possible that user can be not satisfied with the result of make_gloss_list() function, then there are two possible strategies. First strategy is to copy the result of the make_gloss_list(), modify it and paste in your rmarkdown document. Sometimes you work on some volume dedicated to on group of languages and you want to assure that glosses are the same across all articles, than you can compile your own table with columns gloss and definition and use it within make_gloss_list function. As you can see, all glosses specified in the my_abbreviations dataset changed their values in the output below:

my_abbreviations <- data.frame(gloss = c("NPST", "HAB", "INF", "NEG"),
                               definition = c("non-past tense", "habitual aspect", "infinitive", "negation marker"))
make_gloss_list(my_abbreviations)

1sg — first person singular; 3 — third person; 3sg — third person singular; a — agent-like argument of canonical transitive verb; abs — absolutive; add — additive; aff — affix; aor — aorist; aux — auxiliary; cl — clitic; cvb — converb; dat — dative; dcl — declarative; dem — demonstrative; fut — future; hab — habitual aspect; imp — imperative; in — in a container; inf — infinitive; ins — instrumental; io — indirect object; ipf — imperfective; lh — left hand; loc — locative; m — masculine; mir — mirative; msd — masdar; n — neuter; neg — negation marker; np — noun phrase; npl — neutral plural; npst — non-past tense; obl — oblique; pfv — perfective; pl — plural; pot — potential; prf — perfect; prfx — prefix; rh — right hand; root — root; sbjv — subjunctive; sfx — suffix

Unfortunately, some glosses can have multiple meaning in different traditions (e. g. ass can be either associative plural or assertive mood). By default make_gloss_list() shows only some entries that were chosen by the package author. You can see all possibilities, if you add argument all_possible_variants = TRUE. As you can see, there are multiple possible values for aff, ass, cl, imp, in, ins, and prf:

make_gloss_list(all_possible_variants = TRUE)

1sg — first person singular; 3 — third person; 3sg — third person singular; a — agent-like argument of canonical transitive verb; abs — absolutive; add — additive; aff — affirmative; aff — affix; aor — aorist; ass — assertive; ass — associative; aux — auxiliary; cl — classifier; cl — clitic; cvb — converb; dat — dative; dcl — declarative; dem — demonstrative; fut — future; hab — habitual; imp — imperative; imp — imperfect; imp — imperfective; imp — impersonal; in — in a container; in — inclusive; in — inessive; inf — infinitive; ins — instantiated; ins — instrumental; io — indirect object; ipf — imperfective; lh — left hand; loc — locative; m — masculine; mir — mirative; msd — masdar; n — neuter; neg — negation; np — noun phrase; npl — neutral plural; npst — non-past; obl — oblique; pfv — perfective; pl — plural; pot — potential; prf — perfect; prf — perfective; prfx — prefix; rh — right hand; root — root; sbjv — subjunctive; sfx — suffix

You can notice that problematic glosses (those without definition or duplicated) are colored. This can be switched off adding the argument annotate_problematic = FALSE:

make_gloss_list(all_possible_variants = TRUE, annotate_problematic = FALSE)

1sg — first person singular; 3 — third person; 3sg — third person singular; a — agent-like argument of canonical transitive verb; abs — absolutive; add — additive; aff — affirmative; aff — affix; aor — aorist; ass — assertive; ass — associative; aux — auxiliary; cl — classifier; cl — clitic; cvb — converb; dat — dative; dcl — declarative; dem — demonstrative; fut — future; hab — habitual; imp — imperative; imp — imperfect; imp — imperfective; imp — impersonal; in — in a container; in — inclusive; in — inessive; inf — infinitive; ins — instantiated; ins — instrumental; io — indirect object; ipf — imperfective; lh — left hand; loc — locative; m — masculine; mir — mirative; msd — masdar; n — neuter; neg — negation; np — noun phrase; npl — neutral plural; npst — non-past; obl — oblique; pfv — perfective; pl — plural; pot — potential; prf — perfect; prf — perfective; prfx — prefix; rh — right hand; root — root; sbjv — subjunctive; sfx — suffix

In case you want to remove some glosses from the list, you can use argument remove_glosses:

make_gloss_list(remove_glosses = c("1SG", "3SG"))

3 — third person; a — agent-like argument of canonical transitive verb; abs — absolutive; add — additive; aff — affix; aor — aorist; ass — associative; aux — auxiliary; cl — clitic; cvb — converb; dat — dative; dcl — declarative; dem — demonstrative; fut — future; hab — habitual; imp — imperative; in — in a container; inf — infinitive; ins — instrumental; io — indirect object; ipf — imperfective; lh — left hand; loc — locative; m — masculine; mir — mirative; msd — masdar; n — neuter; neg — negation; np — noun phrase; npl — neutral plural; npst — non-past; obl — oblique; pfv — perfective; pl — plural; pot — potential; prf — perfect; prfx — prefix; rh — right hand; root — root; sbjv — subjunctive; sfx — suffix

It is really important that one should not treat results of the make_gloss_list() function as carved in stone: once it compiled you can copy, modify and paste it in your document. You can try to spent time improving the output of the function, but on the final stage it is faster to correct it manually.

4 Other output formats

Right now there is no direct way of knitting lingglosses to .docx format, however you can work around copying and pasting from the .html version:

The .pdf output is possible, however there are some known restrictions:

  • markdown bold and italic annotations do not work;
  • example numbers appears above the example;
  • there is no non-breaking space in glosses list.

So if you want to avoid those problems the best solution will be to use one of the latex glossing packages listed in the first footnote and the package glossaries for automatic compilation of glosses.

References

Arkadiev, P., and Y. Lander. 2020. “The Northwest Caucasian Languages.” In The Oxford Handbook of the Languages of the Caucasus, 369–446.
Comrie, B., M. Haspelmath, and B. Bickel. 2008. “The Leipzig Glossing Rules: Conventions for Interlinear Morpheme-by-Morpheme Glosses.”
DeLancey, S. 1997. “Mirativity: The Grammatical Marking of Unexpected Information.” Linguistic Typology 1 (1): 33–52.
Goldsmith, J. 1979. “The Aims of Autosegmental Phonology.” In Current Approaches to Phonological Theory, edited by D. A. Dinnsen, 202–22. Indiana University Press Bloomington, IN.
Kimmelman, V. 2012. “Word Order in Russian Sign Language.” Sign Language Studies 12 (3): 414–45.
Kuznetsova, A., A. Imashev, M. Mukushev, A. Sandygulova, and V. Kimmelman. 2021. “Using Computer Vision to Analyze Non-Manual Marking of Questions in KRSL.” In Proceedings of the 1st International Workshop on Automatic Translation for Signed and Spoken Languages (At4ssl), 49–59. Association for Machine Translation in the Americas. https://aclanthology.org/2021.mtsummit-at4ssl.6.
Maisak, T., and S. Tatevosov. 2007. “Beyond Evidentiality and Mirativity: Evidence from Tsakhur.” In L’Énonciation médiatisée II, 377–406.
Verhees, S. 2019. “General Converbs in Andi.” Studies in Language. International Journal Sponsored by the Foundation “Foundations of Language” 43 (1): 195–230.
Xie, Y., J. J. Allaire, and G. Grolemund. 2018. R Markdown: The Definitive Guide. CRC Press.

  1. If you want to render .pdf version you can either use latex and multiple linguistic packages developed for it (see e. g. gb4e, langsci, expex, philex), either you can render .html first and convert it to .pdf afterwards.↩︎

  2. It is also possible to use this tier for the annotation of words like here:
    HL H L H
    eze a za a
    np prfx root sfx
    ‘Eze swept… (Igbo, from (Goldsmith 1979: 209))’
    ↩︎
  3. The table generated with markdown is visualy poor. There is a lot of other ways to generate a table in R: kable() from knitr; kableExtra package, DT package and many others.↩︎

  4. It is easier to generate Markdown or Latex tables with Libre Office or MS Excel and then use some online transformation websites like https://www.tablesgenerator.com/.↩︎

  5. The script for the collecting glosses is available here. The glosses list was manually corrected and merged with glosses from other sources. This kind of glosses marked in the glosses_df dataset as lingglosses in the source column.↩︎