Frequency distributions of punctuation marks in English

The analysis of punctuation in philology is mainly carried out with a view to better understand the meaning of the literature concerned. Punctuation is generally believed to play the role of ‘assisting the written language in indicating those elements of speech that cannot be conveniently set down on paper: chiefly the pause, pitch and stress in speech’ (Markwardt, 1942: 156). Most of us often ignore the importance of punctuation in writing systems and tend to believe that punctuation only depends on tradition and the personal styles of writers. In fact, punctuation marks may contribute significantly to the clarity of expression. Many linguists associate punctuation with intonation, but the truth is more complex than that – punctuation marks may affect orthography, morphology, syntactic relations, semantic information, and can even influence textual structure.


Introduction
The analysis of punctuation in philology is mainly carried out with a view to better understand the meaning of the literature concerned. Punctuation is generally believed to play the role of 'assisting the written language in indicating those elements of speech that cannot be conveniently set down on paper: chiefly the pause, pitch and stress in speech' (Markwardt, 1942: 156). Most of us often ignore the importance of punctuation in writing systems and tend to believe that punctuation only depends on tradition and the personal styles of writers. In fact, punctuation marks may contribute significantly to the clarity of expression. Many linguists associate punctuation with intonation, but the truth is more complex than thatpunctuation marks may affect orthography, morphology, syntactic relations, semantic information, and can even influence textual structure.
Before the 1980s, most studies on punctuation were prescriptive, usually neglecting descriptive approaches, and much of the research focused on philology, in most cases seeking to explain the unclear meaning of punctuation in ancient literature. For example, style guides and grammar books (e.g. Partridge, 1953) gave prescriptive accounts of punctuation. The first descriptive attempt can be found in Meyer's PhD dissertation (Meyer, 1987) where he synthesized an account of punctuation through a small English corpus. Punctuation in grammatical relations has been explored in Quirk et al. (1985Quirk et al. ( : 1610Quirk et al. ( -1639, Nunberg (1990), Jones (1996), Huddleston & Pullum (2002: 1731 etc. For example, Nunberg (1990) first put forward the idea of the 'linguistics of punctuation', together with an attempt to situate punctuation as a linguistic subsystem which is closely associated with lexis and grammar. He did so through the construction of two concepts: text-grammar and lexical grammar.
The basic functions of punctuation marks can be classified into two types: grammatical and rhetorical. With regard to grammatical functions, punctuation marks are used to show the boundaries between segments and to indicate how the segments of text are supposed to relate to one another. In contrast, in rhetorical functions, they show the emphasis or prosody that readers want to give to a segment or a larger segment. However, the two functions have not been equally developed in the historical tradition. For one thing, few studies on punctuation up to now have directly engaged with the characteristics and development of language because most of these studies have focused on rhetorical, linguistic and orthographical functions. Although the rhetorical, linguistic and orthographical functions of punctuation are closely related to language use, few studies on punctuation have probed into the characteristics and development of language through analyzing these functions. Further, the rapid development of corpora and the technology of natural language processing allow us to collect linguistic data and make more scientific investigations of linguistic phenomena. Specifically, with the help of new techniques, quantitative analysis will help us better understand patterns of punctuation although only few studies on punctuation hitherto have been built on data analysis. For example, the analysis of the frequency distributions of punctuation marks from synchronic and diachronic perspectives helps us discover patterns and regularities in language use.
The frequency distribution of punctuation based on large-sized corpora has rarely been investigated before. Anyone with a little awareness can see that commas and full stops are used heavily in written language. Frequency can be treated as a ranking of the occurrences of a given phenomenon. For instance, word frequency refers to words as ranked by their frequency in language (the words with highest frequency here are well known: a, and, he . . .) Similarly, various punctuation marks are used with differing frequency. Frequency plays an important role in language and in the physical world. Nature is strict in its rules and laws; hence we can often discover its patterns in the form of frequency distributions. This paper will focus on punctuation in different registers, varieties of English and the development of English. There have been numerous corpusbased descriptions of linguistic characteristics of particular registers which are treated as different textual categories, such as the novel, spoken language, the lecture etc. Studies have also been made using comparisons across the registers. These studies are linguistic descriptions of lexical and grammatical features (e.g. Biber, 1988Biber, , 1995Biber et al., 1999). However, although the frequency distribution of punctuation is helpful in showing differences between registers, the punctuation perspective differing across registers has seldom been taken. The frequency distribution of punctuation marks in different registers helps us to understand the many distinctions between these registers. Additionally, English has been widely used in many countries across the globe. These Englishes differ from each other to some extent, and they are treated as varieties of English. Considering the importance of frequency distributions, we can ask whether different punctuation marks differ in frequency across different registers and varieties of English as well as whether these patterns may have changed over time.
Addressing these questions can definitely help us to gain a better understanding of English and its characteristics. With the development of new technology, we have far greater access to frequency distributions of punctuation marks and can examine them from the perspective of big data.
A thorough study of punctuation using frequency data is therefore definitely helpful in understanding the role that punctuation marks have played in the English language from synchronic and diachronic perspectives. We will answer the following questions in this paper: 1) What statistical patterns do the frequency distributions for the different punctuation marks in English follow? 2) Are these differences in punctuation mark use across various English registers? Do punctuation marks show a similar frequency distribution in the global varieties of English? 3) What changes have these punctuation marks undergone in last five hundred years? Are there any regularities or patterns in these changes? In what follows, some details are given about the electronic resources used for the present study. As the largest freely-available corpus of American English, COCA contains more than 520 million words of text and is comprised of five registers. Containing texts from the 1810s to the 2000s, COHA is the largest corpus of historical American English, consisting of texts from different registers. BNC contains 100 million words of British English text from a wide range of registers. The corpus of GloWbE is a large English corpus collecting international English from the internet, containing about 1.9 billion words of text from twenty different countries. For further information on the corpora used, see https://corpus.byu.edu/.

Data and method
Using a yearly count of N-grams found in the sources printed between 1500 and 2008 in Google's text corpora in English, Chinese, and other six languages, Google Books N-gram Viewer (Google Books, 2010) is an online search engine, which outputs a graph that depicts the historical changes of frequency for a particular phrase (or word). It is also currently the world's largest corpus and the only corpus that enables resolution at a fine temporal scale (yearly) over a long period of time (Michel et al., 2011). The developers of the viewer aimed to create a new approach to humanities research, which would make it possible to rigorously study the evolution of culture using distributional, quantitative data on a grand scale (Bohannon, 2010). Google N-gram viewer thus enables further understanding of the relationship between language and its culture.
The punctuation marks explored in the current study are the common ones: period, comma, colon, semicolon, hyphen, question mark, dash, exclamation mark, parenthesis, apostrophe and slash. The frequency distributions of these punctuation marks can be obtained using the corpora mentioned above.

Patterns in the frequency distributions of punctuations marks
Three corpora (Brown, COCA and BNC) were searched to collect the frequency data of each punctuation mark. The frequency of punctuation marks calculated in the current study is relative to word count in corpora. The frequencies of punctuation marks per million are shown in Figure 1.
In Figure 1, the x-axis shows the rank according to the punctuation frequency distribution; the y-axis shows the normalized frequency per million. Statistical analysis shows that the frequency patterns in the use of punctuation marks in COCA and BNC follow a growth regression model, while they follow a power law 1 for Brown. 2 On one hand, the growth regression model, also called growth curve model, captures how a particular quantity increases over time. Growth curves are used in statistics to determine the type of growth pattern of the quantity-be it linear, logarithmic, or exponential. Logarithmic regression in growth curves might present a heavy-tailed distributional pattern. On the other hand, the power law distribution captures a phenomenon whereby a small number of occurrences are common, while instances of larger occurrences are rare. The power law distribution can be found in a wide variety of physical, biological, cognitive, social and artificial phenomena (Kello et al., 2010;Clauset, Shalizi & Newman, 2009).
In theory, nature, animals (including people), and even well-designed machines will naturally choose the path of least effort so as to reach the best result. The two distributions shown in the frequency of punctuation marks are both heavy-tailed distributional patterns. The power law can explain the behavior that humans make the least effort (Kello et al., 2010), and the growth model also helps explain least-effort behavior. In such a sense, there is very little difference between the two models to demonstrate the least-effort principle.

The frequency distribution of punctuation marks in different registers
After identifying the general statistical pattern of frequency distributions for punctuation marks, we can also ask whether the frequency distributions vary in the different registers. For example, academic English may use few exclamation marks so as to avoid displaying subjectivity and personal emotions in addressing serious topics. This section will examine how the frequencies of punctuation marks are distributed across various registers in English.
COCA provides five different registers for inspection. The data are depicted in Figure 2. Figure 2 shows the normalized frequency per a million (y-axis) for different punctuation marks (x-axis).
It is necessary for a statistical test to be taken to check whether there are significant differences among these data. AVONVA is a commonly effective method to check this kind of significance. One-way ANOVA is used to test the null hypothesis that the means of several data within a group are all equal. One-way ANOVA test in Excel shows that if F > F crit, we reject the null hypothesis, indicating that the difference among values is real. This is the case, 116.5 (F ) > 2.124 (F crit). Therefore, we reject the null hypothesis, that is to say, in COCA, the five registers display different frequency distributions with regard to punctuation marks.
Firstly, academic writing displays the fewest periods 3 , question marks and exclamation marks, but the highest figures for parentheses and semicolons. This demonstrates that sentences in the academic register are longer than in other registers. It is also clear that academic English seldom displays questions or exclamatory sentences. In addition, parentheses are used extensively in academic texts for citations, or occasionally for explanations, while they are seldom employed in other registers, probably so as not to interrupt the flow of text.
Secondly, fiction shows the highest frequencies of periods and exclamation marks. Sentences in fiction tend to be short. Exclamation marks are widely employed in fiction so as to show emotions or the atmosphere. Additionally, fiction uses the fewest hyphens. 4 In contrast, newspapers and fictions tend to combine words through hyphenation into new units to express up-to-date ideas and temporary concepts (see in the following paragraphs and sections for further discussions at this point).
Thirdly, question marks and colons are used the most frequently in spoken English (written transcriptions of recorded spoken language). In daily communication, people often use questions or exclamations, which is a sign of an informal genre for pragmatic communications. The use of the colon is a little complicated because its functions cover many aspects such as the annunciatory, explanatory, appositive, parallel, etc. The colon can therefore be widely used when spoken language is transcribed into written form so as to express complex situations. That is why the frequency of colon in transcribed spoken English is much higher than other registers.

Figure 2. The distribution of punctuation marks for different registers in COCA
The newspaper genre has the highest frequency of apostrophes and hyphens. 'The 'kiss and tell' principle expresses the essence of good journalism: keep it short and simple and tell the story' (Busà, 2014: 96), and the principle indicates that news prefers simple and concise language; hence abbreviations (e.g. 'Strategic Health Authority' becomes 'SHA'), apostrophes and hyphens between words (e.g. some verb phrases need hyphens when they are used as nouns, like check-up, break-in, turn-on) tend to be extensively used. As for hyphens, Journalism BBC News style guide (2018) suggests, ' Hyphens are often essential, if the text is to make immediate sense', and the other reason is that hyphenation is easy to integrate several words into a unit, yielding a new meaning which allows journalists to express concepts in novel and appealing ways.
It is also of interest to look at the frequency of punctuation marks in British English (see Figure 3). BNC provides two more registers, NON-ACAD and MISC-miscellaneous, that are not included in COCA. A contrastive analysis between COCA and BNC is also useful for understanding the differences between American and British English.
One-way ANOVA shows that the difference is quite significant (57.67 (F ) > 2.04 (F crit)) in BNC. The frequencies of the punctuation marks included in the analysis in these registers are quite similar to the results obtained for American English. However, British English newspapers display the most colons. Conversely, American newspapers use the most apostrophes and hyphens. British fiction uses the most periods, which indicates that the sentences are shorter than sentences in the American English material. However, apostrophes and quotation marks are mostly used in fiction and in this respect BNC data differ from American English.

Frequency distributions of punctuation marks in different varieties of english
The differences across varieties of English have been widely discussed from the perspectives of lexicon, syntax, semantics and discourse. However, few studies have investigated the differences in the uses of punctuation (The Punctuation Guide [2018] is an exception). Yet there may be differences in the frequency distributions for punctuation marks in different varieties of English. This section will examine the frequency of punctuation marks attested for 20 English-speaking countries and regions (for the varieties included in the study, see Table 2). The data were acquired through GloWbE.
SPSS was used to examine their discrete degrees, which are chiefly measured by the values of skewness. Skewness is a measure of symmetry. A distribution is symmetric if it looks the same to the left and right of the center point.
Shown in Table 1, skewness, as a statistical index, is a fitting measure of the discrete degrees within a group of numbers. A symmetric distribution such as a normal distribution has a skewness of 0. When data are skewed left, values for the skewness are negative; in contrast, positive values for the skewness indicate that data are skewed right. As positive skewness increases, the degree of asymmetry increases. For example, the skewness values for all twenty columns are 0.597 for comma and  Table 1 show that the frequencies obtained for parentheses, exclamation marks, apostrophes, hyphens and commas vary greatly. However, the use of periods and question marks varies the least between varieties of English. It implies that the boundaries of statements and questions in English seem to be agreed more anonymously by English speakers with various backgrounds than the practices of other punctuation marks due to both obeying syntactic rules more rigidly and strictly than other punctuation marks.
As shown in Table 2, the length of sentences is very similar among the varieties of English. In strong contrast to the use of periods, the varieties of English exhibit great differences in the use of commas probably due to many and varied functions of the comma (Partridge, 1953: 14-41) distinguishes at least twelve different functions for commas).
The greatest difference concerns parentheses. Pakistani English shows the most parentheses (6448.4/mil), almost double in number compared to Nigerian English (3577.14/mil). UK English uses the most apostrophes (4585.83/mil), 2.34 times as many as the Canadian variety, which uses the least (1958.05/mil). Interestingly, Singapore English uses the most question marks (5214.09/ mil) and exclamation marks (3554.11/mil). In Singapore English, there is an abundance of frequently used sentence-final particles (such as har, hor, leh, lor, meh, siah, wat) to express exclamations and questions (Leimgruber, 2013: 84-95), which might be one important factor responsible for the highest frequency of using question marks and exclamation marks in Singapore English.

A diachronic perspective
After the synchronic descriptions, a diachronic perspective will provide an interesting picture of the role played by punctuation marks in the development of English. This section will focus on changes in the frequencies of punctuation marks in the history of English from 1500 to 2008. The data are captured through Google N-gram viewer, as shown in Figure 4. 5 In what follows, comments are provided on a number of the punctuation marks included in the analysis.
Period: By way of background to the use of the period in the history of English, Liberman (2011) once launched an American Presidency Project, showing that mean sentence lengths have been falling since the founding of the republic and have undergone a cumulative drop of roughly 50%. Haussamen (1994) found that the printed English sentence had become shorter by comparing the number of words in written sentences from 1600 to the 1980s. Haussamen (1994) accordingly suggested that the printed sentence will continue to develop into a similar direction over the next two centuries. More than 100 years ago, Lewis (1894: 34) concluded that the English sentence had decreased in average length by at least one half in 300 years (prior to the 1890s).
As the number of periods in English has continued to rise, the length of sentence is very likely   to have become shorter. These findings and predictions are confirmed by Google N-gram data. The graph on the top left of Figure 4 shows that the percentage of periods has continued to increase steadily over the last three hundred years, rising from 3% to 4.3%. Currently, social networking tools reinforce the tendency to use shorter sentences.
Semicolon: A jagged upward trend can be seen in the use of the semicolon in English, peaking around 1800, and afterwards the semicolon experienced a long, smooth decline. This tendency has also been identified in some earlier studies, such as Bruthlaux (1995). Early 17 th -century writers used colons, semicolons, and commas interchangeably. 6 The semicolon prospered before the 19 th century; however, its frequency of use has fallen in the last two hundred years. It seems that currently the use of semicolons is associated with difficult or abstract topics; hence writers tend to decline to use semicolons, as previously described. Nunberg (1990), author of The Linguistics of Punctuation, holds the view that the semicolon seems to be reserved nowadays for certain kinds of highbrow and high-middlebrow writing (Kelly, 1999). The data in Figure 4 are consistent with the observation that the frequency of semicolon has declined.
Question mark: Since 1800, the frequency of question marks fluctuated mildly, but it drastically increased after the 1970s. This tendency is highly likely to continue in the future because of the rise of social networking media where question and exclamation marks are used frequently.
Exclamation mark: The frequency of the exclamation mark also kept fluctuating between 0.06% and 0.08% during the 19 th century before continuing to increase between the 18 th century and the mid-19 th century with some fluctuations. However, its frequency decreased in the 20 th century, reaching the lowest point in 1960s. This could indicate that there is a reduction in the number of exclamatory sentences used to express feelings and emotions. However, the last half-century has seen an upward trend in the use of the exclamation mark, most likely because of the wide application in social media.
Apostrophe: The data of the Google corpora show that the climax for the frequency of apostrophe was reached in the year 1712. However, the use made of apostrophes afterwards underwent a dramatic fall up until 1850. Strangely, it seems that the use of the apostrophe has become fashionable again in the last 60 years, especially in newspapers and magazines. The revival of the apostrophe is also likely to be the result of the rise of social networking, which will be supported by more evidence in the latter part of this section.
Dash: The dash referred to here is the so-called m-dash, '-ʼ, different from a spaced en dash (the m-dash is twice as long as the en dash), used as a break in a sentence or to set off parenthetical statements. The use of the dash increased after 1750, then reached its peak (about 0.35%) in 1860, but afterwards continued to drop up until the 1950s before starting to fluctuate between 0.25% and 0.275%.
As for other punctuation marks, in Google N-gram viewer, commas are missing as they are used as dividers; the colon is not available, either. Hence the COHA is used to supplement this missing data (see Figure 5).
The frequency of the comma has slowly dropped in the last two hundred years in American English, but the period evinces a reverse tendency to the comma. The decline of the use of commas might be caused by the fact that their rhetorical function has weakened in the history of English. Partridge (1953: 14) comments: 'In modern usage, the comma is used predominantly for the grammar, the construction or syntax, of a sentence; formerly the comma indicated primarily the rhetorical pauses, as quite often, it still does.' His point will be further discussed in the next section.
Hyphenated expressions are compounds with hyphenation, such as short-term, would-be, decision-making. Hyphenated expressions can express up-to-date ideas and temporary concepts (such as, floor-to-ceiling windows, a back-to-back connection, 1980s-style dancing, industrial-scale organic producers), and the hyphenated use can make a phrase become a word, such as the premodifiers in 'state-of-the-art article' and 'top-of-the-line use'. A lexical-grammatical device of this kind seems to have become popular in contemporary English. For American English, the frequency of hyphenated two-word expressions has increased in the last 200 years, as shown in two graphs (having different perspectives: component numbers vs. which part of speech [POS] the expression as a whole belongs to) at the bottom of Figure 5. The same tendency can also be seen in British English.

Discussion
From the synchronic perspective, the frequency of punctuation marks follows a 'heavy-tailed' distribution, abiding by the pattern that humans make the least effort. In the frequencies of words in most languages, the phonological system follows the power law; a well-known example of this is Zipf's law (Zipf, 1949), which has also been widely attested for word frequencies and some syntactical phenomena. However, the frequency distribution for punctuation marks in English does not always seem to follow the power law. The question remains as to why this is so, unlike for the other subsystems in language.
The frequencies of punctuation marks in different registers are very helpful in distinguishing genres, styles and other linguistic features. The differences in the frequencies of punctuation marks attested in comparisons between COCA and BNC also reflect the differences in usage and styles between American and British English.
The varieties of English differ with regard to the frequencies of specific punctuation marks. While periods and question marks occur with quite similar frequencies across all the varieties of English, the frequency of parentheses, exclamation marks, apostrophes, hyphens and commas varies between varieties of English. The differences in the frequencies for these punctuation marks are likely to be caused by a wide of variety of complex social-cultural factors. Studies (e.g. Crystal, 1985;Kirkpatrick, 2010) have considered differences between varieties of English through a variety of approaches, among them pidgin and creole studies, lingua franca, linguistic futurology, and sociology. These frameworks might provide explanation of the soci-cultural factors potentially influencing differences on frequencies attested for punctuation in English varieties.
The present study has also shown that the frequencies of the individual punctuation marks in the history of English have undergone dramatic changes. As mentioned previously, the grammatical and rhetorical functions of punctuation have often been interwoven, but they have not been studied to an equal extent. The weakening rhetorical functions of punctuation have been explored in other studies. For example, Schou (2007: 213) points out that 'from 1800 to the present, the rhetorical aspect of punctuation has dwindled into the possibility of marking asyndetic coordination by a semicolon and the option of one specific rhetorical relation expressed by the colon; the rest is left to the full stop'. Schou's observation and Partridge's point on commas can be supported by our data. The other example is the semicolon and colon. The long-term decline in the use of the semicolon began in 1800 and has continued to the present day (middle top panel in Figure 4); in contrast, the frequency of the colon continued to drop before 1850, but began to rise after 1850 (left panel of Figure 5). This is most likely due to the colon's replacing the semicolon in certain rhetorical functions.
Therefore, as Schou (2007: 213) also proposes, 'the general experience is that syntax has been central at least since 1600, although prosody played and still plays a certain role. Punctuation and its theory have increasingly moved towards a syntactic orientation'. It also seems that English native speakers have been influenced unconsciously by the syntactic orientation of punctuation and started to use fewer commas owing to their redundant rhetorical function. In short, as Schou (2007: 214) puts it, 'this development can be therefore characterized as moving from the modern rhetorical-grammatical punctuation of 1800 to the modernistic stylistic-grammatical punctuation'. This can be observed and supported from the data presented in this study.
Stylistic functions feature in the use of some punctuation marks that have been used creatively for communicative purposes. We will now examine how these stylistic functions of punctuation have been influenced by modern communicative purposes. The use of punctuation has been greatly influenced by writing and communication technologies, particularly social networking tools in the internet era. For example, Twitter has a word limit, which means that its users have to use short sentences. Frequently they strengthen their emotional impact by using exclamation marks, question marks, en dashes for emphasis, and emoji, as demonstrated by President Trump and J. K. Rowling (see Figure 6).
Inversely, the use of punctuation marks in social networking influences contemporary English. Another example can be taken to illustrate this point. It was found that the use of apostrophes in contractions (e.g. can't rather than cannot) was higher in the Twitter messages (Denby, 2010) than in either the text messaging (MS) or instant messaging (IM) according to the data collected by Ling and Baron (2007). The use of apostrophes was identified in 97% of the occurrences with contractions in Twitter as compared to 94% for IM and 32% for TM. This study can at least partly explain why the frequency of apostrophe use has been increasing recently, as shown in Figure 4. The frequency of sentence-final punctuation in Tweets is much higher than in MS and IM, as shown in Table 3 (Denby, 2010). This indicates that sentences in tweets are shorter in comparison with MS and IM, which may have been influenced by the widespread use of social networking.
Written English in the internet era is likely to represent informal speech, just as Baron (2001: 56) suggests, 'The semi-stable grammatical model of the past century is being abandoned. Instead, punctuation increasingly marks the cadences of informal speech in the case of email and other contemporary language media, and helping the eye makes sense in messages that are intended to be viewed quickly'. Actually, the stylistic-grammatical function of punctuation facilitates informal speech represented in writing in the internet era. In particular, the wide use of social networking sites such as Facebook, Twitter and Blog has allowed people to read and write language in electronic media and to freely interact

Conclusion
This study analyzes data on the frequency distribution for English punctuation marks from some large corpora. From both the diachronic and synchronic perspectives, we found that the frequency distribution for English punctuation follows the laws of least effort. The study showed that there were differences in how different punctuation marks were used in different registers. The varieties of English were also found to differ with regard to the frequencies of specific punctuation marks. In the last 300 years, the practices of punctuation marks have become more syntactical rather than rhetorical or prosodic in nature. These changes in the frequencies of punctuation marks are evident in, for instance, the shorter sentences and increase in the use of hyphenated compounds and apostrophes in contracted forms. These developments show that modern stylistic-grammatical punctuation is developing under the influence of modern writing and communication technologies. 2 Brown corpus can be retrieved by online Sketch Engine (https://the.sketchengine.co.uk/bonito/corpus/ first_form?corpname=preloaded/brown_1;align=); however, it is not possible to search for question marks (?) and there is no way to distinguish between the dash and the hyphen. The size of this corpus is smallroughly one million words. All might influence the fitting result.
3 Periods are also used following abbreviations, such as Mr., Dr., p.m.; however, the amount of such instances is not large. Due to the difficulty of identifying their proportion in the data, they are included in the counts with full-stops. 4 It is not possible to search '-' in COCA and BNC. Instead, '*-*' (*asterisk wildcard representing any word or morph) works in the search in two corpora. '*-*-*' represents a hyphenated compound with two hyphens, but many irrelevant symbols are calculated in COCA even if the amount of hyphenated compounds of this kind is not large (about over 100 per mil). Hence these are not included in the counts. 5 Although N-gram Viewer can output a graph that depicts the historical changes of frequency for a particular phrase (or word), the quality of graph is quite low. We therefore took use of 'ngramr' package (Carmody, 2015) in R programming language to extract the N-gram data and plotted new graphs based on these data. After that, all graphs were integrated into a page to yield a high-quality chart which is Figure 4 (the R script we wrote is provided to help those who would like to replicate it; please visit the link https://github.com/fivehills/punctuation). 6 Mulvey (2016) gave a detailed account of the relationship among punctuation marks and how they evolve in English and in other languages. Parkes (1993) is a good reference to the history of punctuation in western society.  (Ling & Baron, 2007) IM (Ling & Baron, 2007) Standard apostrophe use in contraction (as percentage of total) 97% 32% 94% Sentence-final punctuation (as percentage of potential sentence-final punctuation) 81.1% 39% 45% Transmission-final punctuation (as percentage of potential transmission-final punctuation) 70% 29% 35%