Statistics As Measurement: 4 Scales/Levels of Measurement

This article discusses the basics of the “4 scales of measurement” and how they are applicable to research or everyday tools of life. To do this you will be able to list and describe the four types of scales of measurement used in quantitative research; provide examples of uses of the four scales of measurement; and determine the appropriate measurement scale for a research problem. The article is designed to present an overview of statistical methods in order to better understand research results. Formulas and mathematical computations will not be presented, as the goal for this article is to merely provide a basic understanding of statistical measurement.


Introduction
"Statistics can be fun or at least they don't need to be feared" (1). When the authors first heard this, they asked who are you kidding! Statistics are part of our everyday life and the ability to reason and think statistically should not be considered as something of a luxury, or a catalyst that triggers anxiety at the mere mention. For example, statistics can be found in the local newspapers, sports, banking statements, and most recently media broadcasts addressing pandemic information. As early as 1903, H. G. Wells, ironically a science fiction author, warned that statistical thinking would one day become a necessity along with the ability to read and write. Many people are finding that they are ill prepared in the fundamentals of statistics and are unable to navigate through the vast information currently being presented in news and media. In general, statistics doesn't have to be scary, can be fun, and is quite useful in everyday life. two categories mentioned previously -Descriptive and Inferential statistics, each with their own goals and formulas.

Descriptive and inferential Statistics
Any statistical study requires the collection of information (data) gathered from a group known as the population (3). Descriptive statistics takes the data from the entire population and organizes it in some type of graphical representation, and then summarizes the data to gain an overall understanding. In simple terms, descriptive statistics describes the data in a quantifiable manner such as in measures of center (mean, median, mode) which capture general trends in the data; and measures of spread (range, variance, standard deviation) which describes the distribution of the data values throughout the population in relation to each other (Taylor, 2018). An example of descriptive statistics is to take the scores of a 7 th grade math test and find the average student score, or to find how each score deviates from the mean score. It is important to note that descriptive statistics cannot be generalized to other populations, however, the data can be used to infer or make predictions using inferential statistics.
Inferential statistics uses formal methods for drawing conclusions from the sample data, and then makes inferences about the larger population. Whereas descriptive statistics have a primary goal of summarizing the entire population, inferential statistics has two goals -estimating and predicting based on a sample of the population (3, 4;Taylor, 2018). Inferential statistics takes what is known (from descriptive statistics) and "makes assumptions or inferences about what is not known" (Newsome, 2007). If we wanted to determine how well students will do on the math test mentioned above for the entire state, it would more feasible to analyze a sample or subset (one school district compared to the entire state), and then generalize to the entire population. These measures obtained from analyzing the sample, when applied to, and thus representing, the entire population are known as parameters (3). Parameters are a "characteristic of the whole population" (Calkins, 2005). Inferential statistics uses confidence intervals (range of values from observed interval estimates), and hypothesis testing based on probabilities which may be uncertain. As Figure 1 depicts, with inferential statistics only a sample of the population is needed, however, descriptive statistics require the entire population be used (3, 4;Taylor 2018). Table 1 provides the main differences between descriptive and inferential statistics; however, it is important to note that generalization is a key difference between the two. Descriptive statistic measurements are summarized as exact numbers and cannot be used to generalize to other populations whereas inferential statistics starts with a sample and generalizes to a population expressed as a range of values along with confidence levels. The two are not mutually exclusive (Calkins, 2005).

Descriptive Statistics
Inferential Statistics • It simply describes or organizes data regarding the population under study -it does not draw conclusions.
• It extrapolates data to a whole population using a smaller representative sample, allowing you to make predictions and draw conclusions.
• It does not use probabilities.
• It is based on the probability theory.
• The results of descriptive statistics are presented as numbers, graphs, charts or tables.
• The results of inferential statistical analyses are presented as a range of potential figures, along with a margin of error. Table 1: Differences between descriptive and inferential statistics (3, 4;Taylor, 2018).

Measurement
From a historical perspective, measurement systems have long existed in human activities for a variety purposes such as to simplify trade or to compare items. Metrology is the study of measurement whose sole purpose is to unify acceptable standards of common understanding and conformity. Such standards of measurement include the metric system, proposed by France in 1795, and accepted worldwide as the International System of Units (SI) during the 11 th Conference Generale des Poids et Mesures (CGPM) in 1960. Since measurement is the cornerstone of science, technology, and research fields, application is dependent on context and discipline. Every day use of the term measurement would typically refer to the length of something (i.e. feet, miles, meters, etc.), or how much something weighs (i.e. grams, ounces, pounds, etc.) (1,5).
According to The Center for Innovation in Research and Teaching, measurement is a "means of assigning numbers or other symbols to characteristics of objects according to certain pre-specified rules" (6). Helmenstine (2019) states that when comparisons of quantity are made with a specified rule or standard unit, it cannot be perfect and "inherently include error, which is how much a measured value deviates from the true value" (para 1). For educational purposes, measurement "refers to any device for the general study and practice of testing, scaling, and appraising the outcomes of educational process" (7). Applying measurement methods, ranging from a simple computation of calculating the mean of a distribution, or more complex calculations such as interactional effects, to data is known as statistics (8). Measurement, in terms of statistical research, is used to assign symbols, letter, or numbers to variables according to conditional rules. These measurement tools are held to standards and can be used to obtain reliable results.

Quantitative and Qualitive Data
The variables used in statistical measurements can be classified into two different types: qualitative or quantitative variables. Qualitative variables, also referred to as categorical variables, describe data that are not numerical and fit into specific categories. This would include such items as eye color (i.e. brown, blue, hazel); type of automobiles (i.e. SUV, compact, mid-size), or gender (i.e. male, female). Qualitative variables have no natural order nor can they be added, subtracted, multiplied, or divided.
Quantitative data, on the other hand, has numerical value and is something that can be counted or measured by a tool or scale. For example, the average of test scores, number of recorded births, home values, etc. The type of data represented quantitatively could be expressed as a distribution of values or summarized as an average, and are classified as either discrete whole number values (i.e. number of t-shirts sold, students in a class, home runs hit etc.), or continuous rational values (i.e. distance traveled, weight/height, temperature, etc.) (Surbhi, 2017).
Electronic copy available at: https://ssrn.com/abstract=3685215 Both qualitative and quantitative variables are classified into four categories of scales of measurement -nominal, ordinal, interval or ratio. As Figure 2 shows, qualitative data can be either classified as nominal (by name such as gender, name, social security numbers) with no ranking order, or ordinal that provides some type of order but without a mathematical difference between scales (e.g. level of satisfaction, or spiciness levels -med, mild, hot). Quantitative data can be classified as either Interval (e.g. differences in temperatures according to an interval scale) or ratio (e.g. measurement of height). The authors note a word of caution when dealing with Likert scale surveys as they can be considered both interval or ordinal depending on if the data fulfills the interval scale requirements, or if they are just ordering responses. Figure 5 summarizes how each of the four scales interacts with qualitative and quantitative data and will be the focus of the remainder of this article (6).

Scales of Measurement
In statistical measurements, all variables fall in one of the four scales of measurement -nominal, ordinal, interval, and ratio (Figure 3), and are considered an easy way to sub-categorize different types of data. The variables in collected data can be classified into one of the four scales of measurement depending on how a variable is defined or categorized, and analyzed (1, 9). As Figure 4 depicts, each scale fulfills a function of scale preceding it (Bhat, 2020) and are considered additive -Nominal scales simply names based on characteristics with no specific order. Ordinal scales names and then places variables in specific order based on their attributes. Interval scales names, orders, and then proportionate intervals between variables. Finally, ratio scales does all that interval scales plus it can accommodate zero as a value for any of its variables.
Electronic copy available at: https://ssrn.com/abstract=3685215  (10) It is important to know about scales of measurement, because without this understanding, you would not be able to conduct appropriate data analysis techniques. In the process of statistical measurement numbers, or values are assigned according to set rules, and hence determines the scale of measurement. Each scale of measurement has its own property or set of properties, which determines what statistical methods to use (7). In knowing the level of measurement, you will be able to interpret the data, and then decide what statistical analysis is suitable. The following description of the levels is based on the references (6, 7, 9, Crossman, 2019;Garger, 2010;Regoniel, 2012;Volchok, 2015).

Nominal Scales
At the lowest level and weakest form of measurement, nominal scales are considered the easiest to understand because they are simply used for labeling or arbitrarily categorizing variables. The word nominal comes from the Latin word Nomen which means "name". Nominal scales have no quantitative value or order, nor can you apply any mathematical operations with them. Nominal scales are essentially a type of coding, and are based on qualities of type or kind such as gender, ethnicity, place of birth. Typical coding used are numbers, letters, colors, labels or any other symbol that distinguishes between the categories.
Nominal scales can arbitrarily be assigned numbers (i.e.., 1=East, 2=North, 3=South, etc.), however, with no order or equal intervals, one cannot perform arithmetic (+, -, /, *) or logical operations (>, <, =) on the data sets. It is important to note that in assigning numbers in nominal scales, it does not imply order, or ranking, it just uniquely names the attribute. An example of this type of coding would be football jerseys. A player with jersey number 20 is not twice the player with a jersey number of 10. As the example illustrates, there is only a qualitative difference between the jersey numbers, and not a quantitative one. An easy way to remember nominal "that "nominal" sounds a lot like "name" and nominal scales are kind of like "names" or labels" (11).
Nominal scales are often used in survey research or questionnaires, for example asking students what their favorite music genre is between Rap, Country, or Pop. The researcher could then arbitrarily assign numbers to each category, and then quantify the preference for each genre using percentages or mode (which category received the highest vote). Figure 7 provides three examples of nominal scales each with no overlap (mutually exclusive) nor numerical significance. Notice that in the case of "What is your gender", there are only two variables to select from -male or female, and are identified as a dichotomy. Other examples of dichotomous nominal scales without order include yes/no, hot/cold, on/off. Some nominal data can be ordered such as with "cold, warm, hot, very hot" but again, the order has no quantifiable significance.

Ordinal Scales
The second level of measurement is the ordinal scale which builds off of the nominal scale. While both nominal and ordinal scales categorize data, the primary difference between the two is that ordinal data is concerned with rank-ordered (i.e. highest to lowest), and summarizes where data points are in relations to each other. Ordinal scale is defined as "a variable measurement scale used to simply depict the order of variables and not the difference between each of the variables" (11), and typically uses non-numeric categories for example, low, medium, and high, or Likert items with responses such as never, sometimes, often, always.
The ordinal scale is best used when rank is important, but when intervals between data points are not the same length. An example would be a students' rank in class where the distance between the top student's GPA ranking is not as close to that of the second and third ranked students. Since there is a lack of equal distance between rankings, mathematical manipulations cannot be performed on ordinal data other than logical operations (>, <, =), nor can a true zero value be indicated (i.e. how much faster 1 st place is compared to 2 nd and 3 rd etc.). Although the data can be rank-ordered, the distance between the rankings have no meaning and may vary. As with nominal data, and easy way to remember ordinal scales is that ordinal sounds much like "order" which is precisely to purpose of this type of scale.
A Likert Scale is a good example of ordinal measurement and typically used by market researchers to determine non numeric levels of customer satisfaction (Figure 8). In the case of Figure 9, numbers are used to rank the responses, but are only used to represent order and not meaning in regards to distance between scores (a 4 -Happy does not indicate twice as much as 2-Unhappy). It is important to note that with this type of scale, order is significant since the difference, or magnitude, between each interval is not really known which is the major disadvantage of ordinal scales.

Interval Scales
Building on the two previous levels, the third level-Interval Scales goes beyond categorizing and ordering by establishing consistent distances known as intervals between categories or data points. Bhat (2020) defines interval scales as "a numerical scale where the order of the variables is known as well as the difference between these variables". Since numbers are used to express quantities, with interval data (price per gallon of gasoline, miles driven, and temperature) the numbers will be classified as all-inclusive (a specific dollar amount), mutually exclusive (a single amount cannot take on two different values), and ordered (gas at station 1 is more that gas at station 2). A key point of interval scales is the equal and meaningful distance between measures that lack a true zero point.
A classic example of interval scales is temperature where 50 degrees and 70 degrees has the same distance between that of 40 degrees and 60 degrees. Since temperature can be a negative value, the value of zero is considered arbitrary (0 degrees Celsius is the same as 32 degrees Fahrenheit), and does not indicate the absence of something (0 degrees does not mean no temperature). Other common examples of interval scale measurements include IQ scores, personality tests and aptitude test scores. Theoretically, IQ scales should be treated as ordinal data since psychologists have no way of quantifying intelligence (i.e. a person with an IQ of 120 is not twice as smart as someone with an IQ of 60), however, in practice, IQ scores are treated parametrically as interval (and at times ratio) data. As with the previous two scales, interval scales is easy to remember since the word interval means "space in between" -the constant distance being the key characteristic of interval scales.
In addition to being able to categorize and order data frequencies and percentages, interval scales allow for more advanced statistical analysis such as in measuring mode, median, or mean, and importantly the calculation of standard deviation. Interval data can be added or subtracted; however, it is important to note that interval data cannot be multiplied or divided. This is because of the presence of an arbitrary zero point; thus, you cannot calculate ratios in a meaningful way (i.e. 100 degrees is not twice as hot as 50 degrees even though the attribute value is twice as large).

Ratio Scales
The final level of measurement is ratio scale and is considered the highest level of measurement because it satisfies all four levels of measurement, contains the most information about the data values, and includes the presence of zero as a starting point. Ratio scale, as defined by Bhat (2020) is "a variable measurement scale that not only produces the order of variables but also makes the difference between variables known along with information on the value of true zero". Since ratio data has a starting point of zero, values less than zero are not possible (i.e. you cannot weigh -40 pounds). This fact alone allows for a wide range of both descriptive and inferential statistics to be applied.
With ratio data you can construct meaningful fractional values, such as weight, and height. Time is an example of comparisons in ratio scales because it can be divided into equal intervals that can be used compare to another value (i.e. 10 minutes is twice as long as 5 minutes, and income of $52,000 is twice that of a starting salary of $26,000, etc.). Ratio and interval scales are often used interchangeably, however, it is important to keep in mind that interval data cannot not be meaningfully multiplied or divided as previously discussed. The interval data for temperature is not ratio because zero degrees does not equate to "no temperature," however, income would be considered ratio data because zero dollars is truly "no income." As mentioned, the ratio scale satisfies all levels of measurement because it categorizes data, organizes the data so that comparisons can be made in relation to each other, and sets the data at equal intervals apart. Ratio scale data produces the most power statistical information, and it is possible to extract lower-level statistics from it such as categories or ranking, however not vis-versa. Knowing what level of measurement your data is will determine the type of analysis to use, thus allowing you to ask deeper degrees of questions about your data. If your data is ordinal rather than interval or ratio, it limits the type of statistical analysis that can be used therefore useful information would not be discovered. A few examples include a chi-square tests of independence being most appropriate for nominal level data, and The Mann-Whitney U test would be used for an ordinal level (dependent variables) and a nominal level (independent variables) data. Figure 10 summarizes each of the four levels of measurement in respect to properties, examples, and statistical analysis used, along with appropriate graphical representations.

Measurement Error
No matter which level of measurement used in statistical research, researchers, statisticians, and data professionals all agree that there will always be some type of measurement error in statistical measurement. According to Glen (2020), "Measurement Error (also called Observational Error) is the difference between a measured quantity and its true value" (para 1). These errors, which may be either negative or positive, can occur when there is a difference between a reported test score and actual knowledge/ability, or when outside factors influence actual performance during the collection process (12). Random errors are those that are generally expected and occur naturally in the scientific process. Systematic errors are another type of measurement error and occur due to human error or miscalibrations of testing instruments. Regardless of the error type, effort should be given to minimize these errors.
Measurement relies on precision and accuracy meaning the more precise your data is, the more accurate it will be to the true value with minimal error. With interval and ratio scales, precision is not so much of an issue because parameters can be set to a much finer distinction (i.e. using millimeters rather than centimeters to measure height, or nearest millionth of a second rather than tenth of a second for lapsed time). It is much more difficult to maintain precision with ordinal scale measurements as is the case with a five-point scale verses a 29-point scale. Participants would have difficulty selecting a best representation of their responses thus introducing measurement error and an illusion of precision (Fife-Schaw, 2006).

Application and Conclusion
Measurement is a constant in our society. Are you a sports fan and identify number 9 on an NFL New Orleans Saints jersey as Drew Bree's? Are you a mother looking at a small box of cereal comparing unit rates ($0.14 per ounce for large box compared to $0.11 per ounce on another). How about checking your child's temperature? We use measurement to identify if training in the workplace is effective, to evaluate classroom instruction, to identify knowledge and skills gaps, and even more recently to determine if CDC guidelines are effective (Does hand washing and social distancing decrease the rate of infections?). Learning management systems (LMS) use analytical measurements to tell which pages were accessed more frequently, and classroom teachers can use the measurement features in their current virtual programs to differentiate instruction. These items are just the tip of the iceberg as the list of what we could measure, and apply the four levels of measurement, goes on indefinitely. Since we all use statistics and measurement on a daily basis, having a basic understanding of the four levels of measurements, regardless of vocation, is important. From an everyday consumers perspective, levels of measurement information assists us in making informed decisions. The information we will work with to answer the aforementioned daily activities are the result of researcher's knowledge of measurement scales to determine credibility and validity of collected data. As we have learned from this article, each of the four levels of measurement provides for different amounts of detail with nominal producing the least amount of detail and ratio the most amount. Each level will also determine how to mathematically treat the data and which type of testing procedures are most appropriate. We also learned that by using a sample set of the population, we can use the data to make generalized decisions for the entire population.