Categorical Data Analysis Using Logistic Regression

The categorized data that will be analyzed in this paper will be of the type that will use the logistic regression method. The rest of the paper will focus on logistic regression and then the combination of these in the title of the topic mentioned above. Logistic regression is a technique widely used for categorical data analysis, offering increased flexibility compared to traditional intersection analysis. A binary result can be predicted using one or more categorical variables, continuous variables or combinations.


Introduction
Random variables The random variable X is called any real function X conveys in space Ω so X: Ω → R while, we mark the random variables with the letters X, Y, Z,… etc (Kastrati, 2017). When studying a random change the biggest problem is finding the probability of its different values, where this problem is smart because the values of the random changes are determined by the test results, where we are dealing with a probability P. We note H = {1x, 2x, 3x…… nx} community and possible value of number X (Kastrati, 2017).
According to Pierce (2020), the random variations and possible values are specifically explained. This is illustrated by such an example: We get an important coin, and it starts from the fact that the coin has two sides. One side contains the "head" and another number. If I count 0 my head, while I count 1 we mark the number, we can get the changes as follows: With X we mark the random changes, 0 and 1 represent the possible probabilistic changes, while the coin represents the event and the possibility that may occur. In our reality only two events can happen, the head or the coin toss number can fall. X = {0.1}. X e random variable, {0, 1 possible values}. In the second example according to (Pierce, 2020), three coins are taken to see what are the possible cases that can occur when three coins are tossed at once. If we denote the random change by X, we denote the sample space by P (this represents all possible cases that may occur after the coins are tossed). Coins can fall in eight different ways. If we want the head to fall there in every possible case let's see how many times it can happen. Imagine if the coins are tossed and with K we mark the head while with N we mark the number: 1 Example below best expresses the values mentioned above.
= {0,1}. K K K 3 times the head has dropped so the value of a combination is 3.

Discrete and continuous random variables
Kastrati (2017), mentioned that the random variable X is called discrete if the set H is at most countable. Examples are: number of children in the family, number of students in a class, number of times we go to the beach, etc. Examples of discrete random variables are given above.
Definition: The random variable X is said to be continuous if a function f (x) ≥0 (for every x∈R) is found so that for every set B c R we have: The random variables X and Y are called independent if for ∀ x, y∈R P (X <x; Y <y) = P (X <x) • P (Y <y) Regarding the continuous variables we will take the following examples: Example 2: It is known that the blood glucose level of diabetic persons follows a normal distribution pattern with an average of 106 mg / 100 ml and a standard deviation of 8 mg / 100 ml.
We calculate the probability of an ordinary person with diabetes having a glucose level of less than 120 mg / 100 ml.
Based on the above formulas we have: P (X≤120) = 0.9599.

Categorized data
A verifiable number of authors have explained in two different forms the data categorized in areas such as: medicine, economics, sports and other fields. Change of content categories of a category according to a measurement scale (Agresti, 2003). The diagnosis of a disease which has been made with a sophisticated modern apparatus can give a clue as follows: normal, harmless, dangerous, probable, suspicious, critical, etc. (James, 2007). Developing methods for category data and promoting research study in the social and medical sciences, moreover scales of category variables used today in social science to measure opinions and attitudes, so the categorized scales used in today's medicine to measure different outcomes and to indicate whether medical treatment is successful (Akturk, 2007;Lipsitz, 1994).
Although these provide a great opportunity for social and medical science, it does not mean that they are limited to these scientific areas, but they often do not even reach the behavioral science (for example: types of mental illness, with this category as: schizophrenia, depression, nervousness) (Feldman, 2009), in epidemiology and public health (for example: contraceptive methods, pills, IUD, etc.) (Preisser, 1997), in zoology (for example: alligators as a preference for primary food, fish category, invertebrates, reptiles) (Maddison, 2014), in education (for example: student answers to exam questions, data category and errors) (Chen, 2003), in marketing (Magidson, 1982), (for example: consumer preferences for the main brand of a product, category A, B and C). In addition to the types of science these also contribute to numerous quantum fields are engineering science and industry control. Examples to be able to take some items and classify them in groups with reference to how much I meet certain standards, also could not be subjective assessment of some characteristics of rubber, for example: as much as possible and may have a costume life, how good and tasty a food product can be (Ai, 2002).

Nominal, sorted and interval variables
Variables that have taken values not according to an order that have become nominal (Agresti, 2003)

Evaluation of parameters in the data dissemination function
For generating random numbers of the rbinom () function we use the example retrieved by "Felix" (2016), 2 the uses command generates 30 random numbers from a binomial distribution with p = 0.4 and N = 15. The simulated data set will change to require a simulation due to Results: dat1 4 5 6 7 8 9 10 4 5 7 3 6 2 3 Example discussion: In contrast to the rbinom () function, which generates random numbers, the dbinom () function provides security for x success having parameters p and N. D on behalf of the function is derived from density, since for redundant distribution functions, example; normal distributions, this function is called the probability density function. What else for discrete distribution included binomial distribution this function is called the probability measure function. This means even if this terminology is not confusing the dbinom () function ensures the operation of the binomial distribution probability measure. First we do not visualize this function for a parameter specification set. Now we see that options should be needed up to 1 by definition. When we toss 15 coins we will see that I will live between zero and fifty heads, i.e.

Results:
The likelihood function is defined as the probability of the data given in the model. With dbinom () we get the probability for every single set, assuming a value of this parameter of p. In this case we do not know the true value of p because we have simulated "information". Finally, when we have to give real we do not have we do not know the true value of p. The N parameter is determined by the design of the experiment or sampling. The following function represents the value of the similarity function with values that we have represented: dbinom ( Above it is seen that this is a very small number. To avoid numerical problems the probabilities of the logarithm can be calculated, and which provides values that are numerically more controllable (i.e. negative values that are not so close to zero).

Conslusion
The paper has elaborated on the categories of information by making analyzes on the change of the case, respectively their categorization to come to the required result. The understanding of the random variations and the main distribution discussed above in Bionomial, Multinomial, cannot be reflected in the results we seek. In addition to the examples taken from everyday life and the choice of rubber in the classical way, in this topic the appearance and signs of realization by means of the code in the language program R. The idea of working with this programming language is that it enables us to quickly find the truth of results we do not seek.
Examples are taken from everyday life that we have combined the ideas of different authorities to better understand what it does not require. To this analysis to categorize the data is added the logistic regression and what can be done to complete this topic, leaving us to understand even better the use of information that I need to analyze more delicately and have a whole presented with the results.