Working paper Open Access

An overview of the elementary statistics of correlation, R-Squared, cosine, sine, Xur, Yur, and regression through the origin, with application to votes and seats for parliament

Thomas Colignatus

The correlation between two vectors is the cosine of the angle between the centered data. While the cosine is a measure of association, the literature has spent little attention to the use of the sine as a measure of distance. A key application of the sine is a new "sine-diagonal inequality / disproportionality" (SDID) measure for votes and their assigned seats for parties for Parliament. This application has nonnegative data and uses regression through the origin (RTO) with non-centered data. Textbooks are advised to discuss this case because the geometry will improve the understanding of both regression and the distinction between descriptive statistics and statistical decision theory. Regression may better be introduced and explained by looking at the angle between a vector and its estimate rather than looking at the Euclidean distance and the sum of squared errors. The paper provides an overview of the issues involved. A new relation between the sine and the Euclidean distance is derived. The application to votes and seats shows that a majority of the electorate in the USA and UK, that have District Representation (DR) and not Equal or Proportional Representation (EPR), still tends to have "taxation without representation".

This version of the paper is included here for its relation to the education in mathematics and statistics. Update: In the former version 3.0 on Zenodo, the didactics subsection 2.3 was erroneously not included in the table of contents. Some minor edits.
Files (686.9 kB)
Name Size
Zenodo-2018-02-20-Overview-r-squared-2018-04-25.pdf
md5:14fccfb36f1f8f0ce65fa468bf2e1716
686.9 kB Download
82
54
views
downloads
All versions This version
Views 8268
Downloads 5443
Data volume 37.1 MB29.5 MB
Unique views 7065
Unique downloads 4341

Share

Cite as