Working paper Open Access

An overview of the elementary statistics of correlation, R-Squared, cosine, sine, Xur, Yur, and regression through the origin, with application to votes and seats for parliament

Thomas Colignatus

The correlation between two vectors is the cosine of the angle between the centered data. While the cosine is a measure of association, the literature has spent little attention to the use of the sine as a measure of distance. A key application of the sine is a new "sine-diagonal inequality / disproportionality" (SDID) measure for votes and their assigned seats for parties for Parliament. This application has nonnegative data and uses regression through the origin (RTO) with non-centered data. Textbooks are advised to discuss this case because the geometry will improve the understanding of both regression and the distinction between descriptive statistics and statistical decision theory. Regression may better be introduced and explained by looking at the angle between a vector and its estimate rather than looking at the Euclidean distance and the sum of squared errors. The paper provides an overview of the issues involved. A new relation between the sine and the Euclidean distance is derived. The application to votes and seats shows that a majority of the electorate in the USA and UK, that have District Representation (DR) and not Equal or Proportional Representation (EPR), still tends to have "taxation without representation".

This version of the paper is included here for its relation to the education in mathematics and statistics. Update: In the former version 3.0 on Zenodo, the didactics subsection 2.3 was erroneously not included in the table of contents. Some minor edits.
Files (686.9 kB)
Name Size
686.9 kB Download
All versions This version
Views 336274
Downloads 13298
Data volume 90.7 MB67.3 MB
Unique views 273242
Unique downloads 11495


Cite as