Overview

Dataset statistics

Number of variables6
Number of observations3079
Missing cells2
Missing cells (%)< 0.1%
Duplicate rows2872
Duplicate rows (%)93.3%
Total size in memory144.5 KiB
Average record size in memory48.0 B

Variable types

Numeric6

Warnings

Dataset has 2872 (93.3%) duplicate rows Duplicates
trad_1 is highly correlated with avg and 1 other fieldsHigh correlation
avg is highly correlated with trad_1 and 1 other fieldsHigh correlation
newavg is highly correlated with trad_1 and 1 other fieldsHigh correlation
m47_2 has 44 (1.4%) zeros Zeros
trad_1 has 63 (2.0%) zeros Zeros
trad_2 has 52 (1.7%) zeros Zeros

Reproduction

Analysis started2021-02-23 14:35:01.915347
Analysis finished2021-02-23 14:35:08.332301
Duration6.42 seconds
Software versionpandas-profiling v2.10.1
Download configurationconfig.yaml

Variables

m47_1
Real number (ℝ≥0)

Distinct6
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.746021436
Minimum0
Maximum5
Zeros12
Zeros (%)0.4%
Memory size24.2 KiB
2021-02-23T15:35:08.413788image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q12
median3
Q33
95-th percentile4
Maximum5
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7696713649
Coefficient of variation (CV)0.2802860003
Kurtosis0.9287021056
Mean2.746021436
Median Absolute Deviation (MAD)0
Skewness0.1481270573
Sum8455
Variance0.5923940099
MonotocityNot monotonic
2021-02-23T15:35:08.539124image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
31618
52.5%
2993
32.3%
4301
 
9.8%
191
 
3.0%
564
 
2.1%
012
 
0.4%
ValueCountFrequency (%)
012
 
0.4%
191
 
3.0%
2993
32.3%
31618
52.5%
4301
 
9.8%
ValueCountFrequency (%)
564
 
2.1%
4301
 
9.8%
31618
52.5%
2993
32.3%
191
 
3.0%

m47_2
Real number (ℝ≥0)

ZEROS

Distinct6
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.543033452
Minimum0
Maximum5
Zeros44
Zeros (%)1.4%
Memory size24.2 KiB
2021-02-23T15:35:08.629875image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q33
95-th percentile4
Maximum5
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.8610494352
Coefficient of variation (CV)0.3385914701
Kurtosis1.194763562
Mean2.543033452
Median Absolute Deviation (MAD)1
Skewness0.2654538662
Sum7830
Variance0.7414061298
MonotocityNot monotonic
2021-02-23T15:35:08.721568image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
21323
43.0%
31265
41.1%
4194
 
6.3%
1163
 
5.3%
590
 
2.9%
044
 
1.4%
ValueCountFrequency (%)
044
 
1.4%
1163
 
5.3%
21323
43.0%
31265
41.1%
4194
 
6.3%
ValueCountFrequency (%)
590
 
2.9%
4194
 
6.3%
31265
41.1%
21323
43.0%
1163
 
5.3%

trad_1
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct6
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.507632348
Minimum0
Maximum5
Zeros63
Zeros (%)2.0%
Memory size24.2 KiB
2021-02-23T15:35:08.807636image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q33
95-th percentile4
Maximum5
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.044116323
Coefficient of variation (CV)0.4163753604
Kurtosis0.01556604178
Mean2.507632348
Median Absolute Deviation (MAD)1
Skewness0.03371318092
Sum7721
Variance1.090178895
MonotocityNot monotonic
2021-02-23T15:35:08.915987image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
31284
41.7%
2866
28.1%
1482
 
15.7%
4265
 
8.6%
5119
 
3.9%
063
 
2.0%
ValueCountFrequency (%)
063
 
2.0%
1482
 
15.7%
2866
28.1%
31284
41.7%
4265
 
8.6%
ValueCountFrequency (%)
5119
 
3.9%
4265
 
8.6%
31284
41.7%
2866
28.1%
1482
 
15.7%

trad_2
Real number (ℝ≥0)

ZEROS

Distinct6
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.422539786
Minimum0
Maximum5
Zeros52
Zeros (%)1.7%
Memory size24.2 KiB
2021-02-23T15:35:09.003337image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q12
median2
Q33
95-th percentile4
Maximum5
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.125242843
Coefficient of variation (CV)0.464488901
Kurtosis-0.5296125938
Mean2.422539786
Median Absolute Deviation (MAD)1
Skewness0.1885798583
Sum7459
Variance1.266171455
MonotocityNot monotonic
2021-02-23T15:35:09.098012image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
31007
32.7%
2833
27.1%
1697
22.6%
4375
 
12.2%
5115
 
3.7%
052
 
1.7%
ValueCountFrequency (%)
052
 
1.7%
1697
22.6%
2833
27.1%
31007
32.7%
4375
 
12.2%
ValueCountFrequency (%)
5115
 
3.7%
4375
 
12.2%
31007
32.7%
2833
27.1%
1697
22.6%

avg
Real number (ℝ≥0)

HIGH CORRELATION

Distinct21
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.554806755
Minimum0
Maximum5
Zeros6
Zeros (%)0.2%
Memory size24.2 KiB
2021-02-23T15:35:09.323610image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1.25
Q12
median2.5
Q33
95-th percentile4
Maximum5
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.8209486858
Coefficient of variation (CV)0.321334944
Kurtosis0.4471057768
Mean2.554806755
Median Absolute Deviation (MAD)0.5
Skewness0.3041216853
Sum7866.25
Variance0.6739567446
MonotocityNot monotonic
2021-02-23T15:35:09.439232image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
3517
16.8%
2.75382
12.4%
2338
11.0%
2.5332
10.8%
2.25303
9.8%
1.75237
7.7%
1.5211
6.9%
3.25187
 
6.1%
3.5112
 
3.6%
1.2592
 
3.0%
Other values (11)368
12.0%
ValueCountFrequency (%)
06
 
0.2%
0.253
 
0.1%
0.511
 
0.4%
0.7516
 
0.5%
157
1.9%
ValueCountFrequency (%)
532
1.0%
4.7527
0.9%
4.528
0.9%
4.2541
1.3%
464
2.1%

newavg
Real number (ℝ≥0)

HIGH CORRELATION

Distinct29
Distinct (%)0.9%
Missing2
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean2.566646084
Minimum0
Maximum5
Zeros14
Zeros (%)0.5%
Memory size24.2 KiB
2021-02-23T15:35:09.540967image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1.25
Q12
median2.67
Q33
95-th percentile4
Maximum5
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.8448029348
Coefficient of variation (CV)0.3291466401
Kurtosis0.5389609469
Mean2.566646084
Median Absolute Deviation (MAD)0.58
Skewness0.2161197238
Sum7897.57
Variance0.7136919986
MonotocityNot monotonic
2021-02-23T15:35:09.664987image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%)
3590
19.2%
2372
12.1%
2.75357
11.6%
2.5250
8.1%
2.25210
 
6.8%
1.5201
 
6.5%
1.75173
 
5.6%
3.25159
 
5.2%
3.5104
 
3.4%
2.6789
 
2.9%
Other values (19)572
18.6%
ValueCountFrequency (%)
014
0.5%
0.253
 
0.1%
0.3311
0.4%
0.53
 
0.1%
0.755
 
0.2%
ValueCountFrequency (%)
537
1.2%
4.7527
0.9%
4.6716
0.5%
4.523
0.7%
4.2527
0.9%

Interactions

2021-02-23T15:35:04.456061image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:04.576121image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:04.680612image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:04.789680image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:04.900237image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:05.005605image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:05.119331image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:05.240961image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:05.360831image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:05.475252image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:05.588991image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:05.700034image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:05.914807image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:06.030566image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:06.136615image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:06.245083image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:06.349631image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:06.452056image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:06.557532image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:06.680598image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:06.805054image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:06.928870image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:07.046751image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:07.150968image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:07.257820image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:07.363585image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:07.475016image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:07.585841image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:07.690973image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-02-23T15:35:07.816858image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-02-23T15:35:09.775346image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-23T15:35:09.928278image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-23T15:35:10.075460image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-23T15:35:10.207241image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-02-23T15:35:07.999757image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-02-23T15:35:08.166466image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-02-23T15:35:08.254991image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

m47_1m47_2trad_1trad_2avgnewavg
044544.254.25
144554.504.50
254444.254.25
332111.751.33
433322.752.75
522111.501.50
622111.501.50
732101.501.50
822111.501.50
922111.501.50

Last rows

m47_1m47_2trad_1trad_2avgnewavg
306932212.002.00
307022211.751.75
307133333.003.00
307223232.502.50
307333333.003.00
307432122.002.00
307532242.752.33
307633333.003.00
307732322.502.50
307822111.501.50

Duplicate rows

Most frequent

m47_1m47_2trad_1trad_2avgnewavgcount
8533333.003.00448
8433322.752.75208
3722222.002.00193
3322111.501.50154
3622211.751.75104
6632222.252.2585
7132332.752.7583
7032322.502.5082
3822232.252.2581
8333312.503.0060