ExpNumStat.RdFunction provides summary statistics for all numerical variable. This function automatically scans through each variable and select only numeric/integer variables. Also if we know the target variable, function will generate relationship between target variable and each independent variable.
ExpNumStat(data, by = "A", gp = NULL, Qnt = NULL, Nlim = 10, MesofShape = 2, Outlier = FALSE, round = 3, dcast = FALSE, val = NULL)
| data | dataframe or matrix |
|---|---|
| by | group by A (summary statistics by All), G (summary statistics by group), GA (summary statistics by group and Overall) |
| gp | target variable if any, default NULL |
| Qnt | default NULL. Specified quantiles [c(.25,0.75) will find 25th and 75th percentiles] |
| Nlim | numeric variable limit (default value is 10 which means it will only consider those variable having more than 10 unique values and variable type is numeric/integer) |
| MesofShape | Measures of shapes (Skewness and kurtosis). |
| Outlier | Calculate the lower hinge, upper hinge and number of outliers |
| round | round off |
| dcast | fast dcast from data.table |
| val | Name of the column whose values will be filled to cast (see Detials sections for list of column names) |
summary statistics for numeric independent variables
Summary by:
Only overall level
Only group level
Both overall and group level
coloumn descriptions
Vname - Variable name
Group - Target variable
TN - Total sample (inculded NA observations)
nNeg - Total negative observations
nPos - Total positive observations
nZero - Total zero observations
NegInf - Negative infinite count
PosInf - Positive infinite count
NA_value - Not Applicable count
Per_of_Missing - Percentage of missings
Min - minimum value
Max - maximum value
Mean - average value
Median - median value
SD - Standard deviation
CV - coefficient of variations (SD/mean)*100
IQR - Inter quartile range
Qnt - quantile values
MesofShape - Skewness and Kurtosis
Outlier - Number of outliers
Cor - Correlation b/w target and independent variables
# Descriptive summary of numeric variables - Summary by Target variables ExpNumStat(mtcars,by="G",gp="gear",Qnt=c(0.1,0.2),MesofShape=2, Outlier=TRUE,round=3)#> Vname Group TN nNeg nZero nPos NegInf PosInf NA_Value Per_of_Missing #> 2 disp gear:4 12 0 0 12 0 0 0 0 #> 8 disp gear:3 15 0 0 15 0 0 0 0 #> 14 disp gear:5 5 0 0 5 0 0 0 0 #> 4 drat gear:4 12 0 0 12 0 0 0 0 #> 10 drat gear:3 15 0 0 15 0 0 0 0 #> 16 drat gear:5 5 0 0 5 0 0 0 0 #> 3 hp gear:4 12 0 0 12 0 0 0 0 #> 9 hp gear:3 15 0 0 15 0 0 0 0 #> 15 hp gear:5 5 0 0 5 0 0 0 0 #> 1 mpg gear:4 12 0 0 12 0 0 0 0 #> 7 mpg gear:3 15 0 0 15 0 0 0 0 #> 13 mpg gear:5 5 0 0 5 0 0 0 0 #> 6 qsec gear:4 12 0 0 12 0 0 0 0 #> 12 qsec gear:3 15 0 0 15 0 0 0 0 #> 18 qsec gear:5 5 0 0 5 0 0 0 0 #> 5 wt gear:4 12 0 0 12 0 0 0 0 #> 11 wt gear:3 15 0 0 15 0 0 0 0 #> 17 wt gear:5 5 0 0 5 0 0 0 0 #> sum min max mean median SD CV IQR Skewness #> 2 1476.200 71.100 167.600 123.017 130.900 38.909 0.316 81.075 -0.200 #> 8 4894.500 120.100 472.000 326.300 318.000 94.853 0.291 104.200 -0.267 #> 14 1012.400 95.100 351.000 202.480 145.000 115.491 0.570 180.700 0.408 #> 4 48.520 3.690 4.930 4.043 3.920 0.312 0.077 0.188 1.994 #> 10 46.990 2.760 3.730 3.133 3.080 0.274 0.087 0.145 1.017 #> 16 19.580 3.540 4.430 3.916 3.770 0.390 0.099 0.600 0.386 #> 3 1074.000 52.000 123.000 89.500 94.000 25.893 0.289 44.250 -0.077 #> 9 2642.000 97.000 245.000 176.133 180.000 47.689 0.271 60.000 -0.196 #> 15 978.000 91.000 335.000 195.600 175.000 102.834 0.526 151.000 0.337 #> 1 294.400 17.800 33.900 24.533 22.800 5.277 0.215 7.075 0.611 #> 7 241.600 10.400 21.500 16.107 15.500 3.372 0.209 3.900 -0.082 #> 13 106.900 15.000 30.400 21.380 19.700 6.659 0.311 10.200 0.373 #> 6 227.580 16.460 22.900 18.965 18.755 1.614 0.085 1.113 0.891 #> 12 265.380 15.410 20.220 17.692 17.420 1.350 0.076 0.955 0.437 #> 18 78.200 14.500 16.900 15.640 15.500 1.130 0.072 2.100 0.113 #> 5 31.400 1.615 3.440 2.617 2.700 0.633 0.242 1.026 -0.157 #> 11 58.389 2.465 5.424 3.893 3.730 0.833 0.214 0.508 0.714 #> 17 13.163 1.513 3.570 2.633 2.770 0.819 0.311 1.030 -0.276 #> Kurtosis 10% 20% LB.25% UB.75% nOutliers #> 2 -1.615 76.000 78.760 -42.688 281.613 0 #> 8 -0.235 238.200 272.240 119.500 536.300 0 #> 14 -1.647 105.180 115.260 -150.750 572.050 0 #> 4 3.640 3.855 3.900 3.619 4.369 1 #> 10 0.707 2.828 2.986 2.818 3.397 4 #> 16 -1.555 3.572 3.604 2.720 5.120 0 #> 3 -1.558 62.300 65.200 -0.625 176.375 0 #> 9 -0.910 107.000 142.000 60.000 300.000 0 #> 15 -1.418 99.800 108.600 -113.500 490.500 0 #> 1 -0.946 19.380 21.000 10.387 38.688 0 #> 7 -0.640 11.560 14.100 8.650 24.250 0 #> 13 -1.457 15.320 15.640 0.500 41.300 0 #> 6 1.323 17.148 18.344 16.796 21.246 2 #> 12 -0.262 16.252 16.990 15.602 19.423 4 #> 18 -1.729 14.540 14.580 11.450 19.850 0 #> 5 -1.300 1.845 1.988 0.594 4.699 0 #> 11 -0.161 3.303 3.439 2.689 4.719 4 #> 17 -1.273 1.764 2.015 0.595 4.715 0# Descriptive summary of numeric variables - Summary by Overall ExpNumStat(mtcars,by="A",gp="gear",Qnt=c(0.1,0.2),MesofShape=2, Outlier=TRUE,round=3)#> Vname Group TN nNeg nZero nPos NegInf PosInf NA_Value Per_of_Missing sum #> 2 disp All 32 0 0 32 0 0 0 0 7383.100 #> 4 drat All 32 0 0 32 0 0 0 0 115.090 #> 3 hp All 32 0 0 32 0 0 0 0 4694.000 #> 1 mpg All 32 0 0 32 0 0 0 0 642.900 #> 6 qsec All 32 0 0 32 0 0 0 0 571.160 #> 5 wt All 32 0 0 32 0 0 0 0 102.952 #> min max mean median SD CV IQR Skewness Kurtosis 10% #> 2 71.100 472.000 230.722 196.300 123.939 0.537 205.175 0.400 -1.090 80.610 #> 4 2.760 4.930 3.597 3.695 0.535 0.149 0.840 0.279 -0.565 3.007 #> 3 52.000 335.000 146.688 123.000 68.563 0.467 83.500 0.761 0.052 66.000 #> 1 10.400 33.900 20.091 19.200 6.027 0.300 7.375 0.640 -0.201 14.340 #> 6 14.500 22.900 17.849 17.710 1.787 0.100 2.008 0.387 0.554 15.534 #> 5 1.513 5.424 3.217 3.325 0.978 0.304 1.029 0.444 0.172 1.956 #> 20% LB.25% UB.75% nOutliers #> 2 120.140 -186.938 633.763 0 #> 4 3.072 1.820 5.180 0 #> 3 93.400 -28.750 305.250 1 #> 1 15.200 4.363 33.862 1 #> 6 16.734 13.881 21.911 1 #> 5 2.349 1.038 5.153 3# Descriptive summary of numeric variables - Summary by Overall and Group ExpNumStat(mtcars,by="GA",gp="gear",Qnt=seq(0,1,.1),MesofShape=1, Outlier=TRUE,round=2)#> Vname Group TN nNeg nZero nPos NegInf PosInf NA_Value Per_of_Missing #> 2 disp gear:All 32 0 0 32 0 0 0 0 #> 8 disp gear:4 12 0 0 12 0 0 0 0 #> 14 disp gear:3 15 0 0 15 0 0 0 0 #> 20 disp gear:5 5 0 0 5 0 0 0 0 #> 4 drat gear:All 32 0 0 32 0 0 0 0 #> 10 drat gear:4 12 0 0 12 0 0 0 0 #> 16 drat gear:3 15 0 0 15 0 0 0 0 #> 22 drat gear:5 5 0 0 5 0 0 0 0 #> 3 hp gear:All 32 0 0 32 0 0 0 0 #> 9 hp gear:4 12 0 0 12 0 0 0 0 #> 15 hp gear:3 15 0 0 15 0 0 0 0 #> 21 hp gear:5 5 0 0 5 0 0 0 0 #> 1 mpg gear:All 32 0 0 32 0 0 0 0 #> 7 mpg gear:4 12 0 0 12 0 0 0 0 #> 13 mpg gear:3 15 0 0 15 0 0 0 0 #> 19 mpg gear:5 5 0 0 5 0 0 0 0 #> 6 qsec gear:All 32 0 0 32 0 0 0 0 #> 12 qsec gear:4 12 0 0 12 0 0 0 0 #> 18 qsec gear:3 15 0 0 15 0 0 0 0 #> 24 qsec gear:5 5 0 0 5 0 0 0 0 #> 5 wt gear:All 32 0 0 32 0 0 0 0 #> 11 wt gear:4 12 0 0 12 0 0 0 0 #> 17 wt gear:3 15 0 0 15 0 0 0 0 #> 23 wt gear:5 5 0 0 5 0 0 0 0 #> sum min max mean median SD CV IQR 0% 10% 20% #> 2 7383.10 71.10 472.00 230.72 196.30 123.94 0.54 205.18 71.10 80.61 120.14 #> 8 1476.20 71.10 167.60 123.02 130.90 38.91 0.32 81.08 71.10 76.00 78.76 #> 14 4894.50 120.10 472.00 326.30 318.00 94.85 0.29 104.20 120.10 238.20 272.24 #> 20 1012.40 95.10 351.00 202.48 145.00 115.49 0.57 180.70 95.10 105.18 115.26 #> 4 115.09 2.76 4.93 3.60 3.70 0.53 0.15 0.84 2.76 3.01 3.07 #> 10 48.52 3.69 4.93 4.04 3.92 0.31 0.08 0.19 3.69 3.86 3.90 #> 16 46.99 2.76 3.73 3.13 3.08 0.27 0.09 0.14 2.76 2.83 2.99 #> 22 19.58 3.54 4.43 3.92 3.77 0.39 0.10 0.60 3.54 3.57 3.60 #> 3 4694.00 52.00 335.00 146.69 123.00 68.56 0.47 83.50 52.00 66.00 93.40 #> 9 1074.00 52.00 123.00 89.50 94.00 25.89 0.29 44.25 52.00 62.30 65.20 #> 15 2642.00 97.00 245.00 176.13 180.00 47.69 0.27 60.00 97.00 107.00 142.00 #> 21 978.00 91.00 335.00 195.60 175.00 102.83 0.53 151.00 91.00 99.80 108.60 #> 1 642.90 10.40 33.90 20.09 19.20 6.03 0.30 7.38 10.40 14.34 15.20 #> 7 294.40 17.80 33.90 24.53 22.80 5.28 0.22 7.08 17.80 19.38 21.00 #> 13 241.60 10.40 21.50 16.11 15.50 3.37 0.21 3.90 10.40 11.56 14.10 #> 19 106.90 15.00 30.40 21.38 19.70 6.66 0.31 10.20 15.00 15.32 15.64 #> 6 571.16 14.50 22.90 17.85 17.71 1.79 0.10 2.01 14.50 15.53 16.73 #> 12 227.58 16.46 22.90 18.96 18.75 1.61 0.09 1.11 16.46 17.15 18.34 #> 18 265.38 15.41 20.22 17.69 17.42 1.35 0.08 0.96 15.41 16.25 16.99 #> 24 78.20 14.50 16.90 15.64 15.50 1.13 0.07 2.10 14.50 14.54 14.58 #> 5 102.95 1.51 5.42 3.22 3.33 0.98 0.30 1.03 1.51 1.96 2.35 #> 11 31.40 1.62 3.44 2.62 2.70 0.63 0.24 1.03 1.62 1.84 1.99 #> 17 58.39 2.46 5.42 3.89 3.73 0.83 0.21 0.51 2.46 3.30 3.44 #> 23 13.16 1.51 3.57 2.63 2.77 0.82 0.31 1.03 1.51 1.76 2.01 #> 30% 40% 50% 60% 70% 80% 90% 100% LB.25% UB.75% #> 2 142.06 160.00 196.30 275.80 303.10 350.80 396.00 472.00 -186.94 633.76 #> 8 87.70 113.20 130.90 144.34 156.01 160.00 166.84 167.60 -42.69 281.61 #> 14 275.80 292.72 318.00 354.00 360.00 408.00 452.00 472.00 119.50 536.30 #> 20 125.24 135.12 145.00 207.40 269.80 311.00 331.00 351.00 -150.75 572.05 #> 4 3.15 3.35 3.70 3.82 3.91 4.05 4.21 4.93 1.82 5.18 #> 10 3.91 3.92 3.92 4.02 4.08 4.10 4.21 4.93 3.62 4.37 #> 16 3.07 3.07 3.08 3.11 3.15 3.21 3.51 3.73 2.82 3.40 #> 22 3.65 3.71 3.77 3.95 4.13 4.26 4.35 4.43 2.72 5.12 #> 3 106.20 110.00 123.00 165.00 178.50 200.00 243.50 335.00 -28.75 305.25 #> 9 66.00 76.80 94.00 103.40 109.70 110.00 121.70 123.00 -0.62 176.38 #> 15 155.00 175.00 180.00 180.00 200.00 218.00 239.00 245.00 60.00 300.00 #> 21 125.40 150.20 175.00 210.60 246.20 278.20 306.60 335.00 -113.50 490.50 #> 1 15.98 17.92 19.20 21.00 21.47 24.08 30.09 33.90 4.36 33.86 #> 7 21.12 21.96 22.80 23.76 26.43 29.78 32.20 33.90 10.39 38.69 #> 13 14.80 15.20 15.50 16.76 17.94 18.80 20.52 21.50 8.65 24.25 #> 19 16.58 18.14 19.70 22.22 24.74 26.88 28.64 30.40 0.50 41.30 #> 6 17.02 17.34 17.71 18.18 18.61 19.33 19.99 22.90 13.88 21.91 #> 12 18.54 18.60 18.75 18.90 19.30 19.81 19.99 22.90 16.80 21.25 #> 18 17.10 17.36 17.42 17.69 17.95 18.29 19.78 20.22 15.60 19.42 #> 24 14.78 15.14 15.50 15.98 16.46 16.74 16.82 16.90 11.45 19.85 #> 5 2.77 3.16 3.33 3.44 3.55 3.77 4.05 5.42 1.04 5.15 #> 11 2.24 2.44 2.70 2.84 3.07 3.18 3.42 3.44 0.59 4.70 #> 17 3.47 3.55 3.73 3.80 3.84 4.31 5.31 5.42 2.69 4.72 #> 23 2.27 2.52 2.77 2.93 3.09 3.25 3.41 3.57 0.60 4.71 #> nOutliers #> 2 0 #> 8 0 #> 14 0 #> 20 0 #> 4 0 #> 10 1 #> 16 4 #> 22 0 #> 3 1 #> 9 0 #> 15 0 #> 21 0 #> 1 1 #> 7 0 #> 13 0 #> 19 0 #> 6 1 #> 12 2 #> 18 4 #> 24 0 #> 5 3 #> 11 0 #> 17 4 #> 23 0# Summary by specific statistics for all numeric variables ExpNumStat(mtcars,by="GA",gp="gear",Qnt=c(0.1,0.2),MesofShape=2, Outlier=FALSE,round=2,dcast = TRUE,val = "IQR")#> Stat Vname gear.All gear.4 gear.3 gear.5 #> 1 IQR disp 205.18 81.08 104.20 180.70 #> 2 IQR drat 0.84 0.19 0.14 0.60 #> 3 IQR hp 83.50 44.25 60.00 151.00 #> 4 IQR mpg 7.38 7.08 3.90 10.20 #> 5 IQR qsec 2.01 1.11 0.96 2.10 #> 6 IQR wt 1.03 1.03 0.51 1.03