Overview

Dataset statistics

Number of variables13
Number of observations2983
Missing cells15487
Missing cells (%)39.9%
Total size in memory451.7 KiB
Average record size in memory155.0 B

Variable types

Numeric10
Unsupported2
Categorical1

Alerts

country_cd has constant value "LV" Constant
time_dx_to_immunotherapy_nm has constant value "1035.0" Constant
socecon_lvl_cd has 2983 (100.0%) missing values Missing
country_origin_cd has 2983 (100.0%) missing values Missing
time_dx_to_surgery_nm has 117 (3.9%) missing values Missing
time_dx_to_radiotherapy_nm has 1654 (55.4%) missing values Missing
time_dx_to_chemotherapy_nm has 2074 (69.5%) missing values Missing
time_dx_to_immunotherapy_nm has 2982 (> 99.9%) missing values Missing
time_dx_to_hormonotherapy_nm has 2694 (90.3%) missing values Missing
patient_id has unique values Unique
socecon_lvl_cd is an unsupported type, check if it needs cleaning or further analysis Unsupported
country_origin_cd is an unsupported type, check if it needs cleaning or further analysis Unsupported
time_dx_to_surgery_nm has 1167 (39.1%) zeros Zeros
time_dx_to_chemotherapy_nm has 76 (2.5%) zeros Zeros

Reproduction

Analysis started2022-06-08 06:41:45.164902
Analysis finished2022-06-08 06:41:45.355088
Duration0.19 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

patient_id
Real number (ℝ≥0)

UNIQUE

Distinct2983
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1480195.818
Minimum217430
Maximum1890273
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2022-06-08T08:41:45.447512image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum217430
5-th percentile373107.4
Q11012769
median1759273
Q31822446.5
95-th percentile1874265.5
Maximum1890273
Range1672843
Interquartile range (IQR)809677.5

Descriptive statistics

Standard deviation495571.303
Coefficient of variation (CV)0.3348011776
Kurtosis-0.03321233942
Mean1480195.818
Median Absolute Deviation (MAD)83815
Skewness-1.192814676
Sum4415424125
Variance2.455909164 × 1011
MonotonicityStrictly increasing
2022-06-08T08:41:45.586639image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2174301
 
< 0.1%
18159161
 
< 0.1%
18156481
 
< 0.1%
18156511
 
< 0.1%
18156631
 
< 0.1%
18156801
 
< 0.1%
18158221
 
< 0.1%
18158231
 
< 0.1%
18158241
 
< 0.1%
18158261
 
< 0.1%
Other values (2973)2973
99.7%
ValueCountFrequency (%)
2174301
< 0.1%
2226791
< 0.1%
2231801
< 0.1%
2241371
< 0.1%
2285601
< 0.1%
2286591
< 0.1%
2312801
< 0.1%
2316231
< 0.1%
2336191
< 0.1%
2336391
< 0.1%
ValueCountFrequency (%)
18902731
< 0.1%
18899741
< 0.1%
18894111
< 0.1%
18894071
< 0.1%
18891081
< 0.1%
18890071
< 0.1%
18889251
< 0.1%
18888401
< 0.1%
18886631
< 0.1%
18886201
< 0.1%

age_nm
Real number (ℝ≥0)

Distinct59
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean59.91451559
Minimum22
Maximum80
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2022-06-08T08:41:45.705051image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum22
5-th percentile40
Q151
median61
Q369
95-th percentile78
Maximum80
Range58
Interquartile range (IQR)18

Descriptive statistics

Standard deviation11.75146724
Coefficient of variation (CV)0.196137232
Kurtosis-0.4491068627
Mean59.91451559
Median Absolute Deviation (MAD)9
Skewness-0.3464300774
Sum178725
Variance138.0969824
MonotonicityNot monotonic
2022-06-08T08:41:45.819888image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
68122
 
4.1%
66114
 
3.8%
62112
 
3.8%
58110
 
3.7%
56103
 
3.5%
6099
 
3.3%
6487
 
2.9%
6383
 
2.8%
5981
 
2.7%
5078
 
2.6%
Other values (49)1994
66.8%
ValueCountFrequency (%)
222
 
0.1%
232
 
0.1%
243
0.1%
252
 
0.1%
262
 
0.1%
274
0.1%
282
 
0.1%
295
0.2%
305
0.2%
316
0.2%
ValueCountFrequency (%)
8050
1.7%
7952
1.7%
7866
2.2%
7756
1.9%
7662
2.1%
7572
2.4%
7473
2.4%
7361
2.0%
7267
2.2%
7166
2.2%

socecon_lvl_cd
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing2983
Missing (%)100.0%
Memory size23.4 KiB

country_cd
Categorical

CONSTANT
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size172.0 KiB
LV
2983 

Characters and Unicode

Total characters5966
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLV
2nd rowLV
3rd rowLV
4th rowLV
5th rowLV

Common Values

ValueCountFrequency (%)
LV2983
100.0%

Category Frequency Plot

2022-06-08T08:41:45.929488image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
lv2983
100.0%

Most occurring characters

ValueCountFrequency (%)
L2983
50.0%
V2983
50.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter5966
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
L2983
50.0%
V2983
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin5966
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
L2983
50.0%
V2983
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII5966
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
L2983
50.0%
V2983
50.0%

country_origin_cd
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing2983
Missing (%)100.0%
Memory size23.4 KiB

hospital_id
Real number (ℝ≥0)

Distinct13
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2752.313778
Minimum190
Maximum4182
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2022-06-08T08:41:45.993135image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum190
5-th percentile351
Q1351
median3943
Q33943
95-th percentile3943
Maximum4182
Range3992
Interquartile range (IQR)3592

Descriptive statistics

Standard deviation1590.829292
Coefficient of variation (CV)0.5779970672
Kurtosis-1.335568712
Mean2752.313778
Median Absolute Deviation (MAD)0
Skewness-0.7252819208
Sum8210152
Variance2530737.836
MonotonicityNot monotonic
2022-06-08T08:41:46.133534image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
39431807
60.6%
351840
28.2%
2032193
 
6.5%
2806120
 
4.0%
318610
 
0.3%
41803
 
0.1%
4973
 
0.1%
41822
 
0.1%
28161
 
< 0.1%
5351
 
< 0.1%
Other values (3)3
 
0.1%
ValueCountFrequency (%)
1901
 
< 0.1%
351840
28.2%
4973
 
0.1%
5351
 
< 0.1%
16961
 
< 0.1%
19231
 
< 0.1%
2032193
 
6.5%
2806120
 
4.0%
28161
 
< 0.1%
318610
 
0.3%
ValueCountFrequency (%)
41822
 
0.1%
41803
 
0.1%
39431807
60.6%
318610
 
0.3%
28161
 
< 0.1%
2806120
 
4.0%
2032193
 
6.5%
19231
 
< 0.1%
16961
 
< 0.1%
5351
 
< 0.1%

ttm_type_cd
Real number (ℝ≥0)

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.209185384
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2022-06-08T08:41:46.217633image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile3
Maximum5
Range4
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.63948659
Coefficient of variation (CV)0.5288573601
Kurtosis9.481374767
Mean1.209185384
Median Absolute Deviation (MAD)0
Skewness3.101984315
Sum3607
Variance0.4089430988
MonotonicityNot monotonic
2022-06-08T08:41:46.291241image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=4)
ValueCountFrequency (%)
12671
89.5%
3261
 
8.7%
234
 
1.1%
517
 
0.6%
ValueCountFrequency (%)
12671
89.5%
234
 
1.1%
3261
 
8.7%
517
 
0.6%
ValueCountFrequency (%)
517
 
0.6%
3261
 
8.7%
234
 
1.1%
12671
89.5%

time_dx_to_surgery_nm
Real number (ℝ≥0)

MISSING
ZEROS

Distinct199
Distinct (%)6.9%
Missing117
Missing (%)3.9%
Infinite0
Infinite (%)0.0%
Mean20.4867411
Minimum0
Maximum1365
Zeros1167
Zeros (%)39.1%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2022-06-08T08:41:46.392319image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q31
95-th percentile156
Maximum1365
Range1365
Interquartile range (IQR)1

Descriptive statistics

Standard deviation75.51028962
Coefficient of variation (CV)3.685812655
Kurtosis96.49901427
Mean20.4867411
Median Absolute Deviation (MAD)1
Skewness7.905235901
Sum58715
Variance5701.803838
MonotonicityNot monotonic
2022-06-08T08:41:46.501709image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01167
39.1%
11140
38.2%
250
 
1.7%
332
 
1.1%
415
 
0.5%
611
 
0.4%
1410
 
0.3%
1510
 
0.3%
510
 
0.3%
129
 
0.3%
Other values (189)412
 
13.8%
(Missing)117
 
3.9%
ValueCountFrequency (%)
01167
39.1%
11140
38.2%
250
 
1.7%
332
 
1.1%
415
 
0.5%
510
 
0.3%
611
 
0.4%
75
 
0.2%
87
 
0.2%
103
 
0.1%
ValueCountFrequency (%)
13651
< 0.1%
13081
< 0.1%
9601
< 0.1%
9551
< 0.1%
7291
< 0.1%
7231
< 0.1%
6791
< 0.1%
6761
< 0.1%
6141
< 0.1%
5891
< 0.1%

time_dx_to_radiotherapy_nm
Real number (ℝ≥0)

MISSING

Distinct360
Distinct (%)27.1%
Missing1654
Missing (%)55.4%
Infinite0
Infinite (%)0.0%
Mean181.5221971
Minimum0
Maximum1683
Zeros11
Zeros (%)0.4%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2022-06-08T08:41:46.619912image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile37
Q184
median167
Q3237
95-th percentile335
Maximum1683
Range1683
Interquartile range (IQR)153

Descriptive statistics

Standard deviation156.8715939
Coefficient of variation (CV)0.8642006124
Kurtosis30.79096196
Mean181.5221971
Median Absolute Deviation (MAD)77
Skewness4.423784909
Sum241243
Variance24608.69698
MonotonicityNot monotonic
2022-06-08T08:41:46.737921image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4915
 
0.5%
7814
 
0.5%
23813
 
0.4%
6313
 
0.4%
4213
 
0.4%
22412
 
0.4%
7612
 
0.4%
9111
 
0.4%
24411
 
0.4%
4311
 
0.4%
Other values (350)1204
40.4%
(Missing)1654
55.4%
ValueCountFrequency (%)
011
0.4%
52
 
0.1%
61
 
< 0.1%
74
 
0.1%
81
 
< 0.1%
102
 
0.1%
171
 
< 0.1%
191
 
< 0.1%
221
 
< 0.1%
231
 
< 0.1%
ValueCountFrequency (%)
16831
< 0.1%
16191
< 0.1%
14751
< 0.1%
14601
< 0.1%
14561
< 0.1%
13551
< 0.1%
13321
< 0.1%
12191
< 0.1%
11261
< 0.1%
11181
< 0.1%

time_dx_to_chemotherapy_nm
Real number (ℝ≥0)

MISSING
ZEROS

Distinct193
Distinct (%)21.2%
Missing2074
Missing (%)69.5%
Infinite0
Infinite (%)0.0%
Mean74.82728273
Minimum0
Maximum1743
Zeros76
Zeros (%)2.5%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2022-06-08T08:41:46.859451image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q123
median41
Q368
95-th percentile172
Maximum1743
Range1743
Interquartile range (IQR)45

Descriptive statistics

Standard deviation158.9419445
Coefficient of variation (CV)2.124117551
Kurtosis42.78523831
Mean74.82728273
Median Absolute Deviation (MAD)21
Skewness6.084945686
Sum68018
Variance25262.54172
MonotonicityNot monotonic
2022-06-08T08:41:46.976502image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
076
 
2.5%
3022
 
0.7%
121
 
0.7%
3619
 
0.6%
3118
 
0.6%
2317
 
0.6%
5017
 
0.6%
5116
 
0.5%
4216
 
0.5%
2815
 
0.5%
Other values (183)672
 
22.5%
(Missing)2074
69.5%
ValueCountFrequency (%)
076
2.5%
121
 
0.7%
28
 
0.3%
32
 
0.1%
49
 
0.3%
510
 
0.3%
65
 
0.2%
712
 
0.4%
86
 
0.2%
105
 
0.2%
ValueCountFrequency (%)
17431
< 0.1%
14491
< 0.1%
13151
< 0.1%
12951
< 0.1%
12781
< 0.1%
12221
< 0.1%
11881
< 0.1%
10781
< 0.1%
9821
< 0.1%
9381
< 0.1%

time_dx_to_immunotherapy_nm
Real number (ℝ≥0)

CONSTANT
MISSING
REJECTED

Distinct1
Distinct (%)100.0%
Missing2982
Missing (%)> 99.9%
Infinite0
Infinite (%)0.0%
Mean1035
Minimum1035
Maximum1035
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2022-06-08T08:41:47.071846image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1035
5-th percentile1035
Q11035
median1035
Q31035
95-th percentile1035
Maximum1035
Range0
Interquartile range (IQR)0

Descriptive statistics

Standard deviationnan
Coefficient of variation (CV)nan
Kurtosisnan
Mean1035
Median Absolute Deviation (MAD)0
Skewnessnan
Sum1035
Variancenan
MonotonicityStrictly increasing
2022-06-08T08:41:47.139286image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)
ValueCountFrequency (%)
10351
 
< 0.1%
(Missing)2982
> 99.9%
ValueCountFrequency (%)
10351
< 0.1%
ValueCountFrequency (%)
10351
< 0.1%

time_dx_to_hormonotherapy_nm
Real number (ℝ≥0)

MISSING

Distinct221
Distinct (%)76.5%
Missing2694
Missing (%)90.3%
Infinite0
Infinite (%)0.0%
Mean227.0103806
Minimum0
Maximum1441
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2022-06-08T08:41:47.233476image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile22.8
Q191
median159
Q3293
95-th percentile627.6
Maximum1441
Range1441
Interquartile range (IQR)202

Descriptive statistics

Standard deviation221.2526284
Coefficient of variation (CV)0.9746366127
Kurtosis7.125520242
Mean227.0103806
Median Absolute Deviation (MAD)91
Skewness2.357220451
Sum65606
Variance48952.72559
MonotonicityNot monotonic
2022-06-08T08:41:47.344675image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1545
 
0.2%
1583
 
0.1%
1033
 
0.1%
563
 
0.1%
973
 
0.1%
773
 
0.1%
1593
 
0.1%
393
 
0.1%
1883
 
0.1%
1443
 
0.1%
Other values (211)257
 
8.6%
(Missing)2694
90.3%
ValueCountFrequency (%)
01
< 0.1%
21
< 0.1%
71
< 0.1%
81
< 0.1%
142
0.1%
151
< 0.1%
172
0.1%
181
< 0.1%
191
< 0.1%
212
0.1%
ValueCountFrequency (%)
14411
< 0.1%
13151
< 0.1%
10891
< 0.1%
10731
< 0.1%
10311
< 0.1%
9831
< 0.1%
9691
< 0.1%
9211
< 0.1%
8801
< 0.1%
8271
< 0.1%

period
Real number (ℝ≥0)

Distinct49
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18.48642306
Minimum1
Maximum62
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.4 KiB
2022-06-08T08:41:47.461992image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q110
median18
Q327
95-th percentile35
Maximum62
Range61
Interquartile range (IQR)17

Descriptive statistics

Standard deviation10.44719725
Coefficient of variation (CV)0.5651281056
Kurtosis-1.037624847
Mean18.48642306
Median Absolute Deviation (MAD)9
Skewness0.09274071826
Sum55145
Variance109.1439303
MonotonicityNot monotonic
2022-06-08T08:41:47.631056image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
ValueCountFrequency (%)
13115
 
3.9%
8113
 
3.8%
10101
 
3.4%
2099
 
3.3%
2295
 
3.2%
2595
 
3.2%
3494
 
3.2%
2794
 
3.2%
1192
 
3.1%
1591
 
3.1%
Other values (39)1994
66.8%
ValueCountFrequency (%)
185
2.8%
261
2.0%
384
2.8%
482
2.7%
582
2.7%
684
2.8%
759
2.0%
8113
3.8%
988
3.0%
10101
3.4%
ValueCountFrequency (%)
621
 
< 0.1%
511
 
< 0.1%
491
 
< 0.1%
461
 
< 0.1%
451
 
< 0.1%
441
 
< 0.1%
431
 
< 0.1%
421
 
< 0.1%
413
0.1%
404
0.1%