Notebook 3 of 5: Pilot Data and Power Calculations

Dependencies

This script requires sas/stat 13.1 or newer to perform repeated measures using proc glmpower. Code below checks to verify version.

In [1]:
 PROC PRODUCT_STATUS;
 run;
Out[1]:

11   ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
12
13 PROC PRODUCT_STATUS;
14 run;
For Base SAS Software ...
Custom version information: 9.4_M4
Image version information: 9.04.01M4P110916
For SAS/STAT ...
Custom version information: 14.2
For SAS/ETS ...
Custom version information: 14.2
For SAS/IML ...
Custom version information: 14.2
For High Performance Suite ...
Custom version information: 2.2_M5
For SAS/ACCESS Interface to PC Files ...
Custom version information: 9.4_M4
NOTE: PROCEDURE PRODUCT_STATUS used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

15
16 ods html5 close;ods listing;

17

Location of data files

In [2]:
libname files "/mnt/hgfs/myfolders";
Out[2]:

19   ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
20
21 libname files "/mnt/hgfs/myfolders";
NOTE: Libref FILES was successfully assigned as follows:
Engine: V9
Physical Name: /mnt/hgfs/myfolders
22 ods html5 close;ods listing;

23

The following macro was written and employed to perform the sums of squares calculation required to estimate the residual variance data from the pilot study.

In [3]:
%macro SS (A,B);

     %let output_title = &B;

PROC iml;
     use &A;

     read all var {head body tail} into x;
     read all var {head} into head;
     read all var {body} into body;
     read all var {tail} into tail;
 
     meanhead = mean(head);
     meanbody = mean(body);
     meantail = mean(tail);

     rowMeans = x[ ,:];
     colMeans = x[:, ];
     grandMean= x[:];

     *to calculate SSw (within-group SS);
     part1=ssq(head - meanhead);
     part2=ssq(body-meanbody);
     part3=ssq(tail-meantail);
     SSw=(part1+part2+part3);

     *to calculate SSs (within subject SS);
     part4=ssq(rowmeans[1]-grandmean);
     part5=ssq(rowmeans[2]-grandmean);
     part6=ssq(rowmeans[3]-grandmean);
     SSs=3*(part4+part5+part6);

     *to calculate SSe (error term);
     SSe=SSw-SSs;

     *to get SSe variance, where 2 is for degrees of freedom;
     var=Sse/2; 

     *to get SD;
     SD=sqrt(var);

     *to get SScondition;
     SScondition=3*(ssq(meanhead-grandmean)+ssq(meanbody-grandmean)+ssq(meantail-grandmean));

     *to get SStotal (total SS);
     SStotal=SSw+SScondition;

     print SStotal SScondition SSw SSs SSe var SD;
     create SS_&B var {"SStotal" "SScondition" "SSw" "SSs" "SSe" "var" "SD"};

append;
close SS_&B;
quit;
title "&output_title";

%mend ss;
quit;
Out[3]:

25   ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
26
27 %macro SS (A,B);
28
29 %let output_title = &B;
30
31 PROC iml;
32 use &A;
33
34 read all var {head body tail} into x;
35 read all var {head} into head;
36 read all var {body} into body;
37 read all var {tail} into tail;
38
39 meanhead = mean(head);
40 meanbody = mean(body);
41 meantail = mean(tail);
42
43 rowMeans = x[ ,:];
44 colMeans = x[:, ];
45 grandMean= x[:];
46
47 *to calculate SSw (within-group SS);
48 part1=ssq(head - meanhead);
49 part2=ssq(body-meanbody);
50 part3=ssq(tail-meantail);
51 SSw=(part1+part2+part3);
52
53 *to calculate SSs (within subject SS);
54 part4=ssq(rowmeans[1]-grandmean);
55 part5=ssq(rowmeans[2]-grandmean);
56 part6=ssq(rowmeans[3]-grandmean);
57 SSs=3*(part4+part5+part6);
58
59 *to calculate SSe (error term);
60 SSe=SSw-SSs;
61
62 *to get SSe variance, where 2 is for degrees of freedom;
63 var=Sse/2;
64
65 *to get SD;
66 SD=sqrt(var);
67
68 *to get SScondition;
69 SScondition=3*(ssq(meanhead-grandmean)+ssq(meanbody-grandmean)+ssq(meantail-grandmean));
70
71 *to get SStotal (total SS);
72 SStotal=SSw+SScondition;
73
74 print SStotal SScondition SSw SSs SSe var SD;
75 create SS_&B var {"SStotal" "SScondition" "SSw" "SSs" "SSe" "var" "SD"};
76
77 append;
78 close SS_&B;
79 quit;
80 title "&output_title";
81
82 %mend ss;
83 quit;
84 ods html5 close;ods listing;

85

Pilot Study

To perform the power calculations, an estimate of the residual variance (i.e. error variability, where error plus subject variability equals total within-group variability) for each of the three markers was calculated using pilot data from nPOD donors who were NOT used in the main study. The process includes data 1) import, 2) transformation, and 3) analysis.

For CD45, SD was found to be 1.98%

In [4]:
*import;
DATA cd45;
     set files.cd45;
run;

DATA cd45_pilot;
     set cd45;
     if case^=6222 and case^=6092 and case^=6107 then delete; *only need pilot data;
     if case=6222 then match_group=1;
     if case=6107 then match_group=2;
     if case=6092 then match_group=3;
run;

*transformation;
PROC sort data=cd45_pilot;
     by case match_group hospbin;
run;

PROC transpose data=cd45_pilot out=cd45_pilot1 (rename=( a=Head b=Body c=Tail));
     by case match_group hospbin;
     id sample_type2;
     var percent_cd45;
run;

*analysis;
%ss(cd45_pilot1, cd45_ss);
Out[4]:
SAS Output

The SAS System

SStotal SScondition SSw SSs SSe var SD
75.85827 5.4392208 70.419049 62.577603 7.8414461 3.920723 1.9800816

For CD68, SD was found to be 0.55%

In [5]:
*import;
DATA cd68;
     set files.cd68;
run;

DATA cd68_pilot;
     set cd68;
     if case^=6222 and case^=6092 and case^=6107 then delete; *only need pilot data;
     if case=6222 then match_group=1;
     if case=6107 then match_group=2;
     if case=6092 then match_group=3;
run;

*transformation;
PROC sort data=cd45_pilot;
     by case match_group hospbin;
run;

PROC transpose data=cd68_pilot out=cd68_pilot1 (rename=( a=Head b=Body c=Tail));
     by case match_group hospbin;
     id sample_type2;
     var percent_cd68;
run;

*analysis;
%ss(cd68_pilot1, cd68_ss);
Out[5]:
SAS Output

cd45_ss

SStotal SScondition SSw SSs SSe var SD
7.2832909 0.0869857 7.1963052 6.5938813 0.6024239 0.301212 0.5488278

For Ki67, SD was found to be 0.74%

In [6]:
*import;
DATA ki67;
     set files.ki67;
run;

DATA ki67_pilot;
     set ki67;
     if case^=6222 and case^=6092 and case^=6107 then delete; *only need pilot data;
     if case=6222 then match_group=1;
     if case=6107 then match_group=2;
     if case=6092 then match_group=3;
run;

*transformation;
PROC sort data=ki67_pilot;
     by case match_group hospbin;
run;

PROC transpose data=ki67_pilot out=ki67_pilot1 (rename=( a=Head b=Body c=Tail));
     by case match_group hospbin;
     id sample_type2;
     var percent_ki67;
run;

*analysis;
%ss(ki67_pilot1, ki67_ss);
Out[6]:
SAS Output

cd68_ss

SStotal SScondition SSw SSs SSe var SD
14.024796 0.8096687 13.215127 12.117766 1.0973609 0.5486805 0.7407297

Power Calculations

Estimation of percentage of leukocytes in the pancreas

Short (<3 days) duration hospitalization group: To obtain theoretical values: 1) Cell number fractions for islet (0.483), acinar (0.253), duct (0.227), and other cell types (0.038; includes leukocytes, neuronal elements, dead, stromal, and endothelial cells) found in an islet prep were obtained from Pisania et al (2010, Laboratory Investigation); it was assumed that leukocytes contributed 1/5th of the other cell types (i.e. 0.0076), 2) The total cell number fraction was adjusted from 1.001 to 0.774 (i.e. 1.001 minus ductal fraction, since this cell type was not analyzed in the experiment), 3) The individual cell number fractions, minus duct, for leukocytes (0.0076/0.774=0.0098) and islets (0.483/0.774=0.624) was calculated, 4) The number of leukocytes/islet was calculated (0.098/0.624=0.015) in islet preps and compared to 0.1 $\pm$ 0.1 for whole tissue (Bogdani et al, 2014, Diabetes). Therefore, the initial estimate of 0.98% (step 3) was conservatively doubled to 1.96% to account for the possibility of an increased % of leukocytes in whole tissue.

Mid (>=3 to < 6 days) and long (>=6 days) duration hospitalization groups: See main text. In brief, the magnitude of the differences in number of cells counted by In'tVeld and colleagues (2010, Diabetes) were used (Table 5, <=25 year old group) to adjust up the estimates from the short duration group.

Differences in leukocyte % based on pancreas region: The authors are not aware of any data suggesting differences based on region of pancreas, therefore it was assumed that % remained constant in the head, body, and tail, i.e. 1.96%.

CD45

Step 1: Creation of exemplary dataset (mean values)

In [7]:
DATA cd45;
     input hospbin $ head body tail alloc;
     datalines;
     ICU_short       1.96    1.96    1.96    1
     ICU_mid         1.96    1.96    1.96    1
     ICU_long        5.88    5.88    5.88    1
;
run;
Out[7]:

180  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
181
182 DATA cd45;
183 input hospbin $ head body tail alloc;
184 datalines;
NOTE: The data set WORK.CD45 has 3 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

188 ;
189 run;
190 ods html5 close;ods listing;

191

Step 2: Additional assumptions and analysis

In [8]:
PROC glmpower data = cd45; 
     class hospbin; 
     model head body tail = hospbin; 
     repeated panc_region;
     weight alloc;
     power
     mtest=uncorr gg
         /*available options include uncorr or GG*/  
         /*uncorr used as analysis plan is to report unadjusted univariate p values for within and between subject effects, including interaction term*/  
         /*If sphericity is violated, GG will be used for univariate within-subjects effects*/
     stddev = 1.9800816 
         /*data taken from pilot study*/
     matrix ("mycorrmat")= lear(0.99, 0, 3, 1 2 3) 
        /*base corr, corr-decay, number of levels, level-values (defaul is used if left blank)*/
        /*base corr and corr-decay specified as constant correlation of 0.99 with no decay, i.e. clustered correlation or compound symmetry*/
        /*0.99 calculated as the mean of head body, body tail, and head tail correlations of panc weight in control donors without missing data*/
     corrmat = "mycorrmat"
     alpha = 0.05 
     ntotal =39 
     power = .; 
        /*option to plot x=power min = .6 max =1*/  
run;
Out[8]:
SAS Output

ki67_ss

The GLMPOWER Procedure

F Test for Multivariate Model

Fixed Scenario Elements
Weight Variable alloc
Nominal Alpha 0.05
Error Standard Deviation 1.980082
Correlation Matrix mycorrmat
Total Sample Size 39
Computed Power
Index Transformation Source Test Effect Num DF Den DF Actual Alpha Power
1 panc_region Intercept Uncorrected Univariate Repeated panc_region 2.00 72.0 0.0500 0.050
2 panc_region Intercept Greenhouse-Geisser Univariate Repeated panc_region 1.90 68.3 0.0472 0.047
3 panc_region hospbin Uncorrected Univariate Repeated panc_region*hospbin 4.00 72.0 0.0500 0.050
4 panc_region hospbin Greenhouse-Geisser Univariate Repeated panc_region*hospbin 3.79 68.3 0.0467 0.047
5 Mean(Dep) Intercept Uncorrected Univariate Repeated Intercept 1.00 36.0 0.0500 >.999
6 Mean(Dep) Intercept Greenhouse-Geisser Univariate Repeated Intercept 1.00 36.0 0.0500 >.999
7 Mean(Dep) hospbin Uncorrected Univariate Repeated hospbin 2.00 36.0 0.0500 >.999
8 Mean(Dep) hospbin Greenhouse-Geisser Univariate Repeated hospbin 2.00 36.0 0.0500 >.999

CD68

Step 1: Creation of exemplary dataset (mean values)

In [9]:
DATA cd68;
     input hospbin $ head body tail alloc;
     datalines;
     ICU_short 1.96  1.96  1.96  1
     ICU_mid   2.94  2.94  2.94  1
     ICU_long  5.88  5.88  5.88  1
;
run;
Out[9]:

220  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
221
222 DATA cd68;
223 input hospbin $ head body tail alloc;
224 datalines;
NOTE: The data set WORK.CD68 has 3 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

228 ;
229 run;
230 ods html5 close;ods listing;

231

Step 2: Additional assumptions and analysis

In [10]:
PROC glmpower data = cd68; 
     class hospbin; 
     model head body tail = hospbin; 
     repeated panc_region;
     weight alloc;
     power
     mtest=uncorr gg
         /*available options include uncorr or GG*/  
         /*uncorr used as analysis plan is to report unadjusted univariate p values for within and between subject effects, including interaction term*/  
         /*If sphericity is violated, GG will be used for univariate within-subjects effects*/
     stddev = 0.5488278 
         /*data taken from pilot study*/
     matrix ("mycorrmat")= lear(0.99, 0, 3, 1 2 3) 
        /*base corr, corr-decay, number of levels, level-values (defaul is used if left blank)*/
        /*base corr and corr-decay specified as constant correlation of 0.99 with no decay, i.e. clustered correlation or compound symmetry*/
        /*0.99 calculated as the mean of head body, body tail, and head tail correlations of panc weight in control donors without missing data*/
     corrmat = "mycorrmat"
     alpha = 0.05 
     ntotal =39 
     power = .; 
        /*option to plot x=power min = .6 max =1*/  
run;
Out[10]:
SAS Output

ki67_ss

The GLMPOWER Procedure

F Test for Multivariate Model

Fixed Scenario Elements
Weight Variable alloc
Nominal Alpha 0.05
Error Standard Deviation 0.548828
Correlation Matrix mycorrmat
Total Sample Size 39
Computed Power
Index Transformation Source Test Effect Num DF Den DF Actual Alpha Power
1 panc_region Intercept Uncorrected Univariate Repeated panc_region 2.00 72.0 0.0500 0.050
2 panc_region Intercept Greenhouse-Geisser Univariate Repeated panc_region 1.90 68.3 0.0472 0.047
3 panc_region hospbin Uncorrected Univariate Repeated panc_region*hospbin 4.00 72.0 0.0500 0.050
4 panc_region hospbin Greenhouse-Geisser Univariate Repeated panc_region*hospbin 3.79 68.3 0.0467 0.047
5 Mean(Dep) Intercept Uncorrected Univariate Repeated Intercept 1.00 36.0 0.0500 >.999
6 Mean(Dep) Intercept Greenhouse-Geisser Univariate Repeated Intercept 1.00 36.0 0.0500 >.999
7 Mean(Dep) hospbin Uncorrected Univariate Repeated hospbin 2.00 36.0 0.0500 >.999
8 Mean(Dep) hospbin Greenhouse-Geisser Univariate Repeated hospbin 2.00 36.0 0.0500 >.999

Ki67

Step 1: Creation of exemplary dataset (See main text for derivation of initial values).

In [11]:
DATA ki67;
     input hospbin $ head body tail alloc;
     datalines;
     ICU_short       0.1     0.1     0.1     1
     ICU_mid         0.6     0.6     0.6     1
     ICU_long        1.59    1.59    1.59    1
;
run;
Out[11]:

260  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
261
262 DATA ki67;
263 input hospbin $ head body tail alloc;
264 datalines;
NOTE: The data set WORK.KI67 has 3 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

268 ;
269 run;
270 ods html5 close;ods listing;

271

Step 2: Additional assumptions and analysis

In [12]:
PROC glmpower data = ki67; 
     class hospbin; 
     model head body tail = hospbin; 
     repeated panc_region;
     weight alloc;
     power
     mtest=uncorr gg
         /*available options include uncorr or GG*/  
         /*uncorr used as analysis plan is to report unadjusted univariate p values for within and between subject effects, including interaction term*/  
         /*If sphericity is violated, GG will be used for univariate within-subjects effects*/
     stddev = 0.7407297 
         /*data taken from pilot study*/
     matrix ("mycorrmat")= lear(0.99, 0, 3, 1 2 3) 
        /*base corr, corr-decay, number of levels, level-values (defaul is used if left blank)*/
        /*base corr and corr-decay specified as constant correlation of 0.99 with no decay, i.e. clustered correlation or compound symmetry*/
        /*0.99 calculated as the mean of head body, body tail, and head tail correlations of panc weight in control donors without missing data*/
     corrmat = "mycorrmat"
     alpha = 0.05 
     ntotal =39 
     power = .; 
        /*option to plot x=power min = .6 max =1*/  
run;
Out[12]:
SAS Output

ki67_ss

The GLMPOWER Procedure

F Test for Multivariate Model

Fixed Scenario Elements
Weight Variable alloc
Nominal Alpha 0.05
Error Standard Deviation 0.74073
Correlation Matrix mycorrmat
Total Sample Size 39
Computed Power
Index Transformation Source Test Effect Num DF Den DF Actual Alpha Power
1 panc_region Intercept Uncorrected Univariate Repeated panc_region 2.00 72.0 0.0500 0.050
2 panc_region Intercept Greenhouse-Geisser Univariate Repeated panc_region 1.90 68.3 0.0472 0.047
3 panc_region hospbin Uncorrected Univariate Repeated panc_region*hospbin 4.00 72.0 0.0500 0.050
4 panc_region hospbin Greenhouse-Geisser Univariate Repeated panc_region*hospbin 3.79 68.3 0.0467 0.047
5 Mean(Dep) Intercept Uncorrected Univariate Repeated Intercept 1.00 36.0 0.0500 >.999
6 Mean(Dep) Intercept Greenhouse-Geisser Univariate Repeated Intercept 1.00 36.0 0.0500 >.999
7 Mean(Dep) hospbin Uncorrected Univariate Repeated hospbin 2.00 36.0 0.0500 0.997
8 Mean(Dep) hospbin Greenhouse-Geisser Univariate Repeated hospbin 2.00 36.0 0.0500 0.997