This RNotebook contains the full analysis of the study examining CFF, RSVP (attentional blink) and RDM tasks.
First, we’ll load the data set and all required packages
library(readxl)
library(tidyr)
library(car)
library(ggplot2)
library(ggnewscale)
library(dplyr)
library(biotools)
library(svglite)
library(lme4)
library(lmerTest)
library(emmeans)
Dataset <- read_excel("Dataset.xlsx")
** Part 1: RSVP analysis**
This data file contains all the raw recordings that were made during the study. We have 3 different tasks: CFF, RSVP and RDM. We want to compare CFF with performance on both other tasks, but not all participants have both RSVP data and RDM data (due to inadequate performance or technical issues etc). So we’ll split up the RSVP and RDM data and remove missing rows first:
# First take the columns we need for our RSVP analysis
RSVP<-Dataset[c("CFF", "Age", "Sex", "RSVP_T1", "AB_Magnitude", "AB_Intercept", "AB_Slope")]
#Then remove the rows with missing values from the dataset
RSVP<-na.omit(RSVP)
We have 84 data points for this analysis.
Let’s have a quick look at the CFF distribution first
hist(RSVP$CFF, breaks = 30, probability = TRUE)
lines(density(RSVP$CFF), col = "blue", lwd = 2)
qqnorm(RSVP$CFF)
qqline(RSVP$CFF, col = "red", lwd = 2)
Looks mostly normally distributed, save for a couple of potential high outliers. We’ll keep that in mind.
Our main question is for the RSVP analysis is if CFF can predict the overall profile of the attentional blink (AB) effect. We have 3 measures for the AB: magnitude (severity of the AB), baseline performance on T2 (listed as AB_Intercept) and rate of recovery from AB (listed as AB_Slope). Since we have 3 response variables, that are all correlated, we’ll do a multivariate regression. Before we make the model, we have to check if the response variables are not overly correlated with each other.
cor.test(RSVP$AB_Magnitude, RSVP$AB_Intercept, method = "pearson")
Pearson's product-moment correlation
data: RSVP$AB_Magnitude and RSVP$AB_Intercept
t = -5.5566, df = 82, p-value = 3.328e-07
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.6630544 -0.3475876
sample estimates:
cor
-0.5230077
cor.test(RSVP$AB_Magnitude, RSVP$AB_Slope, method = "pearson")
Pearson's product-moment correlation
data: RSVP$AB_Magnitude and RSVP$AB_Slope
t = 4.2674, df = 82, p-value = 5.273e-05
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.2332113 0.5870353
sample estimates:
cor
0.4262921
cor.test(RSVP$AB_Intercept, RSVP$AB_Slope, method = "pearson")
Pearson's product-moment correlation
data: RSVP$AB_Intercept and RSVP$AB_Slope
t = -5.7623, df = 82, p-value = 1.406e-07
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.6737106 -0.3644078
sample estimates:
cor
-0.5368596
Good, correlations aren’t extremely high. Now for the multivariate model. We’ll include age and gender (actually listed as Sex in this data set) in the model as covariates.
# First, create the model
mod1 <- manova(cbind(AB_Magnitude, AB_Intercept, AB_Slope) ~ CFF + Age + Sex, data = RSVP)
# Now get the test statistics
summary(mod1, test = "Pillai") # The test statistic here is "Pillai", which is the most robust but also the most conservative
Df Pillai approx F num Df den Df Pr(>F)
CFF 1 0.109835 3.1669 3 77 0.02908 *
Age 1 0.042214 1.1312 3 77 0.34178
Sex 2 0.165360 2.3434 6 156 0.03401 *
Residuals 79
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Both CFF and Sex have significant effects in our model. Age does not (which is exactly what we expect because we intentionally left the age range small during participant recruitment).
Let’s check the model assumptions:
# Check the residuals
plot(residuals(mod1) ~ fitted(mod1))
qqPlot(mod1$residuals)
[1] 157 141
The heavy tails on the QQ plot indicate we may have some extreme values in the data. Let’s examine the residuals of each response variable when modeled separately:
m1<-lm(AB_Magnitude ~ CFF + Sex + Age, data = RSVP)
qqPlot(m1$residuals)
[1] 78 1
ncvTest(m1)
Non-constant Variance Score Test
Variance formula: ~ fitted.values
Chisquare = 0.07343905, Df = 1, p = 0.78639
influenceIndexPlot(m1, vars="Cook")
influenceIndexPlot(m1, vars="hat")
influenceIndexPlot(m1, vars="Bonf")
outlierTest(m1)
No Studentized residuals with Bonferroni p < 0.05
Largest |rstudent|:
plot(m1)
18 and 43 are highly influential (Cook’s distance) and have high leverage (hat). 73 is an extreme value (Bonferroni). Residuals look good.
m2<-lm(AB_Intercept ~ CFF + Sex + Age, data = RSVP)
qqPlot(m2$residuals)
[1] 73 57
ncvTest(m2)
Non-constant Variance Score Test
Variance formula: ~ fitted.values
Chisquare = 1.010188, Df = 1, p = 0.31486
influenceIndexPlot(m2, vars="Cook")
influenceIndexPlot(m2, vars="hat")
influenceIndexPlot(m2, vars="Bonf")
plot(m2)
18 and 43 have high leverage again. 18 and 3 may be influential points and no extreme values. Residuals good.
m3<-lm(AB_Slope ~ CFF + Sex + Age, data = RSVP)
qqPlot(m3$residuals)
[1] 73 74
ncvTest(m3)
Non-constant Variance Score Test
Variance formula: ~ fitted.values
Chisquare = 1.05463, Df = 1, p = 0.30444
influenceIndexPlot(m3, vars="Cook")
influenceIndexPlot(m3, vars="hat")
influenceIndexPlot(m3, vars="Bonf")
plot(m3)
18 and 43 come up again as influential and high leverage. 73 is indicated to be extreme. Inspection of the data shows that 73 does have some high values, but it is a valid data point. 18 and 43 are data points belonging to the only 2 participants who indicated to be neither male nor female. Thus, this would cause them to be highly influential for the Sex variable. Since Sex had a significant effect in the model, let’s see if anything changes if these points are not included:
# Remove the 2 data points
RSVP2<-RSVP[RSVP$Sex != "O", ]
# And do the same model again
mod2 <- manova(cbind(AB_Magnitude, AB_Intercept, AB_Slope) ~ CFF + Sex + Age, data = RSVP2)
summary(mod2, test = "Pillai")
Df Pillai approx F num Df den Df Pr(>F)
CFF 1 0.123085 3.5558 3 76 0.01817 *
Sex 1 0.118591 3.4085 3 76 0.02173 *
Age 1 0.075017 2.0546 3 76 0.11333
Residuals 78
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Removal of those 2 points did not change the significance of the variables. However, it does change the effect size. Now, it is not Sex, but rather CFF that has the largest effect on the AB-profile. We report on this in the main manuscript.
Now, next issue: the model diagnostics for our multivariate model did not look great. Most of the residuals were clustered right in th middle. However, the residuals for the univariate models looked much better. So it is likely the combined effect of the 3 response variables together that does something weird with the data. The clustering in the residual plot of the manova model shows that the variance may not be equally distributed across the 3 responses. We can scale the response data to see if that fixes our issue.
(The scale function subtracts the column mean and divides by the column SD for each value in that column. This way each column has a mean of 0 and an SD of 1, thus standardizing the data. Handy for when different variables have different magnitudes or a big difference in variance)
y_scaled<- scale(cbind(RSVP$AB_Magnitude, RSVP$AB_Intercept, RSVP$AB_Slope))
# Now try the model again
mod3 <- manova(y_scaled ~ RSVP$CFF + RSVP$Sex + RSVP$Age)
summary(mod3, test = "Pillai")
Df Pillai approx F num Df den Df Pr(>F)
RSVP$CFF 1 0.109835 3.1669 3 77 0.02908 *
RSVP$Sex 2 0.155408 2.1905 6 156 0.04669 *
RSVP$Age 1 0.052977 1.4358 3 77 0.23885
Residuals 79
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Now let’s check the assumptions again:
qqPlot(mod3$residuals)
[1] 241 242
plot(residuals(mod3) ~ fitted(mod3))
Much better! Let’s also check for linearity between our predictor and response variables
crPlots(lm(RSVP$AB_Magnitude ~ RSVP$CFF))
crPlots(lm(RSVP$AB_Intercept ~ RSVP$CFF))
crPlots(lm(RSVP$AB_Slope ~ RSVP$CFF))
Pretty good too. Now just to be complete, let’s do the same thing for the model with the two influential points removed and compare the results
# Make the model without the "O" group again, with our response variables scaled
y_scaled2<- scale(cbind(RSVP2$AB_Magnitude, RSVP2$AB_Intercept, RSVP2$AB_Slope))
mod4 <- manova(y_scaled2 ~ RSVP2$CFF + RSVP2$Sex + RSVP2$Age)
summary(mod4, test = "Pillai")
Df Pillai approx F num Df den Df Pr(>F)
RSVP2$CFF 1 0.123085 3.5558 3 76 0.01817 *
RSVP2$Sex 1 0.118591 3.4085 3 76 0.02173 *
RSVP2$Age 1 0.075017 2.0546 3 76 0.11333
Residuals 78
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# And compare against our main model (with "O" group)
summary(mod3, test = "Pillai")
Df Pillai approx F num Df den Df Pr(>F)
RSVP$CFF 1 0.109835 3.1669 3 77 0.02908 *
RSVP$Sex 2 0.155408 2.1905 6 156 0.04669 *
RSVP$Age 1 0.052977 1.4358 3 77 0.23885
Residuals 79
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Very similar results. We just need to keep in mind that the effect sizes for the two predictors are slightly different depending on whether or not the influential points are kept in the data set.
One last check: since Sex is a categorical variable, we need to check that the assumption for MANOVA is met that the different groups have equal covariance matrices. We’ll do this for the data set that has the two “O” datapoints removed, since we already know these two points made the data set very uneven.
boxM(RSVP2[, c("AB_Magnitude", "AB_Intercept", "AB_Slope")], RSVP2$Sex)
Box's M-test for Homogeneity of Covariance Matrices
data: RSVP2[, c("AB_Magnitude", "AB_Intercept", "AB_Slope")]
Chi-Sq (approx.) = 7.2359, df = 6, p-value = 0.2996
The result is not significant, so the covariance matrices can be considered equal for males and females.
Next part:
Now that we know CFF has an effect on the attentional blink effect, we need to examine which of our 3 AB measures is most affected by it. We’ll do this with the Roy-Bargmann stepdown procedure: we do univariate models in stepwise fashion, starting with our most important response variable, the AB magnitude. Since we have 3 response variables, we need to do Bonferroni correction to account for the repeated testing: our result will be significant if P < 0.05/3 = 0.0167.
RB1<- lm(AB_Magnitude ~ CFF + Sex + Age, data = RSVP)
summary(RB1)
Call:
lm(formula = AB_Magnitude ~ CFF + Sex + Age, data = RSVP)
Residuals:
Min 1Q Median 3Q Max
-0.38758 -0.13847 -0.00396 0.12086 0.41143
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.187486 0.229042 0.819 0.4155
CFF -0.006040 0.002409 -2.508 0.0142 *
SexM -0.020898 0.044025 -0.475 0.6363
SexO -0.139994 0.124695 -1.123 0.2650
Age 0.016777 0.009013 1.861 0.0664 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1723 on 79 degrees of freedom
Multiple R-squared: 0.1237, Adjusted R-squared: 0.07938
F-statistic: 2.789 on 4 and 79 DF, p-value: 0.03191
The P value is smaller than 0.0167, so CFF has a significant effect on the AB magnitude.None of the other variables remain significant.
Now we test the second most important response variable (AB intercept) and add AB Magnitude as a covariate in the model.
RB2<-lm(AB_Intercept ~ CFF + Sex + Age + AB_Magnitude, data = RSVP)
summary(RB2)
Call:
lm(formula = AB_Intercept ~ CFF + Sex + Age + AB_Magnitude, data = RSVP)
Residuals:
Min 1Q Median 3Q Max
-2.76025 -0.62310 -0.01276 0.62178 2.92148
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.754448 1.330924 0.567 0.57244
CFF 0.009748 0.014482 0.673 0.50286
SexM 0.828921 0.255106 3.249 0.00171 **
SexO 0.243783 0.727261 0.335 0.73837
Age -0.049682 0.053283 -0.932 0.35399
AB_Magnitude -3.125443 0.651014 -4.801 7.48e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.997 on 78 degrees of freedom
Multiple R-squared: 0.3775, Adjusted R-squared: 0.3376
F-statistic: 9.46 on 5 and 78 DF, p-value: 4.434e-07
CFF does not have a significant effect on baseline performance on T2 when the AB magnitude is accounted for. thus, we stop our stepdown process here. Intercept does not explain any additional amount of variance in the data (and since we determined our third predictor, Slope, to be a lower priority, we have to assume this one does not offer any additional explanatory power either).
However, for AB intercept, there is a significant effect of gender: Males have higher overall accuracy.
T1 Accuracy analysis
Now let’s move on to the other part of the RSVP task: accuracy on T1. This is pretty straight forward, we can do a simple regression.
# Regression with T1 accuracy as the response, and the same predictors we had for the AB analysis
T1mod<-lm(RSVP_T1 ~ CFF + Age + Sex + Age, data = RSVP)
summary(T1mod)
Call:
lm(formula = RSVP_T1 ~ CFF + Age + Sex + Age, data = RSVP)
Residuals:
Min 1Q Median 3Q Max
-0.22843 -0.02908 0.01075 0.04061 0.07404
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.9141430 0.0741934 12.321 <2e-16 ***
CFF -0.0003172 0.0007803 -0.406 0.6855
Age 0.0002237 0.0029195 0.077 0.9391
SexM 0.0300828 0.0142609 2.109 0.0381 *
SexO 0.0266894 0.0403923 0.661 0.5107
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.05582 on 79 degrees of freedom
Multiple R-squared: 0.05702, Adjusted R-squared: 0.009271
F-statistic: 1.194 on 4 and 79 DF, p-value: 0.3199
CFF and Age have no effect on people’s ability to predict the first target. Just like with AB, sex is a significant factor though. Check means per gender:
mean(RSVP$RSVP_T1[RSVP$Sex == "F"])
[1] 0.9008136
sd(RSVP$RSVP_T1[RSVP$Sex == "F"])
[1] 0.06021584
mean(RSVP$RSVP_T1[RSVP$Sex == "M"])
[1] 0.9294783
sd(RSVP$RSVP_T1[RSVP$Sex == "M"])
[1] 0.0392056
mean(RSVP$RSVP_T1[RSVP$Sex == "O"])
[1] 0.9285
sd(RSVP$RSVP_T1[RSVP$Sex == "O"])
[1] 0.05020458
Check linear model assumptions
qqPlot(T1mod$residuals)
[1] 45 44
ncvTest(T1mod)
Non-constant Variance Score Test
Variance formula: ~ fitted.values
Chisquare = 5.661238, Df = 1, p = 0.017344
residualPlots(T1mod)
Test stat Pr(>|Test stat|)
CFF 0.3759 0.7080
Age -0.6132 0.5416
Sex
Tukey test -1.0642 0.2872
influenceIndexPlot(T1mod, vars = "Cook")
influenceIndexPlot(T1mod, vars = "hat")
outlierTest(T1mod)
The variance isn’t heterogeneous, and there is one significant outlier detected. The two “O” data points in the sex variable of course also still have extreme leverage. Let’s take these points out and see if that improves the model fit.
T1df<- RSVP[-c(18, 43, 45),]
T1mod2<-lm(RSVP_T1 ~ CFF + Sex + Age, data = T1df)
summary(T1mod2)
Call:
lm(formula = RSVP_T1 ~ CFF + Sex + Age, data = T1df)
Residuals:
Min 1Q Median 3Q Max
-0.161825 -0.029045 0.007473 0.038886 0.067953
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.8931665 0.0664516 13.441 <2e-16 ***
CFF -0.0001270 0.0006963 -0.182 0.856
SexM 0.0249317 0.0127398 1.957 0.054 .
Age 0.0009144 0.0026283 0.348 0.729
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.04968 on 77 degrees of freedom
Multiple R-squared: 0.05201, Adjusted R-squared: 0.01507
F-statistic: 1.408 on 3 and 77 DF, p-value: 0.2469
qqPlot(T1mod2$residuals)
[1] 42 47
ncvTest(T1mod2)
Non-constant Variance Score Test
Variance formula: ~ fitted.values
Chisquare = 2.82287, Df = 1, p = 0.09293
residualPlots(T1mod2)
Test stat Pr(>|Test stat|)
CFF 0.0254 0.9798
Sex
Age -1.0734 0.2865
Tukey test -1.5581 0.1192
influenceIndexPlot(T1mod2, vars = "Cook")
influenceIndexPlot(T1mod2, vars = "hat")
outlierTest(T1mod2)
No Studentized residuals with Bonferroni p < 0.05
Largest |rstudent|:
Ok, the model meets all the assumptions, and sex is still a significant factor.
Plots
Now let’s make some nice plots of our results:
# First CFF vs the AB magnitude
plot1<-ggplot(data = RSVP, aes(x = CFF, y = AB_Magnitude))
plot1+geom_point(size = 4, colour = "#4C72B0", alpha = 0.7)+geom_smooth(method = "lm", colour = "#F6BD60", se = FALSE, alpha = 0.15)+theme_classic()+theme(axis.title = element_text(size = 35, face = "bold"), axis.text = element_text(size = 30), plot.margin = margin(40, 40, 40, 40))+labs(x= "CFF in Hz", y = expression(bold("AB magnitude (" * lag[7]-lag[4] * ")")))
ggsave("ABMag4.svg", width = 8, height = 8, device = svglite)
ggsave("ABMag4.png", width = 8, height = 8, dpi = 300)
# Then CFF vs the AB Intercept
plot1<-ggplot(data = RSVP, aes(x = CFF, y = AB_Intercept))
plot1+geom_point(size = 4, colour = "#4C72B0", alpha = 0.7)+geom_smooth(method = "lm", colour = "#F6BD60", se = FALSE, alpha = 0.15)+theme_classic()+theme(axis.title = element_text(size = 35, face = "bold"), axis.text = element_text(size = 30), plot.margin = margin(40, 40, 40, 40))+labs(x= "CFF in Hz", y = expression(bold("Overall T2 accuracy (β"[0]*")")))
ggsave("ABInt4.svg", width = 8, height = 8, device = svglite)
ggsave("ABInt4.png", width = 8, height = 8, dpi = 300)
# And then CFF vs the AB Slope
plot1<-ggplot(data = RSVP, aes(x = CFF, y = AB_Slope))
plot1+geom_point(size = 4, colour = "#4C72B0", alpha = 0.7)+geom_smooth(method = "lm", colour = "#F6BD60", se = FALSE, alpha = 0.15)+theme_classic()+theme(axis.title = element_text(size = 35, face = "bold"), axis.text = element_text(size = 30), plot.margin = margin(40, 40, 40, 40))+labs(x= "CFF in Hz", y = expression(bold("T2 recovery rate (β"[1]*")")))
ggsave("ABSlope4.svg", width = 8, height = 8, device = svglite)
ggsave("ABSlope4.png", width = 8, height = 8, dpi = 300)
Next, a bar plot showing the attentional blink effect
# First take the columns we need
T1Plot<-Dataset[c("RSVP_T1L1", "RSVP_T1L2", "RSVP_T1L3", "RSVP_T1L4", "RSVP_T1L5", "RSVP_T1L6", "RSVP_T1L7")]
T2Plot<-Dataset[c("RSVP_T2L1", "RSVP_T2L2", "RSVP_T2L3", "RSVP_T2L4", "RSVP_T2L5", "RSVP_T2L6", "RSVP_T2L7")]
#Then remove the rows with missing values from the dataset
T1Plot<-na.omit(T1Plot)
T2Plot<-na.omit(T2Plot)
# Now pivot the data into long form
T1_long <- T1Plot %>%
dplyr::select(1:7) %>%
pivot_longer(everything(), names_to = "variable", values_to = "value")
T2_long <- T2Plot %>%
dplyr::select(1:7) %>%
pivot_longer(everything(), names_to = "variable", values_to = "value")
# Now get the T1 means for each lag value, plus SE and confidence intervals for error bars
T1dat <- T1_long %>%
group_by(variable) %>%
summarise(
T1mean = mean(value, na.rm = TRUE),
T1se = sd(value, na.rm = TRUE)/sqrt(n()),
ci_low = T1mean - qt(0.975, df = n() - 1) * T1se,
ci_hi = T1mean + qt(0.975, df = n() - 1) * T1se
)
# and also for T2. We'll also calculate SE and confidence intervals here so we can add error bars
T2dat <- T2_long %>%
group_by(variable) %>%
summarise(
T2mean = mean(value, na.rm = TRUE),
T2se = sd(value, na.rm = TRUE)/sqrt(n()),
ci_low = T2mean - qt(0.975, df = n() - 1) * T2se,
ci_hi = T2mean + qt(0.975, df = n() - 1) * T2se
)
# Then merge the T1 means and T2 means into one dataframe
ABPlot<-data.frame(Lag = c("1","2","3","4","5","6","7"), T1mean = T1dat$T1mean, T1se = T1dat$T1se, T1ci_low = T1dat$ci_low, T1ci_hi = T1dat$ci_hi, T2mean = T2dat$T2mean, T2se = T2dat$T2se, T2ci_low = T2dat$ci_low, T2ci_hi = T2dat$ci_hi)
Now plot
ggplot(ABPlot, aes(x = Lag)) +
geom_col(aes(y = T2mean), fill = "#F6BD60", colour = "black", linewidth = 0.5) +
geom_errorbar(aes(ymin = T2ci_low, ymax = T2ci_hi), width = 0.2) +
geom_line(aes(y = T1mean, group = 1), linewidth = 0.8, colour = "#4C72B0") +
geom_point(aes(y = T1mean), size = 2, colour = "#4C72B0") +
geom_errorbar(aes(ymin = T1ci_low, ymax = T1ci_hi), width = 0.2, colour = "#4C72B0") +
theme_classic() +
labs(y = "Mean accuracy", x = "Lag level") +
theme(axis.title = element_text(size = 18, face = "bold"), axis.text = element_text(size = 18), plot.title =
element_text(hjust = 0.5, size = 16, face = "bold")) +
scale_x_discrete(labels = c(
RSVP_L1 = "lag 1",
RSVP_L2 = "lag 2",
RSVP_L3 = "lag 3",
RSVP_L4 = "lag 4",
RSVP_L5 = "lag 5",
RSVP_L6 = "lag 6",
RSVP_L7 = "lag 7"
))
ggsave("ABPlot.svg", device = svglite)
ggsave("ABPlot.png", width = 8, height = 6, dpi = 300)
Let’s also get the mean and sd for T1 accuracy, both across all lags and just for Lag1 trials (because it is much lower there compared to other lags)
# mean across all lags
mean(RSVP$RSVP_T1)
[1] 0.9093214
sd(RSVP$RSVP_T1)
[1] 0.05607571
# mean of just Lag 1 trials
T1L1<- subset(T1_long, variable == "RSVP_T1L1")$value
mean(T1L1)
[1] 0.6089286
sd(T1L1)
[1] 0.1741002
Now we know AB magnitude is predicted by CFF, but we don’t know exactly how performance across lags may vary depending on someone’s CFF. We can look into this if we divide the data into two groups and plot them: We’ll sort the data by CFF, and the lower half will be a “Low CFF” group and the upper half will be a “High CFF” group.
# First take all the columns we need
RSVP3<-Dataset[c("Participant", "CFF", "AB_Magnitude", "AB_Intercept", "AB_Slope", "RSVP_T1L1", "RSVP_T1L2","RSVP_T1L3","RSVP_T1L4","RSVP_T1L5","RSVP_T1L6","RSVP_T1L7","RSVP_T2L1", "RSVP_T2L2", "RSVP_T2L3", "RSVP_T2L4", "RSVP_T2L5", "RSVP_T2L6", "RSVP_T2L7")]
# Then remove the rows with missing values from the dataset
RSVP3<-na.omit(RSVP3)
# Then add a new column categorizing the data by CFF group
RSVP3 <- RSVP3 %>%
mutate(
CFFGroup = ntile(CFF, 2), # split into two quantiles
CFFGroup = factor(CFFGroup,
labels = c("LowCFF", "HighCFF"))
)
# Sanity check: how many participants are in each group?
table(RSVP3$CFFGroup)
LowCFF HighCFF
42 42
Now prepare data for plotting:
# First make the dataframe in long format (T1 accuracy first)
RSVPT1_long <- RSVP3 %>%
pivot_longer(
cols = 6:12,
names_to = "T1Lag",
values_to = "T1Accuracy",
)
# Then a dataframe in long format for T2 accuracy
RSVPT2_long <- RSVP3 %>%
pivot_longer(
cols = 13:19,
names_to = "T2Lag",
values_to = "T2Accuracy",
)
# Now calculate T1 means and standard errors for bar plot
T1Lag_means <- RSVPT1_long %>%
group_by(T1Lag, CFFGroup) %>%
summarise(
T1Mean = mean(T1Accuracy, na.rm = TRUE),
T1SE = sd(T1Accuracy, na.rm = TRUE) / sqrt(n()),
.groups = "drop"
)
# And the same for T2 means and standard errors
T2Lag_means <- RSVPT2_long %>%
group_by(T2Lag, CFFGroup) %>%
summarise(
T2Mean = mean(T2Accuracy, na.rm = TRUE),
T2SE = sd(T2Accuracy, na.rm = TRUE) / sqrt(n()),
.groups = "drop"
)
# Get the desired columns from those dataframes and combine them into a new dataframe
groupdiffs<-data.frame(T2Lag_means$T2Lag, T2Lag_means$CFFGroup, T2Lag_means$T2Mean, T2Lag_means$T2SE, T1Lag_means$T1Mean, T1Lag_means$T1SE)
# The combined columns will have weird names. Fix it
names(groupdiffs)[names(groupdiffs) == "T2Lag_means.T2Lag"] <- "T2Lag"
names(groupdiffs)[names(groupdiffs) == "T2Lag_means.CFFGroup"] <- "CFFGroup"
names(groupdiffs)[names(groupdiffs) == "T2Lag_means.T2Mean"] <- "T2Mean"
names(groupdiffs)[names(groupdiffs) == "T2Lag_means.T2SE"] <- "T2SE"
names(groupdiffs)[names(groupdiffs) == "T1Lag_means.T1Mean"] <- "T1Mean"
names(groupdiffs)[names(groupdiffs) == "T1Lag_means.T1SE"] <- "T1SE"
Now we’re ready to plot:
# Make the ggplot object
Groupdiff<-ggplot(groupdiffs, aes(x = T2Lag, group = CFFGroup))
# And add all the visuals we want
Groupdiff +
geom_line(linewidth = 1.2, aes(y = T2Mean, color = CFFGroup, linetype = "T2")) +
geom_point(size = 3, aes(y = T2Mean, color = CFFGroup)) +
geom_errorbar(aes(ymin = T2Mean - T2SE, ymax = T2Mean + T2SE, color = CFFGroup), width = 0.2)+
geom_line(linewidth = 0.8, aes(y = T1Mean, color = CFFGroup, linetype = "T1"), alpha = 0.5) +
geom_point(size = 2, aes(y = T1Mean, color = CFFGroup), alpha = 0.5) +
geom_errorbar(aes(ymin = T1Mean - T1SE, ymax = T1Mean + T1SE, color = CFFGroup), width = 0.2, alpha = 0.7)+
theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_line(colour = "grey85", linewidth = 0.3),
panel.grid.minor.y = element_line(colour = "grey85", linewidth = 0.3),
axis.line.x = element_line(color = "black", linewidth = 0.5),
axis.line.y = element_line(color = "black", linewidth = 0.5),
panel.background = element_rect(fill = "white", colour = NA),
axis.title = element_text(size = 18, face = "bold"),
axis.text = element_text(size = 18, colour = "black"),
legend.text = element_text(size = 14),
legend.title = element_text(size = 16, face = "bold"),
plot.margin = margin(t = 20, r = 10, b = 10, l = 10)
) +
labs(x = "Lag level", y = "Accuracy", color = "CFF Group") +
scale_x_discrete(labels = c("RSVP_T2L1" = "1", "RSVP_T2L2" = "2", "RSVP_T2L3" = "3", "RSVP_T2L4" = "4", "RSVP_T2L5" = "5", "RSVP_T2L6" = "6", "RSVP_T2L7" = "7")) +
scale_color_manual(values = c("darkorchid", "green3"), labels = c("Low CFF", "High CFF")) + scale_linetype_manual(name = "Target" , values = c("T2" = "solid", "T1" = "dashed"))
ggsave("2groupdiffLag.svg", device = svglite)
ggsave("2groupdiffLag.png", width = 8, height = 6, dpi = 300)
We can also check if there is a statistically significant difference between the groups, either in the shape of the overall curve, or on any particular lag level.
We’ll do a mixed effects model that has Lag and CFFGroup as predictors, plus their interaction term. If the term is significant, we’ll know if the groups differ in their performance across lags. Participant will be included as random term, since we have multiple measures
lmemod<-lmer(T2Accuracy ~ CFFGroup * T2Lag + (1|Participant), data = RSVPT2_long)
anova(lmemod)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
CFFGroup 0.0296 0.02962 1 82 1.6673 0.2003
T2Lag 5.9375 0.98958 6 492 55.6931 <2e-16 ***
CFFGroup:T2Lag 0.1137 0.01894 6 492 1.0661 0.3819
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Lag is significant, but the interaction term is not. So the overall shape of the two performance plots is not significantly different between the groups. This tells us enough, but we’ll also look at how different performance is on a lag-by-lag level, using the Estimated Marginal Means from the model. We’ll use the emmeans package for this.
T2emm<-emmeans(lmemod, pairwise ~ CFFGroup | T2Lag)
summary(T2emm)
$emmeans
T2Lag = RSVP_T2L1:
CFFGroup emmean SE df lower.CL upper.CL
LowCFF 0.876 0.0318 190 0.813 0.939
HighCFF 0.908 0.0318 190 0.846 0.971
T2Lag = RSVP_T2L2:
CFFGroup emmean SE df lower.CL upper.CL
LowCFF 0.586 0.0318 190 0.523 0.649
HighCFF 0.662 0.0318 190 0.600 0.725
T2Lag = RSVP_T2L3:
CFFGroup emmean SE df lower.CL upper.CL
LowCFF 0.571 0.0318 190 0.508 0.633
HighCFF 0.639 0.0318 190 0.577 0.702
T2Lag = RSVP_T2L4:
CFFGroup emmean SE df lower.CL upper.CL
LowCFF 0.542 0.0318 190 0.480 0.605
HighCFF 0.617 0.0318 190 0.554 0.680
T2Lag = RSVP_T2L5:
CFFGroup emmean SE df lower.CL upper.CL
LowCFF 0.653 0.0318 190 0.590 0.716
HighCFF 0.704 0.0318 190 0.642 0.767
T2Lag = RSVP_T2L6:
CFFGroup emmean SE df lower.CL upper.CL
LowCFF 0.719 0.0318 190 0.657 0.782
HighCFF 0.746 0.0318 190 0.683 0.808
T2Lag = RSVP_T2L7:
CFFGroup emmean SE df lower.CL upper.CL
LowCFF 0.758 0.0318 190 0.696 0.821
HighCFF 0.754 0.0318 190 0.691 0.817
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
$contrasts
T2Lag = RSVP_T2L1:
contrast estimate SE df t.ratio p.value
LowCFF - HighCFF -0.0324 0.0449 190 -0.720 0.4724
T2Lag = RSVP_T2L2:
contrast estimate SE df t.ratio p.value
LowCFF - HighCFF -0.0766 0.0449 190 -1.706 0.0897
T2Lag = RSVP_T2L3:
contrast estimate SE df t.ratio p.value
LowCFF - HighCFF -0.0687 0.0449 190 -1.528 0.1282
T2Lag = RSVP_T2L4:
contrast estimate SE df t.ratio p.value
LowCFF - HighCFF -0.0745 0.0449 190 -1.658 0.0990
T2Lag = RSVP_T2L5:
contrast estimate SE df t.ratio p.value
LowCFF - HighCFF -0.0512 0.0449 190 -1.140 0.2556
T2Lag = RSVP_T2L6:
contrast estimate SE df t.ratio p.value
LowCFF - HighCFF -0.0261 0.0449 190 -0.582 0.5614
T2Lag = RSVP_T2L7:
contrast estimate SE df t.ratio p.value
LowCFF - HighCFF 0.0044 0.0449 190 0.098 0.9220
Degrees-of-freedom method: kenward-roger
Performance is not significantly difference at any lag level, but it approaches significance on Lag 2 and Lag 4.
That’s it for our RSVP analysis.
Part 2: RDM analysis
Now let’s move on to the RDM analysis. Firstly, our experimenters who collected the data used slightly different methodologies to assess the coherence with which a participant could perform with ~70% accuracy. These results for these two methods were kept separate, so we can check if there are no statistically relevant differences between the two sets. Let’s have a look at the means, min/max values and standard deviations:
mean(na.omit(Dataset$RDM_coh_Quest))
[1] 27.52348
mean(na.omit(Dataset$RDM_coh_Verify))
[1] 26.57692
min(na.omit(Dataset$RDM_coh_Quest))
[1] 7.222
min(na.omit(Dataset$RDM_coh_Verify))
[1] 9
max(na.omit(Dataset$RDM_coh_Quest))
[1] 73.764
max(na.omit(Dataset$RDM_coh_Verify))
[1] 50
sd(na.omit(Dataset$RDM_coh_Quest))
[1] 18.32858
sd(na.omit(Dataset$RDM_coh_Verify))
[1] 11.26827
The means are pretty similar. The max values and SD’s are slightly different though. Let’s do a quick t-test
t.test(Dataset$RDM_coh_Quest, Dataset$RDM_coh_Verify, var.equal = FALSE)
Welch Two Sample t-test
data: Dataset$RDM_coh_Quest and Dataset$RDM_coh_Verify
t = 0.28256, df = 72.938, p-value = 0.7783
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-5.729947 7.623052
sample estimates:
mean of x mean of y
27.52348 26.57692
The two sets are not significantly different from each other, so let’s analyse them together. They’re grouped together in the column “Coherence” in the raw dataset, so we’ll just use that
Let’s make a data frame from the raw data:
# First get only the columns we need for our analysis
RDM<-Dataset[c("CFF", "Age", "Sex", "Coherence", "RDM_RT_verify")]
#Then remove the rows with missing values from the dataset
RDM<-na.omit(RDM)
We have 79 observations for this part.
First we’ll see if the coherence at which participants perform with roughly 70% accuracy can be predicted by CFF. Again, we will also put Age and Sex in the model as predictors
RDMmod1<- lm(Coherence ~ CFF + Age + Sex, data = RDM)
summary(RDMmod1)
Call:
lm(formula = Coherence ~ CFF + Age + Sex, data = RDM)
Residuals:
Min 1Q Median 3Q Max
-22.755 -11.855 -3.999 6.974 51.474
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 59.3019 22.3618 2.652 0.00979 **
CFF -0.1879 0.2309 -0.813 0.41856
Age -1.0877 0.8909 -1.221 0.22597
SexM 5.0131 4.3526 1.152 0.25314
SexO 0.9346 16.5276 0.057 0.95506
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 16.38 on 74 degrees of freedom
Multiple R-squared: 0.03926, Adjusted R-squared: -0.01267
F-statistic: 0.7559 on 4 and 74 DF, p-value: 0.5573
Check the model assumptions
qqPlot(RDMmod1$residuals)
[1] 57 52
shapiro.test(RDMmod1$residuals)
Shapiro-Wilk normality test
data: RDMmod1$residuals
W = 0.90118, p-value = 1.529e-05
vif(RDMmod1)
GVIF Df GVIF^(1/(2*Df))
CFF 1.045104 1 1.022303
Age 1.010321 1 1.005147
Sex 1.052731 2 1.012930
residualPlots(RDMmod1)
Test stat Pr(>|Test stat|)
CFF -0.7880 0.4333
Age 0.3861 0.7005
Sex
Tukey test 1.5312 0.1257
ncvTest(RDMmod1)
Non-constant Variance Score Test
Variance formula: ~ fitted.values
Chisquare = 0.3625736, Df = 1, p = 0.54708
influenceIndexPlot(RDMmod1, vars = "Cook")
influenceIndexPlot(RDMmod1, vars = "hat")
outlierTest(RDMmod1)
No Studentized residuals with Bonferroni p < 0.05
Largest |rstudent|:
The residuals aren’t entirely normally distributed, but overall the model looks good. The Cook’s distance plot shows some influential points, but they are not significant outliers in the outlier test. There’s no relationship at all between RDM coherence and any of the predictors. Just to check, we can drop the other predictors and only include the one we’re interested in the most (CFF)
RDMmod2<- lm(Coherence ~ CFF, data = RDM)
summary(RDMmod2)
Call:
lm(formula = Coherence ~ CFF, data = RDM)
Residuals:
Min 1Q Median 3Q Max
-19.778 -10.040 -5.410 7.067 47.335
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 35.6956 13.1861 2.707 0.00836 **
CFF -0.1464 0.2253 -0.650 0.51781
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 16.33 on 77 degrees of freedom
Multiple R-squared: 0.005452, Adjusted R-squared: -0.007464
F-statistic: 0.4221 on 1 and 77 DF, p-value: 0.5178
Makes no difference for the result. Just to be extra sure, let’s separate the data by the slight difference in methodology used by our two experimenters.
# Version 1 of our separated data set
RDMv1<-Dataset[c("CFF", "Age", "Sex", "RDM_coh_Quest")]
RDMv1<-na.omit(RDMv1)
# Version 2 of our separated data set
RDMv2<-Dataset[c("CFF", "Age", "Sex", "RDM_coh_Verify")]
RDMv2<-na.omit(RDMv2)
# Now for the linear models
Questmod<-lm(RDM_coh_Quest ~ CFF + Age + Sex, data = RDMv1)
summary(Questmod)
Call:
lm(formula = RDM_coh_Quest ~ CFF + Age + Sex, data = RDMv1)
Residuals:
Min 1Q Median 3Q Max
-19.804 -13.221 -6.114 9.099 50.669
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 57.5581 28.6633 2.008 0.0502 .
CFF -0.1125 0.3549 -0.317 0.7526
Age -1.1924 1.1099 -1.074 0.2879
SexM 6.3253 6.2075 1.019 0.3132
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 18.5 on 49 degrees of freedom
Multiple R-squared: 0.03962, Adjusted R-squared: -0.01918
F-statistic: 0.6738 on 3 and 49 DF, p-value: 0.5722
Verifymod<-lm(RDM_coh_Verify ~ CFF + Age + Sex, data = RDMv2)
summary(Verifymod)
Call:
lm(formula = RDM_coh_Verify ~ CFF + Age + Sex, data = RDMv2)
Residuals:
Min 1Q Median 3Q Max
-20.4675 -6.5446 -0.9614 6.1976 24.8644
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 79.3400 45.8460 1.731 0.0982 .
CFF -0.2237 0.2582 -0.866 0.3961
Age -2.0233 2.1249 -0.952 0.3518
SexM 3.2735 5.2366 0.625 0.5386
SexO 1.6115 12.1579 0.133 0.8958
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 11.79 on 21 degrees of freedom
Multiple R-squared: 0.07969, Adjusted R-squared: -0.09561
F-statistic: 0.4546 on 4 and 21 DF, p-value: 0.768
Similar results: no relationship between CFF and RDM coherence
RDMplot<-ggplot(data = RDM, aes(x = Coherence, y = CFF))
RDMplot +
geom_point(size = 3, colour = "#4C72B0", alpha = 0.7) +
geom_smooth(method = "lm", colour = "#F6BD60", se = FALSE, alpha = 0.15) +
theme_classic() +
theme(axis.title = element_text(size = 18, face = "bold"), axis.text = element_text(size = 18)) +
labs(x= "RDM Coherence", y = "CFF")
ggsave("RDMPlot.svg", device = svglite)
ggsave("RDMPlot.png", width = 8, height = 6, dpi = 300)
End