Nest success estimates are biased upwards, when they are estimated directly from the raw data, because nests that failed are underrepresented in the raw data. They are underrepresented, because they have a smaller chance to be detected compared to successful nests, because they are available for detection for a shorter time period only.

Mayfield suggested a simple method to account for this bias (for a history of the methods, see Johnson 2007 Methods of estimating nest success: an historical tour, Studies in Avian Biology No. 34). Further methodological developments were made, among others, nest survival models can be fitted in MARK (Cooch & White 2018: Program MARK, A Gentle Introduction; with chap 17 on Nest survival and Appendix C.18 for its implementation in RMark). In a post (https://rpubs.com/bbolker/logregexp), Ben Bolker provided an approach based on ideas also presented by Heisey et al (2007: The ABCS of nest survival: theory and application from a biostatistical persepctive, Studies in Avian Biology 34: 13-33), which formulate nest survival estimates in the framework of survival analyses, using a complementary log-log link function.

In our analyses for Rime et al. “Drivers of nest site selection and breeding success in an Alpine ground-nesting songbird” we used the code provided in the post by Ben Bolker. A reviewer suggested to justify this, given the method is not peer-reviewed (but stating the admiration for Ben Bolkers contributions to biostatistics in general!). We did not use Mark or RMark as we assumed that equally-spaced searches would be needed for that (which we don’t have), while Bolker’s code works directly with exposure time between two visits of any length. Also, Bolker’s code allows us to use the function sim from package arm to get draws from the joint posterior distribution (i.e. Bayesian framework) which allows to easily create (custom) effect plots - in our view the most helpful way to present results from which biological inference can be drawn. Therefore, to justify using Bolker’s code, we simulate nest survival data and analyse it using Mayfield, RMark and Bolker’s code. We find that all produce similar results in the setting we use. We aknowledge that this is not a thorough evaluation for the methods, e.g. we don’t explore what happens when predictors are included, but nevertheless we argue that the code by Bolker is well justified theoretically (though this is not easily proven by a non-statistician ecologist like us) and by our simulation.

We thank Ben Bolker for his code, and him and an anonymous reviewer for pointing out relevant literature.

# simulation of survival data to compare Mayfield, BBolker and RMark:

nrep <- 100
t.res <- data.frame(rep = 1:nrep,
                    truth = NA,   # true nest survival (for entire time = nd days)
                    apparent = NA,
                    Mayfield = NA,
                    rmark = NA, rmark.lwr = NA, rmark.upr = NA,
                    bb = NA, bb.lwr = NA, bb.upr = NA)

for(t.rep in 1:nrep) {

# bird with 30-day nest time; all nests start at the same date 1
nn <- 300       # nr of nests total (including nondetected nests)
nd <- 30        # nr of days for nesting time
t.dsr <- 0.986  # daily survival rate
t.det <- 0.13   # daily detection probability of a nest alive
nocc <- 5       # nr of nest visits: on days spread equally from start to end
t.gap <- floor(nd/nocc)   # days between visits
t.dobs <- sort(nd-2 - t.gap*(0:(nocc-1)))  # days with visits


# create truth of survival of the nn nests across nd days:
t.tru <- matrix(NA, nrow=nn, ncol=nd)
t.tru[,1] <- 1  # every nest alive on day 1
for(n in 1:nn) {
  for(d in 2:nd) {
    if(t.tru[n,d-1]==0) t.tru[n,d] <- 0  # if dead the day before: also dead on day d
    if(t.tru[n,d-1]==1) t.tru[n,d] <- rbinom(1,1,t.dsr)  # if alive: survival with t.dsr probability
  }
}
t.res$truth[t.rep] <- mean(t.tru[,nd])   # proportion of nests surviving to the end
  # This is, on average, equal to:
t.dsr^(nd-1)

## explanatory plots for code development
# plot(1:nd,apply(t.tru,2,mean), type="l", ylim=c(0,1))
#    # true survival curve
# plot(2:nd,apply(apply(t.tru,1,diff),1,FUN=function(x)mean(x==0)))
#    # the observed daily survival rates, compared to the underlaying survival rate
# abline(h=t.dsr, col="red")
# abline(h=mean(apply(apply(t.tru,1,diff),1,FUN=function(x)mean(x==0))),col="orange")

t.tru.od <- t.tru[,t.dobs]   # true state on observation days
# only a nest alive can be found, with t.det; once found, it is checked on each visit until it fails.

# create observation states on observation days:
t.obs.od <- t.tru.od  # observations on observation days: start with the truth, but then put in NA where not observed:

# first, create detections ignoring previous detections and true states:
tt.det.hyp <- matrix(sample(c(0,1),nrow(t.obs.od)*ncol(t.obs.od),replace = T, 
                             prob=c((1-t.det),t.det)),nrow=nrow(t.obs.od),ncol=ncol(t.obs.od))
# first detection per nest:
tt.det.first <- apply(tt.det.hyp,1,FUN=function(x) {
  if(sum(x==1)==0) return(NA)
  if(sum(x==1)>0) return(min(which(x==1)))
})

# nests never observed:
t.obs.od[is.na(tt.det.first),] <- NA
# nests with a "detection", but which is only a real detection if the nest was still alive:
for(n in which(!is.na(tt.det.first))) {
  t.first <- tt.det.first[n]  # first potential visit to that nest
  # if nest was already gone at first visit:
  if(t.tru.od[n,t.first]==0) {
    t.obs.od[n,] <- NA
    next
  }
  # else:
  # NA before first visit:
  if(t.first>1) t.obs.od[n,1:(t.first-1)] <- NA
  # NA after the first zero (i.e. after knowing it failed):
  if(sum(t.obs.od[n,] %in% 0)>1) t.obs.od[n,(which.min(t.obs.od[n,])+1):nocc] <- NA
}

# to continue, keep only nests that were observed at least twice:
d.obs <- t.obs.od[apply(t.obs.od,1,FUN=function(x)sum(!is.na(x))>1),]
# separately store the nests that survived:
d.surv <- t.tru[apply(t.obs.od,1,FUN=function(x)sum(!is.na(x))>1),ncol(t.tru)]

head(d.obs)  # the observed nests
# note that the corresponding observation days are:
t.dobs


## Estimators for nest survival:

# the apparent nest survival estimator:
t.res$apparent[t.rep] <- sum(d.obs[,nocc] %in% 1) / nrow(d.obs)

# Mayfield: n failed nests / total exposure = daily mortality rate, hence:
d.exposure <- apply(d.obs,1,FUN=function(x)sum(x%in%1))*t.gap -   # each observed "1" gets t.gap days, minus:
  t.gap/2 * apply(d.obs,1,FUN=function(x) sum(x%in%0)>0) -   # half a gap-time if the nest was observed to have failed
  t.gap * apply(d.obs,1,FUN=function(x) sum(x%in%0)==0)  # one t.gap for nests that were not observed to have failed (i.e. for the last "1")

( dsr <- 1-(sum(d.surv==0)/sum(d.exposure)) )
t.res$Mayfield[t.rep] <- dsr^nd   # Mayfield nest success estimate

# RMark: we need the following:
#   FirstFound: day the nest was first found
#   LastPresent: last day that a chick was present in the nest
#   LastChecked: last day the nest was checked
#   Fate: fate of the nest; 0=success and 1=depredated

d.rmark <- data.frame(id = paste0("ID",1:nrow(d.obs)),
                      FirstFound = NA, LastPresent = NA, LastChecked = NA, Fate = NA)
for(n in 1:nrow(d.obs)) {
  tt <- d.obs[n,]
  d.rmark$FirstFound[n] <- min(t.dobs[tt %in% 1])
  d.rmark$LastPresent[n] <- max(t.dobs[tt %in% 1])
  d.rmark$LastChecked[n] <- max(t.dobs[tt %in% c(0,1)])
  d.rmark$Fate[n] <- 1-tail(tt[!is.na(tt)],1)  # based on last visit
}

modm <- mark(d.rmark, model="Nest",nocc=nd, output=F, silent=T, delete = T)
t.beta <- modm$results$beta
t.res$rmark[t.rep] <- plogis(t.beta[1,"estimate"])^nd
t.res$rmark.lwr[t.rep] <- plogis(t.beta[1,"estimate"]-2*t.beta[1,"se"])^nd
t.res$rmark.upr[t.rep] <- plogis(t.beta[1,"estimate"]+2*t.beta[1,"se"])^nd

# Ben Bolkers approach: need one row per visit, i.e. per non-NA value in d.obs:
dat.bb <- data.frame(id = rep(paste0("id",1:nrow(d.obs)),each=4),
                     alive = NA,
                     exposure = NA)
for(n in 1:nrow(d.obs)){
  tix <- dat.bb$id==paste0("id",n)
  tt.obs <- d.obs[n,]
  dat.bb$alive[tix] <- tt.obs[2:5]
  dat.bb$exposure[tix] <- diff(t.dobs)
}
dat.bb <- dat.bb[!is.na(dat.bb$alive),]


m_glm_S <- glm(alive ~ 1 + offset(log(exposure)),
               data=dat.bb, family=binomial(link=logexp(dat.bb$exposure)))
nsim <- 5000
bsim <- sim(m_glm_S, n.sims=nsim)
# according to
logexp()$linkinv  # the inverse function for the overall survival seems to be:
  # and in get_expossure(), there is an object ..exposure, which is set to 
  # meran(dat$exposure) in the example of Ben Bolker
f.inv <- function(eta) plogis(eta)^mean(dat.bb$exposure)
t.res$bb[t.rep] <- as.numeric(f.inv(coef(m_glm_S)))

## coefficient estimates with 95% uncertainty interval:
tt <- f.inv(round(t(apply(bsim@coef,2,quantile,probs=c(0.025,0.975))),3))
t.res$bb.lwr[t.rep] <- tt[1]
t.res$bb.upr[t.rep] <- tt[2]

}

save(t.res,file="t.res.cmpMMB")

Plots to compare the simulation results: each dot in the following plots is one simulation (one of 100). Each “simulation” corresponds to the simulation of one data set (using the same expected values for survival and detection) that is then analysed using Mayfield, Mark and Ben Bolkers method.

Fig. 1. Comparing the true (simulated) nest survival with the apparent survival from raw data.

Fig. 1. Comparing the true (simulated) nest survival with the apparent survival from raw data.

Of course: apparent survival is always an over-estimation of the true nest survival (Fig. 1).

Fig. 2. Comparing the true (simulated) nest survival with the Mayfield estimate.

Fig. 2. Comparing the true (simulated) nest survival with the Mayfield estimate.

Mayfield seems overall roughly unbiased (Fig. 2).

Fig. 3. Comparing the estimate from RMark and using Ben Bolker's code (red vs blue) along the true values (on the x-axis). Position along the x-axis slightly shifted to reduce overlap.

Fig. 3. Comparing the estimate from RMark and using Ben Bolker’s code (red vs blue) along the true values (on the x-axis). Position along the x-axis slightly shifted to reduce overlap.

Comparing the estimates using RMark and Bolker’s method, we see very similar results (Fig. 3) that are also overall similar to Mayfield.

Fig. 4. Comparing the estimate from RMark and from using Ben Bolker's code (with 95% credible intervals)

Fig. 4. Comparing the estimate from RMark and from using Ben Bolker’s code (with 95% credible intervals)

Estimates from Bolkers method correlate very strongly with the estimates from RMark. In our simulation, the average bias of the estimates using RMark is -0.63%, those using Bolker’s method 1.66%.