Bob O'Hara
Sept 11 2015
“Buprestidae - Naturmuseum Senckenberg - DSC02118” by Daderot - Daderot. Licensed under CC0 via Wikimedia Commons - https://commons.wikimedia.org/wiki/File:Buprestidae_-_Naturmuseum_Senckenberg_-_DSC02118.JPG#/media/File:Buprestidae_-_Naturmuseum_Senckenberg_-_DSC02118.JPG
Go to a site, look for the species. If you see it, it's present. If not, it may be absent
For the moment, assume you are brilliant & if you don't see it, it's definitely not there
Museums have specimens with their locations recorded
So, we only have presences but no records of absences
Presence/Absence and Presence Only
(If we have several visits to a site, then we can use occupancy models)
For each site we observe presence (\( Y \)=1) or absence (\( Y \)=0).
We can model \( Pr(Y=1) \). e.g. a logistic regression:
\[ log{\frac{p}{1-p}} = X \beta \]
Need to somehow model where the species might not be
Two and a half methods:
Assume sampling is uniform across space
We have presences
Select some pseudo-absences at random
Treat the problem as a presence-absence analysis
“Contaminated controls”
Again, select a lot of pseudo-absences at random
Rather than logistic regression, maximise the entropy:
\[ E = \sum p \log p + C \]
Where C is some constraints
Assume that there is some density (\( e^{\eta(s)} \)) across space, and the higher density the more like there is to be a presence
Then in any area \( A \), number of obervations follows a Poisson Process with mean \( \lambda \):
\[ \lambda = \int_A e^{\eta(s)}ds \]
The likelihood for the data (X) has to integrate over the whole area, This is calculated numerically by chosing some integration points at which to calculate the density.
It turns out that MaxEnt and the point process approach are the same, except for the intercept
Pseudo-absence approach is an approximation to the same process
If we do not know \( Pr(X=1|I) \) but it is constant then
\[ Pr(X=1) = Pr(X=1|I)Pr(I=1) = pf(env) \]
So we can still estimate differences in occupancy, but can't calibrate
If \( f(env) \) is higher, more likely spoecies is present
Sometimes we want to say which sites are occupied. Do this by stetting a threshold for \( f(env) \): if it is above this, say species is present, else absent.
Can set threshold to minimise cost of error. Have to decide which is more costly: false positives or false negatives
Confusion matrix