This repository contains four datasets:

1) Bloodbank.csv: The longitudinal cohort containing the tested blood samples used to estimate the seroprevalence in the eight cities.
2) repeat_blood_donors.csv: The cohort of repeat blood donors used to estimate the probability distribution of the time-to-seroreversion.
3) convalescent_plasma_longitudinal_roche.csv: Convalescent plasma donors used to estimate the sensitivity of the assay.
4) prepandemic_cohort.csv: The pre-pandemic blood donors cohort, containing samples tested in February 2020 in São Paulo.

In all files, each row represents a tested blood sample. Information as exact age, education level and declared race were removed to ensure data are anonymized. For the same reason, dates of sample collection were substituted by the corresponding week numbers, and the date of onset was substituted by the time interval between the date of sample collection and the date of onset in the convalescent plasma donors dataset. 

See data_dictionary.pdf for the data dictionary.