Using the Data Browser and AllofUs Database
Source:vignettes/web_only/data-browser.Rmd
data-browser.Rmd
Using the AllofUs data browser and R together
This vignette will go over how to use the AllofUs Research Data Browser to identify questions of interest, and then how to pull that data from the OMOP-CDM structured database. The Data Browser is a useful took for quickly finding data (survey questions, conditions in the electronic health record, fitbit measurements etc.). It can be found here: https://databrowser.researchallofus.org/ and is available publicly. However, it’s not always clear exactly how to query the OMOP-CDM database to find the data that you can preview in the Data Browser.
Later Edit: Nearly all the key information you need to find survey data can be found here: https://support.researchallofus.org/hc/en-us/articles/6085114880148.
Survey Data
The Basics
Finding the question “In what country were you born?” and participants answers
We can see in the image above that the “Concept Code” for Birthplace USA is 1586136. This is a Non-Standard AllofUs specific code used to identify Birthplace: USA. We need to find the code for the question. An easy way to do this is to search for this code in Athena (https://athena.ohdsi.org/search-terms/start).
Clicking on Birthplace:USA reveals that it is in the Observation Domain, it is a question Answer, and is in the PPI Vocabulary (the vocabulary for AllofUs specific, non-standard codes). We also see that it has the PPI Parent Code of 1586135 “The Basics: Birthplace”. This is the non-standard code for the question. We can also see the non-standard to Standard map (OMOP) For Birthlace code is 3005917. This is the Standard code for “Birthplace”.
To be successful finding AllofUs Data from the Data Browswer, it is really important to understand the difference between these non-standard and standard codes.
- non-standard codes are specific to the AllofUs database. Standard codes are codes that could be found in any OMOP database.
- non-standard codes are found in the **{table}_source_concept_id** columns. standard codes are found in the **{table}_concept_id** columns
Lets look at this survey question example. We can find our
question, non-standard code 1586135 in
the observation table. This is the AllofUs specific
code for The Basics: Birthplace question in the
observation_source_concept_id column. Here’s a glimpse at the distinct
responses for this code (note that I’ve omitted other important columns
like person_id
and observation_date
and shown
an aggregate row to comply with the AllofUs data sharing
restrictions).
# start by loading R packages
library(allofus)
# You can also use library(tidyverse) to load all of these at once
library(dplyr)
library(tidyr)
library(tibble)
con <- allofus::aou_connect()
dplyr::tbl(con, "observation") %>%
dplyr::filter(observation_source_concept_id == 1586135) %>%
head() %>%
dplyr::distinct(observation_concept_id, observation_source_value,
observation_source_concept_id, value_source_concept_id, value_source_value) %>%
dplyr::collect()
observation_concept_id | observation_source_value | observation_source_concept_id | value_source_concept_id | value_source_value |
---|---|---|---|---|
3005917 | TheBasics_Birthplace | 1586135 | 1586136 | Birthplace_USA |
These columns match to the values we saw in Athena. We can see the
standard concept ID for the Birthplace question in the
observation_concept_id
column. The
observation_source_value
and
observation_source_concept_id
columns hold the related
non-standard codes for the question. Finally, we see
where the survey responses are: the non-standard
concept code we saw in the data browser is in the
value_source_concept_id
column and the text version of this
code is in value_source_value
.