In order to download the data from CbioPortal, one must first require a token from the website CbioPortal wich will prompt a login page with your MSKCC credentials. Then navigate to “Web API” in the top bar menu, following this simply download a token and copy it after running the following command in R:
usethis::edit_r_environ()
And pasting the token you were given in the .Renviron file that was created and saving after pasting your token.
CBIOPORTAL_TOKEN = 'YOUR_TOKEN'
You can test your connection using:
get_cbioportal_token()
There exist multiple datasets available in cbioPortal, and are available through our API:
The get_genetics()
function let’s the user download the mutation, fusion and copy-number alterations data in a Mutation Annotation Format (MAF) file for either the sample DMPID provided or a specific study. It takes the following arguments:
"mskimpact_Colorectal_Cancer"
)impact_gene_info
dataset.This function returns either a MAF file or a copy-number alterations summary file, or both (mut
and/or cna
).
Even though cBioPortal mainly focuses on IMPACT data, other genomic studies are also available on it. Particurlarly the The Cancer Genome Atlas (TCGA) database, which whole-exome sequenced large cohorts of 33 different cancer sites. This dataset is a public ressource and does not require a token to be accessed through the gnomeR
API, we will thus use it as an example for the functionalities of the API. Additionally to having access to mutational and copy number alteration data, cBioPortal also grants us access to other data types such as RNA-seq or RPPMs. In this section we will show how the user can use the API to retrieve this data. Note that the list of samples, cancer sites and genes available in tcga_samples
and tcga_genes
respectively.
To retrieve the TCGA mutational data the user should set the arguments mutations
to TRUE and database
to “tcga”:
# df <- get_genetics(sample_ids = c("TCGA-17-Z023-01","TCGA-02-0003-01","TCGA-02-0055-01"), # mutations = TRUE,fusions = FALSE, cna = FALSE, # database = "tcga")
# df$mut
# df <- get_genetics(sample_ids = as.character(tcga_samples$patient_id[!is.na(tcga_samples$Cancer_Code)][1:100]), # mutations = TRUE,fusions = TRUE, cna = FALSE, # database = "tcga")
# df$mut %>% # filter(Variant_Classification == "Fusion")
# df <- get_genetics(sample_ids = c("TCGA-17-Z023-01","TCGA-02-0003-01","TCGA-02-0055-01"), # mutations = FALSE,fusions = FALSE, cna = TRUE, # database = "tcga")
# df$cna
The copy-number alterations data we have covered up to now is a discrete estimation of the alterations that occured. There however exist more nuanced and accurate data for copy-number alterations observed in a tumor. In gnomeR
we include an example of segmentation file and relevant functions from the facets
package that provides an allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing. We show below how to download it from the API in gnomeR.
# df <- get_genetics(sample_ids = c("TCGA-17-Z023-01","TCGA-02-0003-01","TCGA-02-0055-01"), # mutations = FALSE,fusions = FALSE, cna = FALSE, seg = TRUE, # database = "tcga")
# df$seg
As mentioned previously IMPACT genomic data is protected and requires a token to be accessed. The ger_genetics()
functions in the same way to the examples shown above for TCGA datasets.
# df.mut <- get_genetics("P-0000062-T01-IM3",database = "msk_impact", # mutations = TRUE, fusions = FALSE, cna = FALSE) # df.mut
# df.fus <- get_genetics("P-0000062-T01-IM3",database = "msk_impact", # mutations = TRUE, fusions = FALSE, cna = FALSE) # df.fus
# df.cna <- get_genetics("P-0000062-T01-IM3",database = "msk_impact", # mutations = FALSE, fusions = FALSE, cna = TRUE) # df.cna
# df.gen <- get_genetics("P-0000062-T01-IM3",database = "msk_impact", # mutations = TRUE, fusions = TRUE, cna = TRUE) # df.gen
We show here an example to retrieve all the samples in a study. A complete list of these studies can be found on the CbioPortal website. Not working yet.
# df.gen <- get_genetics(sample_list_id = "mskimpact_Colorectal_Cancer",database = "msk_impact", # mutations = TRUE, fusions = TRUE, cna = TRUE)