Get normalized RNA-seq expression data

Usage

rnaSeqData(
  cancerStudy,
  geneNames,
  zscore = c("all samples", "diploid samples")
)

Arguments

cancerStudy

character(1). Cancer study identifier (e.g. ""acc_tcga_pan_can_atlas_2018"). See cancerStudies() for details.

geneNames

character. HUGO gene symbols (e.g. "TP53").

zscore

character.

"all samples": mRNA expression z-Scores relative to all samples (log RNA Seq RPKM). Log-transformed mRNA z-Scores compared to the expression distribution of all samples (RNA Seq RPKM).
"diploid samples": mRNA expression z-Scores relative to diploid samples (RNA Seq RPKM). mRNA z-Scores (RNA Seq RPKM) compared to the expression distribution of each gene tumors that are diploid for this gene.

Value

SummarizedExperiment. Samples (e.g. patient tumors) in the columns and genes in the rows.

Details

Examples of cancer studies with different mRNA data types:

RNA-seq v2: gbm_tcga_pub2013_rna_seq_v2_mrna.
RNA-seq v1: nbl_target_2018_pub_rna_seq_mrna.
Microarray: gbm_tcga_pub_mrna.

Note

Updated 2021-09-03.

The cBioPortal Z-Score calculation method

cBioPortal currently generates two z-score profiles using two different base populations:

Distribution based on diploid samples only: The expression distribution for unaltered copies of the gene is estimated by calculating the mean and variance of the expression values for samples in which the gene is diploid (i.e. value is "0" as reported by discrete CNA data). We call this the unaltered distribution. If the gene has no diploid samples, then its normalized expression is reported as NA.

Distribution based on all samples: The expression distribution of the gene is estimated by calculating the mean and variance of all samples with expression values (excludes zero's and non-numeric values like NA, NULL or NaN). If the gene has samples whose expression values are all zeros or non-numeric, then its normalized expression is reported as NA.

Otherwise for every sample, the gene's normalized expression for both the profiles is reported as:

where r is the raw expression value, and mu and sigma are the mean and standard deviation of the base population, respectively.

Examples

geneNames <- c("MYC", "TP53")

## ACC TCGA 2018 ====
cancerStudy <- "acc_tcga_pan_can_atlas_2018"
x <- rnaSeqData(cancerStudy = cancerStudy, geneNames = geneNames)
#> → Importing RNA-seq data: `acc_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_Zscores`.
print(x)
#> class: SummarizedExperiment 
#> dim: 2 78 
#> metadata(5): caseLists date geneticProfiles sessionInfo wd
#> assays(1): counts
#> rownames(2): MYC TP53
#> rowData names(0):
#> colnames(78): TCGA_OR_A5J1_01 TCGA_OR_A5J2_01 ... TCGA_PK_A5HA_01
#>   TCGA_PK_A5HB_01
#> colData names(51): age ajccPathologicTumorStage ... tumorTissueSite
#>   tumorType

## CCLE Broad 2019 ====
cancerStudy <- "ccle_broad_2019"
x <- rnaSeqData(cancerStudy = cancerStudy, geneNames = geneNames)
#> → Importing RNA-seq data: `ccle_broad_2019_rna_seq_mrna_median_all_sample_Zscores`.
print(x)
#> class: SummarizedExperiment 
#> dim: 2 1570 
#> metadata(5): caseLists date geneticProfiles sessionInfo wd
#> assays(1): counts
#> rownames(2): MYC TP53
#> rowData names(0):
#> colnames(1570): A101D_SKIN A1207_CENTRAL_NERVOUS_SYSTEM ...
#>   ZR751_BREAST ZR7530_BREAST
#> colData names(43): age annotationSource ... tumorType typeRefined