Get normalized RNA-seq expression data
Usage
rnaSeqData(
cancerStudy,
geneNames,
zscore = c("all samples", "diploid samples")
)
Arguments
- cancerStudy
character(1)
. Cancer study identifier (e.g. ""acc_tcga_pan_can_atlas_2018"). SeecancerStudies()
for details.- geneNames
character
. HUGO gene symbols (e.g. "TP53").- zscore
character
."all samples"
: mRNA expression z-Scores relative to all samples (log RNA Seq RPKM). Log-transformed mRNA z-Scores compared to the expression distribution of all samples (RNA Seq RPKM)."diploid samples"
: mRNA expression z-Scores relative to diploid samples (RNA Seq RPKM). mRNA z-Scores (RNA Seq RPKM) compared to the expression distribution of each gene tumors that are diploid for this gene.
Details
Examples of cancer studies with different mRNA data types:
RNA-seq v2:
gbm_tcga_pub2013_rna_seq_v2_mrna
.RNA-seq v1:
nbl_target_2018_pub_rna_seq_mrna
.Microarray:
gbm_tcga_pub_mrna
.
The cBioPortal Z-Score calculation method
cBioPortal currently generates two z-score profiles using two different base populations:
Distribution based on diploid samples only: The expression distribution
for unaltered copies of the gene is estimated by calculating the mean and
variance of the expression values for samples in which the gene is diploid
(i.e. value is "0" as reported by discrete CNA data). We call this the
unaltered distribution. If the gene has no diploid samples, then its
normalized expression is reported as NA
.
Distribution based on all samples: The expression distribution of the
gene is estimated by calculating the mean and variance of all samples with
expression values (excludes zero's and non-numeric values like NA
, NULL
or NaN
). If the gene has samples whose expression values are all zeros or
non-numeric, then its normalized expression is reported as NA
.
Otherwise for every sample, the gene's normalized expression for both the profiles is reported as:
where r
is the raw expression value, and mu
and sigma
are the mean and
standard deviation of the base population, respectively.
See also:
https://github.com/cBioPortal/cbioportal/blob/master/docs/ Z-Score-normalization-script.md
Examples
geneNames <- c("MYC", "TP53")
## ACC TCGA 2018 ====
cancerStudy <- "acc_tcga_pan_can_atlas_2018"
x <- rnaSeqData(cancerStudy = cancerStudy, geneNames = geneNames)
#> → Importing RNA-seq data: `acc_tcga_pan_can_atlas_2018_rna_seq_v2_mrna_median_all_sample_Zscores`.
print(x)
#> class: SummarizedExperiment
#> dim: 2 78
#> metadata(5): caseLists date geneticProfiles sessionInfo wd
#> assays(1): counts
#> rownames(2): MYC TP53
#> rowData names(0):
#> colnames(78): TCGA_OR_A5J1_01 TCGA_OR_A5J2_01 ... TCGA_PK_A5HA_01
#> TCGA_PK_A5HB_01
#> colData names(51): age ajccPathologicTumorStage ... tumorTissueSite
#> tumorType
## CCLE Broad 2019 ====
cancerStudy <- "ccle_broad_2019"
x <- rnaSeqData(cancerStudy = cancerStudy, geneNames = geneNames)
#> → Importing RNA-seq data: `ccle_broad_2019_rna_seq_mrna_median_all_sample_Zscores`.
print(x)
#> class: SummarizedExperiment
#> dim: 2 1570
#> metadata(5): caseLists date geneticProfiles sessionInfo wd
#> assays(1): counts
#> rownames(2): MYC TP53
#> rowData names(0):
#> colnames(1570): A101D_SKIN A1207_CENTRAL_NERVOUS_SYSTEM ...
#> ZR751_BREAST ZR7530_BREAST
#> colData names(43): age annotationSource ... tumorType typeRefined