This function is a utility wrapper for SummarizedExperiment
that provides
automatic subsetting for row and column data, as well as automatic handling
of transgenes and spike-ins.
Usage
makeSummarizedExperiment(
assays = S4Vectors::SimpleList(),
rowRanges = GenomicRanges::GRanges(),
rowData = NULL,
colData = S4Vectors::DataFrame(),
metadata = list(),
transgeneNames = NULL,
denylist = TRUE,
sort = TRUE,
sessionInfo = TRUE
)
Arguments
- assays
SimpleList
. Count matrices, which must have matching dimensions. Counts can be passed in as either a dense matrix (matrix
) or sparse matrix (sparseMatrix
).- rowRanges
GenomicRanges
orGenomicRangesList
. Genomic ranges (e.g. genome annotations). Metadata describing the assay rows.- rowData
DataFrame
. Metadata describing the assay rows, if genomic ranges are not available. UserowRanges
(GenomicRanges
) instead, if possible.- colData
DataFrame
. Metadata describing the assay columns. For bulk RNA-seq, this data describes the samples. For single-cell RNA-seq, this data describes the cells.- metadata
list
. Metadata.- transgeneNames
character
. Vector indicating which assay rows denote transgenes (e.g. EGFP, TDTOMATO).- denylist
logical(1)
. Apply a denylist check on illegal column names defined incolData
. This is useful for catching names that are not considered best practice, and other values that may conflict with Acid Genomics packages. Refer tometadataDenylist
for the current list of offending values.- sort
logical(1)
. Ensure all row and column names are sorted alphabetically. This includes columns insiderowData
andcolData
, andmetadata
slot names. Assay names are required to containcounts
as the first assay.- sessionInfo
logical(1)
. Slot session information intometadata
.
Session information
This function improves upon the standard constructor by slotting useful
session information into the metadata
slot by default:
date
: Today's date, returned fromSys.Date
.sessionInfo
:sessioninfo::session_info()
return. This behavior can be disabled by settingsessionInfo = FALSE
.wd
: Working directory, returned fromgetwd
.
Examples
## Rows (genes)
genes <- c(
sprintf("gene%02d", seq_len(3L)),
"EGFP" # transgene
)
print(genes)
#> [1] "gene01" "gene02" "gene03" "EGFP"
## Columns (samples)
samples <- sprintf("sample%02d", seq_len(4L))
print(samples)
#> [1] "sample01" "sample02" "sample03" "sample04"
## Counts (assay)
counts <- matrix(
data = seq_len(length(genes) * length(samples)),
nrow = length(genes),
ncol = length(samples),
dimnames = list(genes, samples)
)
## Primary assay must be named "counts".
assays <- S4Vectors::SimpleList("counts" = counts)
print(assays)
#> List of length 1
#> names(1): counts
## Row data (genomic ranges)
## Note that we haven't defined the transgene here.
## It will be handled automatically in the function call.
rowRanges <- AcidGenomes::emptyRanges(head(genes, n = length(genes) - 1L))
print(rowRanges)
#> GRanges object with 3 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> gene01 unknown 1-100 *
#> gene02 unknown 101-200 *
#> gene03 unknown 201-300 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
## Column data
colData <- S4Vectors::DataFrame(
"age" = rep(
x = c(3L, 6L),
times = length(samples) / 2L
),
"genotype" = rep(
x = c("wildtype", "knockout"),
times = 1L,
each = length(samples) / 2L
),
row.names = samples
)
print(colData)
#> DataFrame with 4 rows and 2 columns
#> age genotype
#> <integer> <character>
#> sample01 3 wildtype
#> sample02 6 wildtype
#> sample03 3 knockout
#> sample04 6 knockout
## Minimal mode.
x <- makeSummarizedExperiment(assays = assays)
print(x)
#> class: SummarizedExperiment
#> dim: 4 4
#> metadata(3): date sessionInfo wd
#> assays(1): counts
#> rownames(4): EGFP gene01 gene02 gene03
#> rowData names(0):
#> colnames(4): sample01 sample02 sample03 sample04
#> colData names(0):
x <- makeSummarizedExperiment(
assays = assays,
rowRanges = rowRanges,
colData = colData,
transgeneNames = "EGFP"
)
print(x)
#> class: RangedSummarizedExperiment
#> dim: 4 4
#> metadata(3): date sessionInfo wd
#> assays(1): counts
#> rownames(4): EGFP gene01 gene02 gene03
#> rowData names(0):
#> colnames(4): sample01 sample02 sample03 sample04
#> colData names(2): age genotype