Make genomic ranges (GRanges) from Ensembl
Source: R/makeGRangesFromEnsembl.R
makeGRangesFromEnsembl.RdQuickly obtain gene and transcript annotations from Ensembl using AnnotationHub and ensembldb.
Arguments
- organism
character(1). Full Latin organism name (e.g."Homo sapiens").- level
character(1). Return as genes or transcripts.- genomeBuild
character(1). Ensembl genome build assembly name (e.g."GRCh38"). If setNULL, defaults to the most recent build available. Note: don't pass in UCSC build IDs (e.g."hg38").- release
integer(1). Ensembl release version (e.g.100). We recommend setting this value if possible, for improved reproducibility. When left unset, the latest release available via AnnotationHub/ensembldb is used. Note that the latest version available can vary, depending on the versions of AnnotationHub and ensembldb in use.- ignoreVersion
logical(1). Ignore identifier (e.g. transcript, gene) versions. When applicable, the identifier containing version numbers will be stored intxIdVersionandgeneIdVersion, and the variants without versions will be stored intxId,txIdNoVersion,geneId, andgeneIdNoVersion.- extraMcols
logical(1). Include extra metadata columns (e.g."broadClass").- object
EnsDborcharacter(1).EnsDbobject or name of specific annotation package containing a versioned EnsDb object (e.g. "EnsDb.Hsapiens.v75").
Details
Simply specify the desired organism, using the full Latin name. For example,
we can obtain human annotations with Homo sapiens. Optionally, specific
Ensembl genome builds (e.g. GRCh38) and release versions (e.g. 87) are
supported.
Under the hood, this function fetches annotations from AnnotationHub using the ensembldb package. AnnotationHub supports versioned Ensembl releases, back to version 87.
Genome build: use "GRCh38" instead of "hg38" for the genome build, since
we're querying Ensembl and not UCSC.
Functions
makeGRangesFromEnsembl(): Obtain annotations from Ensembl by querying AnnotationHub.makeGRangesFromEnsDb(): Use a specificEnsDbobject as the annotation source. Alternatively, can pass in an EnsDb package name as acharacter(1).
Broad class definitions
For gene and transcript annotations, a broadClass column is added, which
generalizes the gene types into a smaller number of semantically-meaningful
groups:
coding.noncoding.pseudo.small.decaying.ig(immunoglobulin).tcr(T cell receptor).other.
GRCh37 (hg19) legacy annotations
makeGRangesFromEnsembl() supports the legacy Homo sapiens GRCh37 (release
75) build by internally querying the EnsDb.Hsapiens.v75 package.
Alternatively, the corresponding GTF/GFF file can be loaded directly from
GENCODE or Ensembl.
Examples
## Get annotations from Ensembl via AnnotationHub query.
genes <- makeGRangesFromEnsembl(
organism = "Homo sapiens",
level = "genes"
)
#> → Making <GRanges> from Ensembl.
#> → Getting <EnsDb> from AnnotationHub 3.14.0 (2024-10-28).
#> ℹ "AH119325": Ensembl 113 EnsDb for Homo sapiens.
#> → Making <GRanges> from <EnsDb>.
#> Organism: Homo sapiens
#> Genome build: GRCh38
#> Release: 113
#> Level: genes
#> → Defining names by `geneId` column in `mcols()`.
summary(genes)
#> [1] "EnsemblGenes object with 87726 ranges and 9 metadata columns"
transcripts <- makeGRangesFromEnsembl(
organism = "Homo sapiens",
level = "transcripts"
)
#> → Making <GRanges> from Ensembl.
#> → Getting <EnsDb> from AnnotationHub 3.14.0 (2024-10-28).
#> ℹ "AH119325": Ensembl 113 EnsDb for Homo sapiens.
#> → Making <GRanges> from <EnsDb>.
#> Organism: Homo sapiens
#> Genome build: GRCh38
#> Release: 113
#> Level: transcripts
#> → Defining names by `txId` column in `mcols()`.
summary(transcripts)
#> [1] "EnsemblTranscripts object with 413674 ranges and 22 metadata columns"
## Get annotations from specific EnsDb object or package.
## > if (goalie::isInstalled("EnsDb.Hsapiens.v75")) {
## > genes <- makeGRangesFromEnsDb(
## > object = "EnsDb.Hsapiens.v75",
## > level = "genes"
## > )
## > summary(genes)
## > }