Make genomic ranges (GRanges
) from Ensembl
Source: R/makeGRangesFromEnsembl.R
makeGRangesFromEnsembl.Rd
Quickly obtain gene and transcript annotations from Ensembl using AnnotationHub and ensembldb.
Arguments
- organism
character(1)
. Full Latin organism name (e.g."Homo sapiens"
).- level
character(1)
. Return as genes or transcripts.- genomeBuild
character(1)
. Ensembl genome build assembly name (e.g."GRCh38"
). If setNULL
, defaults to the most recent build available. Note: don't pass in UCSC build IDs (e.g."hg38"
).- release
integer(1)
. Ensembl release version (e.g.100
). We recommend setting this value if possible, for improved reproducibility. When left unset, the latest release available via AnnotationHub/ensembldb is used. Note that the latest version available can vary, depending on the versions of AnnotationHub and ensembldb in use.- ignoreVersion
logical(1)
. Ignore identifier (e.g. transcript, gene) versions. When applicable, the identifier containing version numbers will be stored intxIdVersion
andgeneIdVersion
, and the variants without versions will be stored intxId
,txIdNoVersion
,geneId
, andgeneIdNoVersion
.- extraMcols
logical(1)
. Include extra metadata columns (e.g."broadClass"
).- object
EnsDb
orcharacter(1)
.EnsDb
object or name of specific annotation package containing a versioned EnsDb object (e.g. "EnsDb.Hsapiens.v75").
Details
Simply specify the desired organism, using the full Latin name. For example,
we can obtain human annotations with Homo sapiens
. Optionally, specific
Ensembl genome builds (e.g. GRCh38
) and release versions (e.g. 87
) are
supported.
Under the hood, this function fetches annotations from AnnotationHub using the ensembldb package. AnnotationHub supports versioned Ensembl releases, back to version 87.
Genome build: use "GRCh38"
instead of "hg38"
for the genome build, since
we're querying Ensembl and not UCSC.
Functions
makeGRangesFromEnsembl()
: Obtain annotations from Ensembl by querying AnnotationHub.makeGRangesFromEnsDb()
: Use a specificEnsDb
object as the annotation source. Alternatively, can pass in an EnsDb package name as acharacter(1)
.
Broad class definitions
For gene and transcript annotations, a broadClass
column is added, which
generalizes the gene types into a smaller number of semantically-meaningful
groups:
coding
.noncoding
.pseudo
.small
.decaying
.ig
(immunoglobulin).tcr
(T cell receptor).other
.
GRCh37 (hg19) legacy annotations
makeGRangesFromEnsembl()
supports the legacy Homo sapiens GRCh37 (release
75) build by internally querying the EnsDb.Hsapiens.v75 package.
Alternatively, the corresponding GTF/GFF file can be loaded directly from
GENCODE or Ensembl.
Examples
## Get annotations from Ensembl via AnnotationHub query.
genes <- makeGRangesFromEnsembl(
organism = "Homo sapiens",
level = "genes"
)
#> → Making <GRanges> from Ensembl.
#> → Getting <EnsDb> from AnnotationHub 3.10.0 (2023-10-20).
#> ℹ "AH113665": Ensembl 110 EnsDb for Homo sapiens.
#> → Making <GRanges> from <EnsDb>.
#> Organism: Homo sapiens
#> Genome build: GRCh38
#> Release: 110
#> Level: genes
#> → Downloading extra gene-level metadata from Ensembl.
#> → Importing /Users/mike/.cache/R/AcidGenomes/BiocFileCache/89e57efaa427_gene.txt.gz using base::`read.table()`.
#> → Importing /Users/mike/.cache/R/AcidGenomes/BiocFileCache/89e5c7338bf_external_synonym.txt.gz using base::`read.table()`.
#> → Importing /Users/mike/.cache/R/AcidGenomes/BiocFileCache/89e534a0ef45_Homo_sapiens.GRCh38.110.entrez.tsv.gz using base::`read.table()`.
#> → Defining names by `geneId` column in `mcols()`.
summary(genes)
#> [1] "EnsemblGenes object with 71440 ranges and 11 metadata columns"
transcripts <- makeGRangesFromEnsembl(
organism = "Homo sapiens",
level = "transcripts"
)
#> → Making <GRanges> from Ensembl.
#> → Getting <EnsDb> from AnnotationHub 3.10.0 (2023-10-20).
#> ℹ "AH113665": Ensembl 110 EnsDb for Homo sapiens.
#> → Making <GRanges> from <EnsDb>.
#> Organism: Homo sapiens
#> Genome build: GRCh38
#> Release: 110
#> Level: transcripts
#> → Downloading extra gene-level metadata from Ensembl.
#> → Importing /Users/mike/.cache/R/AcidGenomes/BiocFileCache/89e57efaa427_gene.txt.gz using base::`read.table()`.
#> → Importing /Users/mike/.cache/R/AcidGenomes/BiocFileCache/89e5c7338bf_external_synonym.txt.gz using base::`read.table()`.
#> → Importing /Users/mike/.cache/R/AcidGenomes/BiocFileCache/89e534a0ef45_Homo_sapiens.GRCh38.110.entrez.tsv.gz using base::`read.table()`.
#> → Defining names by `txId` column in `mcols()`.
summary(transcripts)
#> [1] "EnsemblTranscripts object with 278545 ranges and 24 metadata columns"
## Get annotations from specific EnsDb object or package.
## > if (goalie::isInstalled("EnsDb.Hsapiens.v75")) {
## > genes <- makeGRangesFromEnsDb(
## > object = "EnsDb.Hsapiens.v75",
## > level = "genes"
## > )
## > summary(genes)
## > }