Skip to contents

Quickly obtain gene and transcript annotations from Ensembl using AnnotationHub and ensembldb.

Usage

makeGRangesFromEnsembl(
  organism,
  level = c("genes", "transcripts"),
  genomeBuild = NULL,
  release = NULL,
  ignoreVersion = FALSE,
  extraMcols = TRUE
)

makeGRangesFromEnsDb(
  object,
  level = c("genes", "transcripts"),
  ignoreVersion = FALSE,
  extraMcols = TRUE
)

Arguments

organism

character(1). Full Latin organism name (e.g. "Homo sapiens").

level

character(1). Return as genes or transcripts.

genomeBuild

character(1). Ensembl genome build assembly name (e.g. "GRCh38"). If set NULL, defaults to the most recent build available. Note: don't pass in UCSC build IDs (e.g. "hg38").

release

integer(1). Ensembl release version (e.g. 100). We recommend setting this value if possible, for improved reproducibility. When left unset, the latest release available via AnnotationHub/ensembldb is used. Note that the latest version available can vary, depending on the versions of AnnotationHub and ensembldb in use.

ignoreVersion

logical(1). Ignore identifier (e.g. transcript, gene) versions. When applicable, the identifier containing version numbers will be stored in txIdVersion and geneIdVersion, and the variants without versions will be stored in txId, txIdNoVersion, geneId, and geneIdNoVersion.

extraMcols

logical(1). Include extra metadata columns (e.g. "broadClass").

object

EnsDb or character(1). EnsDb object or name of specific annotation package containing a versioned EnsDb object (e.g. "EnsDb.Hsapiens.v75").

Value

EnsemblGenes or EnsemblTranscripts.

Details

Simply specify the desired organism, using the full Latin name. For example, we can obtain human annotations with Homo sapiens. Optionally, specific Ensembl genome builds (e.g. GRCh38) and release versions (e.g. 87) are supported.

Under the hood, this function fetches annotations from AnnotationHub using the ensembldb package. AnnotationHub supports versioned Ensembl releases, back to version 87.

Genome build: use "GRCh38" instead of "hg38" for the genome build, since we're querying Ensembl and not UCSC.

Functions

  • makeGRangesFromEnsembl(): Obtain annotations from Ensembl by querying AnnotationHub.

  • makeGRangesFromEnsDb(): Use a specific EnsDb object as the annotation source. Alternatively, can pass in an EnsDb package name as a character(1).

Note

Updated 2023-12-04.

Broad class definitions

For gene and transcript annotations, a broadClass column is added, which generalizes the gene types into a smaller number of semantically-meaningful groups:

  • coding.

  • noncoding.

  • pseudo.

  • small.

  • decaying.

  • ig (immunoglobulin).

  • tcr (T cell receptor).

  • other.

GRCh37 (hg19) legacy annotations

makeGRangesFromEnsembl() supports the legacy Homo sapiens GRCh37 (release 75) build by internally querying the EnsDb.Hsapiens.v75 package. Alternatively, the corresponding GTF/GFF file can be loaded directly from GENCODE or Ensembl.

Examples

## Get annotations from Ensembl via AnnotationHub query.
genes <- makeGRangesFromEnsembl(
    organism = "Homo sapiens",
    level = "genes"
)
#> → Making <GRanges> from Ensembl.
#> → Getting <EnsDb> from AnnotationHub 3.10.0 (2023-10-20).
#>  "AH113665": Ensembl 110 EnsDb for Homo sapiens.
#> → Making <GRanges> from <EnsDb>.
#> Organism: Homo sapiens
#> Genome build: GRCh38
#> Release: 110
#> Level: genes
#> → Downloading extra gene-level metadata from Ensembl.
#> → Importing /Users/mike/.cache/R/AcidGenomes/BiocFileCache/89e57efaa427_gene.txt.gz using base::`read.table()`.
#> → Importing /Users/mike/.cache/R/AcidGenomes/BiocFileCache/89e5c7338bf_external_synonym.txt.gz using base::`read.table()`.
#> → Importing /Users/mike/.cache/R/AcidGenomes/BiocFileCache/89e534a0ef45_Homo_sapiens.GRCh38.110.entrez.tsv.gz using base::`read.table()`.
#> → Defining names by `geneId` column in `mcols()`.
summary(genes)
#> [1] "EnsemblGenes object with 71440 ranges and 11 metadata columns"
transcripts <- makeGRangesFromEnsembl(
    organism = "Homo sapiens",
    level = "transcripts"
)
#> → Making <GRanges> from Ensembl.
#> → Getting <EnsDb> from AnnotationHub 3.10.0 (2023-10-20).
#>  "AH113665": Ensembl 110 EnsDb for Homo sapiens.
#> → Making <GRanges> from <EnsDb>.
#> Organism: Homo sapiens
#> Genome build: GRCh38
#> Release: 110
#> Level: transcripts
#> → Downloading extra gene-level metadata from Ensembl.
#> → Importing /Users/mike/.cache/R/AcidGenomes/BiocFileCache/89e57efaa427_gene.txt.gz using base::`read.table()`.
#> → Importing /Users/mike/.cache/R/AcidGenomes/BiocFileCache/89e5c7338bf_external_synonym.txt.gz using base::`read.table()`.
#> → Importing /Users/mike/.cache/R/AcidGenomes/BiocFileCache/89e534a0ef45_Homo_sapiens.GRCh38.110.entrez.tsv.gz using base::`read.table()`.
#> → Defining names by `txId` column in `mcols()`.
summary(transcripts)
#> [1] "EnsemblTranscripts object with 278545 ranges and 24 metadata columns"

## Get annotations from specific EnsDb object or package.
## > if (goalie::isInstalled("EnsDb.Hsapiens.v75")) {
## >     genes <- makeGRangesFromEnsDb(
## >         object = "EnsDb.Hsapiens.v75",
## >         level = "genes"
## >     )
## >     summary(genes)
## > }