Skip to contents

Make a GeneToSymbol object

Usage

makeGeneToSymbolFromEnsembl(
  organism,
  genomeBuild = NULL,
  release = NULL,
  ignoreVersion = FALSE,
  format = c("makeUnique", "1:1", "unmodified")
)

makeGeneToSymbolFromEnsDb(
  object,
  ignoreVersion = FALSE,
  format = c("makeUnique", "1:1", "unmodified")
)

makeGeneToSymbolFromGff(
  file,
  ignoreVersion = FALSE,
  format = c("makeUnique", "1:1", "unmodified")
)

Arguments

organism

character(1). Full Latin organism name (e.g. "Homo sapiens").

genomeBuild

character(1). Ensembl genome build assembly name (e.g. "GRCh38"). If set NULL, defaults to the most recent build available. Note: don't pass in UCSC build IDs (e.g. "hg38").

release

integer(1). Ensembl release version (e.g. 100). We recommend setting this value if possible, for improved reproducibility. When left unset, the latest release available via AnnotationHub/ensembldb is used. Note that the latest version available can vary, depending on the versions of AnnotationHub and ensembldb in use.

ignoreVersion

logical(1). Ignore identifier (e.g. transcript, gene) versions. When applicable, the identifier containing version numbers will be stored in txIdVersion and geneIdVersion, and the variants without versions will be stored in txId, txIdNoVersion, geneId, and geneIdNoVersion.

format

character(1). Formatting method to apply:

  • "makeUnique": Recommended. Apply make.unique to the geneName column. Gene names are made unique, while the identifiers remain unmodified. NA gene names will be renamed to "unannotated".

  • "1:1": For gene names that map to multiple gene identifiers, select only the first annotated gene identifier. Incomplete elements with NA gene name will be removed will be removed with an internal complete.cases call.

  • "unmodified": Return geneId and geneName columns unmodified, in long format. Incomplete elements with NA gene name will be removed with an internal complete.cases call.

object

Object.

file

character(1). File path.

Value

GeneToSymbol.

Functions

  • makeGeneToSymbolFromEnsembl(): Make a GeneToSymbol object from Ensembl using an AnnotationHub lookup.

  • makeGeneToSymbolFromEnsDb(): Make a GeneToSymbol object from an EnsDb object or annotation package.

  • makeGeneToSymbolFromGff(): Make a GeneToSymbol object from a GFF file.

Note

Updated 2021-08-03.

GFF/GTF file

Remote URLs and compressed files are supported.

Examples

## makeGeneToSymbolFromEnsembl ====
x <- makeGeneToSymbolFromEnsembl(
    organism = "Homo sapiens",
    ignoreVersion = FALSE
)
#> → Making <GRanges> from Ensembl.
#> → Getting <EnsDb> from AnnotationHub 3.10.0 (2023-10-20).
#>  "AH113665": Ensembl 110 EnsDb for Homo sapiens.
#> → Making <GRanges> from <EnsDb>.
#> Organism: Homo sapiens
#> Genome build: GRCh38
#> Release: 110
#> Level: genes
#> → Downloading extra gene-level metadata from Ensembl.
#> → Importing /Users/mike/.cache/R/AcidGenomes/BiocFileCache/89e57efaa427_gene.txt.gz using base::`read.table()`.
#> → Importing /Users/mike/.cache/R/AcidGenomes/BiocFileCache/89e5c7338bf_external_synonym.txt.gz using base::`read.table()`.
#> → Importing /Users/mike/.cache/R/AcidGenomes/BiocFileCache/89e534a0ef45_Homo_sapiens.GRCh38.110.entrez.tsv.gz using base::`read.table()`.
#> → Defining names by `geneId` column in `mcols()`.
#>  4414 non-unique gene symbols detected: "5S_rRNA", "5_8S_rRNA", "7SK", "A2M", "A2MP1", "A4GALT", "AAAS", "AACSP1", "AADACL2", "AADACL2-AS1"....
print(x)
#> GeneToSymbol with 71440 rows and 2 columns
#>                                geneId    geneName
#>                           <character> <character>
#> ENSG00000000003.16 ENSG00000000003.16      TSPAN6
#> ENSG00000000005.6   ENSG00000000005.6        TNMD
#> ENSG00000000419.14 ENSG00000000419.14        DPM1
#> ENSG00000000457.14 ENSG00000000457.14       SCYL3
#> ENSG00000000460.17 ENSG00000000460.17    C1orf112
#> ...                               ...         ...
#> LRG_995.1                   LRG_995.1     FUBP1.1
#> LRG_996.1                   LRG_996.1     ERBB3.1
#> LRG_997.1                   LRG_997.1      ROS1.1
#> LRG_998.1                   LRG_998.1     CCND3.1
#> LRG_999.1                   LRG_999.1       CIC.1

## makeGeneToSymbolFromEnsDb ====
## > if (goalie::isInstalled("EnsDb.Hsapiens.v75")) {
## >     x <- makeGeneToSymbolFromEnsDb("EnsDb.Hsapiens.v75")
## >     print(x)
## > }

## makeGeneToSymbolFromGff ====
## > file <- AcidBase::pasteUrl(
## >     "ftp.ensembl.org",
## >     "pub",
## >     "release-102",
## >     "gtf",
## >     "homo_sapiens",
## >     "Homo_sapiens.GRCh38.102.gtf.gz",
## >     protocol = "ftp"
## > )
## > x <- makeGeneToSymbolFromGff(
## >     file = file,
## >     ignoreVersion = FALSE
## > )
## > print(x)