Make a Gene2Symbol object

makeGene2SymbolFromEnsembl(
  organism,
  genomeBuild = NULL,
  release = NULL,
  ignoreVersion = TRUE,
  format = c("makeUnique", "unmodified", "1:1")
)

makeGene2SymbolFromEnsDb(
  object,
  ignoreVersion = TRUE,
  format = c("makeUnique", "unmodified", "1:1")
)

makeGene2SymbolFromGFF(
  file,
  ignoreVersion = TRUE,
  format = c("makeUnique", "unmodified", "1:1")
)

Arguments

organism

character(1). Full Latin organism name (e.g. "Homo sapiens").

genomeBuild

character(1). Ensembl genome build assembly name (e.g. "GRCh38"). If set NULL, defaults to the most recent build available. Note: don't pass in UCSC build IDs (e.g. "hg38").

release

integer(1). Ensembl release version (e.g. 90). We recommend setting this value if possible, for improved reproducibility. When left unset, the latest release available via AnnotationHub/ensembldb is used. Note that the latest version available can vary, depending on the versions of AnnotationHub and ensembldb in use.

ignoreVersion

logical(1). Ignore identifier (e.g. transcript, gene) versions. When applicable, the identifier containing version numbers will be stored in txIdVersion and geneIdVersion, and the variants without versions will be stored in txId, txIdNoVersion, geneId, and geneIdNoVersion.

format

character(1). Formatting method to apply:

  • "makeUnique": Recommended. Apply make.unique() to the geneName column. Gene symbols are made unique, while the gene identifiers remain unmodified.

  • "unmodified": Return geneId and geneName columns unmodified, in long format.

  • "1:1": For gene symbols that map to multiple gene identifiers, select

object

Object.

file

character(1). File path.

Value

Gene2Symbol.

Functions

  • makeGene2SymbolFromEnsembl: Make a Gene2Symbol object from Ensembl using an AnnotationHub lookup.

  • makeGene2SymbolFromEnsDb: Make a Gene2Symbol object from an EnsDb object or annotation package.

  • makeGene2SymbolFromGFF: Make a Gene2Symbol object from a GFF file.

Note

Updated 2021-03-10.

GFF/GTF file

Remote URLs and compressed files are supported.

Examples

## makeGene2SymbolFromEnsembl ==== x <- makeGene2SymbolFromEnsembl( organism = "Homo sapiens", ignoreVersion = FALSE )
#> → Making `GRanges` from Ensembl.
#> → Getting `EnsDb` from AnnotationHub 2.22.0 (2020-10-27).
#> AH89426: Ensembl 103 EnsDb for Homo sapiens.
#> → Making `GRanges` from `EnsDb`.
#> Organism: Homo sapiens
#> Genome build: GRCh38
#> Release: 103
#> Level: genes
#> → Defining names by `geneId` column in `mcols`.
#> 3220 non-unique gene symbols detected.
print(x)
#> Gene2Symbol with 67992 rows and 2 columns #> geneId geneName #> <character> <character> #> ENSG00000228572.7 ENSG00000228572.7 AL954722.1 #> ENSG00000182378.15 ENSG00000182378.15 PLCXD1 #> ENSG00000226179.6 ENSG00000226179.6 LINC00685 #> ENSG00000281849.3 ENSG00000281849.3 AL732314.6 #> ENSG00000280767.3 ENSG00000280767.3 AL732314.4 #> ... ... ... #> LRG_584.1 LRG_584.1 CYP2C19.1 #> LRG_721.1 LRG_721.1 AKT1.1 #> LRG_741.1 LRG_741.1 ALMS1.1 #> LRG_763.1 LRG_763.1 HTT.1 #> LRG_93.1 LRG_93.1 ORAI1.1
## makeTx2GeneFromEnsDb ==== if ("EnsDb.Hsapiens.v75" %in% rownames(installed.packages())) { x <- makeGene2SymbolFromEnsDb("EnsDb.Hsapiens.v75") print(x) }
#> → Making `GRanges` from `EnsDb`.
#> Organism: Homo sapiens
#> Genome build: GRCh37
#> Release: 75
#> Level: genes
#> → Defining names by `geneId` column in `mcols`.
#> 3075 non-unique gene symbols detected.
#> Gene2Symbol with 64102 rows and 2 columns #> geneId geneName #> <character> <character> #> ENSG00000228572 ENSG00000228572 LL0YNC03-29C1.1 #> ENSG00000182378 ENSG00000182378 PLCXD1 #> ENSG00000226179 ENSG00000226179 LINC00685 #> ENSG00000185960 ENSG00000185960 SHOX #> ENSG00000237531 ENSG00000237531 RP11-309M23.1 #> ... ... ... #> LRG_187 LRG_187 LRG_187 #> LRG_239 LRG_239 LRG_239 #> LRG_311 LRG_311 LRG_311 #> LRG_415 LRG_415 LRG_415 #> LRG_93 LRG_93 LRG_93
## makeGene2SymbolFromGFF ==== file <- pasteURL( "ftp.ensembl.org", "pub", "release-102", "gtf", "homo_sapiens", "Homo_sapiens.GRCh38.102.gtf.gz", protocol = "ftp" ) x <- makeGene2SymbolFromGFF( file = file, ignoreVersion = FALSE )
#> → Making `GRanges` from GFF file (Homo_sapiens.GRCh38.102.gtf.gz).
#> → Getting GFF metadata for Homo_sapiens.GRCh38.102.gtf.gz.
#> → Importing 104a6592d9a95_Homo_sapiens.GRCh38.102.gtf.gz at /opt/koopa/opt/r/cache/AcidGenomes using rtracklayer::`import()`.
#> → Defining names by `geneId` column in `mcols`.
#> 67 non-unique gene symbols detected.
print(x)
#> Gene2Symbol with 60675 rows and 2 columns #> geneId geneName #> <character> <character> #> ENSG00000223972.5 ENSG00000223972.5 DDX11L1 #> ENSG00000243485.5 ENSG00000243485.5 MIR1302-2HG #> ENSG00000284332.1 ENSG00000284332.1 MIR1302-2 #> ENSG00000268020.3 ENSG00000268020.3 OR4G4P #> ENSG00000240361.2 ENSG00000240361.2 OR4G11P #> ... ... ... #> ENSG00000276017.1 ENSG00000276017.1 AC007325.1 #> ENSG00000278817.1 ENSG00000278817.1 AC007325.4 #> ENSG00000277196.4 ENSG00000277196.4 AC007325.2 #> ENSG00000278625.1 ENSG00000278625.1 U6.36 #> ENSG00000277374.1 ENSG00000277374.1 U1.5