Make a GeneToSymbol object
Usage
makeGeneToSymbolFromEnsembl(
organism,
genomeBuild = NULL,
release = NULL,
ignoreVersion = FALSE,
format = c("makeUnique", "1:1", "unmodified")
)
makeGeneToSymbolFromEnsDb(
object,
ignoreVersion = FALSE,
format = c("makeUnique", "1:1", "unmodified")
)
makeGeneToSymbolFromGff(
file,
ignoreVersion = FALSE,
format = c("makeUnique", "1:1", "unmodified")
)
Arguments
- organism
character(1)
. Full Latin organism name (e.g."Homo sapiens"
).- genomeBuild
character(1)
. Ensembl genome build assembly name (e.g."GRCh38"
). If setNULL
, defaults to the most recent build available. Note: don't pass in UCSC build IDs (e.g."hg38"
).- release
integer(1)
. Ensembl release version (e.g.100
). We recommend setting this value if possible, for improved reproducibility. When left unset, the latest release available via AnnotationHub/ensembldb is used. Note that the latest version available can vary, depending on the versions of AnnotationHub and ensembldb in use.- ignoreVersion
logical(1)
. Ignore identifier (e.g. transcript, gene) versions. When applicable, the identifier containing version numbers will be stored intxIdVersion
andgeneIdVersion
, and the variants without versions will be stored intxId
,txIdNoVersion
,geneId
, andgeneIdNoVersion
.- format
character(1)
. Formatting method to apply:"makeUnique"
: Recommended. Applymake.unique
to thegeneName
column. Gene names are made unique, while the identifiers remain unmodified.NA
gene names will be renamed to"unannotated"
."1:1"
: For gene names that map to multiple gene identifiers, select only the first annotated gene identifier. Incomplete elements withNA
gene name will be removed will be removed with an internalcomplete.cases
call."unmodified"
: ReturngeneId
andgeneName
columns unmodified, in long format. Incomplete elements withNA
gene name will be removed with an internalcomplete.cases
call.
- object
Object.
- file
character(1)
. File path.
Functions
makeGeneToSymbolFromEnsembl()
: Make aGeneToSymbol
object from Ensembl using an AnnotationHub lookup.makeGeneToSymbolFromEnsDb()
: Make aGeneToSymbol
object from anEnsDb
object or annotation package.makeGeneToSymbolFromGff()
: Make aGeneToSymbol
object from a GFF file.
Examples
## makeGeneToSymbolFromEnsembl ====
x <- makeGeneToSymbolFromEnsembl(
organism = "Homo sapiens",
ignoreVersion = FALSE
)
#> → Making <GRanges> from Ensembl.
#> → Getting <EnsDb> from AnnotationHub 3.10.0 (2023-10-20).
#> ℹ "AH113665": Ensembl 110 EnsDb for Homo sapiens.
#> → Making <GRanges> from <EnsDb>.
#> Organism: Homo sapiens
#> Genome build: GRCh38
#> Release: 110
#> Level: genes
#> → Downloading extra gene-level metadata from Ensembl.
#> → Importing /Users/mike/.cache/R/AcidGenomes/BiocFileCache/89e57efaa427_gene.txt.gz using base::`read.table()`.
#> → Importing /Users/mike/.cache/R/AcidGenomes/BiocFileCache/89e5c7338bf_external_synonym.txt.gz using base::`read.table()`.
#> → Importing /Users/mike/.cache/R/AcidGenomes/BiocFileCache/89e534a0ef45_Homo_sapiens.GRCh38.110.entrez.tsv.gz using base::`read.table()`.
#> → Defining names by `geneId` column in `mcols()`.
#> ℹ 4414 non-unique gene symbols detected: "5S_rRNA", "5_8S_rRNA", "7SK", "A2M", "A2MP1", "A4GALT", "AAAS", "AACSP1", "AADACL2", "AADACL2-AS1"....
print(x)
#> GeneToSymbol with 71440 rows and 2 columns
#> geneId geneName
#> <character> <character>
#> ENSG00000000003.16 ENSG00000000003.16 TSPAN6
#> ENSG00000000005.6 ENSG00000000005.6 TNMD
#> ENSG00000000419.14 ENSG00000000419.14 DPM1
#> ENSG00000000457.14 ENSG00000000457.14 SCYL3
#> ENSG00000000460.17 ENSG00000000460.17 C1orf112
#> ... ... ...
#> LRG_995.1 LRG_995.1 FUBP1.1
#> LRG_996.1 LRG_996.1 ERBB3.1
#> LRG_997.1 LRG_997.1 ROS1.1
#> LRG_998.1 LRG_998.1 CCND3.1
#> LRG_999.1 LRG_999.1 CIC.1
## makeGeneToSymbolFromEnsDb ====
## > if (goalie::isInstalled("EnsDb.Hsapiens.v75")) {
## > x <- makeGeneToSymbolFromEnsDb("EnsDb.Hsapiens.v75")
## > print(x)
## > }
## makeGeneToSymbolFromGff ====
## > file <- AcidBase::pasteUrl(
## > "ftp.ensembl.org",
## > "pub",
## > "release-102",
## > "gtf",
## > "homo_sapiens",
## > "Homo_sapiens.GRCh38.102.gtf.gz",
## > protocol = "ftp"
## > )
## > x <- makeGeneToSymbolFromGff(
## > file = file,
## > ignoreVersion = FALSE
## > )
## > print(x)