GWAS‑to‑CRISPR: streamlined extraction of significant GWAS SNPs, metadata aggregation and optional FASTA/BED/CSV export for downstream CRISPR design (GRCh38/hg38).
Genome‑wide association studies (GWAS) link traits
to genetic variants, but raw summary statistics are not directly usable
for guide design. gwas2crispr
bridges this gap. It
retrieves significant single‑nucleotide polymorphisms
(SNPs) for a given Experimental Factor
Ontology (EFO) trait, annotates them with gene
and study metadata, and returns in‑memory summaries. When requested, it
also writes ready‑to‑use CSV, BED and
FASTA files for high‑throughput CRISPR target design.
All genomic coordinates are mapped to GRCh38/hg38.
fetch_gwas(efo_id, p_cut = 5e-8)
: fetches significant
associations for an EFO trait via gwasrapidd
with a REST
API fallback.run_gwas2crispr(efo_id, p_cut = 5e-8, flank_bp = 200, out_prefix = NULL)
:
end‑to‑end pipeline that calls fetch_gwas()
, aggregates
variant/gene/study metadata, and returns an object with summaries. If
you provide out_prefix
, it will also write
CSV
, BED
and optional FASTA
files.CRAN‑safe examples: the package does not write files by default. Examples that perform network operations or file writing are wrapped in
\donttest{}
. When you supplyout_prefix
, outputs are written only to paths you specify — in documentation we usetempdir()
.
gwas2crispr
.httr
, dplyr
,
purrr
, readr
, tibble
,
tidyr
, methods
, utils
(pulled
automatically)Biostrings
, BSgenome.Hsapiens.UCSC.hg38
If
these are missing, CSV/BED are still produced; FASTA is skipped
gracefully.optparse
install.packages("gwasrapidd")
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
::install(c("Biostrings", "BSgenome.Hsapiens.UCSC.hg38")) BiocManager
Until the package is on CRAN, install the development version directly:
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
::install_github("leopard0ly/gwas2crispr") devtools
After CRAN release you will be able to run:
install.packages("gwas2crispr")
Use a clear prefix and write outputs (CSV/BED/FASTA) to your current working directory:
library(gwas2crispr)
run_gwas2crispr(
efo_id = "EFO_0000707", # lung disease (example)
p_cut = 1e-6,
flank_bp = 300,
out_prefix = "lung" # produces: lung_snps_full.csv / lung_snps_hg38.bed / lung_snps_flank300.fa
)
Outputs
lung_snps_full.csv
— harmonised SNP metadata from the
GWAS Catalog (GRCh38).lung_snps_hg38.bed
— intervals suitable for genomic
intersection.lung_snps_flank300.fa
— sequences for CRISPR guide
design (requires Biostrings
+
BSgenome.Hsapiens.UCSC.hg38
).library(gwas2crispr)
<- run_gwas2crispr(
res efo_id = "EFO_0001663", # Prostate cancer
p_cut = 5e-8,
flank_bp = 200,
out_prefix = NULL # <- no writing; returns objects only
)
$summary # one‑row tibble: n_SNPs, SNPs_w_gene, unique_genes, n_studies
res$chr_freq # table of chromosomes by SNP count res
<- file.path(tempdir(), "prostate") # CRAN‑friendly
out <- run_gwas2crispr(
res efo_id = "EFO_0001663",
p_cut = 5e-8,
flank_bp = 200,
out_prefix = out
)
$csv # path to <prefix>_snps_full.csv
res$bed # path to <prefix>_snps_hg38.bed
res$fasta # path to <prefix>_snps_flank<bp>.fa (only if BSgenome installed) res
Output file names
<prefix>_snps_full.csv
— unified metadata
table<prefix>_snps_hg38.bed
— BED intervals<prefix>_snps_flank<bp>.fa
— FASTA
sequences (requires BSgenome.Hsapiens.UCSC.hg38
)A portable Rscript is installed in the package under
inst/scripts/gwas2crispr.R
. Use it to run the pipeline from
the shell. The script relies on the optparse
package;
install it if missing.
Version-agnostic (recommended):
Rscript -e "cat(system.file('scripts','gwas2crispr.R', package='gwas2crispr'))" ^
| Rscript -- -e EFO_0001663 -p 5e-8 -f 200 -o "%CD%\prostate"
Fixed output (current folder):
"C:\Program Files\R\R-4.4.1\bin\Rscript.exe" ^
"C:\Users\ZAD ECT\AppData\Local\R\win-library\4.4\gwas2crispr\scripts\gwas2crispr.R" ^
-e EFO_0001663 -p 5e-8 -f 200 -o "%CD%\prostate"
Temporary output (system temp):
Rscript -e "cat(system.file('scripts','gwas2crispr.R', package='gwas2crispr'))" ^
| Rscript -- -e EFO_0001663 -p 5e-8 -f 200 -o "%TEMP%\prostate"
Fixed output (current folder):
Rscript "$(Rscript -e 'cat(system.file("scripts","gwas2crispr.R", package="gwas2crispr"))')" -e EFO_0001663 -p 5e-8 -f 200 -o "$PWD/prostate"
Temporary output (system temp):
Rscript "$(Rscript -e 'cat(system.file("scripts","gwas2crispr.R", package="gwas2crispr"))')" -e EFO_0001663 -p 5e-8 -f 200 -o "$(mktemp -d)/prostate"
-e, --efo
(required) — EFO trait ID,
e.g. EFO_0001663
-p, --pthresh
— P‑value cut‑off (default
5e-8
)-f, --flank
— number of flanking bases for FASTA
(default 200
)-o, --out
— output file prefix (optional; omit to
run object‑only without writing files)-v, --verbose
— print progress messages and, when
--out
is omitted, a concise summaryIf you omit the -o/--out
option, no files are written.
Use -v/--verbose
to emit a concise summary of the run.
fetch_gwas(efo_id, p_cut = 5e-8)
Fetch significant associations for an EFO trait. Tries
gwasrapidd::get_associations()
first; if no rows or an
error is returned, falls back to the EBI GWAS REST API.
"associations"
with slots associations
and
risk_alleles
(compatible with
gwasrapidd
).run_gwas2crispr(efo_id, p_cut = 5e-8, flank_bp = 200, out_prefix = NULL)
Runs the full pipeline: fetches GWAS data, merges gene and study
annotations, and returns a list with summary
and
chr_freq
. When out_prefix
is provided, the
list also contains file paths to the written csv
,
bed
and optional fasta
files.
BSgenome.Hsapiens.UCSC.hg38
for FASTA export)summary
, chr_freq
and, if writing,
csv
, bed
, fasta
paths.Large outputs are not bundled in the package
tarball; they are excluded via .Rbuildignore
.
Small example files (if needed) should live under
inst/extdata/
and can be accessed with:
system.file("extdata", "your_example.csv", package = "gwas2crispr")
Automated tests live in tests/testthat/
and avoid
network calls on CRAN via skip_on_cran()
. To run the test
suite locally:
::test() devtools
BSgenome.Hsapiens.UCSC.hg38
is not installed, the FASTA
step is skipped gracefully; CSV and BED files will still be
produced.gwasrapidd
must be installed to ensure smooth data retrieval from the GWAS
Catalog.message()
so that you can
silence it with suppressMessages()
when running scripts or
examples.Please cite gwas2crispr
and the resources it builds
upon. To see the formatted citation:
citation("gwas2crispr")
Additional background: Sudlow et al. (2015) UK Biobank: An open access resource for identifying the causes of a wide range of complex diseases of middle and old age doi:10.1093/nar/gkv1256.
MIT © Othman S. I. Mohammed — see the LICENSE
file for details.
This package builds upon gwasrapidd and the EBI GWAS REST API. Sequence handling and genome data are powered by Biostrings and BSgenome.