| Title: | A Collection of Proteome Panels and Meta-Data |
| Version: | 0.5 |
| Date: | 2025-3-5 |
| Description: | It aggregates protein panel data and metadata for protein quantitative trait locus (pQTL) analysis using 'pQTLtools' (https://jinghuazhao.github.io/pQTLtools/). The package includes data from affinity-based panels such as 'Olink' (https://olink.com/) and 'SomaScan' (https://somalogic.com/), as well as mass spectrometry-based panels from 'CellCarta' (https://cellcarta.com/) and 'Seer' (https://seer.bio/). The metadata encompasses updated annotations and publication details. |
| License: | MIT + file LICENSE |
| URL: | https://jinghuazhao.github.io/pQTLdata/, https://jinghuazhao.github.io/pQTLdata/ |
| Depends: | R (≥ 3.5.0) |
| Imports: | knitr, Rdpack |
| RdMacros: | Rdpack |
| Suggests: | dplyr, grid, EnsDb.Hsapiens.v75, ensembldb, IRanges, org.Hs.eg.db, S4Vectors, VennDiagram |
| VignetteBuilder: | knitr |
| LazyData: | Yes |
| LazyLoad: | Yes |
| LazyDataCompression: | xz |
| NeedsCompilation: | no |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| Packaged: | 2025-03-05 16:14:38 UTC; jhz22 |
| Author: | Jing Hua Zhao |
| Maintainer: | Jing Hua Zhao <jinghuazhao@hotmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-03-07 11:30:02 UTC |
A summary of datasets
Description
It aggregates protein panel data and metadata for protein quantitative trait locus (pQTL) analysis using 'pQTLtools' (https://jinghuazhao.github.io/pQTLtools/). The package includes data from affinity-based panels such as 'Olink' (https://olink.com/) and 'SomaScan' (https://somalogic.com/), as well as mass spectrometry-based panels from 'CellCarta' (https://cellcarta.com/) and 'Seer' (https://seer.bio/). The metadata encompasses updated annotations and publication details.
Details
Available data are listed in the following table.
| Objects | Description |
| Datasets | |
caprion | Caprion panel |
inf1 | Olink/INF panel |
Olink_Explore_1536 | Olink/NGS 1472 panels |
Olink_Explore_3072 | Olink/Explore 3072 panels |
Olink_Explore_HT | Olink/Explore HT panels |
Olink_Target_96 | Olink/Target 96 panels |
Olink_qPCR | Olink/qPCR panels |
SomaScan160410 | SomaScan panel |
SomaScanV4.1 | SomaScan v4.1 panel |
SomaScan11k | SomaScan 11k panel |
scallop_inf1 | SCALLOP/INF meta-analysis results |
seer1980 | ST1 from Suhre et al. (2024) bioRxiv |
swath_ms | SWATH-MS panel |
| Installations | |
| EndNote/ | Proteogenomics references |
| Olink/ | Olink-COVID analysis by MGH |
Some generic description for the datasets are as follows.
chr Chromosome.
start Start position.
end End position.
gene Gene name.
UniProt UniProt ID.
Usage
Vignettes on package usage:
An Overview of pQTLdata.
vignette("pQTLdata").
Author(s)
Jing Hua Zhao in collaboration with other colleagues.
See Also
Useful links:
Examples
# Olink-SomaScan panel overlap
p <- list(setdiff(inf1$uniprot,"P23560"),
setdiff(SomaScan160410$UniProt[!is.na(SomaScan160410$UniProt)],"P23560"))
cnames <- c("INF1","SomaScan")
os <- VennDiagram::venn.diagram(x = p, category.names=cnames, filename=NULL,
disable.logging = TRUE,height=8,width=8,units="in")
grid::grid.newpage()
grid::grid.draw(os)
m <- merge(inf1,SomaScan160410,by.x="uniprot",by.y="UniProt")
u <- setdiff(with(m,unique(uniprot)),"P23560")
o <- subset(inf1,uniprot %in% u)
dim(o)
vars <- c("UniProt","chr","start","end","extGene","Target","TargetFullName")
s <- subset(SomaScan160410[vars], UniProt %in% u)
dim(s)
us <- s[!duplicated(s),]
dim(us)
us
Olink/Explore 1536 panel
Description
Information based on pilot studies
Usage
Olink_Explore_1536
Format
A data frame with 1,472 rows and 3 variables:
UniProtUniProt id
AssayExperimental assay
PanelOlink panel
Details
Curated from R.
Olink/Explore 3072 panels
Description
Information on all qPCR panels
Usage
Olink_Explore_3072
Format
A data frame with 2,945 rows and 4 variables:
UniProt.IDUniProt id
Protein.nameProtein name
Gene.nameGene name
Explore.384.panelExplore 384 panel
Details
Curated from Excel.
Olink/Explore HT panels
Description
Information on all qPCR panels
Usage
Olink_Explore_HT
Format
A data frame with 5,416 rows and 4 variables:
Olink.IDOlink id
UniProt.IDUniProt id
Protein.nameProtein name
Gene.nameGene name
Details
Curated from Excel.
Olink/Target 96 panels
Description
Information on all Target 96 panels. Individual panels are also available from the companion xlsx in the Olink/ directory.
Usage
Olink_Target_96
Format
A data frame with 1,116 rows and 3 variables:
UniProtUniProt id
ProteinProtein
PanelPanel
Details
Curated from Excel.
Olink/qPCR panels
Description
Information on all qPCR panels
Usage
Olink_qPCR
Format
A data frame with 1,112 rows and 7 variables:
UniProtUniProt id
PanelPanels
TargetProtein
geneHGNC symbol
chrChromosome
startstart
endend
Details
Curated from Excel.
SomaScan 11k
Description
This is also the latest panel
Usage
SomaScan11k
Format
A data frame with 10,776 rows and 5 variables:
Sequence.IDSequence ID
Full.NameFull name
Target.NameTarget name
UniProt.IDUniProt ID
Entrez.Gene.NameEntrez gene name
Details
curated from SomaLogic website.
Source
https://somalogic.com/somascan-11k-assay/
Somascan panel
Description
This is based on panel used in Sun et al. (2018).
Usage
SomaScan160410
Format
A data frame with 5,178 rows and 10 variables:
SOMAMER_IDSomamer id
UniProtUniProt id
TargetProtein target
TargetFullNameProtein target full name
chrchromosome (1-22,X,Y)
startstart
endend
entGeneentrez gene
ensGeneENSEMBL gene
extGeneexternal gene
Details
from the INTERVAL study.
References
Sun BB, Maranville JC, Peters JE, Stacey D, Staley JR, Blackshaw J, Burgess S, Jiang T, Paige E, Surendran P, Oliver-Williams C, Kamat MA, Prins BP, Wilcox SK, Zimmerman ES, Chi A, Bansal N, Spain SL, Wood AM, Morrell NW, Bradley JR, Janjic N, Roberts DJ, Ouwehand WH, Todd JA, Soranzo N, Suhre K, Paul DS, Fox CS, Plenge RM, Danesh J, Runz H, Butterworth AS (2018). “Genomic atlas of the human plasma proteome.” Nature, 558(7708), 73-79. ISSN 1476-4687 (Electronic) 0028-0836 (Linking), doi:10.1038/s41586-018-0175-2.
SomaScan v4.1
Description
This is the 7k panel
Usage
SomaScanV4.1
Format
A data frame with 7,288 rows and 6 variables:
#A serial number
SeqIDSeqID
Human.Target.or.AnalyteHuman target/analyte
UniProt.IDUniProt id
GeneIDHGNC symbol
Type"Protein"
Details
obtained directly from SomaLogic.
Caprion panel
Description
Information based on Caprion pilot studies
Usage
caprion
Format
A data frame with 987 rows and 12 variables:
GeneHGNC symbols simplified in four instances
Gene.origHGNC symbol
ProteinProtein name as in UniProt
AccessionUniProt id
Protein.DescriptionDetailed information on protein
GO.Cellular.ComponentGO Ceullular component
GO.FunctionGO function
GO.ProcessGO process
ensGenesEnsembl genes
chromchromosome
chrchromosome
startsstart positions
endsend positions
startminimum start
endmaximum end
Details
See the Caprion repository involving its use.
Olink/INF1 panel
Description
The panel is based on SCALLOP-INF Zhao et al. (2023).
Usage
inf1
Format
A data frame with 92 rows and 9 variables:
uniprotUniProt id
protProtein
targetProtein target name
target.shortProtein target short name
geneHGNC symbol
chrchromosome (1-13,16-17,19-22)
startstart
endend
chromosomeupdated chromosomes
start38start position under build 38
end38end position under build 38
ensGeneEnsembl gene name
ensembl_gene_idENSEMBL gene
alt_namerecent name from www.uniprot.org
Details
Assembled for SCALLOP-INF
References
Zhao JH, Stacey D, Eriksson N, Macdonald-Dunlop E, Hedman ÅK, Kalnapenkis A, Enroth S, Cozzetto D, Digby-Bell J, Marten J, Folkersen L, Herder C, Jonsson L, Bergen SE, Gieger C, Needham EJ, Surendran P, Team EBR, Paul DS, Polasek O, Thorand B, Grallert H, Roden M, Võsa U, Esko T, Hayward C, Johansson Å, Gyllensten U, Powell N, Hansson O, Mattsson-Carlgren N, Joshi PK, Danesh J, Padyukov L, Klareskog L, Landén M, Wilson JF, Siegbahn A, Wallentin L, Mälarstig A, Butterworth AS, Peters JE (2023). “Genetics of circulating inflammatory proteins identifies drivers of immune-mediated disease risk and therapeutic targets.” Nature Immunology, 24(9), 1540-1551. doi:10.1038/s41590-023-01588-w.
Supplementary table 3
Description
Supplementary information for Zhao et al. (2023).
Usage
scallop_inf1
Format
A data frame with 180 rows and 19 variables:
- UniProt
UnitProt ID
- Protein
Protein name
- Protein_gene_symbol
Gene symbol
- Chromosome
Chromosome
- Position
Position
- cistrans
cis/trans
- rsid
reference sequence ID
- Effect_allele
Effect allele
- Other_allele
Eeference allele
- EAF
Effect allele frequency
- b
b
- SE
SE
- log10P
log10(P)
- Direction
Direction field in METAL output
- HetISq
I
^2- HetChiSq
Heterogeneity chi-square
- HetDf
degrees of freedom
- logHetP
Heterogeneity log10(P)
- N
N
References
Zhao JH, Stacey D, Eriksson N, Macdonald-Dunlop E, Hedman ÅK, Kalnapenkis A, Enroth S, Cozzetto D, Digby-Bell J, Marten J, Folkersen L, Herder C, Jonsson L, Bergen SE, Gieger C, Needham EJ, Surendran P, Team EBR, Paul DS, Polasek O, Thorand B, Grallert H, Roden M, Võsa U, Esko T, Hayward C, Johansson Å, Gyllensten U, Powell N, Hansson O, Mattsson-Carlgren N, Joshi PK, Danesh J, Padyukov L, Klareskog L, Landén M, Wilson JF, Siegbahn A, Wallentin L, Mälarstig A, Butterworth AS, Peters JE (2023). “Genetics of circulating inflammatory proteins identifies drivers of immune-mediated disease risk and therapeutic targets.” Nature Immunology, 24(9), 1540-1551. doi:10.1038/s41590-023-01588-w.
Seer 1980 panel
Description
ST1 from Suhre et al. (2024).
Usage
seer1980
Format
A data frame with 1,980 rows:
PID.NPPID.NP
protein_idsprotein_ids
protein_namesprotein_names
mapped.UniProtIDmapped.UniProtID
mapped_gene_idmapped_gene_id
gene_namegene_name
descriptiondescription
chrchr
startstart
endend
Details
As above.
References
Suhre K, Chen Q, Halama A, Mendez K, Dahlin A, Stephan N, Thareja G, Sarwath H, Guturu H, Dwaraka VB, Batzoglou S, Schmidt F, Lasky-Su JA (2024). “A genome-wide association study of mass spectrometry proteomics using the Seer Proteograph platform.” BioRxiv. doi:10.1101/2024.05.27.596028.
SWATH-MS panel
Description
Curated during INTERVAL pilot study.
Usage
swath_ms
Format
A data frame with 684 rows and 5 variables:
AccessionUniProt id
accListList of UniProt ids
uniprotNameProtein
ensGeneENSEMBL gene
geneNameHGNC symbol
Details
As above.