Type: | Package |
Title: | Parallel and Memory-Efficient Ecological Diversity Metrics |
Version: | 1.0.0 |
Date: | 2025-08-18 |
Description: | Computes alpha and beta diversity metrics using concurrent 'C' threads. Metrics include 'UniFrac', Faith's phylogenetic diversity, Bray-Curtis dissimilarity, Shannon diversity index, and many others. Also parses newick trees into 'phylo' objects and rarefies feature tables. |
URL: | https://cmmr.github.io/ecodive/, https://github.com/cmmr/ecodive |
BugReports: | https://github.com/cmmr/ecodive/issues |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
Depends: | R (≥ 3.6.0) |
RoxygenNote: | 7.3.2 |
Config/Needs/website: | rmarkdown |
Config/testthat/edition: | 3 |
Imports: | parallel, utils |
Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) |
VignetteBuilder: | knitr |
NeedsCompilation: | yes |
Packaged: | 2025-08-18 22:41:52 UTC; Daniel |
Author: | Daniel P. Smith |
Maintainer: | Daniel P. Smith <dansmith01@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-08-22 17:40:15 UTC |
Bray-Curtis
Description
Bray-Curtis beta diversity metric.
Usage
bray_curtis(counts, weighted = TRUE, pairs = NULL, cpus = n_cpus())
Arguments
counts |
An OTU abundance matrix where each column is a sample, and
each row is an OTU. Any object coercible with |
weighted |
If |
pairs |
Which combinations of samples should distances be
calculated for? The default value ( |
cpus |
How many parallel processing threads should be used. The
default, |
Value
A dist
object.
Calculation
In the formulas below, x
and y
are two columns (samples) from counts
.
n
is the number of rows (OTUs) in counts
.
D = \displaystyle \frac{\sum_{i = 1}^{n} |x_i - y_i|}{\sum_{i = 1}^{n} (x_i + y_i)}
x <- c(4, 0, 3, 2, 6) y <- c(0, 8, 0, 0, 5) sum(abs(x-y)) / sum(x+y) #> 0.6428571
References
Sorenson T 1948. A method of establishing groups of equal amplitude in plant sociology based on similarity of species content. Kongelige Danske Videnskabernes Selskab, 5.
Bray JR and Curtis JT 1957. An ordination of the upland forest communities of southern Wisconsin. Ecological Monographs, 27(4). doi:10.2307/1942268
See Also
Other beta_diversity:
canberra()
,
euclidean()
,
generalized_unifrac()
,
gower()
,
jaccard()
,
kulczynski()
,
manhattan()
,
unweighted_unifrac()
,
variance_adjusted_unifrac()
,
weighted_normalized_unifrac()
,
weighted_unifrac()
Examples
# Example counts matrix
ex_counts
# Bray-Curtis weighted distance matrix
bray_curtis(ex_counts)
# Bray-Curtis unweighted distance matrix
bray_curtis(ex_counts, weighted = FALSE)
# Only calculate distances for A vs all.
bray_curtis(ex_counts, pairs = 1:3)
Canberra
Description
Canberra beta diversity metric.
Usage
canberra(counts, weighted = TRUE, pairs = NULL, cpus = n_cpus())
Arguments
counts |
An OTU abundance matrix where each column is a sample, and
each row is an OTU. Any object coercible with |
weighted |
If |
pairs |
Which combinations of samples should distances be
calculated for? The default value ( |
cpus |
How many parallel processing threads should be used. The
default, |
Value
A dist
object.
Calculation
In the formulas below, x
and y
are two columns (samples) from counts
.
n
is the number of rows (OTUs) in counts
.
OTUs must be removed if they are absent from both samples.
D = \displaystyle \frac{1}{n}\sum_{i = 1}^{n} \frac{|x_i - y_i|}{x_i + y_i}
x <- c(4, 0, 3, 0, 6)[-4] y <- c(0, 8, 0, 0, 5)[-4] sum(abs(x-y) / (x+y)) / length(x) #> 0.7727273
References
Lance GN and Williams WT 1967. A general theory of classificatory sorting strategies II. Clustering systems. The computer journal, 10(3). doi:10.1093/comjnl/10.3.271
See Also
Other beta_diversity:
bray_curtis()
,
euclidean()
,
generalized_unifrac()
,
gower()
,
jaccard()
,
kulczynski()
,
manhattan()
,
unweighted_unifrac()
,
variance_adjusted_unifrac()
,
weighted_normalized_unifrac()
,
weighted_unifrac()
Examples
# Example counts matrix
ex_counts
# Gower weighted distance matrix
canberra(ex_counts)
# Gower unweighted distance matrix
canberra(ex_counts, weighted = FALSE)
# Only calculate distances for A vs all.
canberra(ex_counts, pairs = 1:3)
Chao1
Description
Chao1 alpha diversity metric.
A non-parametric estimator of the number of unobserved species in a sample.
The Chao1 index estimates total species richness based on the number of
species that occur only once (singletons) and twice (doubletons) in the
sample.
Usage
chao1(counts, cpus = n_cpus())
Arguments
counts |
An OTU abundance matrix where each column is a sample, and
each row is an OTU. Any object coercible with |
cpus |
How many parallel processing threads should be used. The
default, |
Value
A numeric vector.
Calculation
Prerequisite: all counts are whole numbers.
In the formulas below, x
is a single column (sample) from counts
.
n
is the total number of non-zero OTUs, a
is the number of
singletons, and b
is the number of doubletons.
D = \displaystyle n + \frac{a^{2}}{2b}
x <- c(1, 0, 3, 2, 6) sum(x>0) + (sum(x==1) ^ 2) / (2 * sum(x==2)) #> 4.5
Note that when x
does not have any singletons or doubletons
(a = 0, b = 0
), the result will be NaN
. When x
has singletons
but no doubletons (a > 0, b = 0
), the result will be Inf
.
References
Chao A 1984. Non-parametric estimation of the number of classes in a population. Scandinavian Journal of Statistics, 11:265-270.
See Also
Other alpha_diversity:
faith()
,
inv_simpson()
,
shannon()
,
simpson()
Examples
# Example counts matrix
ex_counts
# Chao1 diversity values
chao1(ex_counts)
# Low diversity
chao1(c(100, 1, 1, 1, 1)) # Inf
# High diversity
chao1(c(20, 20, 20, 20, 20)) # NaN
# Low richness
chao1(1:3) # 3.5
# High richness
chao1(1:100) # 100.5
documentation
Description
documentation
Arguments
counts |
An OTU abundance matrix where each column is a sample, and
each row is an OTU. Any object coercible with |
weighted |
If |
normalized |
For weighted UniFrac only, normalize distances by the
total branch length. Options: |
alpha |
How much weight to give to relative abundances; a value
between 0 and 1, inclusive. Setting |
tree |
A |
pairs |
Which combinations of samples should distances be
calculated for? The default value ( |
cpus |
How many parallel processing threads should be used. The
default, |
Euclidean
Description
Euclidean beta diversity metric.
Usage
euclidean(counts, weighted = TRUE, pairs = NULL, cpus = n_cpus())
Arguments
counts |
An OTU abundance matrix where each column is a sample, and
each row is an OTU. Any object coercible with |
weighted |
If |
pairs |
Which combinations of samples should distances be
calculated for? The default value ( |
cpus |
How many parallel processing threads should be used. The
default, |
Value
A dist
object.
Calculation
In the formulas below, x
and y
are two columns (samples) from counts
.
n
is the number of rows (OTUs) in counts
.
D = \displaystyle \sqrt{\sum_{i = 1}^{n} (x_i - y_i)^{2}}
x <- c(4, 0, 3, 2, 6) y <- c(0, 8, 0, 0, 5) sqrt(sum((x-y)^2)) #> 9.69536
References
Gower JC, Legendre P 1986. Metric and Euclidean Properties of Dissimilarity Coefficients. Journal of Classification. 3. doi:10.1007/BF01896809
Legendre P, Caceres M 2013. Beta diversity as the variance of community data: dissimilarity coefficients and partitioning. Ecology Letters. 16(8). doi:10.1111/ele.12141
See Also
Other beta_diversity:
bray_curtis()
,
canberra()
,
generalized_unifrac()
,
gower()
,
jaccard()
,
kulczynski()
,
manhattan()
,
unweighted_unifrac()
,
variance_adjusted_unifrac()
,
weighted_normalized_unifrac()
,
weighted_unifrac()
Examples
# Example counts matrix
ex_counts
# Euclidean weighted distance matrix
euclidean(ex_counts)
# Euclidean unweighted distance matrix
euclidean(ex_counts, weighted = FALSE)
# Only calculate distances for A vs all.
euclidean(ex_counts, pairs = 1:3)
Example counts matrix
Description
Genera found on four human body sites.
Usage
ex_counts
Format
A matrix of 4 samples (columns) x 6 genera (rows).
Source
Derived from The Human Microbiome Project dataset. https://commonfund.nih.gov/hmp
Example phylogenetic tree
Description
Companion tree for ex_counts
.
Usage
ex_tree
Format
A phylo
object.
Details
ex_tree
encodes this tree structure:
+----------44---------- Haemophilus +-2-| | +----------------68---------------- Bacteroides | | +---18---- Streptococcus | +--12--| | | +--11-- Staphylococcus +--11--| | +-----24----- Corynebacterium +--12--| +--13-- Propionibacterium
Faith's PD
Description
Faith's phylogenetic diversity metric.
A higher value indicates a greater amount of evolutionary history
represented within the community, suggesting higher biodiversity in terms of
evolutionary relationships.
Usage
faith(counts, tree = NULL, cpus = n_cpus())
Arguments
counts |
An OTU abundance matrix where each column is a sample, and
each row is an OTU. Any object coercible with |
tree |
A |
cpus |
How many parallel processing threads should be used. The
default, |
Value
A numeric vector.
Calculation
Given n
branches with lengths L
and a sample's
abundances on each of those branches coded as 1 for present or 0 for absent:
\sum_{i = 1}^{n} P_i \times L_i
References
Faith DP 1992. Conservation evaluation and phylogenetic diversity. Biological Conservation, 61:1-10. doi:10.1016/0006-3207(92)91201-3
See Also
Other alpha_diversity:
chao1()
,
inv_simpson()
,
shannon()
,
simpson()
Examples
# Example counts matrix
ex_counts
# Faith diversity values
faith(ex_counts, tree = ex_tree)
Generalized UniFrac
Description
Generalized UniFrac beta diversity metric.
Usage
generalized_unifrac(
counts,
tree = NULL,
alpha = 0.5,
pairs = NULL,
cpus = n_cpus()
)
Arguments
counts |
An OTU abundance matrix where each column is a sample, and
each row is an OTU. Any object coercible with |
tree |
A |
alpha |
How much weight to give to relative abundances; a value
between 0 and 1, inclusive. Setting |
pairs |
Which combinations of samples should distances be
calculated for? The default value ( |
cpus |
How many parallel processing threads should be used. The
default, |
Value
A dist
object.
Calculation
Given n
branches with lengths L
, a pair of samples'
abundances (A
and B
) on each of those branches, and
abundance weighting 0 \le \alpha \le 1
:
D = \displaystyle \frac{\sum_{i = 1}^{n} L_i(\frac{A_i}{A_T} + \frac{B_i}{B_T})^{\alpha}|\displaystyle \frac{\frac{A_i}{A_T} - \frac{B_i}{B_T}}{\frac{A_i}{A_T} + \frac{B_i}{B_T}} |}{\sum_{i = 1}^{n} L_i(\frac{A_i}{A_T} + \frac{B_i}{B_T})^{\alpha}}
See vignette('unifrac')
for details and a worked example.
References
Chen J, Bittinger K, Charlson ES, Hoffmann C, Lewis J, Wu GD, Collman RG, Bushman FD, Li H 2012. Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinformatics, 28(16). doi:10.1093/bioinformatics/bts342
See Also
Other beta_diversity:
bray_curtis()
,
canberra()
,
euclidean()
,
gower()
,
jaccard()
,
kulczynski()
,
manhattan()
,
unweighted_unifrac()
,
variance_adjusted_unifrac()
,
weighted_normalized_unifrac()
,
weighted_unifrac()
Examples
# Example counts matrix
ex_counts
# Generalized UniFrac distance matrix
generalized_unifrac(ex_counts, tree = ex_tree)
# Only calculate distances for A vs all.
generalized_unifrac(ex_counts, tree = ex_tree, pairs = 1:3)
Gower
Description
Gower beta diversity metric.
Usage
gower(counts, weighted = TRUE, pairs = NULL, cpus = n_cpus())
Arguments
counts |
An OTU abundance matrix where each column is a sample, and
each row is an OTU. Any object coercible with |
weighted |
If |
pairs |
Which combinations of samples should distances be
calculated for? The default value ( |
cpus |
How many parallel processing threads should be used. The
default, |
Value
A dist
object.
Calculation
Each row (OTU) of counts
is rescaled to the range 0-1. In cases where a
row is all the same value, those values are replaced with 0
.
counts scaled recounts A B C D A B C D OTU1 0 0 0 0 -> OTU1 0.0 0.0 0.0 0 OTU2 0 8 9 10 -> OTU2 0.0 0.8 0.9 1 OTU3 5 5 5 5 -> OTU3 0.0 0.0 0.0 0 OTU4 2 0 0 0 -> OTU4 1.0 0.0 0.0 0 OTU5 4 6 4 1 -> OTU5 0.6 1.0 0.6 0
In the formulas below, x
and y
are two columns (samples) from the scaled
counts. n
is the number of rows (OTUs) in counts
.
D = \displaystyle \frac{1}{n}\sum_{i = 1}^{n} |x_i - y_i|
x <- c(0, 0, 0, 1, 0.6) y <- c(0, 0.8, 0, 0, 1) sum(abs(x-y)) / length(x) #> 0.44
References
Gower JC 1971. A general coefficient of similarity and some of its properties. Biometrics. 27(4). doi:10.2307/2528823
Gower JC, Legendre P 1986. Metric and Euclidean Properties of Dissimilarity Coefficients. Journal of Classification. 3. doi:10.1007/BF01896809
See Also
Other beta_diversity:
bray_curtis()
,
canberra()
,
euclidean()
,
generalized_unifrac()
,
jaccard()
,
kulczynski()
,
manhattan()
,
unweighted_unifrac()
,
variance_adjusted_unifrac()
,
weighted_normalized_unifrac()
,
weighted_unifrac()
Examples
# Example counts matrix
ex_counts
# Gower weighted distance matrix
gower(ex_counts)
# Gower unweighted distance matrix
gower(ex_counts, weighted = FALSE)
# Only calculate distances for A vs all.
gower(ex_counts, pairs = 1:3)
Inverse Simpson
Description
Inverse Simpson alpha diversity metric.
Usage
inv_simpson(counts, cpus = n_cpus())
Arguments
counts |
An OTU abundance matrix where each column is a sample, and
each row is an OTU. Any object coercible with |
cpus |
How many parallel processing threads should be used. The
default, |
Value
A numeric vector.
Calculation
Pre-transformation: drop all OTUs with zero abundance.
In the formulas below, x
is a single column (sample) from counts
.
p
are the relative abundances.
p_{i} = \displaystyle \frac{x_i}{\sum x}
D = \displaystyle 1 / \sum_{i = 1}^{n} p_{i}\times\ln(p_{i})
x <- c(4, 0, 3, 2, 6)[-2] p <- x / sum(x) 1 / sum(p * log(p)) #> -0.7636352
References
Simpson EH 1949. Measurement of diversity. Nature, 163. doi:10.1038/163688a0
See Also
Other alpha_diversity:
chao1()
,
faith()
,
shannon()
,
simpson()
Examples
# Example counts matrix
ex_counts
# Inverse Simpson diversity values
inv_simpson(ex_counts)
# Low diversity
inv_simpson(c(100, 1, 1, 1, 1)) # 1.08
# High diversity
inv_simpson(c(20, 20, 20, 20, 20)) # 5
# Low richness
inv_simpson(1:3) # 2.57
# High richness
inv_simpson(1:100) # 75.37
Jaccard
Description
Jaccard beta diversity metric.
Usage
jaccard(counts, weighted = TRUE, pairs = NULL, cpus = n_cpus())
Arguments
counts |
An OTU abundance matrix where each column is a sample, and
each row is an OTU. Any object coercible with |
weighted |
If |
pairs |
Which combinations of samples should distances be
calculated for? The default value ( |
cpus |
How many parallel processing threads should be used. The
default, |
Value
A dist
object.
Calculation
In the formulas below, x
and y
are two columns (samples) from counts
.
n
is the number of rows (OTUs) in counts
.
b = \displaystyle \frac{\sum_{i = 1}^{n} |x_i - y_i|}{\sum_{i = 1}^{n} x_i + y_i}
D = \displaystyle \frac{2b}{1 + b}
x <- c(4, 0, 3, 2, 6) y <- c(0, 8, 0, 0, 5) bray <- sum(abs(x-y)) / sum(x+y) 2 * bray / (1 + bray) #> 0.7826087
References
Jaccard P 1908. Nouvellesrecherches sur la distribution florale. Bulletin de la Societe Vaudoise des Sciences Naturelles, 44(163). doi:10.5169/seals-268384
See Also
Other beta_diversity:
bray_curtis()
,
canberra()
,
euclidean()
,
generalized_unifrac()
,
gower()
,
kulczynski()
,
manhattan()
,
unweighted_unifrac()
,
variance_adjusted_unifrac()
,
weighted_normalized_unifrac()
,
weighted_unifrac()
Examples
# Example counts matrix
ex_counts
# Jaccard weighted distance matrix
jaccard(ex_counts)
# Jaccard unweighted distance matrix
jaccard(ex_counts, weighted = FALSE)
# Only calculate distances for A vs all.
jaccard(ex_counts, pairs = 1:3)
Kulczynski
Description
Kulczynski beta diversity metric.
Usage
kulczynski(counts, weighted = TRUE, pairs = NULL, cpus = n_cpus())
Arguments
counts |
An OTU abundance matrix where each column is a sample, and
each row is an OTU. Any object coercible with |
weighted |
If |
pairs |
Which combinations of samples should distances be
calculated for? The default value ( |
cpus |
How many parallel processing threads should be used. The
default, |
Value
A dist
object.
Calculation
In the formulas below, x
and y
are two columns (samples) from counts
.
n
is the number of rows (OTUs) in counts
.
t = \displaystyle \sum_{i = 1}^{n} min(x_i,y_i)
D = \displaystyle 1 - 0.5(\frac{t}{\sum_{i = 1}^{n} x_i} + \frac{t}{\sum_{i = 1}^{n} y_i})
x <- c(4, 0, 3, 2, 6) y <- c(0, 8, 0, 0, 5) t <- sum(pmin(x,y)) 1 - (t/sum(x) + t/sum(y)) / 2 #> 0.6410256
References
Kulcynski S 1927. Die Pflanzenassoziationen der Pieninen. Bulletin International de l'Académie Polonaise des Sciences et des Lettres, Classe des Sciences Mathématiques et Naturelles, Série B: Sciences Naturelles.
See Also
Other beta_diversity:
bray_curtis()
,
canberra()
,
euclidean()
,
generalized_unifrac()
,
gower()
,
jaccard()
,
manhattan()
,
unweighted_unifrac()
,
variance_adjusted_unifrac()
,
weighted_normalized_unifrac()
,
weighted_unifrac()
Examples
# Example counts matrix
ex_counts
# Kulczynski weighted distance matrix
kulczynski(ex_counts)
# Kulczynski unweighted distance matrix
kulczynski(ex_counts, weighted = FALSE)
# Only calculate distances for A vs all.
kulczynski(ex_counts, pairs = 1:3)
Manhattan
Description
Manhattan beta diversity metric.
Usage
manhattan(counts, weighted = TRUE, pairs = NULL, cpus = n_cpus())
Arguments
counts |
An OTU abundance matrix where each column is a sample, and
each row is an OTU. Any object coercible with |
weighted |
If |
pairs |
Which combinations of samples should distances be
calculated for? The default value ( |
cpus |
How many parallel processing threads should be used. The
default, |
Value
A dist
object.
Calculation
In the formulas below, x
and y
are two columns (samples) from counts
.
n
is the number of rows (OTUs) in counts
.
D = \displaystyle \sum_{i = 1}^{n} |x_i - y_i|
x <- c(4, 0, 3, 2, 6) y <- c(0, 8, 0, 0, 5) sum(abs(x-y)) #> 18
References
Paul EB 2006. Manhattan distance. Dictionary of Algorithms and Data Structures. https://xlinux.nist.gov/dads/HTML/manhattanDistance.html
See Also
Other beta_diversity:
bray_curtis()
,
canberra()
,
euclidean()
,
generalized_unifrac()
,
gower()
,
jaccard()
,
kulczynski()
,
unweighted_unifrac()
,
variance_adjusted_unifrac()
,
weighted_normalized_unifrac()
,
weighted_unifrac()
Examples
# Example counts matrix
ex_counts
# Manhattan weighted distance matrix
manhattan(ex_counts)
# Manhattan unweighted distance matrix
manhattan(ex_counts, weighted = FALSE)
# Only calculate distances for A vs all.
manhattan(ex_counts, pairs = 1:3)
Number of CPU Cores
Description
A thin wrapper around
parallel::detectCores(all.tests = TRUE, logical = TRUE)
which falls back
to 1
when the number of CPU cores cannot be detected, or when the system
does not support pthreads
. Consider using parallely::availableCores()
in place of n_cpus()
for more advanced interrogation of system resources.
Usage
n_cpus()
Value
A scalar integer, guaranteed to be at least 1
.
Examples
n_cpus()
Rarefy OTU counts.
Description
Sub-sample OTU observations such that all samples have an equal number.
If called on data with non-integer abundances, values will be re-scaled to
integers between 1 and depth
such that they sum to depth
.
Usage
rarefy(
counts,
depth = 0.1,
n_samples = NULL,
seed = 0,
times = NULL,
cpus = n_cpus()
)
Arguments
counts |
An OTU abundance matrix where each column is a sample, and
each row is an OTU. Any object coercible with |
depth |
How many observations to keep per sample. When
|
n_samples |
The number of samples to keep. When |
seed |
An integer seed for randomizing which observations to keep or drop. If you need to create different random rarefactions of the same data, set the seed to a different number each time. |
times |
How many independent rarefactions to perform. If set,
|
cpus |
How many parallel processing threads should be used. The
default, |
Value
An integer matrix.
Examples
# Create an OTU matrix with 4 samples (A-D) and 5 OTUs.
counts <- matrix(
data = c(4,0,3,2,6,0,8,0,0,5,0,9,0,0,7,0,10,0,0,1),
nrow = 5,
dimnames = list(paste0('OTU', 1:5), LETTERS[1:4]) )
counts
colSums(counts)
counts <- rarefy(counts, depth = 14)
counts
colSums(counts)
Read a newick formatted phylogenetic tree.
Description
A phylogenetic tree is required for computing UniFrac distance matrices. You can load a tree from a file or by providing the tree string directly. This tree must be in Newick format, also known as parenthetic format and New Hampshire format.
Usage
read_tree(newick, underscores = FALSE)
Arguments
newick |
Input data as either a file path, URL, or Newick string. Compressed (gzip or bzip2) files are also supported. |
underscores |
If |
Value
A phylo
class object representing the tree.
Examples
tree <- read_tree("
(A:0.99,((B:0.87,C:0.89):0.51,(((D:0.16,(E:0.83,F:0.96)
:0.94):0.69,(G:0.92,(H:0.62,I:0.85):0.54):0.23):0.74,J:0.1
2):0.43):0.67);")
class(tree)
Shannon
Description
Shannon alpha diversity metric.
The index considers both the number of different OTUs (richness) and how
evenly the observations are distributed among those OTUs (evenness).
Usage
shannon(counts, cpus = n_cpus())
Arguments
counts |
An OTU abundance matrix where each column is a sample, and
each row is an OTU. Any object coercible with |
cpus |
How many parallel processing threads should be used. The
default, |
Value
A numeric vector.
Calculation
Pre-transformation: drop all OTUs with zero abundance.
In the formulas below, x
is a single column (sample) from counts
.
p_i
is the proportion of the i
-th OTU in the total community.
p_{i} = \displaystyle \frac{x_i}{\sum x}
D = \displaystyle -\sum_{i = 1}^{n} p_{i}\times\ln(p_{i})
x <- c(4, 0, 3, 2, 6)[-2] p <- x / sum(x) -sum(p * log(p)) #> 1.309526
References
Shannon CE, Weaver W 1949. The Mathematical Theory of Communication. University of Illinois Press.
See Also
Other alpha_diversity:
chao1()
,
faith()
,
inv_simpson()
,
simpson()
Examples
# Example counts matrix
ex_counts
# Shannon diversity values
shannon(ex_counts)
# Low diversity
shannon(c(100, 1, 1, 1, 1)) # 0.22
# High diversity
shannon(c(20, 20, 20, 20, 20)) # 1.61
# Low richness
shannon(1:3) # 1.01
# High richness
shannon(1:100) # 4.42
Simpson
Description
Simpson alpha diversity metric.
Gauges the uniformity of species within a community. A Simpson index of 0
indicates that one or a few high abundance OTUs dominate the community,
which is indicative of low diversity.
Usage
simpson(counts, cpus = n_cpus())
Arguments
counts |
An OTU abundance matrix where each column is a sample, and
each row is an OTU. Any object coercible with |
cpus |
How many parallel processing threads should be used. The
default, |
Value
A numeric vector.
Calculation
Pre-transformation: drop all OTUs with zero abundance.
In the formulas below, x
is a single column (sample) from counts
.
p
are the relative abundances.
p_{i} = \displaystyle \frac{x_i}{\sum x}
D = \displaystyle 1 - \sum_{i = 1}^{n} p_{i}\times\ln(p_{i})
x <- c(4, 0, 3, 2, 6)[-2] p <- x / sum(x) 1 - sum(p * log(p)) #> 2.309526
References
Simpson EH 1949. Measurement of diversity. Nature, 163. doi:10.1038/163688a0
See Also
Other alpha_diversity:
chao1()
,
faith()
,
inv_simpson()
,
shannon()
Examples
# Example counts matrix
ex_counts
# Simpson diversity values
simpson(ex_counts)
# Low diversity
simpson(c(100, 1, 1, 1, 1)) # 0.075
# High diversity
simpson(c(20, 20, 20, 20, 20)) # 0.8
# Low richness
simpson(1:3) # 0.61
# High richness
simpson(1:100) # 0.99
Unweighted UniFrac
Description
Unweighted UniFrac beta diversity metric.
Usage
unweighted_unifrac(counts, tree = NULL, pairs = NULL, cpus = n_cpus())
Arguments
counts |
An OTU abundance matrix where each column is a sample, and
each row is an OTU. Any object coercible with |
tree |
A |
pairs |
Which combinations of samples should distances be
calculated for? The default value ( |
cpus |
How many parallel processing threads should be used. The
default, |
Value
A dist
object.
Calculation
Given n
branches with lengths L
and a pair of samples'
abundances (A
and B
) on each of those branches:
D = \displaystyle \frac{\sum_{i = 1}^{n} L_i(|A_i - B_i|)}{\sum_{i = 1}^{n} L_i(max(A_i,B_i))}
Abundances in A
and B
are coded as 1
or 0
to indicate their
presence or absence, respectively, on each branch.
See https://cmmr.github.io/ecodive/articles/unifrac.html for details and a worked example.
References
Lozupone C, Knight R 2005. UniFrac: A new phylogenetic method for comparing microbial communities. Applied and Environmental Microbiology, 71(12). doi:10.1128/AEM.71.12.8228-8235.2005
See Also
Other beta_diversity:
bray_curtis()
,
canberra()
,
euclidean()
,
generalized_unifrac()
,
gower()
,
jaccard()
,
kulczynski()
,
manhattan()
,
variance_adjusted_unifrac()
,
weighted_normalized_unifrac()
,
weighted_unifrac()
Examples
# Example counts matrix
ex_counts
# Unweighted UniFrac distance matrix
unweighted_unifrac(ex_counts, tree = ex_tree)
# Only calculate distances for A vs all.
unweighted_unifrac(ex_counts, tree = ex_tree, pairs = 1:3)
Variance Adjusted UniFrac
Description
Variance Adjusted UniFrac beta diversity metric.
Usage
variance_adjusted_unifrac(counts, tree = NULL, pairs = NULL, cpus = n_cpus())
Arguments
counts |
An OTU abundance matrix where each column is a sample, and
each row is an OTU. Any object coercible with |
tree |
A |
pairs |
Which combinations of samples should distances be
calculated for? The default value ( |
cpus |
How many parallel processing threads should be used. The
default, |
Value
A dist
object.
Calculation
Given n
branches with lengths L
and a pair of samples'
abundances (A
and B
) on each of those branches:
D = \displaystyle \frac{\sum_{i = 1}^{n} L_i\displaystyle \frac{|\frac{A_i}{A_T} - \frac{B_i}{B_T}|}{\sqrt{(A_i + B_i)(A_T + B_T - A_i - B_i)}} }{\sum_{i = 1}^{n} L_i\displaystyle \frac{\frac{A_i}{A_T} + \frac{B_i}{B_T}}{\sqrt{(A_i + B_i)(A_T + B_T - A_i - B_i)}} }
See vignette('unifrac')
for details and a worked example.
References
Chang Q, Luan Y, Sun F 2011. Variance adjusted weighted UniFrac: a powerful beta diversity measure for comparing communities based on phylogeny. BMC Bioinformatics, 12. doi:10.1186/1471-2105-12-118
See Also
Other beta_diversity:
bray_curtis()
,
canberra()
,
euclidean()
,
generalized_unifrac()
,
gower()
,
jaccard()
,
kulczynski()
,
manhattan()
,
unweighted_unifrac()
,
weighted_normalized_unifrac()
,
weighted_unifrac()
Examples
# Example counts matrix
ex_counts
# Variance Adjusted UniFrac distance matrix
variance_adjusted_unifrac(ex_counts, tree = ex_tree)
# Only calculate distances for A vs all.
variance_adjusted_unifrac(ex_counts, tree = ex_tree, pairs = 1:3)
Normalized UniFrac
Description
Normalized UniFrac beta diversity metric.
Usage
weighted_normalized_unifrac(counts, tree = NULL, pairs = NULL, cpus = n_cpus())
Arguments
counts |
An OTU abundance matrix where each column is a sample, and
each row is an OTU. Any object coercible with |
tree |
A |
pairs |
Which combinations of samples should distances be
calculated for? The default value ( |
cpus |
How many parallel processing threads should be used. The
default, |
Value
A dist
object.
Calculation
Given n
branches with lengths L
and a pair of samples'
abundances (A
and B
) on each of those branches:
D = \displaystyle \frac{\sum_{i = 1}^{n} L_i|\frac{A_i}{A_T} - \frac{B_i}{B_T}|}{\sum_{i = 1}^{n} L_i(\frac{A_i}{A_T} + \frac{B_i}{B_T})}
See vignette('unifrac')
for details and a worked example.
References
Lozupone CA, Hamady M, Kelley ST, Knight R 2007.
Quantitative and Qualitative \beta
Diversity Measures Lead to Different Insights into Factors That Structure Microbial Communities.
Applied and Environmental Microbiology, 73(5).
doi:10.1128/AEM.01996-06
See Also
Other beta_diversity:
bray_curtis()
,
canberra()
,
euclidean()
,
generalized_unifrac()
,
gower()
,
jaccard()
,
kulczynski()
,
manhattan()
,
unweighted_unifrac()
,
variance_adjusted_unifrac()
,
weighted_unifrac()
Examples
# Example counts matrix
ex_counts
# UniFrac weighted distance matrix
weighted_normalized_unifrac(ex_counts, tree = ex_tree)
# Only calculate distances for A vs all.
weighted_normalized_unifrac(ex_counts, tree = ex_tree, pairs = 1:3)
Weighted UniFrac
Description
Weighted UniFrac beta diversity metric.
Usage
weighted_unifrac(counts, tree = NULL, pairs = NULL, cpus = n_cpus())
Arguments
counts |
An OTU abundance matrix where each column is a sample, and
each row is an OTU. Any object coercible with |
tree |
A |
pairs |
Which combinations of samples should distances be
calculated for? The default value ( |
cpus |
How many parallel processing threads should be used. The
default, |
Value
A dist
object.
Calculation
Given n
branches with lengths L
and a pair of samples'
abundances (A
and B
) on each of those branches:
D = \sum_{i = 1}^{n} L_i|\frac{A_i}{A_T} - \frac{B_i}{B_T}|
See vignette('unifrac')
for details and a worked example.
References
Lozupone CA, Hamady M, Kelley ST, Knight R 2007.
Quantitative and Qualitative \beta
Diversity Measures Lead to Different Insights into Factors That Structure Microbial Communities.
Applied and Environmental Microbiology, 73(5).
doi:10.1128/AEM.01996-06
See Also
Other beta_diversity:
bray_curtis()
,
canberra()
,
euclidean()
,
generalized_unifrac()
,
gower()
,
jaccard()
,
kulczynski()
,
manhattan()
,
unweighted_unifrac()
,
variance_adjusted_unifrac()
,
weighted_normalized_unifrac()
Examples
# Example counts matrix
ex_counts
# Weighted UniFrac distance matrix
weighted_unifrac(ex_counts, tree = ex_tree)
# Only calculate distances for A vs all.
weighted_unifrac(ex_counts, tree = ex_tree, pairs = 1:3)