| Title: | Distributed Multinomial Regression |
| Version: | 1.0.2 |
| Depends: | R (≥ 2.15), Matrix, gamlr, parallel, methods, stats |
| Suggests: | MASS, textir |
| Description: | Fast distributed/parallel estimation for multinomial logistic regression via Poisson factorization and the 'gamlr' package. For details see: Taddy (2015, AoAS), Distributed Multinomial Regression, <doi:10.48550/arXiv.1311.6139>. |
| Maintainer: | Nelson Rayl <nelsonrayl14@gmail.com> |
| License: | GPL-3 |
| URL: | https://github.com/TaddyLab/distrom |
| NeedsCompilation: | no |
| Packaged: | 2025-09-04 01:06:56 UTC; nelsonrayl |
| Author: | Matt Taddy [aut], Nelson Rayl [cre] |
| Repository: | CRAN |
| Date/Publication: | 2025-10-03 12:30:02 UTC |
Data checking and binning
Description
Collapses counts along equal levels of binned covariates.
Usage
collapse(v,counts,mu=NULL,bins=NULL)
Arguments
v |
Either matrix or |
counts |
Either matrix or |
mu |
Possible pre-specified fixed effects for |
bins |
The number of quantile bins into which we collapse |
Details
For each column of v, aggregates
the observations into bins defined by their average value. Both v and counts are then collapsed according to levels of the interaction across implied bin-factors, and the number
of observations in each bin is recorded as n. Look at the code of the dmr function to see collapse used in practice.
Value
A list containing collapsed and formatted v, counts, and nbin, along with mu = log(rowSums(counts)), the plug-in fixed effect estimates for dmr.
Author(s)
Matt Taddy mataddy@gmail.com
See Also
Distributed Multinomial Regression
Description
Gamma-lasso path estimation for a multinomial logistic regression factorized into independent Poisson log regressions.
Usage
dmr(cl, covars, counts, mu=NULL, bins=NULL, verb=0, cv=FALSE, ...)
## S3 method for class 'dmr'
coef(object, ...)
## S3 method for class 'dmr'
predict(object, newdata,
type=c("link","response","class"), ...)
Arguments
cl |
A |
covars |
A dense |
counts |
A dense |
mu |
Pre-specified fixed effects for each observation in the Poisson regression linear equation. If |
bins |
Number of bins into which we will attempt to collapse each column of |
verb |
Whether to print some info. |
cv |
A flag for whether to use |
type |
For |
newdata |
A |
... |
Additional arguments to |
object |
A |
Details
dmr fits multinomial logistic regression by assuming that, unconditionally on the ‘size’ (total count across categories) each individual category count has been generated as a Poisson
x_{ij} \sim Po(exp[\mu_i + \alpha_j + \beta v_i ]).
We [default] plug-in estimate \hat\mu_i = log(m_i), where m_i = \sum_j x_{ij} and p is the dimension of x_i. Then each individual is outsourced to Poisson regression in the gamlr package via the parLapply function of the parallel library. The output from dmr is a list of gamlr fitted models.
coef.dmr builds a matrix of multinomial logistic regression
coefficients from the length(object) list of gamlr fits. Default selection under cv=FALSE uses an
information criteria via AICc on Poisson deviance for each
individual response dimension (see gamlr). Combined coefficients
across all dimensions are then returned as a dmrcoef s4-class
object.
predict.dmr takes either a dmr or dmrcoef object and returns predicted values for newdata on the scale defined by the type argument.
Value
dmr returns the dmr s3 object: an ncol(counts)-length list of fitted gamlr objects, with the added attributes nlambda, mu, and nobs.
Author(s)
Matt Taddy mataddy@gmail.com
References
Taddy (2015 AoAS) Distributed Multinomial Regression
Taddy (2017 JCGS) One-step Estimator Paths for Concave Regularization, the Journal of Computational and Graphical Statistics
Taddy (2013 JASA) Multinomial Inverse Regression for Text Analysis
See Also
dmrcoef-class, cv.gamlr, AICc, and the gamlr and textir packages.
Examples
library(MASS)
data(fgl)
## make your cluster
## FORK is faster but memory heavy, and doesn't work on windows.
cl <- makeCluster(2,type=ifelse(.Platform$OS.type=="unix","FORK","PSOCK"))
print(cl)
## fit in parallel
fits <- dmr(cl, fgl[,1:9], fgl$type, verb=1)
## its good practice stop the cluster once you're done
stopCluster(cl)
## Individual Poisson model fits and AICc selection
par(mfrow=c(3,2))
for(j in 1:6){
plot(fits[[j]])
mtext(names(fits)[j],font=2,line=2) }
## AICc model selection
B <- coef(fits)
## Fitted probability by true response
par(mfrow=c(1,1))
P <- predict(B, fgl[,1:9], type="response")
boxplot(P[cbind(1:214,fgl$type)]~fgl$type,
ylab="fitted prob of true class")
Class "dmrcoef"
Description
The extended Matrix class for output from coef.dmr.
Details
This is the class for a covariate matrix from dmr regression; it inherits the Matrix class as defined in the Matrix library.
In particular, this is the ncol(covars) by ncol(counts) matrix of logistic regression coefficients chosen in coef.dmr from the regularization paths for each category.
Objects from the Class
Objects can be created only by a call to the coef.dmr function.
Slots
i:From
Matrix: the row indices.p:From
Matrix: the column pointers.Dim:From
Matrix: the dimensions.Dimnames:From
Matrix: the list of labels.x:From
Matrix: the nonzero entries.factors:From
Matrix.
Extends
Class Matrix, directly.
Methods
- predict
signature(object = "dmrcoef"): Prediction for a givendmrcoefmatrix. Takes the same arguments aspredict.dmr, but will be faster (sincecoef.dmris called insidepredict.dmr).
Author(s)
Matt Taddy mataddy@gmail.com
See Also
Examples
showClass("dmrcoef")