MOM

Produymna Ghose Majumdar Sangbartta Banerjee

2025-08-01

Introduction

This package is mainly made to make some statistical inferences more handy. In this package, we want to cover some basic estimation and testing of hypotheses. We have, until now, added the method of moments, sampling interval range, and the most powerful test by the Neyman-Pearson lemma. This package is very simple to use, as easy as it much for beginners. In this vignette, we are going to discuss the theory behind the functions and utilities along with some examples in some cases.

Method of Moments:

Theory:

In this method of estimation, we mainly equate the \(rth\) sample moment(s) along with the rth order population moment. In brief, for estimating a vector valued parameter \(\theta=(\theta_1,\theta_2,\theta_3,...,\theta_s), s\ge1\), of a particular probability distribution \(f_\theta(x)\) is nothing but any solution to the moment equation \[E(X^r)=\frac1n \sum_ix_i^r\]

or \[E(X-E(X))^r=\frac1n\sum_i(x_i-x)^r\]

for any \(r=1,2,3,..\)

provided the expectation exists (at least up to the \(sth\) order) and a solution exists

Using this underlying theory, we have made the functions to get the method of moment estimates for the popular distribution.

Warning:

It requires a large number of samples; otherwise, it may give inaccurate results.

Functions:

beta_est, binom_est, chisq_test, exp_est, gamma_est, geom_est, lnorm_est, logarithmic_est, nbinom_est, norm_est, pois_est

Utility

These functions will calculate the method moments estimators of the parameters, along can also plot the histogram with the density curve with the estimated parameters

For more, see the help files.

Sampling Interval:

Motivation:

In reality, we may deduce a statistic, which is useful in drawing meaningful inferences, but the sampling distribution for a particular statistic is unknown. Now, it may be useful to know the sampling interval of the statistic. For that, we are going to introduce a simulated sampling interval.

Function:

sim_sam_int

The function asks the user to specify a distribution from which a random sample is drawn and to specify a function of the random variables for which an approximate sampling Interval is to be provided. The function then uses the Monte Carlo simulation technique to provide an approximate sampling interval of the statistic.

Although this function is inferior to other sophisticated techniques for dealing with this problem, it might come in handy for a beginner.

Increasing the parameters n and sim. size user can get more accuracy, though it will take more time.

Example:

  • 95% Sampling interval of sample mean from standard normal
library(MOM)
sim_sam_int(dist="normal",pop.par=c(0,1),FUN=mean,side="both")
#> Sampling Interval for Statistic 
#> 
#> Population Distribution:  normal 
#> Parameter:  0 1 
#> Sample Size:  100 
#> Confidence Coefficient:  0.95 
#> Sampling Interval:  -0.1938635 0.1804689
  • 99% Sampling interval of sample sum from Bin(5,0.5) with sample size 1000 and more accuracy
sim_sam_int(dist="binomial",pop.par=c(5,0.5),FUN=sum,side="both",conf.coeff=0.99,n=1000,sim.size=2000)
#> Sampling Interval for Statistic 
#> 
#> Population Distribution:  binomial 
#> Parameter:  5 0.5 
#> Sample Size:  1000 
#> Confidence Coefficient:  0.99 
#> Sampling Interval:  2421.828 2582.099

You can also create a custom function using sample data and obtain the interval.

  • The sampling interval (with upper bound only) of the minimum order statistic for Uniform(0,1) distribution
ord=function(x) return(min(x))
sim_sam_int(dist="unif",pop.par=c(0,1),FUN=ord,side="upper")
#> Sampling Interval for Statistic 
#> 
#> Population Distribution:  uniform 
#> Parameter:  0 1 
#> Sample Size:  100 
#> Confidence Coefficient:  0.95 
#> Sampling Interval:  -Inf 0.02733347

Most Powerful Test:

Theory:

Two renowned statisticians, Jerzy Neyman and Egon Pearson have proposed the “Neyman-Pearson Lemma”, which is used to determine the most powerful test (MP Test). The primary condition for applying this criterion is that the test must be simple vs simple.

Suppose we have a sample \(x=(x_1,x_2,x_3,...,x_n)\). We want to test whether the data comes from the density \(f_\theta(.)\) or \(g_\theta(.)\).

So, the null hypothesis is \(H_0:\)The data comes from \(f_\theta(.)\)

vs the alternative hypothesis \(H_1\):The data comes from \(g_\theta(.)\)

So, the MP critical region is given by: \[W=\{x:\frac{f_{H_1}}{f_{H_0}} >k\} \]

We want a k such that \(P_{H_0}(W)=\alpha\) , where \(\alpha\) is the level of significance of the test.

Now we will use a simulation to choose the value of \(k\).

And the power of the test is given by \(P_{H_1}(W)\)

Function:

sim_mp_test

This function mainly takes the sample and the choice of null and alternative distributions. Then it generates samples from the null distribution and calculates the likelihood ratio. And find the 95th percentile value and choose it as \(k\).

Then then calculate the likelihood ratios regarding the user given sample; if it is more than \(k\), it rejects the null hypothesis, otherwise not. If required, it can also calculate the power of the test, which is the maximum power that can be obtained for that case.

Example:

  • Test the data to see whether it comes from Normal (5,1) or Cauchy (5,1)
sim_mp_test(c(2.5,5.8,8.5,3.6,6.7),null.dist="normal",null.par=c(5,1),alter.dist="cauchy",alter.par=c(5,1))
#> The MP Test by NP Lemma 
#> 
#> Data:  2.5 5.8 8.5 3.6 6.7 
#> Null Distribution:  normal 
#> Null Parameter:  5 1 
#> Alternative Distribution:  cauchy 
#> Alternative Parameter:  5 1 
#> Sample Size:  5 
#> Significance Level:  0.95 
#> Decision:  Reject null hypothesis 
#> Power:  1
  • Test the data to see whether it comes from Uniform(0.5,1) or Uniform(0,1) at 90% level
sim_mp_test(runif(100),null.dist="uniform",null.par=c(0.5,1),alter.dist="uniform",alter.par=c(0,1),test.level=0.9,sim.size=10)
#> The MP Test by NP Lemma 
#> 
#> Data:  0.373562 0.2738595 0.6113875 0.1778065 0.467409 0.5865435 
#> Null Distribution:  uniform 
#> Null Parameter:  0.5 1 
#> Alternative Distribution:  uniform 
#> Alternative Parameter:  0 1 
#> Sample Size:  100 
#> Significance Level:  0.9 
#> Decision:  Reject null hypothesis 
#> Power:  1
  • Test the data where it comes from Poisson(2) or Poisson (3) at 95% level
sim_mp_test(c(2,1,3,0,2),null.dist="pois",null.par=c(2),alter.dist="pois",alter.par=c(3))
#> The MP Test by NP Lemma 
#> 
#> Data:  2 1 3 0 2 
#> Null Distribution:  poisson 
#> Null Parameter:  2 
#> Alternative Distribution:  poisson 
#> Alternative Parameter:  3 
#> Sample Size:  5 
#> Significance Level:  0.95 
#> Decision:  Reject null hypothesis 
#> Power:  1

Warning:

The more you increase sim. size, you shall get more accuracy, but over-increasing in the discrete case may lead to failure of the test. In the discrete case, the power may not be accurate.

Conclusion:

We are trying to modify the package more; it is very basic and for beginners. We have kept in mind the layman users. So, we want to keep things simple and user-friendly as much as possible. We try to add more things to make it a more sophisticated tool for estimation and hypothesis. If you find any mistakes or have any suggestions, you can reach out by email (IDs are given in the package description).