This package is mainly made to make some statistical inferences more handy. In this package, we want to cover some basic estimation and testing of hypotheses. We have, until now, added the method of moments, sampling interval range, and the most powerful test by the Neyman-Pearson lemma. This package is very simple to use, as easy as it much for beginners. In this vignette, we are going to discuss the theory behind the functions and utilities along with some examples in some cases.
In this method of estimation, we mainly equate the \(rth\) sample moment(s) along with the rth order population moment. In brief, for estimating a vector valued parameter \(\theta=(\theta_1,\theta_2,\theta_3,...,\theta_s), s\ge1\), of a particular probability distribution \(f_\theta(x)\) is nothing but any solution to the moment equation \[E(X^r)=\frac1n \sum_ix_i^r\]
or \[E(X-E(X))^r=\frac1n\sum_i(x_i-x)^r\]
for any \(r=1,2,3,..\)
provided the expectation exists (at least up to the \(sth\) order) and a solution exists
Using this underlying theory, we have made the functions to get the method of moment estimates for the popular distribution.
It requires a large number of samples; otherwise, it may give inaccurate results.
beta_est, binom_est, chisq_test, exp_est, gamma_est, geom_est, lnorm_est, logarithmic_est, nbinom_est, norm_est, pois_est
These functions will calculate the method moments estimators of the parameters, along can also plot the histogram with the density curve with the estimated parameters
For more, see the help files.
In reality, we may deduce a statistic, which is useful in drawing meaningful inferences, but the sampling distribution for a particular statistic is unknown. Now, it may be useful to know the sampling interval of the statistic. For that, we are going to introduce a simulated sampling interval.
sim_sam_int
The function asks the user to specify a distribution from which a random sample is drawn and to specify a function of the random variables for which an approximate sampling Interval is to be provided. The function then uses the Monte Carlo simulation technique to provide an approximate sampling interval of the statistic.
Although this function is inferior to other sophisticated techniques for dealing with this problem, it might come in handy for a beginner.
Increasing the parameters n and sim. size user can get more accuracy, though it will take more time.
library(MOM)
sim_sam_int(dist="normal",pop.par=c(0,1),FUN=mean,side="both")
#> Sampling Interval for Statistic
#>
#> Population Distribution: normal
#> Parameter: 0 1
#> Sample Size: 100
#> Confidence Coefficient: 0.95
#> Sampling Interval: -0.1938635 0.1804689
sim_sam_int(dist="binomial",pop.par=c(5,0.5),FUN=sum,side="both",conf.coeff=0.99,n=1000,sim.size=2000)
#> Sampling Interval for Statistic
#>
#> Population Distribution: binomial
#> Parameter: 5 0.5
#> Sample Size: 1000
#> Confidence Coefficient: 0.99
#> Sampling Interval: 2421.828 2582.099
You can also create a custom function using sample data and obtain the interval.
Two renowned statisticians, Jerzy Neyman and Egon Pearson have proposed the “Neyman-Pearson Lemma”, which is used to determine the most powerful test (MP Test). The primary condition for applying this criterion is that the test must be simple vs simple.
Suppose we have a sample \(x=(x_1,x_2,x_3,...,x_n)\). We want to test whether the data comes from the density \(f_\theta(.)\) or \(g_\theta(.)\).
So, the null hypothesis is \(H_0:\)The data comes from \(f_\theta(.)\)
vs the alternative hypothesis \(H_1\):The data comes from \(g_\theta(.)\)
So, the MP critical region is given by: \[W=\{x:\frac{f_{H_1}}{f_{H_0}} >k\} \]
We want a k such that \(P_{H_0}(W)=\alpha\) , where \(\alpha\) is the level of significance of the test.
Now we will use a simulation to choose the value of \(k\).
And the power of the test is given by \(P_{H_1}(W)\)
sim_mp_test
This function mainly takes the sample and the choice of null and alternative distributions. Then it generates samples from the null distribution and calculates the likelihood ratio. And find the 95th percentile value and choose it as \(k\).
Then then calculate the likelihood ratios regarding the user given sample; if it is more than \(k\), it rejects the null hypothesis, otherwise not. If required, it can also calculate the power of the test, which is the maximum power that can be obtained for that case.
sim_mp_test(c(2.5,5.8,8.5,3.6,6.7),null.dist="normal",null.par=c(5,1),alter.dist="cauchy",alter.par=c(5,1))
#> The MP Test by NP Lemma
#>
#> Data: 2.5 5.8 8.5 3.6 6.7
#> Null Distribution: normal
#> Null Parameter: 5 1
#> Alternative Distribution: cauchy
#> Alternative Parameter: 5 1
#> Sample Size: 5
#> Significance Level: 0.95
#> Decision: Reject null hypothesis
#> Power: 1
sim_mp_test(runif(100),null.dist="uniform",null.par=c(0.5,1),alter.dist="uniform",alter.par=c(0,1),test.level=0.9,sim.size=10)
#> The MP Test by NP Lemma
#>
#> Data: 0.373562 0.2738595 0.6113875 0.1778065 0.467409 0.5865435
#> Null Distribution: uniform
#> Null Parameter: 0.5 1
#> Alternative Distribution: uniform
#> Alternative Parameter: 0 1
#> Sample Size: 100
#> Significance Level: 0.9
#> Decision: Reject null hypothesis
#> Power: 1
sim_mp_test(c(2,1,3,0,2),null.dist="pois",null.par=c(2),alter.dist="pois",alter.par=c(3))
#> The MP Test by NP Lemma
#>
#> Data: 2 1 3 0 2
#> Null Distribution: poisson
#> Null Parameter: 2
#> Alternative Distribution: poisson
#> Alternative Parameter: 3
#> Sample Size: 5
#> Significance Level: 0.95
#> Decision: Reject null hypothesis
#> Power: 1
The more you increase sim. size, you shall get more accuracy, but over-increasing in the discrete case may lead to failure of the test. In the discrete case, the power may not be accurate.
We are trying to modify the package more; it is very basic and for beginners. We have kept in mind the layman users. So, we want to keep things simple and user-friendly as much as possible. We try to add more things to make it a more sophisticated tool for estimation and hypothesis. If you find any mistakes or have any suggestions, you can reach out by email (IDs are given in the package description).