Maximum likelihood estimator

From DDWiki

Jump to: navigation, search

A maximum likelihood estimator (MLE) is a statistical method to fit an assumed functional form in a probabilistic model to observed data.

The maximum likelihood approach is commonly used to fit simple discrete choice models such as the logit model; however, it can be impractical for fitting discrete choice models with greater complexity, and Bayesian estimation is typically called upon for such cases.

Contents

Likelihood

Likelihood is a statistical approach to express the fitting of a specific distribution function to observed sample data. Assume the probability density function of the distribution can be represented by parameter θ and the observed data set has values y1, y2,...,yn from independent observations. The likelihood function can be expressed as:

 L(\theta) = f(y_i ; \theta) = \prod_{i=1}^{N} f(y_i ; \theta)


Log likelihood

To present the likelihood with mathematic convenience, log likelihood is the most common to be used. The log transformation is a monotonic transformation that maintains the same optimum as maximizing likelihood directly, while simplifying computation and numerical roundoff error dramatically.

 LL = \log L (\theta) = \sum_{i=1}^{N} LL_i (\theta)


Maximum likelihood estimator

The basic idea of MLE is to implement optimization skill, and treat fitting distribution parameter θ as variable and log likelihood as objection function. If a maximum likelihood value can be obtained, then the distribution should fit the sample data at its best.

Generally the log likelihood is approximately quadratic as a function of fitting distribution parameters. Therefore the maximization of log likelihood can be done through unconstrained nonlinear programming algorithm using first-order gradient, such as the steepest descent method, BHHH and BFGS algorithms.


Asymptotic properties

  • Consistency
The expected value of the log-likelihood is maximized at the true value of parameters θ0. The mathematical expression is:
 \mathbf E_0[(1/n) \log(\theta_0)] > E_0[(1/n) \log(\theta)] for any given θθ0
  • Asymptotic normality
At the maximum likelihood, the gradient of the log-likelihood equals zero: g(θ)=0. The asymptotic distribution of the maximum likelihood estimator:
 \hat{\theta} \sim N[\theta_0,(I(\theta_0))^{-1}]
  • Asymptotic efficiency
There is lower bound for the variance of an unbiased maximum likelihood estimator.
  • Invariance
MLE is invariance to one-to-one' transformations of θ.