# Maximum Likelihood Estimation

Any observation has some degree of noise content that makes our observations uncertain. When we try to make conclusions based on noisy observations, we have to separate the dynamics of a signal from noise. This is the point that estimation starts. Any time that we analyse noisy observations to make decisions, we are estimating some parameters. Parameters are mainly used to simplify the description of a dynamic.

Noise by its definition is a sequence of data which we can not formulate its dynamic. This is why in any model we have a residual that is normally Gaussian and is separated from the deterministic dynamic.

Now the main question that arises is that how accurate we can estimate the parameters when the dynamic is polluted by noise. The best solution means the optimal value of the parameter in some sense.

To answer this question, we have to first find a measure that shows the amount of noise in the parameter value. The next step is to minimize this measure which is proportional to the error.

The maximum likelihood is the most sensitive measure that reflects the effect of noise. When we talk about ML estimation, we maximize the probability of the parameter value given the observations. Interchangeably, ML gives the most probable value given a set of observations.

The likelihood function that we maximize during ML, is the probability of the parameters given the data. The main challenge in ML is finding this distribution. Once we found the distribution, then by setting the derivatives versus parameters equal to zero, we can find the optimal values.

Most of the cases, the Likelihood function is Gaussian, which in turn reduces the problem to the mean squared error minimization. Therefore, the solution is the least squares solution, which was proposed years back by the great mathematician Carl Friedrich Gauss.

In other cases, we simplify the joint distribution by assuming that the observations are IID (independent). In this case the marginal PDFs are multiplied to make the joint density.

Generally, the main bottle-neck in ML is to assign a joint PDF to the parameters and then maximize this value given the observations.

- Comments
- Write a Comment Select to add a comment

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Please login (on the right) if you already have an account on this platform.

Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: