IPIN2018/tex_review/chapters/estimation.tex

\subsection{State Estimation}
\label{sec:estimation}
% 1/2 bis 3/4 Seite
% particles describe posterior as samples
% (MAP) estimate => max particle
%   very jumpy
% MMSE estimate => weighted average
%   most of the time very good
%   goes out of the window with multi modalities
% estimation of the pdf could help
% computational cheap methods are based on a parametric family
% not neccesserly given in our case
% non parametric => slow
% solution boxKDE
% Problems: larger error compared to WA and bandwidth selection


Each particle is a realization of one possible system state, here the position of a pedestrian within a building.
The set of all particles represents the posterior of the system.
In other words, the particle filter naturally generates a sample based representation of the posterior.
With this representation a point estimator can directly be applied to the sample data to derive a sample statistic serving as a \qq{best guess}.

A popular point estimate, which can be directly obtained from the sample set, is the minimum mean squared error (MMSE) estimate.
In the case of particle filters the MMSE estimate equals to the weighted-average over all samples, \ie{} the sample mean
\begin{equation}
    \hat{\mStateVec}_t := \frac{1}{W_t} \sum_{i=1}^{N} w^i_t \vec{X}^i_{t} \, \text{,}
\end{equation}
%\commentByMarkus{Passt die Notation so?}
%\commentByFrank{sieht fuer mich auf den ersten blick nach korrektem weighted average aller partikel aus. was stoert dich?}
where $W_t=\sum_{i=1}^{N}w^i_t$ is the sum of all weights.
While producing an overall good result in many situations, it fails when the posterior is multimodal.
In these situations the weighted-average estimate will find the estimate somewhere between the modes.
\del{Clearly}\add{It is expected that} such a position between modes is extremely unlikely the position of the pedestrian.
The real position is more likely to be found at the position of one of the modes, but virtually never somewhere between.

In the case of a multimodal posterior the system should estimate the position based on the highest mode.
Therefore, the maximum a posteriori (MAP) estimate is a suitable choice for such a situation.
A straightforward approach is to select the particle with the highest weight.
However, this is in fact not necessarily a valid MAP estimate, because only the weight of the particle is taken into account.
In order to compute the true MAP estimate the local density of the particles needs to be considered as well \cite{cappe2007overview}.

\del{It is obvious,} A computation of the probability density function of the posterior could solve the above, but finding such an analytical solution is \del{clearly} an intractable problem, which is the reason for applying a sample representation in the first place.
A feasible alternative is to estimate the parameters of a specific parametric model based on the sample set, assuming that the unknown distribution is approximately a parametric distribution or a mixture of parametric distributions, \eg{} Gaussian mixture distributions.
Given the estimated parameters the most probable state can be obtained from the parameterised density function.
%In the case of multi-modalities several parametric distributions can be combined into a mixture distribution.
However, parametric models fail when the assumption does not fit the underlying model.
For our application assuming a parametric distribution is too limiting as the posterior is changing in a non-predictable way over time.
%As a result, those techniques are not able to provide an accurate statement about the most probable state, rather causing misleading or false outcomes.

On the other side a non-parametric approach directly obtains an estimate of the entire density function driven by the structure of the data.
A classic non-parametric method is the kernel density estimator (KDE), where a kernel function with given bandwidth is placed at each particle to approximate the posterior.
While the kernel estimate inherits all the properties of the kernel, usually it is not of crucial matter if a non-optimal kernel was chosen.
As a matter of fact, the quality of the kernel estimate is primarily determined by the bandwidth. % TODO \cite{scott2015} ?
For our system we choose the Gaussian kernel in favour of computational efficiency.

The great flexibility of the KDE comes at the cost of a high computational time, which renders it unpractical for real time scenarios.
The complexity of a naive implementation of the KDE is \landau{MN}, given by $M$ evaluations and $N$ particles as input size.
A fast approximation of the KDE can be applied if the data is stored in equidistant bins as suggested by \cite{silverman1982algorithm}.
Computation of the KDE with a Gaussian kernel on the binned data becomes analogous to applying a Gaussian filter, which can be approximated by iterated box filter in \landau{N} \cite{Bullmann-18}.
Our \del{rapid computation} \add{approximation} scheme of the KDE is fast enough to estimate the density of the posterior in each time step.
This allows us to recover the most prober state from occurring multimodal posterior.