Fusion2018/tex/chapters/usage.tex

\section{Usage}

\subsection{Extension to multi-dimensional data}
\todo{Absatz zum Thema 2D - Extension to multi-dimensional data}

So far only univariate sample sets were considered.
This is due to the fact, that the equations of the KDE \eqref{eq:kde}, BKDE \eqref{eq:binKde}, Gaussian filter \eqref{eq:gausFilt}, and the box filter \eqref{eq:boxFilt} are quite easily extended to multi-dimensional input.
Each method can be seen as several one-dimensional problems combined to a multi-dimensional result.
In the following, the generalization to multi-dimensional input are briefly outlined.


In order to estimate a multivariate density using KDE or BKDE, a multivariate kernel needs to be used.
Multivariate kernel functions can be constructed in various ways, however, a popular way is given by the product kernel.
Such a kernel is constructed by combining several univariate kernels into a product, where each kernel is applied in each dimension with a possibly different bandwidth.

Given a multivariate random variable $X=(x_1,\dots ,x_d)$ in $d$ dimensions.
The sample $\bm{X}$ is a $n\times d$ matrix defined as \cite[162]{scott2015}
\begin{equation}
    \bm{X}=
    \begin{pmatrix}
        X_1    \\
        \vdots \\
        X_n    \\
    \end{pmatrix}
    =
    \begin{pmatrix}
        x_{11} & \dots & x_{1d} \\
        \vdots & \ddots & \vdots \\
        x_{n1} & \dots & x_{nd}
    \end{pmatrix} \text{.}
\end{equation}

The multivariate KDE $\hat{f}$ which defines the estimate pointwise at $\bm{x}=(x_1, \dots, x_d)^T$ is given as \cite[162]{scott2015}
\begin{equation}
\label{eq:mvKDE}
    \hat{f}(\bm{x}) = \frac{1}{W} \sum_{i=1}^{n} \frac{w_i}{h_1 \dots h_d} \left[  \prod_{j=1}^{d} K\left( \frac{x_j-x_{ij}}{h_j} \right)  \right]  \text{.}
\end{equation}
where the bandwidth is given as a vector $\bm{h}=(h_1, \dots, h_d)$.

Note that \eqref{eq:mvKDE} does not include all possible multivariate kernels, such as spherically symmetric kernels, which are based on rotation of a univariate kernel.
In general a multivariate product and spherically symmetric kernel based on the same univariate kernel will differ.
The only exception is the Gaussian kernel which is spherical symmetric and has independent marginals. % TODO scott cite?!
In addition, only smoothing in the direction of the axes are possible.
If smoothing in other directions is necessary, the computation needs to be done on a prerotated sample set and the estimate needs to be rotated back to fit the original coordinate system \cite{wand1994fast}.

For the multivariate BKDE, in addition to the kernel function the grid and the binning rules need to be extended to multivariate data.
\todo{Reicht hier text oder müssen Formeln her?}


In general multi-dimensional filters are multi-dimensional convolution operations.
However, by utilizing the separability property of convolution a straightforward and a more efficient implementation can be found.
Convolution is separable if the filter kernel is separable, i.e. it can be split into successive convolutions of several kernels.
Likewise digital filters based on such kernels are called separable filters.
They are easily applied to multi-dimensional signals, because the input signal can be filtered in each dimension separately by an one-dimensional filter.

The Gaussian filter is separable, because of $e^{x^2+y^2} = e^{x^2}\cdot e^{y^2}$.


% KDE:
%So far only the univariate case was considered.
%This is due to the fact, that univariate kernel estimators can quite easily be extended to multivariate distributions.
%A common approach is to apply an univariate kernel with a possibly different bandwidth in each dimension.
%These kind of multivariate kernel is called product kernel as the multivariate kernel result is the product of each individual univariate kernel.
%
%Given a multivariate random variable $X=(x_1,\dots ,x_d)$ in $d$ dimensions.
%The sample $\bm{X}$ is a $n\times d$ matrix defined as \cite[162]{scott2015}
%\begin{equation}
%    \bm{X}=
%    \begin{pmatrix}
%        X_1    \\
%        \vdots \\
%        X_n    \\
%    \end{pmatrix}
%    =
%    \begin{pmatrix}
%        x_{11} & \dots & x_{1d} \\
%        \vdots & \ddots & \vdots \\
%        x_{n1} & \dots & x_{nd}
%    \end{pmatrix} \text{.}
%\end{equation}
%
%The multivariate kernel density estimator $\hat{f}$ which defines the estimate pointwise at $\bm{x}=(x_1, \dots, x_d)^T$ is given as \cite[162]{scott2015}
%\begin{equation}
%    \hat{f}(\bm{x}) = \frac{1}{nh_1 \dots h_d} \sum_{i=1}^{n} \left[  \prod_{j=1}^{d} K\left( \frac{x_j-x_{ij}}{h_j} \right)  \right]  \text{.}
%\end{equation}
%where the bandwidth is given as a vector $\bm{h}=(h_1, \dots, h_d)$.

% Product kernel allows our method
% Spherically symmetric kernel not supported, but Gaussian kernel == product & spehrically symmetric
% smoothing not in the direction of the axes -> rotate data, kde, rotate back

%Multivariate Gauss-Kernel
%\begin{equation}
%K(u)=\frac{1}{(2\pi)^{d/2}} \expp{-\frac{1}{2} \bm{x}^T \bm{x}}
%\end{equation}

% Gaus:
%If the filter kernel is separable, the convolution is also separable i.e. multi-dimensional convolution can be computed as individual one-dimensional convolutions with a one-dimensional kernel.
%Because of $e^{x^2+y^2} = e^{x^2}\cdot e^{y^2}$ the Gaussian filter is separable and can be easily applied to multi-dimensional signals. \todo{quelle}


%wie benutzen wir das ganze jetzt? auf was muss ich achten?

% Am Beispiel 2D Daten
% Histogram erzeugen (== data binnen)
% Hierzu wird min/max benötigt
% Anschließend Filterung per Box Filter über das Histogram
% - Wenn möglich parallel (SIMD, GPU)
% - separiert in jeder dim einzeln
% Maximum aus Filter ergebnis nehmen


\subsection{Our method}
The objective of our method is to allow a reliable recover of the most probable state from a time-sequential Monte Carlo sensor fusion system.
Assuming a sample based representation, our method allows to estimate the density of the unknown distribution of the state space in a narrow time frame.
Such systems are often used to obtain an estimation of the most probable state in near real time.
As the density estimation poses only a single step in the whole process, its computation needs to be as fast as possible.
% not taking to much time from the frame

%Consider a set of two-dimensional samples, presumably generated from e.g. particle filter system.
Assuming that the generated samples are often stored in a sequential list, the first step is to create a grid representation.
In order to efficiently compute the grid and to allocate the required memory the extrema of the samples need to be known in advance.
These limits might be given by the application, for example, the position of a pedestrian within a building is limited by the physical dimensions of the building.
Such knowledge should be integrated into the system to avoid a linear search over the sample set, naturally reducing the computation time.

The second parameter to be defined by the application is the size of the grid, which can be set directly or defined in terms of bin sizes.
As the number of grid points directly affects both computation time and accuracy, a suitable grid should be as coarse as possible but at the same time narrow enough to produce an estimate sufficiently fast with an acceptable approximation error.

Given the extreme values of the samples and the number of grid points $G$, the computation of the grid has a linear complexity of \landau{N} where $N$ is the number of samples.
If the extreme values are unknown, an additional $\landau{N}$ search is required.
The grid is stored as an linear array in memory, thus its space complexity is $\landau{G}$.

Next, the binned data is filtered with a Gaussian using the box filter approximation.
The box filter width is derived from the standard deviation of the approximated Gaussian, which is in turn equal to the bandwidth of the KDE.
However, the bandwidth $h$ needs to be scaled according to the grid size.
This is necessary as $h$ is defined in the input space of the KDE, i.e. in relation to the sample data.
In contrast, the bandwidth of a BKDE is defined in the context of the binned data, which differs from the unbinned data due to the discretisation of the samples.
For this reason, $h$ needs to be divided by the bin size to account the discrepancy between the different sampling spaces.

Given the scaled bandwidth the required box filter width can be computed. % as in \eqref{label}
Due to its best runtime performance the recursive box filter implementation is used.
If multivariate data is processed, the algorithm is easily extended due to its separability.
Each filter pass is computed in $\landau{G}$ operations, however an additional memory buffer is required.

While the integer-sized box filter requires fewest operations, it causes a larger approximation error due to rounding errors.
Depending on the required accuracy the extended box filter algorithm can further improve the estimation results, with only a small additional overhead.
Due to its simple indexing scheme, the recursive box filter can easily be computed in parallel using SIMD operations or parallel computation cores.

Finally, the most likely state can be obtained from the filtered data, i.e. from the estimated discrete density, by searching filtered data for its maximum value.

\commentByToni{An sich ganz cooles Kapitel, aber wir müssen den Bezug nach oben stärker aufbauen. Also die Formeln zitieren. irgendwie halt nach oben referenzieren, damit niemand abgehängt wird.

Würde es Sinn machen das obere irgendwie Algorithmisch darzustellen? Also mit Pseudocode? Weil irgendwie/wo müssen wir ja "DAS IST UNSER APPROACH" stehen haben}.