99 lines
6.1 KiB
TeX
99 lines
6.1 KiB
TeX
\section{Usage}
|
|
|
|
\subsection{Extension to multi-dimensional data}
|
|
\todo{Absatz zum Thema 2D - Extension to multi-dimensional data}
|
|
|
|
% KDE:
|
|
%So far only the univariate case was considered.
|
|
%This is due to the fact, that univariate kernel estimators can quite easily be extended to multivariate distributions.
|
|
%A common approach is to apply an univariate kernel with a possibly different bandwidth in each dimension.
|
|
%These kind of multivariate kernel is called product kernel as the multivariate kernel result is the product of each individual univariate kernel.
|
|
%
|
|
%Given a multivariate random variable $X=(x_1,\dots ,x_d)$ in $d$ dimensions.
|
|
%The sample $\bm{X}$ is a $n\times d$ matrix defined as \cite[162]{scott2015}
|
|
%\begin{equation}
|
|
% \bm{X}=
|
|
% \begin{pmatrix}
|
|
% X_1 \\
|
|
% \vdots \\
|
|
% X_n \\
|
|
% \end{pmatrix}
|
|
% =
|
|
% \begin{pmatrix}
|
|
% x_{11} & \dots & x_{1d} \\
|
|
% \vdots & \ddots & \vdots \\
|
|
% x_{n1} & \dots & x_{nd}
|
|
% \end{pmatrix} \text{.}
|
|
%\end{equation}
|
|
%
|
|
%The multivariate kernel density estimator $\hat{f}$ which defines the estimate pointwise at $\bm{x}=(x_1, \dots, x_d)^T$ is given as \cite[162]{scott2015}
|
|
%\begin{equation}
|
|
% \hat{f}(\bm{x}) = \frac{1}{nh_1 \dots h_d} \sum_{i=1}^{n} \left[ \prod_{j=1}^{d} K\left( \frac{x_j-x_{ij}}{h_j} \right) \right] \text{.}
|
|
%\end{equation}
|
|
%where the bandwidth is given as a vector $\bm{h}=(h_1, \dots, h_d)$.
|
|
|
|
%Multivariate Gauss-Kernel
|
|
%\begin{equation}
|
|
%K(u)=\frac{1}{(2\pi)^{d/2}} \expp{-\frac{1}{2} \bm{x}^T \bm{x}}
|
|
%\end{equation}
|
|
|
|
% Gaus:
|
|
%If the filter kernel is separable, the convolution is also separable i.e. multi-dimensional convolution can be computed as individual one-dimensional convolutions with a one-dimensional kernel.
|
|
%Because of $e^{x^2+y^2} = e^{x^2}\cdot e^{y^2}$ the Gaussian filter is separable and can be easily applied to multi-dimensional signals. \todo{quelle}
|
|
|
|
|
|
%wie benutzen wir das ganze jetzt? auf was muss ich achten?
|
|
|
|
% Am Beispiel 2D Daten
|
|
% Histogram erzeugen (== data binnen)
|
|
% Hierzu wird min/max benötigt
|
|
% Anschließend Filterung per Box Filter über das Histogram
|
|
% - Wenn möglich parallel (SIMD, GPU)
|
|
% - separiert in jeder dim einzeln
|
|
% Maximum aus Filter ergebnis nehmen
|
|
|
|
|
|
|
|
|
|
The objective of our method is to allow a reliable recover of the most probable state from a time-sequential Monte Carlo sensor fusion system.
|
|
Assuming a sample based representation, our method allows to estimate the density of the unknown distribution of the state space in a narrow time frame.
|
|
Such systems are often used to obtain an estimation of the most probable state in near real time.
|
|
As the density estimation poses only a single step in the whole process, its computation needs to be as fast as possible.
|
|
% not taking to much time from the frame
|
|
|
|
%Consider a set of two-dimensional samples, presumably generated from e.g. particle filter system.
|
|
Assuming that the generated samples are often stored in a sequential list, the first step is to create a grid representation.
|
|
In order to efficiently compute the grid and to allocate the required memory the extrema of the samples need to be known in advance.
|
|
These limits might be given by the application, for example, the position of a pedestrian within a building is limited by the physical dimensions of the building.
|
|
Such knowledge should be integrated into the system to avoid a linear search over the sample set, naturally reducing the computation time.
|
|
|
|
The second parameter to be defined by the application is the size of the grid, which can be set directly or defined in terms of bin sizes.
|
|
As the number of grid points directly affects both computation time and accuracy, a suitable grid should be as coarse as possible but at the same time narrow enough to produce an estimate sufficiently fast with an acceptable approximation error.
|
|
|
|
Given the extreme values of the samples and the number of grid points $G$, the computation of the grid has a linear complexity of \landau{N} where $N$ is the number of samples.
|
|
If the extreme values are unknown, an additional $\landau{N}$ search is required.
|
|
The grid is stored as an linear array in memory, thus its space complexity is $\landau{G}$.
|
|
|
|
Next, the binned data is filtered with a Gaussian using the box filter approximation.
|
|
The box filter width is derived from the standard deviation of the approximated Gaussian, which is in turn equal to the bandwidth of the KDE.
|
|
However, the bandwidth $h$ needs to be scaled according to the grid size.
|
|
This is necessary as $h$ is defined in the input space of the KDE, i.e. in relation to the sample data.
|
|
In contrast, the bandwidth of a BKDE is defined in the context of the binned data, which differs from the unbinned data due to the discretisation of the samples.
|
|
For this reason, $h$ needs to be divided by the bin size to account the discrepancy between the different sampling spaces.
|
|
|
|
Given the scaled bandwidth the required box filter width can be computed. % as in \eqref{label}
|
|
Due to its best runtime performance the recursive box filter implementation is used.
|
|
If multivariate data is processed, the algorithm is easily extended due to its separability.
|
|
Each filter pass is computed in $\landau{G}$ operations, however an additional memory buffer is required.
|
|
|
|
While the integer-sized box filter requires fewest operations, it causes a larger approximation error due to rounding errors.
|
|
Depending on the required accuracy the extended box filter algorithm can further improve the estimation results, with only a small additional overhead.
|
|
Due to its simple indexing scheme, the recursive box filter can easily be computed in parallel using SIMD operations or parallel computation cores.
|
|
|
|
Finally, the most likely state can be obtained from the filtered data, i.e. from the estimated discrete density, by searching filtered data for its maximum value.
|
|
|
|
\commentByToni{An sich ganz cooles Kapitel, aber wir müssen den Bezug nach oben stärker aufbauen. Also die Formeln zitieren. irgendwie halt nach oben referenzieren, damit niemand abgehängt wird.
|
|
|
|
Würde es Sinn machen das obere irgendwie Algorithmisch darzustellen? Also mit Pseudocode? Weil irgendwie/wo müssen wir ja "DAS IST UNSER APPROACH" stehen haben}.
|
|
|