Box & boxKDE algos + notation fixes

This commit is contained in:
2018-02-20 14:08:58 +01:00
parent e49c7a1cbf
commit 0bbc43e269
4 changed files with 78 additions and 39 deletions

View File

@@ -1,22 +1,60 @@
\section{Usage}
The objective of our method is to allow a reliable recover of the most probable state from a time-sequential Monte Carlo sensor fusion system.
Assuming a sample based representation, our method allows to estimate the density of the unknown distribution of the state space in a narrow time frame.
Such systems are often used to obtain an estimation of the most probable state in near real time.
As the density estimation poses only a single step in the whole process, its computation needs to be as fast as possible.
%The objective of our method is to allow a reliable recover of the most probable state from a time-sequential Monte Carlo sensor fusion system.
%Assuming a sample based representation, our method allows to estimate the density of the unknown distribution of the state space in a narrow time frame.
%Such systems are often used to obtain an estimation of the most probable state in near real time.
%As the density estimation poses only a single step in the whole process, its computation needs to be as fast as possible.
% not taking to much time from the frame
%Consider a set of two-dimensional samples, presumably generated from e.g. particle filter system.
Assuming that the generated samples are often stored in a sequential list, the first step is to create a grid representation.
In order to efficiently compute the grid and to allocate the required memory the extrema of the samples need to be known in advance.
\begin{algorithm}[ht]
\caption{Bivariate \textsc{boxKDE}}
\label{alg:boxKDE}
\begin{algorithmic}[1]
\Statex \textbf{Input:} Samples $\bm{X}_1, \dots, \bm{X}_N$ and weights $w_1, \dots, w_N$
\Statex \textbf{Output:} Approximative density estimate $\hat{f}$ on $G_1 \times G_2$
\Statex
\For{$i=1 \textbf{ to } N$} \Comment{Data binning}
\State Find the $4$ nearest grid points to $\bm{X}_i$
\State Compute bin count $C_{i,j}$ as recommended by \cite{wand1994fast}
\EndFor
\Statex
\State $\tilde{\bm{h}} := \bm{\delta}^{-1} \bm{h}$ \Comment{Scaled bandwidth}
\State $\bm{L} := \floor{\sqrt{12\tilde{\bm{h}}^2n^{-1}+\bm{1}}}$ \Comment{\eqref{eq:boxidealwidth}}
% \State $l := \floor{(L-1)/2}$
\Statex
%\For{$1 \textbf{ to } n$}
\Loop{ $n$ \textbf{times}} \Comment{$n$ box filter iterations}
\For{$ i=1 \textbf{ to } G_1$}
\State Compute $\hat{f}_{i,1:G_2} \gets B_{L_2} * C_{i,1:G_2}$ \Comment{Alg. \ref{alg:naiveboxalgo}}
\EndFor
\For{$ j=1 \textbf{ to } G_2$}
\State Compute $\hat{f}_{1:G_1,j} \gets B_{L_1} * C_{1:G_1,j}$ \Comment{Alg. \ref{alg:naiveboxalgo}}
\EndFor
\EndLoop
\end{algorithmic}
\end{algorithm}
Consider a set of two-dimensional samples with associated weights, e.g. presumably generated from a particle filter system.
The overall process for bivariate data is described in Algorithm~\ref{alg:boxKDE}.
Assuming that the given $N$ samples are stored in a sequential list, the first step is to create a grid representation.
In order to efficiently construct the grid and to allocate the required memory the extrema of the samples need to be known in advance.
These limits might be given by the application, for example, the position of a pedestrian within a building is limited by the physical dimensions of the building.
Such knowledge should be integrated into the system to avoid a linear search over the sample set, naturally reducing the computation time.
The second parameter to be defined by the application is the size of the grid, which can be set directly or defined in terms of bin sizes.
Given the extreme values of the samples and grid sizes $G_1$ and $G_2$ defined by the user, a $G_1\times G_2$ grid can be constructed, using a binning rule from \eqref{eq:simpleBinning} or \eqref{eq:linearBinning}.
As the number of grid points directly affects both computation time and accuracy, a suitable grid should be as coarse as possible but at the same time narrow enough to produce an estimate sufficiently fast with an acceptable approximation error.
Given the extreme values of the samples and the number of grid points $G$, the computation of the grid has a linear complexity of \landau{N} where $N$ is the number of samples.
If the extreme values are unknown, an additional $\landau{N}$ search is required.
The grid is stored as an linear array in memory, thus its space complexity is $\landau{G}$.
If the extreme values are known in advanced, the computation of the grid is $\landau{N}$, otherwise an additional $\landau{N}$ search is required.
The grid is stored as an linear array in memory, thus its space complexity is $\landau{G_1\cdot G_2}$.
Next, the binned data is filtered with a Gaussian using the box filter approximation.
The box filter width is derived from the standard deviation of the approximated Gaussian, which is in turn equal to the bandwidth of the KDE.
@@ -28,7 +66,7 @@ For this reason, $h$ needs to be divided by the bin size to account the discrepa
Given the scaled bandwidth the required box filter width can be computed. % as in \eqref{label}
Due to its best runtime performance the recursive box filter implementation is used.
If multivariate data is processed, the algorithm is easily extended due to its separability.
Each filter pass is computed in $\landau{G}$ operations, however an additional memory buffer is required.
Each filter pass is computed in $\landau{G}$ operations, however, an additional memory buffer is required.
While the integer-sized box filter requires fewest operations, it causes a larger approximation error due to rounding errors.
Depending on the required accuracy the extended box filter algorithm can further improve the estimation results, with only a small additional overhead.
@@ -40,4 +78,3 @@ Finally, the most likely state can be obtained from the filtered data, i.e. from
Würde es Sinn machen das obere irgendwie Algorithmisch darzustellen? Also mit Pseudocode? Weil irgendwie/wo müssen wir ja "DAS IST UNSER APPROACH" stehen haben}.