Intro & related work

This commit is contained in:
2018-02-13 17:30:32 +01:00
parent 4aa3ff5e30
commit 7b82b8247c
4 changed files with 122 additions and 6 deletions

View File

@@ -1,2 +1,27 @@
\section{Experiments}
We now empirically evaluate the accuracy of our method and compare its runtime performance with other state of the art approaches.
To conclude our findings we present a real world example from a indoor localisation system.
All tests are performed on a Intel Core \mbox{i5-7600K} CPU with a frequency of $4.5 \text{GHz}$, which supports the AVX2 instruction set, hence 256-bit wide SIMD registers are available.
We compare our C++ implementation of the box filter based KDE to the KernSmooth R package and the \qq{FastKDE} implementation \cite{fastKDE}.
The KernSmooth packages provides a FFT-based BKDE implementation based on optimized C functions at its core.
\subsection{Error}
In order to quantity the accuracy of our method the mean integrated squared error (MISE) is used.
The ground truth is given as a synthetic data set drawn from a mixture normal density.
Clearly, the choice of the ground truth distribution affects the resulting error.
However, as our method approximates the KDE it is only of interest to evaluate the closeness to the KDE and not to the ground truth itself.
Therefore, the particular choice of the ground truth is only of minor importance here.
At first we evaluate the accuracy of our method as a function of the bandwidth $h$ in comparison to the exact KDE and the BKDE.
% kde, box filter, exbox in abhänigkeit von h (bild)
% sample size und grid size text
% fastKDE fehler vergleich macht kein sinn weil kernel und bandbreite unterschiedlich sind
\subsection{Performance}
\subsection{Real World}

View File

@@ -28,14 +28,19 @@ We formalize this ...
Our experiments support our ..
}
In this paper, a novel approximation approach for rapid computation of the KDE is presented.
%Therefore, this paper presents a novel approximation approach for rapid computation of the KDE.
%In this paper, a well known approximation of the Gaussian filter is used to speed up the computation of the KDE.
In this paper, a novel approximation approach for rapid computation of the KDE is presented.
The basic idea is to interpret the estimation problem as a filtering operation.
We show that computing the KDE with a Gaussian kernel on pre-binned data is equal to applying a Gaussian filter on the binned data.
This allows us to use a well known approximation scheme for Gaussian filters using the box filter.
Multiple recursion of a box filter yields an approximative Gaussian filter \cite{kovesi2010fast}.
This process converges quite fast to a reasonable close approximation of the ideal Gaussian.
In addition, a box filter can be computed extremely fast by a computer, due to its intrinsic simplicity.
While the idea to use several box filter passes to approximate a Gaussian has been around for a long, the application to obtain a fast KDE is new.
% time sequential, fixed computation time, pre binned data!!
% KDE wellknown nonparametic estimation method
% Flexibility is paid with slow speed

View File

@@ -1,5 +1,47 @@
\section{Related work}
% original work rosenblatt/parzen
% langsam
% other approaches Fast Gaussian Transform
% binned version silverman, scott, härdle
% -> Fourier transfom
% other approaches Fast Gaussian Transform
Kernel density estimation is well known non-parametric estimator, originally described independently by Rosenblatt \cite{rosenblatt1956remarks} and Parzen \cite{parzen1962estimation}.
It was subject to extensive research and its theoretical properties are well understood.
A comprehensive reference is given by Scott \cite{scott2015}.
Although classified as non-parametric, the KDE has a two free parameters, the kernel function and its bandwidth.
The selection of a \qq{good} bandwidth is still an open problem and heavily researched.
However, the automatic selection of the bandwidth is not subject of this work and we refer to the literature \cite{turlach1993bandwidth}.
The great flexibility of the KDE renders it very useful for many applications.
However, its flexibility comes at the cost of a relative slow computation speed.
The complexity of a naive implementation of the KDE is \landau{NM} evaluations of the kernel function, given $N$ data samples and $M$ points of the estimate.
Therefore, a lot of effort was put into reducing the computation time of the KDE.
Various methods have been proposed, which can be clustered based on different techniques.
% k-nearest neighbor searching
An obvious way to speed up the computation is to reduce the number of evaluated kernel functions.
One possible optimization is based on k-nearest neighbour search performed on spatial data structures.
These algorithms reduce the number of evaluated kernels by taking the the spatial distance between clusters of data points into account \cite{gray2003nonparametric}.
% fast multipole method & Fast Gaus Transform
Another approach is to reduce the algorithmic complexity of the sum over Gaussian functions, by employing a specialized variant of the fast multipole method.
The term fast Gauss transform was coined by Greengard \cite{greengard1991fast} who suggested this approach to reduce the complexity of the KDE to \label{N+M}.
% However, the complexity grows exponentially with dimension. \cite{Improved Fast Gauss Transform and Efficient Kernel Density Estimation}
% FastKDE, passed on ECF and nuFFT
Recent methods based on the \qq{self-consistent} KDE proposed by Bernacchia and Pigolotti allow to obtain an estimate without any assumptions.
They define a Fourier-based filter on the empirical characteristic function of a given dataset.
The computation time was further reduced by \etal{O'Brien} using a non-uniform FFT algorithm to efficiently transform the data into Fourier space.
Therefore, the data is not required to be on a grid.
% binning => FFT
In general, it is desirable to omit a grid, as the data points do not necessary fall onto equally spaced points.
However, reducing the sample size by distributing the data on a equidistant grid can significantly reduce the computation time, if an approximative KDE is acceptable.
Silverman \cite{silverman1982algorithm} originally suggested to combine adjacent data points into data bins and apply a FFT to quickly compute the estimate.
This approximation scheme was later called binned KDE an was extensively studied \cite{fan1994fast} \cite{wand1994fast} \cite{hall1996accuracy} \cite{holmstrom2000accuracy}.
The idea to approximate a Gaussian filter using several box filters was first formulated by Wells \cite{wells1986efficient}.
Kovesi \cite{kovesi2010fast} suggested to use two box filter with different widths to increase accuracy maintaining the same complexity.
To eliminate the approximation error completely \etal{Gwosdek} \cite{gwosdek2011theoretical} proposed a new approach called extended box filter.

View File

@@ -2890,4 +2890,48 @@ year = {2003}
}
@inproceedings{kovesi2010fast,
title={Fast almost-gaussian filtering},
author={Kovesi, Peter},
booktitle={Proceedings of the 2010 International Conference on Digital Image Computing: Techniques and Applications},
pages={121--125},
year={2010},
publisher={IEEE}
}
@book{turlach1993bandwidth,
title={Bandwidth selection in kernel density estimation: A review},
author={Turlach, Berwin A.},
year={1993},
publisher={CORE and Institut de Statistique Universit{\'e} catholique de Louvain Louvain-la-Neuve}
}
@inproceedings{gray2003nonparametric,
title={Nonparametric density estimation: Toward computational tractability},
author={Gray, Alexander G and Moore, Andrew W},
booktitle={Proceedings of the 2003 SIAM International Conference on Data Mining},
pages={203--211},
year={2003},
organization={SIAM}
}
@article{greengard1991fast,
title={The fast Gauss transform},
author={Greengard, Leslie and Strain, John},
journal={SIAM Journal on Scientific and Statistical Computing},
volume={12},
number={1},
pages={79--94},
year={1991},
publisher={SIAM}
}
@article{wells1986efficient,
title={Efficient synthesis of Gaussian filters by cascaded uniform filters},
author={Wells, William M.},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
number={2},
pages={234--239},
year={1986},
publisher={IEEE}
}