From 7b82b8247cdfac9ae30ea1e0819e450e2b7a28e8 Mon Sep 17 00:00:00 2001 From: Markus Bullmann Date: Tue, 13 Feb 2018 17:30:32 +0100 Subject: [PATCH] Intro & related work --- tex/chapters/experiments.tex | 25 ++++++++++++++++++++ tex/chapters/introduction.tex | 15 ++++++++---- tex/chapters/relatedwork.tex | 44 ++++++++++++++++++++++++++++++++++- tex/egbib.bib | 44 +++++++++++++++++++++++++++++++++++ 4 files changed, 122 insertions(+), 6 deletions(-) diff --git a/tex/chapters/experiments.tex b/tex/chapters/experiments.tex index 14d2650..d8712cd 100644 --- a/tex/chapters/experiments.tex +++ b/tex/chapters/experiments.tex @@ -1,2 +1,27 @@ \section{Experiments} +We now empirically evaluate the accuracy of our method and compare its runtime performance with other state of the art approaches. +To conclude our findings we present a real world example from a indoor localisation system. +All tests are performed on a Intel Core \mbox{i5-7600K} CPU with a frequency of $4.5 \text{GHz}$, which supports the AVX2 instruction set, hence 256-bit wide SIMD registers are available. +We compare our C++ implementation of the box filter based KDE to the KernSmooth R package and the \qq{FastKDE} implementation \cite{fastKDE}. +The KernSmooth packages provides a FFT-based BKDE implementation based on optimized C functions at its core. + +\subsection{Error} +In order to quantity the accuracy of our method the mean integrated squared error (MISE) is used. +The ground truth is given as a synthetic data set drawn from a mixture normal density. +Clearly, the choice of the ground truth distribution affects the resulting error. +However, as our method approximates the KDE it is only of interest to evaluate the closeness to the KDE and not to the ground truth itself. +Therefore, the particular choice of the ground truth is only of minor importance here. + +At first we evaluate the accuracy of our method as a function of the bandwidth $h$ in comparison to the exact KDE and the BKDE. + + + +% kde, box filter, exbox in abhänigkeit von h (bild) +% sample size und grid size text +% fastKDE fehler vergleich macht kein sinn weil kernel und bandbreite unterschiedlich sind + + +\subsection{Performance} + +\subsection{Real World} diff --git a/tex/chapters/introduction.tex b/tex/chapters/introduction.tex index 429a531..51e8acf 100644 --- a/tex/chapters/introduction.tex +++ b/tex/chapters/introduction.tex @@ -28,14 +28,19 @@ We formalize this ... Our experiments support our .. } -In this paper, a novel approximation approach for rapid computation of the KDE is presented. + %Therefore, this paper presents a novel approximation approach for rapid computation of the KDE. %In this paper, a well known approximation of the Gaussian filter is used to speed up the computation of the KDE. +In this paper, a novel approximation approach for rapid computation of the KDE is presented. +The basic idea is to interpret the estimation problem as a filtering operation. +We show that computing the KDE with a Gaussian kernel on pre-binned data is equal to applying a Gaussian filter on the binned data. +This allows us to use a well known approximation scheme for Gaussian filters using the box filter. +Multiple recursion of a box filter yields an approximative Gaussian filter \cite{kovesi2010fast}. - - - - +This process converges quite fast to a reasonable close approximation of the ideal Gaussian. +In addition, a box filter can be computed extremely fast by a computer, due to its intrinsic simplicity. +While the idea to use several box filter passes to approximate a Gaussian has been around for a long, the application to obtain a fast KDE is new. +% time sequential, fixed computation time, pre binned data!! % KDE wellknown nonparametic estimation method % Flexibility is paid with slow speed diff --git a/tex/chapters/relatedwork.tex b/tex/chapters/relatedwork.tex index 41b3d19..2316419 100644 --- a/tex/chapters/relatedwork.tex +++ b/tex/chapters/relatedwork.tex @@ -1,5 +1,47 @@ \section{Related work} % original work rosenblatt/parzen +% langsam +% other approaches Fast Gaussian Transform % binned version silverman, scott, härdle % -> Fourier transfom -% other approaches Fast Gaussian Transform + + +Kernel density estimation is well known non-parametric estimator, originally described independently by Rosenblatt \cite{rosenblatt1956remarks} and Parzen \cite{parzen1962estimation}. +It was subject to extensive research and its theoretical properties are well understood. +A comprehensive reference is given by Scott \cite{scott2015}. +Although classified as non-parametric, the KDE has a two free parameters, the kernel function and its bandwidth. +The selection of a \qq{good} bandwidth is still an open problem and heavily researched. +However, the automatic selection of the bandwidth is not subject of this work and we refer to the literature \cite{turlach1993bandwidth}. + +The great flexibility of the KDE renders it very useful for many applications. +However, its flexibility comes at the cost of a relative slow computation speed. +The complexity of a naive implementation of the KDE is \landau{NM} evaluations of the kernel function, given $N$ data samples and $M$ points of the estimate. +Therefore, a lot of effort was put into reducing the computation time of the KDE. +Various methods have been proposed, which can be clustered based on different techniques. + +% k-nearest neighbor searching +An obvious way to speed up the computation is to reduce the number of evaluated kernel functions. +One possible optimization is based on k-nearest neighbour search performed on spatial data structures. +These algorithms reduce the number of evaluated kernels by taking the the spatial distance between clusters of data points into account \cite{gray2003nonparametric}. + +% fast multipole method & Fast Gaus Transform +Another approach is to reduce the algorithmic complexity of the sum over Gaussian functions, by employing a specialized variant of the fast multipole method. +The term fast Gauss transform was coined by Greengard \cite{greengard1991fast} who suggested this approach to reduce the complexity of the KDE to \label{N+M}. +% However, the complexity grows exponentially with dimension. \cite{Improved Fast Gauss Transform and Efficient Kernel Density Estimation} + +% FastKDE, passed on ECF and nuFFT +Recent methods based on the \qq{self-consistent} KDE proposed by Bernacchia and Pigolotti allow to obtain an estimate without any assumptions. +They define a Fourier-based filter on the empirical characteristic function of a given dataset. +The computation time was further reduced by \etal{O'Brien} using a non-uniform FFT algorithm to efficiently transform the data into Fourier space. +Therefore, the data is not required to be on a grid. + +% binning => FFT +In general, it is desirable to omit a grid, as the data points do not necessary fall onto equally spaced points. +However, reducing the sample size by distributing the data on a equidistant grid can significantly reduce the computation time, if an approximative KDE is acceptable. +Silverman \cite{silverman1982algorithm} originally suggested to combine adjacent data points into data bins and apply a FFT to quickly compute the estimate. +This approximation scheme was later called binned KDE an was extensively studied \cite{fan1994fast} \cite{wand1994fast} \cite{hall1996accuracy} \cite{holmstrom2000accuracy}. + +The idea to approximate a Gaussian filter using several box filters was first formulated by Wells \cite{wells1986efficient}. +Kovesi \cite{kovesi2010fast} suggested to use two box filter with different widths to increase accuracy maintaining the same complexity. +To eliminate the approximation error completely \etal{Gwosdek} \cite{gwosdek2011theoretical} proposed a new approach called extended box filter. + diff --git a/tex/egbib.bib b/tex/egbib.bib index 938d8f5..d8b58ad 100644 --- a/tex/egbib.bib +++ b/tex/egbib.bib @@ -2890,4 +2890,48 @@ year = {2003} } +@inproceedings{kovesi2010fast, + title={Fast almost-gaussian filtering}, + author={Kovesi, Peter}, + booktitle={Proceedings of the 2010 International Conference on Digital Image Computing: Techniques and Applications}, + pages={121--125}, + year={2010}, + publisher={IEEE} +} +@book{turlach1993bandwidth, + title={Bandwidth selection in kernel density estimation: A review}, + author={Turlach, Berwin A.}, + year={1993}, + publisher={CORE and Institut de Statistique Universit{\'e} catholique de Louvain Louvain-la-Neuve} +} + +@inproceedings{gray2003nonparametric, + title={Nonparametric density estimation: Toward computational tractability}, + author={Gray, Alexander G and Moore, Andrew W}, + booktitle={Proceedings of the 2003 SIAM International Conference on Data Mining}, + pages={203--211}, + year={2003}, + organization={SIAM} +} + +@article{greengard1991fast, + title={The fast Gauss transform}, + author={Greengard, Leslie and Strain, John}, + journal={SIAM Journal on Scientific and Statistical Computing}, + volume={12}, + number={1}, + pages={79--94}, + year={1991}, + publisher={SIAM} +} + +@article{wells1986efficient, + title={Efficient synthesis of Gaussian filters by cascaded uniform filters}, + author={Wells, William M.}, + journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, + number={2}, + pages={234--239}, + year={1986}, + publisher={IEEE} +} \ No newline at end of file