Intro & related work

2018-02-13 17:30:32 +01:00
parent 4aa3ff5e30
commit 7b82b8247c
4 changed files with 122 additions and 6 deletions
--- a/tex/chapters/relatedwork.tex
+++ b/tex/chapters/relatedwork.tex
@@ -1,5 +1,47 @@
 \section{Related work}
 % original work rosenblatt/parzen
+% langsam
+% other approaches Fast Gaussian Transform
 % binned version silverman, scott, härdle
 % -> Fourier transfom
-% other approaches Fast Gaussian Transform
+
+
+Kernel density estimation is well known non-parametric estimator, originally described independently by Rosenblatt \cite{rosenblatt1956remarks} and Parzen \cite{parzen1962estimation}.
+It was subject to extensive research and its theoretical properties are well understood.
+A comprehensive reference is given by Scott \cite{scott2015}.
+Although classified as non-parametric, the KDE has a two free parameters, the kernel function and its bandwidth.
+The selection of a \qq{good} bandwidth is still an open problem and heavily researched.
+However, the automatic selection of the bandwidth is not subject of this work and we refer to the literature \cite{turlach1993bandwidth}.
+
+The great flexibility of the KDE renders it very useful for many applications.
+However, its flexibility comes at the cost of a relative slow computation speed.
+The complexity of a naive implementation of the KDE is \landau{NM} evaluations of the kernel function, given $N$ data samples and $M$ points of the estimate.
+Therefore, a lot of effort was put into reducing the computation time of the KDE.
+Various methods have been proposed, which can be clustered based on different techniques.
+
+%  k-nearest neighbor searching
+An obvious way to speed up the computation is to reduce the number of evaluated kernel functions.
+One possible optimization is based on k-nearest neighbour search performed on spatial data structures.
+These algorithms reduce the number of evaluated kernels by taking the the spatial distance between clusters of data points into account \cite{gray2003nonparametric}.
+
+%  fast multipole method & Fast Gaus Transform 
+Another approach is to reduce the algorithmic complexity of the sum over Gaussian functions, by employing a specialized variant of the fast multipole method.
+The term fast Gauss transform was coined by Greengard \cite{greengard1991fast} who suggested this approach to reduce the complexity of the KDE to \label{N+M}.
+% However, the complexity grows exponentially with dimension. \cite{Improved Fast Gauss Transform and Efficient Kernel Density Estimation}
+
+% FastKDE, passed on ECF and nuFFT
+Recent methods based on the \qq{self-consistent} KDE proposed by Bernacchia and Pigolotti allow to obtain an estimate without any assumptions.
+They define a Fourier-based filter on the empirical characteristic function of a given dataset.
+The computation time was further reduced by \etal{O'Brien} using a non-uniform FFT algorithm to efficiently transform the data into Fourier space.
+Therefore, the data is not required to be on a grid.
+
+% binning => FFT
+In general, it is desirable to omit a grid, as the data points do not necessary fall onto equally spaced points.
+However, reducing the sample size by distributing the data on a equidistant grid can significantly reduce the computation time, if an approximative KDE is acceptable.
+Silverman \cite{silverman1982algorithm} originally suggested to combine adjacent data points into data bins and apply a FFT to quickly compute the estimate.
+This approximation scheme was later called binned KDE an was extensively studied \cite{fan1994fast} \cite{wand1994fast} \cite{hall1996accuracy} \cite{holmstrom2000accuracy}.
+
+The idea to approximate a Gaussian filter using several box filters was first formulated by Wells \cite{wells1986efficient}.
+Kovesi \cite{kovesi2010fast} suggested to use two box filter with different widths to increase accuracy maintaining the same complexity.
+To eliminate the approximation error completely \etal{Gwosdek} \cite{gwosdek2011theoretical} proposed a new approach called extended box filter.
+