first draft introduction finished

first draft related work finished
This commit is contained in:
toni
2018-02-14 18:16:29 +01:00
parent a5fc1628e6
commit df18cc87ee
3 changed files with 42 additions and 22 deletions

View File

@@ -9,13 +9,16 @@
Kernel density estimation is well known non-parametric estimator, originally described independently by Rosenblatt \cite{rosenblatt1956remarks} and Parzen \cite{parzen1962estimation}.
It was subject to extensive research and its theoretical properties are well understood.
A comprehensive reference is given by Scott \cite{scott2015}.
Although classified as non-parametric, the KDE has a two free parameters, the kernel function and its bandwidth.
Although classified as non-parametric, the KDE depends on two free parameters, the kernel function and its bandwidth.
The selection of a \qq{good} bandwidth is still an open problem and heavily researched.
However, the automatic selection of the bandwidth is not subject of this work and we refer to the literature \cite{turlach1993bandwidth}.
An extensive overview regarding the topic of automatic bandwith selection is given by \cite{heidenreich2013bandwith}.
%However, the automatic selection of the bandwidth is not subject of this work and we refer to the literature \cite{turlach1993bandwidth}.
The great flexibility of the KDE renders it very useful for many applications.
However, its flexibility comes at the cost of a relative slow computation speed.
The complexity of a naive implementation of the KDE is \landau{NM} evaluations of the kernel function, given $N$ data samples and $M$ points of the estimate.
However, this comes at the cost of a relative slow computation speed.
%
The complexity of a naive implementation of the KDE is \landau{MN}, given by $M$ evaluations of $N$ data samples.
%The complexity of a naive implementation of the KDE is \landau{NM} evaluations of the kernel function, given $N$ data samples and $M$ points of the estimate.
Therefore, a lot of effort was put into reducing the computation time of the KDE.
Various methods have been proposed, which can be clustered based on different techniques.
@@ -32,16 +35,26 @@ The term fast Gauss transform was coined by Greengard \cite{greengard1991fast} w
% FastKDE, passed on ECF and nuFFT
Recent methods based on the \qq{self-consistent} KDE proposed by Bernacchia and Pigolotti allow to obtain an estimate without any assumptions.
They define a Fourier-based filter on the empirical characteristic function of a given dataset.
The computation time was further reduced by \etal{O'Brien} using a non-uniform FFT algorithm to efficiently transform the data into Fourier space.
The computation time was further reduced by \etal{O'Brien} using a non-uniform fast Fourier transform (FFT) algorithm to efficiently transform the data into Fourier space.
Therefore, the data is not required to be on a grid.
% binning => FFT
In general, it is desirable to omit a grid, as the data points do not necessary fall onto equally spaced points.
However, reducing the sample size by distributing the data on a equidistant grid can significantly reduce the computation time, if an approximative KDE is acceptable.
Silverman \cite{silverman1982algorithm} originally suggested to combine adjacent data points into data bins and apply a FFT to quickly compute the estimate.
This approximation scheme was later called binned KDE an was extensively studied \cite{fan1994fast} \cite{wand1994fast} \cite{hall1996accuracy} \cite{holmstrom2000accuracy}.
Silverman \cite{silverman1982algorithm} originally suggested to combine adjacent data points into data bins, which results in a discrete convolution structure of the KDE.
Allowing to efficiently compute the estimate using a FFT algorithm.
This approximation scheme was later called binned KDE (BKDE) and was extensively studied \cite{fan1994fast} \cite{wand1994fast} \cite{hall1996accuracy} \cite{holmstrom2000accuracy}.
The idea to approximate a Gaussian filter using several box filters was first formulated by Wells \cite{wells1986efficient}.
Kovesi \cite{kovesi2010fast} suggested to use two box filter with different widths to increase accuracy maintaining the same complexity.
Kovesi \cite{kovesi2010fast} suggested to use two box filters with different widths to increase accuracy maintaining the same complexity.
To eliminate the approximation error completely \etal{Gwosdek} \cite{gwosdek2011theoretical} proposed a new approach called extended box filter.
This work highlights the discrete convolution structure of the BKDE and elaborates its connection to digital signal processing, especially the Gaussian filter.
Accordingly, this results in an equivalence relation between BKDE and Gaussian filter.
It follows, that the above mentioned box filter approach is also an approximation of the BKDE, resulting in an efficient computation scheme presented within this paper.
This approach has a lower complexity as comparable FFT-based algorithms and adds only a negligible small error, while improving the performance significantly.