This repository has been archived on 2020-04-08. You can view files and clone it, but cannot push or open issues or pull requests.
Files
Fusion2018/tex/chapters/relatedwork.tex
2018-03-12 22:21:39 +01:00

61 lines
4.7 KiB
TeX

\section{Related work}
% original work rosenblatt/parzen
% langsam
% other approaches Fast Gaussian Transform
% binned version silverman, scott, härdle
% -> Fourier transfom
The Kernel density estimator is a well known non-parametric estimator, originally described independently by Rosenblatt \cite{rosenblatt1956remarks} and Parzen \cite{parzen1962estimation}.
It was subject to extensive research and its theoretical properties are well understood.
A comprehensive reference is given by Scott \cite{scott2015}.
Although classified as non-parametric, the KDE depends on two free parameters, the kernel function and its bandwidth.
The selection of a \qq{good} bandwidth is still an open problem and heavily researched.
An extensive overview regarding the topic of automatic bandwith selection is given by \cite{heidenreich2013bandwidth}.
%However, the automatic selection of the bandwidth is not subject of this work and we refer to the literature \cite{turlach1993bandwidth}.
The great flexibility of the KDE makes it very useful for many applications.
However, this comes at the cost of a slow computation speed.
%
The complexity of a naive implementation of the KDE is \landau{MN}, given by $M$ evaluations of $N$ data samples as input size.
%The complexity of a naive implementation of the KDE is \landau{NM} evaluations of the kernel function, given $N$ data samples and $M$ points of the estimate.
Therefore, a lot of effort was put into reducing the computation time of the KDE.
Various methods have been proposed, which can be clustered based on different techniques.
% k-nearest neighbor searching
An obvious way to speed up the computation is to reduce the number of evaluated kernel functions.
One possible optimization is based on k-nearest neighbour search, performed on spatial data structures.
These algorithms reduce the number of evaluated kernels by taking the distance between clusters of data points into account \cite{gray2003nonparametric}.
% fast multipole method & Fast Gaus Transform
Another approach is to reduce the algorithmic complexity of the sum over Gaussian functions, by employing a specialized variant of the fast multipole method.
The term fast Gauss transform was coined by Greengard \cite{greengard1991fast} who suggested this approach to reduce the complexity of the KDE to \landau{N+M}.
% However, the complexity grows exponentially with dimension. \cite{Improved Fast Gauss Transform and Efficient Kernel Density Estimation}
% FastKDE, passed on ECF and nuFFT
Recent methods based on the self-consistent KDE proposed by Bernacchia and Pigolotti \cite{bernacchia2011self} allow to obtain an estimate without any assumptions, \ie{} the kernel and bandwidth are both derived during the estimation.
They define a Fourier-based filter on the empirical characteristic function of a given dataset.
The computation time was further reduced by \etal{O'Brien} using a non-uniform fast Fourier transform (FFT) algorithm to efficiently transform the data into Fourier space \cite{oBrien2016fast}.
% binning => FFT
In general, it is desirable to compute the estimate directly from the sample set.
However, reducing the sample size by distributing the data on an equidistant grid can significantly reduce the computation time, if an approximative KDE is acceptable.
Silverman \cite{silverman1982algorithm} originally suggested to combine adjacent data points into data bins, which results in a discrete convolution structure of the KDE.
Allowing to efficiently compute the estimate using a FFT algorithm.
This approximation scheme was later called binned KDE (BKDE) and was extensively studied \cite{fan1994fast} \cite{wand1994fast} \cite{hall1996accuracy}.
While the FFT algorithm constitutes an efficient algorithm for large sample sets, it adds an noticeable overhead for smaller ones.
The idea to approximate a Gaussian filter using several box filters was first formulated by Wells \cite{wells1986efficient}.
Kovesi \cite{kovesi2010fast} suggested to use two box filters with different widths to increase accuracy maintaining the same complexity.
To eliminate the approximation error completely, \etal{Gwosdek} \cite{gwosdek2011theoretical} proposed a new approach called extended box filter.
This work highlights the discrete convolution structure of the BKDE and elaborates its connection to digital signal processing, especially the Gaussian filter.
Accordingly, this results in an equivalence relation between BKDE and Gaussian filter.
It follows, that the above mentioned box filter approach is also an approximation of the BKDE, resulting in an efficient computation scheme presented within this paper.
This approach has a lower complexity as comparable FFT-based algorithms and adds only a negligible small error, while improving the performance significantly.