Fixed many bugs
This commit is contained in:
@@ -1,4 +1,4 @@
|
||||
\section{Kernel Density Estimation}
|
||||
\section{Kernel Density Estimator}
|
||||
% KDE by rosenblatt and parzen
|
||||
% general KDE
|
||||
% Gauss Kernel
|
||||
@@ -11,17 +11,17 @@
|
||||
%The histogram is a simple and for a long time the most used non-parametric estimator.
|
||||
%However, its inability to produce a continuous estimate dismisses it for many applications where a smooth distribution is assumed.
|
||||
%In contrast,
|
||||
The KDE is often the preferred tool to estimate a density function from discrete data samples because of its ability to produce a continuous estimate and its flexibility.
|
||||
The KDE is often the preferred tool to estimate a density function from discrete data samples because of its flexibility and ability to produce a continuous estimate.
|
||||
%
|
||||
Given a univariate random sample set $X=\{X_1, \dots, X_N\}$, where $X$ has the density function $f$ and let $w_1, \dots w_N$ be associated weights.
|
||||
The kernel estimator $\hat{f}$ which estimates $f$ at the point $x$ is given as
|
||||
\begin{equation}
|
||||
\label{eq:kde}
|
||||
\hat{f}(x) = \frac{1}{W} \sum_{i=1}^{N} \frac{w_i}{h} K \left(\frac{x-X_i}{h}\right)
|
||||
\hat{f}(x) = \frac{1}{W} \sum_{i=1}^{N} \frac{w_i}{h} K \left(\frac{x-X_i}{h}\right) \text{,}
|
||||
\end{equation}
|
||||
where $W=\sum_{i=1}^{N}w_i$ and $h\in\R^+$ is an arbitrary smoothing parameter called bandwidth.
|
||||
$K$ is a kernel function such that $\int K(u) \dop{u} = 1$.
|
||||
In general any kernel can be used, however the general advice is to chose a symmetric and low-order polynomial kernel.
|
||||
In general, any kernel can be used, however a common advice is to chose a symmetric and low-order polynomial kernel.
|
||||
Thus, several popular kernel functions are used in practice, like the Uniform, Gaussian, Epanechnikov, or Silverman kernel \cite{scott2015}.
|
||||
|
||||
While the kernel estimate inherits all the properties of the kernel, usually it is not of crucial matter if a non-optimal kernel was chosen.
|
||||
@@ -51,25 +51,25 @@ K_G(u)=\frac{1}{\sqrt{2\pi}} \expp{- \frac{u^2}{2} } \text{.}
|
||||
\end{equation}
|
||||
|
||||
The flexibility of the KDE comes at the expense of computational efficiency, which leads to the development of more efficient computation schemes.
|
||||
The computation time depends, besides the number of calculated points, on the number of data points $N$.
|
||||
The computation time depends, besides the number of calculated points $M$, on the input size, namely the number of data points $N$.
|
||||
In general, reducing the size of the sample negatively affects the accuracy of the estimate.
|
||||
Still, the sample size is a suitable parameter to speedup the computation.
|
||||
Still, the sample size is a suitable parameter to speed up the computation.
|
||||
|
||||
Since each single sample is combined with its adjacent samples into bins, the BKDE approximates the KDE.
|
||||
Each bin represents the count of the sample set at a given point of a equidistant grid with spacing $\delta$.
|
||||
A binning rule distributes a sample $x$ among the grid points $g_j=j\delta$, indexed by $j\in\Z$.
|
||||
Each bin represents the count of the sample set at a given point of an equidistant grid with spacing $\delta$.
|
||||
A binning rule distributes a sample among the grid points $g_j=j\delta$, indexed by $j\in\Z$.
|
||||
% and can be represented as a set of functions $\{ w_j(x,\delta), j\in\Z \}$.
|
||||
Computation requires a finite grid on the interval $[a,b]$ containing the data, thus the number of grid points is $G=(b-a)/\delta+1$.
|
||||
|
||||
Given a binning rule $r_j$ the BKDE $\tilde{f}$ of a density $f$ computed pointwise at the grid point $g_x$ is given as
|
||||
\begin{equation}
|
||||
\label{eq:binKde}
|
||||
\tilde{f}(g_x) = \frac{1}{W} \sum_{j=1}^{G} \frac{C_j}{h} K \left(\frac{g_x-g_j}{h}\right)
|
||||
\tilde{f}(g_x) = \frac{1}{W} \sum_{j=1}^{G} \frac{C_j}{h} K \left(\frac{g_x-g_j}{h}\right) \text{,}
|
||||
\end{equation}
|
||||
where $G$ is the number of grid points and
|
||||
\begin{equation}
|
||||
\label{eq:gridCnts}
|
||||
C_j=\sum_{i=1}^{n} r_j(x_i,\delta)
|
||||
C_j=\sum_{i=1}^{N} r_j(x_i,\delta)
|
||||
\end{equation}
|
||||
is the count at grid point $g_j$, such that $\sum_{j=1}^{G} C_j = W$ \cite{hall1996accuracy}.
|
||||
|
||||
@@ -83,7 +83,7 @@ However, for many applications it is recommend to use the simple binning rule
|
||||
0 & \text{else}
|
||||
\end{cases}
|
||||
\end{align}
|
||||
or the common linear binning rule which divides the sample into two fractional weights shared by the nearest grid points
|
||||
or the common linear binning rule, which divides the sample into two fractional weights shared by the nearest grid points
|
||||
\begin{align}
|
||||
\label{eq:linearBinning}
|
||||
r_j(x,\delta) &=
|
||||
@@ -94,32 +94,32 @@ or the common linear binning rule which divides the sample into two fractional w
|
||||
\end{align}
|
||||
An advantage is that their impact on the approximation error is extensively investigated and well understood \cite{hall1996accuracy}.
|
||||
Both methods can be computed with a fast $\landau{N}$ algorithm, as simple binning is essentially the quotient of an integer division and the fractional weights of the linear binning are given by the remainder of the division.
|
||||
As linear binning is more precise it is often preferred over simple binning \cite{fan1994fast}.
|
||||
As linear binning is more precise, it is often preferred over simple binning \cite{fan1994fast}.
|
||||
|
||||
While linear binning improves the accuracy of the estimate the choice of the grid size is of more importance.
|
||||
While linear binning improves the accuracy of the estimate, the choice of the grid size is of more importance.
|
||||
The number of grid points $G$ determines the trade-off between the approximation error caused by the binning and the computational speed of the algorithm.
|
||||
Clearly, a large value of $G$ produces a estimate close to the regular KDE, but requires more evaluations of the kernel compared to a coarser grid.
|
||||
Clearly, a large value of $G$ produces an estimate close to the regular KDE, but requires more evaluations of the kernel compared to a coarser grid.
|
||||
However, it is unknown what particular $G$ gives the best trade-off for any given sample set.
|
||||
In general, there is no definite answer because the amount of binning depends on the structure of the unknown density and the sample size \cite{hall1996accuracy}.
|
||||
|
||||
A naive implementation of \eqref{eq:binKde} reduces the number of kernel evaluations to $\landau{G^2}$, assuming that $G<N$ \cite{fan1994fast}.
|
||||
However, due to the fixed grid spacing several kernel evaluations are the same and can be reused.
|
||||
This reduces the number of kernel evaluations to $\landau{G}$, but the number of additions and multiplications required are still $\landau{G^2}$.
|
||||
Using the FFT to perform the discrete convolution, the complexity can be further reduced to $\landau{G\log{G}}$, which is currently the fastest exact BKDE algorithm.
|
||||
Using the FFT to perform the discrete convolution, the complexity can be further reduced to $\landau{G\log{G}}$ \cite{silverman1982algorithm}.%, which is currently the fastest exact BKDE algorithm.
|
||||
|
||||
The \mbox{FFT-convolution} approach is usually highlighted as the striking computational benefit of the BKDE.
|
||||
However, for this work it is the key to recognize the discrete convolution structure of \eqref{eq:binKde}, as this allows one to interpret the computation of a density estimate as a signal filter problem.
|
||||
However, for this work it is the key to recognize the discrete convolution structure of \eqref{eq:binKde}, as this allows to interpret the computation of a density estimate as a signal filter problem.
|
||||
This makes it possible to apply a wide range of well studied techniques from the broad field of digital signal processing (DSP).
|
||||
Using the Gaussian kernel from \eqref{eq:gausKern} in conjunction with \eqref{eq:binKde} results in the following equation
|
||||
\begin{equation}
|
||||
\label{eq:bkdeGaus}
|
||||
\tilde{f}(g_x)=\frac{1}{nh\sqrt{2\pi}} \sum_{i=1}^{G} C_j \expp{-\frac{(x-X_i)^2}{2h^2}} \text{.}
|
||||
\tilde{f}(g_x)=\frac{1}{W\sqrt{2\pi}} \sum_{j=1}^{G} \frac{C_j}{h} \expp{-\frac{(g_x-g_j)^2}{2h^2}} \text{.}
|
||||
\end{equation}
|
||||
|
||||
The above formula is a convolution operation of the data and the Gaussian kernel.
|
||||
More precisely it is a discrete convolution of the finite data grid and the Gaussian function.
|
||||
More precisely, it is a discrete convolution of the finite data grid and the Gaussian function.
|
||||
In terms of DSP this is analogous to filter the binned data with a Gaussian filter.
|
||||
This finding allows to speedup the computation of the density estimate by using a fast approximation scheme based on iterated box filters.
|
||||
This finding allows to speed up the computation of the density estimate by using a fast approximation scheme based on iterated box filters.
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user