fixed kde chapter

This commit is contained in:
toni
2018-02-24 13:53:06 +01:00
parent 0d4cd0ff31
commit 1b6445fa65

View File

@@ -17,15 +17,15 @@ Given a univariate random sample set $X=\{X_1, \dots, X_N\}$, where $X$ has the
The kernel estimator $\hat{f}$ which estimates $f$ at the point $x$ is given as
\begin{equation}
\label{eq:kde}
\hat{f}(x) = \frac{1}{W} \sum_{i=1}^{n} \frac{w_i}{h} K \left(\frac{x-X_i}{h}\right)
\hat{f}(x) = \frac{1}{W} \sum_{i=1}^{N} \frac{w_i}{h} K \left(\frac{x-X_i}{h}\right)
\end{equation}
where $W=\sum_{i=1}^{n}w_i$ and $h\in\R^+$ is an arbitrary smoothing parameter called bandwidth.
$K$ is a kernel function such that $\int K(u) \dop{u} = 1$ \cite[138]{scott2015}.
where $W=\sum_{i=1}^{N}w_i$ and $h\in\R^+$ is an arbitrary smoothing parameter called bandwidth.
$K$ is a kernel function such that $\int K(u) \dop{u} = 1$.
In general any kernel can be used, however the general advice is to chose a symmetric and low-order polynomial kernel.
Thus, several popular kernel functions are used in practice, like the Uniform, Gaussian, Epanechnikov, or Silverman kernel \cite[152.]{scott2015}.
Thus, several popular kernel functions are used in practice, like the Uniform, Gaussian, Epanechnikov, or Silverman kernel \cite{scott2015}.
While the kernel estimate inherits all the properties of the kernel, usually it is not of crucial matter if a non-optimal kernel was chosen \cite[151f.]{scott2015}.
As a matter of fact, the quality of the kernel estimate is primarily determined by the smoothing parameter $h$ \cite[145]{scott2015}.
While the kernel estimate inherits all the properties of the kernel, usually it is not of crucial matter if a non-optimal kernel was chosen.
As a matter of fact, the quality of the kernel estimate is primarily determined by the smoothing parameter $h$ \cite{scott2015}.
%In theory it is possible to calculate an optimal bandwidth $h^*$ regarding to the asymptotic mean integrated squared error.
%However, in order to do so the density function to be estimated needs to be known which is obviously unknown in practice.
%
@@ -56,12 +56,12 @@ In general, reducing the size of the sample negatively affects the accuracy of t
Still, the sample size is a suitable parameter to speedup the computation.
Since each single sample is combined with its adjacent samples into bins, the BKDE approximates the KDE.
Each bin represents the \qq{count} of the sample set at a given point of a equidistant grid with spacing $\delta$.
Each bin represents the count of the sample set at a given point of a equidistant grid with spacing $\delta$.
A binning rule distributes a sample $x$ among the grid points $g_j=j\delta$, indexed by $j\in\Z$.
% and can be represented as a set of functions $\{ w_j(x,\delta), j\in\Z \}$.
Computation requires a finite grid on the interval $[a,b]$ containing the data, thus the number of grid points is $G=(b-a)/\delta+1$.
Given a binning rule $b_j$ the BKDE $\tilde{f}$ of a density $f$ computed pointwise at the grid point $g_x$ is given as
Given a binning rule $r_j$ the BKDE $\tilde{f}$ of a density $f$ computed pointwise at the grid point $g_x$ is given as
\begin{equation}
\label{eq:binKde}
\tilde{f}(g_x) = \frac{1}{W} \sum_{j=1}^{G} \frac{C_j}{h} K \left(\frac{g_x-g_j}{h}\right)
@@ -69,7 +69,7 @@ Given a binning rule $b_j$ the BKDE $\tilde{f}$ of a density $f$ computed pointw
where $G$ is the number of grid points and
\begin{equation}
\label{eq:gridCnts}
C_j=\sum_{i=1}^{n} b_j(x_i,\delta)
C_j=\sum_{i=1}^{n} r_j(x_i,\delta)
\end{equation}
is the count at grid point $g_j$, such that $\sum_{j=1}^{G} C_j = W$ \cite{hall1996accuracy}.
@@ -77,7 +77,7 @@ In theory, any function which determines the count at grid points is a valid bin
However, for many applications it is recommend to use the simple binning rule
\begin{align}
\label{eq:simpleBinning}
b_j(x,\delta) &=
r_j(x,\delta) &=
\begin{cases}
w_j & \text{if } x \in ((j-\frac{1}{2})\delta, (j-\frac{1}{2})\delta ] \\
0 & \text{else}
@@ -86,7 +86,7 @@ However, for many applications it is recommend to use the simple binning rule
or the common linear binning rule which divides the sample into two fractional weights shared by the nearest grid points
\begin{align}
\label{eq:linearBinning}
b_j(x,\delta) &=
r_j(x,\delta) &=
\begin{cases}
w_j(1-|\delta^{-1}x-j|) & \text{if } |\delta^{-1}x-j|\le1 \\
0 & \text{else.}
@@ -140,4 +140,3 @@ This finding allows to speedup the computation of the density estimate by using
%\begin{equation}
%\hat{f}_n = \frac{1}{nh\sqrt{2\pi}} \sum_{i=1}^{n} \expp{-\frac{(x-X_i)^2}{2h^2}}
%\end{equation}