fixed kde chapter

This commit is contained in:
toni
2018-02-24 13:53:06 +01:00
parent 0d4cd0ff31
commit 1b6445fa65

View File

@@ -17,15 +17,15 @@ Given a univariate random sample set $X=\{X_1, \dots, X_N\}$, where $X$ has the
The kernel estimator $\hat{f}$ which estimates $f$ at the point $x$ is given as The kernel estimator $\hat{f}$ which estimates $f$ at the point $x$ is given as
\begin{equation} \begin{equation}
\label{eq:kde} \label{eq:kde}
\hat{f}(x) = \frac{1}{W} \sum_{i=1}^{n} \frac{w_i}{h} K \left(\frac{x-X_i}{h}\right) \hat{f}(x) = \frac{1}{W} \sum_{i=1}^{N} \frac{w_i}{h} K \left(\frac{x-X_i}{h}\right)
\end{equation} \end{equation}
where $W=\sum_{i=1}^{n}w_i$ and $h\in\R^+$ is an arbitrary smoothing parameter called bandwidth. where $W=\sum_{i=1}^{N}w_i$ and $h\in\R^+$ is an arbitrary smoothing parameter called bandwidth.
$K$ is a kernel function such that $\int K(u) \dop{u} = 1$ \cite[138]{scott2015}. $K$ is a kernel function such that $\int K(u) \dop{u} = 1$.
In general any kernel can be used, however the general advice is to chose a symmetric and low-order polynomial kernel. In general any kernel can be used, however the general advice is to chose a symmetric and low-order polynomial kernel.
Thus, several popular kernel functions are used in practice, like the Uniform, Gaussian, Epanechnikov, or Silverman kernel \cite[152.]{scott2015}. Thus, several popular kernel functions are used in practice, like the Uniform, Gaussian, Epanechnikov, or Silverman kernel \cite{scott2015}.
While the kernel estimate inherits all the properties of the kernel, usually it is not of crucial matter if a non-optimal kernel was chosen \cite[151f.]{scott2015}. While the kernel estimate inherits all the properties of the kernel, usually it is not of crucial matter if a non-optimal kernel was chosen.
As a matter of fact, the quality of the kernel estimate is primarily determined by the smoothing parameter $h$ \cite[145]{scott2015}. As a matter of fact, the quality of the kernel estimate is primarily determined by the smoothing parameter $h$ \cite{scott2015}.
%In theory it is possible to calculate an optimal bandwidth $h^*$ regarding to the asymptotic mean integrated squared error. %In theory it is possible to calculate an optimal bandwidth $h^*$ regarding to the asymptotic mean integrated squared error.
%However, in order to do so the density function to be estimated needs to be known which is obviously unknown in practice. %However, in order to do so the density function to be estimated needs to be known which is obviously unknown in practice.
% %
@@ -56,12 +56,12 @@ In general, reducing the size of the sample negatively affects the accuracy of t
Still, the sample size is a suitable parameter to speedup the computation. Still, the sample size is a suitable parameter to speedup the computation.
Since each single sample is combined with its adjacent samples into bins, the BKDE approximates the KDE. Since each single sample is combined with its adjacent samples into bins, the BKDE approximates the KDE.
Each bin represents the \qq{count} of the sample set at a given point of a equidistant grid with spacing $\delta$. Each bin represents the count of the sample set at a given point of a equidistant grid with spacing $\delta$.
A binning rule distributes a sample $x$ among the grid points $g_j=j\delta$, indexed by $j\in\Z$. A binning rule distributes a sample $x$ among the grid points $g_j=j\delta$, indexed by $j\in\Z$.
% and can be represented as a set of functions $\{ w_j(x,\delta), j\in\Z \}$. % and can be represented as a set of functions $\{ w_j(x,\delta), j\in\Z \}$.
Computation requires a finite grid on the interval $[a,b]$ containing the data, thus the number of grid points is $G=(b-a)/\delta+1$. Computation requires a finite grid on the interval $[a,b]$ containing the data, thus the number of grid points is $G=(b-a)/\delta+1$.
Given a binning rule $b_j$ the BKDE $\tilde{f}$ of a density $f$ computed pointwise at the grid point $g_x$ is given as Given a binning rule $r_j$ the BKDE $\tilde{f}$ of a density $f$ computed pointwise at the grid point $g_x$ is given as
\begin{equation} \begin{equation}
\label{eq:binKde} \label{eq:binKde}
\tilde{f}(g_x) = \frac{1}{W} \sum_{j=1}^{G} \frac{C_j}{h} K \left(\frac{g_x-g_j}{h}\right) \tilde{f}(g_x) = \frac{1}{W} \sum_{j=1}^{G} \frac{C_j}{h} K \left(\frac{g_x-g_j}{h}\right)
@@ -69,7 +69,7 @@ Given a binning rule $b_j$ the BKDE $\tilde{f}$ of a density $f$ computed pointw
where $G$ is the number of grid points and where $G$ is the number of grid points and
\begin{equation} \begin{equation}
\label{eq:gridCnts} \label{eq:gridCnts}
C_j=\sum_{i=1}^{n} b_j(x_i,\delta) C_j=\sum_{i=1}^{n} r_j(x_i,\delta)
\end{equation} \end{equation}
is the count at grid point $g_j$, such that $\sum_{j=1}^{G} C_j = W$ \cite{hall1996accuracy}. is the count at grid point $g_j$, such that $\sum_{j=1}^{G} C_j = W$ \cite{hall1996accuracy}.
@@ -77,7 +77,7 @@ In theory, any function which determines the count at grid points is a valid bin
However, for many applications it is recommend to use the simple binning rule However, for many applications it is recommend to use the simple binning rule
\begin{align} \begin{align}
\label{eq:simpleBinning} \label{eq:simpleBinning}
b_j(x,\delta) &= r_j(x,\delta) &=
\begin{cases} \begin{cases}
w_j & \text{if } x \in ((j-\frac{1}{2})\delta, (j-\frac{1}{2})\delta ] \\ w_j & \text{if } x \in ((j-\frac{1}{2})\delta, (j-\frac{1}{2})\delta ] \\
0 & \text{else} 0 & \text{else}
@@ -86,7 +86,7 @@ However, for many applications it is recommend to use the simple binning rule
or the common linear binning rule which divides the sample into two fractional weights shared by the nearest grid points or the common linear binning rule which divides the sample into two fractional weights shared by the nearest grid points
\begin{align} \begin{align}
\label{eq:linearBinning} \label{eq:linearBinning}
b_j(x,\delta) &= r_j(x,\delta) &=
\begin{cases} \begin{cases}
w_j(1-|\delta^{-1}x-j|) & \text{if } |\delta^{-1}x-j|\le1 \\ w_j(1-|\delta^{-1}x-j|) & \text{if } |\delta^{-1}x-j|\le1 \\
0 & \text{else.} 0 & \text{else.}
@@ -140,4 +140,3 @@ This finding allows to speedup the computation of the density estimate by using
%\begin{equation} %\begin{equation}
%\hat{f}_n = \frac{1}{nh\sqrt{2\pi}} \sum_{i=1}^{n} \expp{-\frac{(x-X_i)^2}{2h^2}} %\hat{f}_n = \frac{1}{nh\sqrt{2\pi}} \sum_{i=1}^{n} \expp{-\frac{(x-X_i)^2}{2h^2}}
%\end{equation} %\end{equation}