30 lines
1.4 KiB
TeX
30 lines
1.4 KiB
TeX
\section{Binned Kernel Density Estimation}
|
|
% KDE by rosenblatt and parzen
|
|
% general KDE
|
|
% Gauss Kernel
|
|
% Formula Gauss KDE
|
|
% -> complexity/operation count
|
|
% Binned KDE
|
|
% Binned Gauss KDE
|
|
% -> complexity/operation count
|
|
|
|
The histogram is a simple and for a long time the most used non-parametric estimator.
|
|
However, its inability to produce a continuous estimate dismisses it for many applications where a smooth distribution is assumed.
|
|
In contrast, the KDE is often the preferred tool because of its ability to produce a continuous estimate and its flexibility.
|
|
Given $n$ independently observed realizations of the observation set $X=(x_1,\dots,x_n)$, the kernel density estimate $\hat{f}_n$ of the density function $f$ of the underlying distribution is given with
|
|
\begin{equation}
|
|
\label{eq:kde}
|
|
\hat{f}_n = \frac{1}{nh} \sum_{i=1}^{n} K \left( \frac{x-X_i}{h} \right) \text{,} %= \frac{1}{n} \sum_{i=1}^{n} K_h(x-x_i)
|
|
\end{equation}
|
|
where $K$ is the kernel function and $h\in\R^+$ is an arbitrary smoothing parameter called bandwidth.
|
|
While any density function can be used as the kernel function $K$ (such that $\int K(u) \dop{u} = 1$), a variety of popular choices of the kernel function $K$ exits.
|
|
In practice the Gaussian kernel is commonly used:
|
|
\begin{equation}
|
|
K(u)=\frac{1}{\sqrt{2\pi}} \expp{- \frac{u^2}{2} }
|
|
\end{equation}
|
|
|
|
\begin{equation}
|
|
\hat{f}_n = \frac{1}{nh\sqrt{2\pi}} \sum_{i=1}^{n} \expp{-\frac{(x-X_i)^2}{2h^2}}
|
|
\end{equation}
|
|
|