Fusion2018/tex/chapters/kde.tex

\section{Binned Kernel Density Estimation}
% KDE by rosenblatt and parzen
% general KDE
% Gauss Kernel
% Formula Gauss KDE
% -> complexity/operation count
% Binned KDE
% Binned Gauss KDE
% -> complexity/operation count

The histogram is a simple and for a long time the most used non-parametric estimator.
However, its inability to produce a continuous estimate dismisses it for many applications where a smooth distribution is assumed.
In contrast, the KDE is often the preferred tool because of its ability to produce a continuous estimate and its flexibility.
Given $n$ independently observed realizations of the observation set $X=(x_1,\dots,x_n)$, the kernel density estimate $\hat{f}_n$ of the density function $f$ of the underlying distribution is given with
\begin{equation}
\label{eq:kde}
\hat{f}_n = \frac{1}{nh} \sum_{i=1}^{n} K \left(  \frac{x-X_i}{h} \right)  \text{,} %= \frac{1}{n} \sum_{i=1}^{n} K_h(x-x_i)
\end{equation}
where $K$ is the kernel function and $h\in\R^+$ is an arbitrary smoothing parameter called bandwidth.
While any density function can be used as the kernel function $K$ (such that $\int K(u) \dop{u} = 1$), a variety of popular choices of the kernel function $K$ exits.
In practice the Gaussian kernel is commonly used:
\begin{equation}
K(u)=\frac{1}{\sqrt{2\pi}} \expp{- \frac{u^2}{2} }
\end{equation}

\begin{equation}
\hat{f}_n = \frac{1}{nh\sqrt{2\pi}} \sum_{i=1}^{n} \expp{-\frac{(x-X_i)^2}{2h^2}}
\end{equation}