Fusion2018/tex/chapters/multivariate.tex

\section{Extension to multi-dimensional data}
So far only univariate sample sets were considered.
This is due to the fact, that the equations of the KDE \eqref{eq:kde}, BKDE \eqref{eq:binKde}, Gaussian filter \eqref{eq:gausFilt}, and the box filter \eqref{eq:boxFilt} are quite easily extended to multi-dimensional input.
Each method can be seen as several one-dimensional problems combined to a multi-dimensional result.
%However, with an increasing number of dimensions the computation time significantly increases.
In the following, the generalization to multi-dimensional input are briefly outlined.

In order to estimate a multivariate density using KDE or BKDE, a multivariate kernel needs to be used.
Multivariate kernel functions can be constructed in various ways, however, a popular way is given by the product kernel.
Such a kernel is constructed by combining several univariate kernels into a product, where each kernel is applied in each dimension with a possibly different bandwidth.

Given a multivariate random variable $\bm{X}=(x_1,\dots ,x_d)$ in $d$ dimensions.
The sample set $\mathcal{X}$ is a $n\times d$ matrix \cite{scott2015}.
The multivariate KDE $\hat{f}$ which defines the estimate pointwise at $\bm{u}=(u_1, \dots, u_d)^T$ is given as
\begin{equation}
\label{eq:mvKDE}
    \hat{f}(\bm{u}) = \frac{1}{W} \sum_{i=1}^{n} \frac{w_i}{h_1 \dots h_d} \left[  \prod_{j=1}^{d} K\left( \frac{u_j-x_{i,j}}{h_j} \right)  \right]  \text{,}
\end{equation}
where the bandwidth is given as a vector $\bm{h}=(h_1, \dots, h_d)$.

Note that \eqref{eq:mvKDE} does not include all possible multivariate kernels, such as spherically symmetric kernels, which are based on rotation of an univariate kernel.
In general, a multivariate product and spherically symmetric kernel based on the same univariate kernel will differ.
The only exception is the Gaussian kernel, which is spherically symmetric and has independent marginals. % TODO scott cite?!
In addition, only smoothing in the direction of the axes is possible.
If smoothing in other directions is necessary, the computation needs to be done on a prerotated sample set and the estimate needs to be rotated back to fit the original coordinate system \cite{wand1994fast}.

For the multivariate BKDE, in addition to the kernel function, the grid and the binning rules need to be extended to multivariate data.
Their extensions are rather straightforward, as the grid is easily defined on many dimensions.
Likewise, the ideas of common and linear binning rule scale with dimensionality \cite{wand1994fast}.

In general, multi-dimensional filters are multi-dimensional convolution operations.
However, by utilizing the separability property of convolution, a straightforward and a more efficient implementation can be found.
Convolution is separable if the filter kernel is separable, \ie{} it can be split into successive convolutions of several kernels.
In example, the Gaussian filter is separable, because of $e^{x^2+y^2} = e^{x^2}\cdot e^{y^2}$.
Likewise digital filters based on such kernels are called separable filters.
They are easily applied to multi-dimensional signals, because the input signal can be filtered in each dimension individually by an one-dimensional filter \cite{dspGuide1997}.


% KDE:
%So far only the univariate case was considered.
%This is due to the fact, that univariate kernel estimators can quite easily be extended to multivariate distributions.
%A common approach is to apply an univariate kernel with a possibly different bandwidth in each dimension.
%These kind of multivariate kernel is called product kernel as the multivariate kernel result is the product of each individual univariate kernel.
%
%Given a multivariate random variable $X=(x_1,\dots ,x_d)$ in $d$ dimensions.
%The sample $\bm{X}$ is a $n\times d$ matrix defined as \cite{scott2015}
%\begin{equation}
%    \bm{X}=
%    \begin{pmatrix}
%        X_1    \\
%        \vdots \\
%        X_n    \\
%    \end{pmatrix}
%    =
%    \begin{pmatrix}
%        x_{11} & \dots & x_{1d} \\
%        \vdots & \ddots & \vdots \\
%        x_{n1} & \dots & x_{nd}
%    \end{pmatrix} \text{.}
%\end{equation}
%
%The multivariate kernel density estimator $\hat{f}$ which defines the estimate pointwise at $\bm{x}=(x_1, \dots, x_d)^T$ is given as \cite{scott2015}
%\begin{equation}
%    \hat{f}(\bm{x}) = \frac{1}{nh_1 \dots h_d} \sum_{i=1}^{n} \left[  \prod_{j=1}^{d} K\left( \frac{x_j-x_{ij}}{h_j} \right)  \right]  \text{.}
%\end{equation}
%where the bandwidth is given as a vector $\bm{h}=(h_1, \dots, h_d)$.

% Product kernel allows our method
% Spherically symmetric kernel not supported, but Gaussian kernel == product & spehrically symmetric
% smoothing not in the direction of the axes -> rotate data, kde, rotate back

%Multivariate Gauss-Kernel
%\begin{equation}
%K(u)=\frac{1}{(2\pi)^{d/2}} \expp{-\frac{1}{2} \bm{x}^T \bm{x}}
%\end{equation}

% Gaus:
%If the filter kernel is separable, the convolution is also separable \ie{} multi-dimensional convolution can be computed as individual one-dimensional convolutions with a one-dimensional kernel.
%Because of $e^{x^2+y^2} = e^{x^2}\cdot e^{y^2}$ the Gaussian filter is separable and can be easily applied to multi-dimensional signals. \todo{quelle}


%wie benutzen wir das ganze jetzt? auf was muss ich achten?

% Am Beispiel 2D Daten
% Histogram erzeugen (== data binnen)
% Hierzu wird min/max benötigt
% Anschließend Filterung per Box Filter über das Histogram
% - Wenn möglich parallel (SIMD, GPU)
% - separiert in jeder dim einzeln
% Maximum aus Filter ergebnis nehmen