kde & moving avg draft
This commit is contained in:
@@ -109,6 +109,9 @@
|
|||||||
\newcommand{\Z} {\ensuremath{ \mathbf{Z} }}
|
\newcommand{\Z} {\ensuremath{ \mathbf{Z} }}
|
||||||
\newcommand{\expp} [1]{\ensuremath{ \exp \left( #1 \right) }}
|
\newcommand{\expp} [1]{\ensuremath{ \exp \left( #1 \right) }}
|
||||||
\newcommand{\landau}[1]{\ensuremath{ \mathcal{O}\left( #1 \right) }}
|
\newcommand{\landau}[1]{\ensuremath{ \mathcal{O}\left( #1 \right) }}
|
||||||
|
\newcommand{\wideal} {\ensuremath{ w_{\text{ideal}} }}
|
||||||
|
\newcommand{\floor} [1]{\ensuremath{ \lfloor #1 \rfloor }}
|
||||||
|
\newcommand{\etal} [1]{#1~et~al.}
|
||||||
|
|
||||||
\newcommand{\qq} [1]{``#1''}
|
\newcommand{\qq} [1]{``#1''}
|
||||||
|
|
||||||
|
|||||||
@@ -13,6 +13,7 @@ While such methods are computational fast and suitable most of the time, it is n
|
|||||||
Especially time-sequential, non-linear and non-Gaussian state spaces, depending upon a high number of different sensor types, frequently suffer from a multimodal representation of the posterior distribution.
|
Especially time-sequential, non-linear and non-Gaussian state spaces, depending upon a high number of different sensor types, frequently suffer from a multimodal representation of the posterior distribution.
|
||||||
As a result, those techniques are not able to provide an accurate statement about the most probable state, rather causing misleading or false outcomes.
|
As a result, those techniques are not able to provide an accurate statement about the most probable state, rather causing misleading or false outcomes.
|
||||||
For example in a localization scenario where a bimodal distribution represents the current posterior, a reliable position estimation is more likely to be at one of the modes, instead of somewhere in-between.
|
For example in a localization scenario where a bimodal distribution represents the current posterior, a reliable position estimation is more likely to be at one of the modes, instead of somewhere in-between.
|
||||||
|
\commentByMarkus{Vlt. noch drauf eingehen, dass avg. eben in die Mitte geht?}
|
||||||
Additionally, in most practical scenarios the sample size and therefore the resolution is limited, causing the variance of the sample based estimate to be high \cite{Verma2003}.
|
Additionally, in most practical scenarios the sample size and therefore the resolution is limited, causing the variance of the sample based estimate to be high \cite{Verma2003}.
|
||||||
|
|
||||||
It is obvious, that a computation of the full posterior could solve the above, but finding such an analytical solution is an intractable problem, what is the reason for applying a sample representation in the first place.
|
It is obvious, that a computation of the full posterior could solve the above, but finding such an analytical solution is an intractable problem, what is the reason for applying a sample representation in the first place.
|
||||||
|
|||||||
@@ -12,6 +12,48 @@ The histogram is a simple and for a long time the most used non-parametric estim
|
|||||||
However, its inability to produce a continuous estimate dismisses it for many applications where a smooth distribution is assumed.
|
However, its inability to produce a continuous estimate dismisses it for many applications where a smooth distribution is assumed.
|
||||||
In contrast, KDE is often the preferred tool because of its ability to produce a continuous estimate and its flexibility.
|
In contrast, KDE is often the preferred tool because of its ability to produce a continuous estimate and its flexibility.
|
||||||
|
|
||||||
|
Given a univariate random sample $X=\{X_1, X_2, \dots, X_n\}$, the kernel estimator $\hat{f}$ which defines the estimate at the point $x$ is given as
|
||||||
|
\begin{equation}
|
||||||
|
\label{eq:kde}
|
||||||
|
\hat{f}(x) = \frac{1}{n} \sum_{i=1}^{n} K_h(x-X_i)
|
||||||
|
\end{equation}
|
||||||
|
where $K_h(t)=K(t/h)/h$ is the normalized kernel \cite[138]{scott2015} and $h\in\R^+$ is an arbitrary smoothing parameter called bandwidth.
|
||||||
|
%, and $h=h_n$ is a function of the sample size $n$ with $h\rightarrow0$ as $n\rightarrow\infty$ \cite{rosenblatt1956remarks}.
|
||||||
|
Any function which satisfy $\int K_h(u) \dop{u} = 1$ is a valid kernel.
|
||||||
|
In general any kernel can be used, however the general advice is to chose a symmetric and low-order polynomial kernel.
|
||||||
|
Thus, several popular kernel functions are used in practice, like the Uniform, Gaussian, Epanechnikov, or Silverman kernel \cite[152.]{scott2015}.
|
||||||
|
|
||||||
|
While the kernel estimate inherits all the properties of the kernel, usually it is not of crucial matter if a non-optimal kernel was chosen \cite[151f.]{scott2015}.
|
||||||
|
As a matter of fact, the quality of the kernel estimate is primarily determined by the smoothing parameter $h$ \cite[145]{scott2015}.
|
||||||
|
In theory it is possible to calculate an optimal bandwidth $h^*$ regarding to the asymptotic mean integrated squared error.
|
||||||
|
However, in order to do so the density function to be estimated needs to be known which is obviously unknown in practice.
|
||||||
|
|
||||||
|
Any non-optimal bandwidth causes undersmoothing or oversmoothing.
|
||||||
|
An undersmoothing estimator has a large variance and hence a small $h$ leads to undersmoothing.
|
||||||
|
On the other hand given a large $h$ the bias increases, which leads to oversmoothing \cite[7]{Cybakov2009}.
|
||||||
|
Clearly with an adverse choice of the bandwidth crucial information like modality might get smoothed out.
|
||||||
|
All in all it is not obvious to determine a good choice of the bandwidth.
|
||||||
|
|
||||||
|
This is aggravated by the fact that the structure of the data may vary significantly.
|
||||||
|
Given such a situation it is beneficial to adapt the bandwidth to the neighbourhood of the given data point.
|
||||||
|
As a result, a lot of research is put into developing data-driven bandwidth selections algorithms to obtain an adequate value of $h$ directly from the data.
|
||||||
|
% TODO aus gründen wird hier die Bandbreite als gegeben angenommen
|
||||||
|
|
||||||
|
As mentioned above the particular choice of the kernel is only of minor importance as it affects the overall result in an negligible way.
|
||||||
|
It is common practice to suspect that the data is approximately Gaussian, and therefore the Gaussian kernel is frequently used.
|
||||||
|
Note that this assumption is different compared to assuming a concrete distribution family like a Gaussian distribution or mixture distribution.
|
||||||
|
In this work we choose the Gaussian kernel in favour of computational efficiency as our approach is based on the approximation of the Gaussian filter.
|
||||||
|
The Gaussian kernel is given as
|
||||||
|
\begin{equation}
|
||||||
|
\label{eq:gausKern}
|
||||||
|
K_G(u)=\frac{1}{\sqrt{2\pi}} \expp{- \frac{u^2}{2} } \text{.}
|
||||||
|
\end{equation}
|
||||||
|
|
||||||
|
So far only the univariate case was considered.
|
||||||
|
This is due to the fact, that univariate kernel estimators can quite easily be extended to multivariate distributions.
|
||||||
|
A common approach is to apply an univariate kernel with a possibly different bandwidth in each dimension.
|
||||||
|
These kind of multivariate kernel is called product kernel as the multivariate kernel result is the product of each individual univariate kernel.
|
||||||
|
|
||||||
Given a multivariate random variable $X=(x_1,\dots ,x_d)$ in $d$ dimensions.
|
Given a multivariate random variable $X=(x_1,\dots ,x_d)$ in $d$ dimensions.
|
||||||
The sample $\bm{X}$ is a $n\times d$ matrix defined as \cite[162]{scott2015}
|
The sample $\bm{X}$ is a $n\times d$ matrix defined as \cite[162]{scott2015}
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
@@ -29,101 +71,91 @@ The sample $\bm{X}$ is a $n\times d$ matrix defined as \cite[162]{scott2015}
|
|||||||
\end{pmatrix} \text{.}
|
\end{pmatrix} \text{.}
|
||||||
\end{equation}
|
\end{equation}
|
||||||
|
|
||||||
The multivariate kernel estimator $\hat{f}$ which defines the estimate pointwise at $\bm{x}=(x_1, \dots, x_d)^T$ is given as \cite[162]{scott2015}
|
The multivariate kernel density estimator $\hat{f}$ which defines the estimate pointwise at $\bm{x}=(x_1, \dots, x_d)^T$ is given as \cite[162]{scott2015}
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
\hat{f}(\bm{x}) = \frac{1}{n} \sum_{i=1}^{n} \bm{K_h}(\bm{x} - \bm{X_i})
|
\hat{f}(\bm{x}) = \frac{1}{nh_1 \dots h_d} \sum_{i=1}^{n} \left[ \prod_{j=1}^{d} K\left( \frac{x_j-x_{ij}}{h_j} \right) \right] \text{.}
|
||||||
\end{equation}
|
\end{equation}
|
||||||
where $\bm{h}$ is an arbitrary smoothing parameter called bandwidth and $\bm{K_h}$ is a multivariate kernel function.
|
where the bandwidth is given as a vector $\bm{h}=(h_1, \dots, h_d)$.
|
||||||
The bandwidth is given as a vector $\bm{h}=(h_1, \dots, h_d)$, i.e. for every dimension a different smoothing parameter is used.
|
|
||||||
|
|
||||||
Valid kernel functions must satisfy $\int \bm{K_h}(\bm{u}) \dop{\bm{u}} = 1$.
|
|
||||||
A common approach to define a multivariate kernel function is to apply a univariate kernel in each dimension.
|
|
||||||
|
|
||||||
Univariate kernel estimators can easily be extended to multivariate distributions.
|
|
||||||
In fact, the univariate kernel is just applied in every dimension and the individual results are combined by a product.
|
|
||||||
Therefore, a multivariate kernel estimator which is obtained by this concept is called product kernel \cite[162]{scott2015}.
|
|
||||||
|
|
||||||
\begin{equation}
|
|
||||||
\bm{K_h}(t) = \frac{1}{h_1 \dots h_d} \prod_{j=1}^{d} K\left( \frac{x_j-x_{ij}}{h_j} \right)
|
|
||||||
\end{equation}
|
|
||||||
|
|
||||||
|
|
||||||
Multivariate Gauss-Kernel
|
|
||||||
|
|
||||||
\begin{equation}
|
|
||||||
K(u)=\frac{1}{(2\pi)^{d/2}} \expp{-\frac{1}{2} \bm{x}^T \bm{x}}
|
|
||||||
\end{equation}
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
In summary the kernel density estimation is a powerful tool to obtain an estimate from an unknown distribution.
|
|
||||||
It can obtain a smooth estimate from a sample without relying on assumptions about the underlying model.
|
|
||||||
Selection of a good choice of the bandwidth is non-obvious and has a crucial impact on the quality of the estimate.
|
|
||||||
In theory it is possible to calculate an optimal bandwidth $h*$ regarding to the asymptotic mean integrated squared error.
|
|
||||||
However, in order to do so the to be estimated density needs to be known which is obviously unknown.
|
|
||||||
|
|
||||||
Any non-optimal bandwidth causes undersmoothing or oversmoothing.
|
|
||||||
An undersmoothing estimator has a large variance and hence a small $h$ leads to undersmoothing.
|
|
||||||
On the other hand given a large $h$ the bias increases, which leads to oversmoothing \cite[7]{Cybakov2009}.
|
|
||||||
Clearly with an adverse choice of the bandwidth crucial information like modality might get smoothed out.
|
|
||||||
Sophisticated methods are necessary to obtain a good bandwidth from the data.
|
|
||||||
|
|
||||||
|
%Multivariate Gauss-Kernel
|
||||||
|
%\begin{equation}
|
||||||
|
%K(u)=\frac{1}{(2\pi)^{d/2}} \expp{-\frac{1}{2} \bm{x}^T \bm{x}}
|
||||||
|
%\end{equation}
|
||||||
|
|
||||||
The flexibility of the KDE comes at the expense of computational efficiency, which leads to the development of more efficient computation schemes.
|
The flexibility of the KDE comes at the expense of computational efficiency, which leads to the development of more efficient computation schemes.
|
||||||
The computation time depends, besides the number of calculated points, on the number of data points $n$.
|
The computation time depends, besides the number of calculated points, on the number of data points $n$.
|
||||||
In general, reducing the size of the sample negatively affects the accuracy of the estimate.
|
In general, reducing the size of the sample negatively affects the accuracy of the estimate.
|
||||||
Still, the sample size is a suitable parameter to speedup the computation.
|
Still, the sample size is a suitable parameter to speedup the computation.
|
||||||
To reduce the number of single data points adjacent points are combined into a bin.
|
Silverman \cite{silverman1982algorithm} suggested to reduce the number of single data points by combining adjacent points into data bins.
|
||||||
|
This approximation is called binned kernel density estimate (BKDE) and was extensively analysed \cite{fan1994fast} \cite{wand1994fast} \cite{hall1996accuracy} \cite{holmstrom2000accuracy}.
|
||||||
|
|
||||||
Usually the data is binned over an equidistant grid.
|
Usually the data is binned over an equidistant grid.
|
||||||
A kernel estimator which computes the estimate over such a grid is called binned kernel density estimator (BKDE).
|
Due to the equally-spaced grid many kernel evaluations are almost the same and can be saved, which greatly reduces the number of evaluated kernels and naturally leads to a reduced computation time \cite{fan1994fast}.
|
||||||
|
|
||||||
Due to the equally-spaced grid many kernel evaluations are almost the same and can be saved, which greatly reduces the number of evaluated kernels \cite{fan1994fast}.
|
|
||||||
This naturally leads to a reduced computation time.
|
|
||||||
|
|
||||||
So far the binned estimator introduced two additional parameters which affects the accuracy of the estimation.
|
|
||||||
The choice of the number of grid points is crucial, because it determines the trade-off between the approximation error introduced by the binning and the computational speed of the algorithm \cite{wand1994fast}.
|
|
||||||
Also, the choice of the binning rule affects the accuracy of the approximation.
|
|
||||||
There are several possible binning rules \cite{hall1996accuracy} but the most commonly used binning rules are the simple \eqref{eq:simpleBinning} and common linear binning rule \eqref{eq:linearBinning}.
|
|
||||||
An advantage of these often used binning rules is that their effect on the approximation is extensively investigated and well understood \cite{wand1994fast} \cite{hall1996accuracy} \cite{holmstrom2000accuracy}.
|
|
||||||
|
|
||||||
|
|
||||||
At first the data, i.e. a random sample $X$, has to be assigned to a grid.
|
At first the data, i.e. a random sample $X$, has to be assigned to a grid.
|
||||||
A binning rule distributes a sample $x$ among the grid points $g_j=j\delta$ for $j\in\Z$ and can be represented as a set of functions $\{ w_j(x,\delta), j\in\Z \}$.
|
A binning rule distributes a sample $x$ among the grid points $g_j=j\delta$ for $j\in\Z$ and can be represented as a set of functions $\{ w_j(x,\delta), j\in\Z \}$.
|
||||||
For computation a finite grid is used on the interval $[a,b]$ containing the data, thus the number of grid points is $G=(b-a)/\delta+1$.
|
For computation a finite grid is used on the interval $[a,b]$ containing the data, thus the number of grid points is $G=(b-a)/\delta+1$.
|
||||||
|
|
||||||
|
While the estimate can be efficiently computed it is unknown how large the grid should be chosen.
|
||||||
|
Because the computation time heavily depends on the grid size, it is desirable to chose a grid as small as possible without losing to much accuracy.
|
||||||
|
In general, there is no definite answer because the amount of binning depends on the structure of the unknown density and the sample size.
|
||||||
|
The roughness of the unknown density directly affects the grid size.
|
||||||
|
Coarser grids allow a greater speedup but at the same time might conceal important details of the unknown density \cite{wand1994fast}.
|
||||||
|
|
||||||
Equipped with a method to bin the data the binned kernel estimator $\tilde{f}$ of a density $f$, based on $\bm{X}$ can be pointwise computed at the grid point $g_x$ with
|
|
||||||
|
Given a binning rule $w_j$ the BKDE $\tilde{f}$ of a density $f$ computed pointwise at the grid point $g_x$ is given as
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
\label{eq:binKde}
|
\label{eq:binKde}
|
||||||
\tilde{f}(g_x) = \frac{1}{n} \sum_{j=1}^{G} N_j K_h(g_x-g_j)
|
\tilde{f}(g_x) = \frac{1}{n} \sum_{j=1}^{G} N_j K_h(g_x-g_j)
|
||||||
\end{equation}
|
\end{equation}
|
||||||
where $G$ is the number of grid points and
|
where $G$ is the number of grid points and
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
|
\label{eq:gridCnts}
|
||||||
N_j=\sum_{i=1}^{n} w_j(x_i,\delta)
|
N_j=\sum_{i=1}^{n} w_j(x_i,\delta)
|
||||||
\end{equation}
|
\end{equation}
|
||||||
is the count at grid point $g_j$ \cite{hall1996accuracy}.
|
is the count at grid point $g_j$ \cite{hall1996accuracy}.
|
||||||
|
|
||||||
|
So far the binned estimator introduced two additional parameters which affects the accuracy of the estimation.
|
||||||
|
The choice of the number of grid points is crucial, because it determines the trade-off between the approximation error introduced by the binning and the computational speed of the algorithm \cite{wand1994fast}.
|
||||||
|
Also, the choice of the binning rule affects the accuracy of the approximation.
|
||||||
|
There are several possible binning rules \cite{hall1996accuracy} but the most commonly used binning rules are the simple
|
||||||
|
\begin{align}
|
||||||
|
\label{eq:simpleBinning}
|
||||||
|
w_j(x,\delta) &=
|
||||||
|
\begin{cases}
|
||||||
|
1 & \text{if } x \in ((j-\frac{1}{2})\delta, (j-\frac{1}{2})\delta ] \\
|
||||||
|
0 & \text{else}
|
||||||
|
\end{cases}
|
||||||
|
\end{align}
|
||||||
|
and the common linear binning rule
|
||||||
|
|
||||||
|
\begin{align}
|
||||||
|
w_j(x,\delta) &=
|
||||||
|
\begin{cases}
|
||||||
|
1-|\delta^{-1}x-j| & \text{if } |\delta^{-1}x-j|\le1 \\
|
||||||
|
0 & \text{else.}
|
||||||
|
\end{cases}
|
||||||
|
\end{align}
|
||||||
|
An advantage of these often used binning rules is that their effect on the approximation is extensively investigated and well understood \cite{wand1994fast} \cite{hall1996accuracy} \cite{holmstrom2000accuracy}.
|
||||||
|
|
||||||
Once the counts $N_j$ and kernel values are computed they need to be combined, which is, in fact, a discrete convolution \cite{wand1994fast}.
|
|
||||||
This makes it possible to apply a wide range of well studied techniques from the DSP field.
|
|
||||||
Often a FFT based computation scheme is used to efficiently compute the estimate \cite{silverman1982algorithm}\cite[210ff.]{scott2015}.
|
|
||||||
As already stated the computational savings are achieved by reducing the number of evaluated kernels.
|
As already stated the computational savings are achieved by reducing the number of evaluated kernels.
|
||||||
A naive implementation of \eqref{eq:binKde} reduces the number evaluations to $\landau{G^2}$ \cite{fan1994fast}.
|
A naive implementation of \eqref{eq:binKde} reduces the number evaluations to $\landau{G^2}$ \cite{fan1994fast}.
|
||||||
Because of the fixed grid spacing $\delta$ most of the kernel evaluations are the same, as each $g_j-g_{j-k}=k\delta$ is independent of $j$ \cite{fan1994fast}.
|
Because of the fixed grid spacing $\delta$ most of the kernel evaluations are the same, as each $g_j-g_{j-k}=k\delta$ is independent of $j$ \cite{fan1994fast}.
|
||||||
Therefore, many evaluated kernels can be reused, so that the number kernel evaluations are reduced to $\landau{G}$ \cite{fan1994fast}.
|
Therefore, many evaluated kernels can be reused, so that the number kernel evaluations are reduced to $\landau{G}$ \cite{fan1994fast}.
|
||||||
|
|
||||||
|
However, more important for this work the fact that the BKDE can be seen as a convolution operation.
|
||||||
|
Once the grid counts $N_j$ in \eqref{eq:gridCnts} and kernel values are computed they need to be combined, which is, in fact, a discrete convolution \cite{wand1994fast}.
|
||||||
|
This makes it possible to apply a wide range of well studied techniques from the DSP field.
|
||||||
|
Often a FFT-convolution based computation scheme is used to efficiently compute the estimate \cite{silverman1982algorithm}\cite[210ff.]{scott2015}.
|
||||||
|
|
||||||
While the estimate can be efficiently computed it is still unknown how large the grid should be chosen.
|
Using the Gaussian kernel from \eqref{eq:gausKern} in conjunction with the BKDE results in the following equation
|
||||||
Because the computation time heavily depends on the grid size, it is desirable to chose a grid as small as possible without losing to much accuracy.
|
\begin{equation}
|
||||||
In general there is no definite answer because the amount of binning depends on the structure of the unknown density and the sample size.
|
\hat{f}(g_x)=\frac{1}{nh\sqrt{2\pi}} \sum_{i=1}^{G} N_j \expp{-\frac{(x-X_i)^2}{2h^2}} \text{.}
|
||||||
The roughness of the unknown density directly affects the grid size.
|
\end{equation}
|
||||||
Coarser grids allow a greater speedup but at the same time might conceal important details of the unknown density \cite{wand1994fast}.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
As already stated the above formula is a convolution operation of the data and the kernel.
|
||||||
|
More precisely it is a discrete convolution of the finite data grid and the Gaussian function.
|
||||||
|
In terms of DSP this is analogous to filter the binned data with a Gaussian filter.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -4,6 +4,38 @@
|
|||||||
% Gauss Blur Filter
|
% Gauss Blur Filter
|
||||||
% Repetitive Box filter to approx Gauss
|
% Repetitive Box filter to approx Gauss
|
||||||
% Simple multipass, n/m approach, extended box filter
|
% Simple multipass, n/m approach, extended box filter
|
||||||
|
|
||||||
|
The Gaussian filter is a popular filter to smooth a signal by convoluting an input signal with the Gaussian function
|
||||||
|
\begin{equation}
|
||||||
|
\label{eq:gausfx}
|
||||||
|
g(x) = \frac{1}{\sigma \sqrt{2\pi}} \expp{-\frac{x^2}{2\sigma^2}} \text{,}
|
||||||
|
\end{equation}
|
||||||
|
where $\sigma$ is a smoothing parameter called standard deviation.
|
||||||
|
|
||||||
|
In the discrete case the Gaussian filter is easily computed with the sliding window algorithm in time domain.
|
||||||
|
Convolution is separable if the filter kernel is separable, i.e. multidimensional convolution can be computed as individual one-dimensional convolutions with a one-dimensional kernel.
|
||||||
|
Because of $\operatorname{e}^{x^2+y^2} = \operatorname{e}^{x^2}\cdot\operatorname{e}^{y^2}$ the Gaussian filter is separable and can be easily applied to multi-dimensional signals.
|
||||||
|
Separability of a filter can also be used to reduce the number of required operations to compute the filter result.
|
||||||
|
|
||||||
|
% TODO ähnlichkeit Gauss und KDE -> schneller Gaus = schnelle KDE
|
||||||
|
|
||||||
|
|
||||||
|
Computation of a filter using the a naive implementation of the sliding window algorithm yields $\landau{NK}$, where $N$ is the length of the input signal and $K$ is the size of the filter kernel.
|
||||||
|
Note that in the case of the Gaussian filter $K$ depends on $\sigma$.
|
||||||
|
In order to capture all significant values of the Gaussian function the kernel size $K$ must be adopted to the standard deviation of the Gaussian.
|
||||||
|
|
||||||
|
|
||||||
|
Another approach to efficiently compute a filter result is the FFT-convoultion algorithm which is a $\landau{N\log(N)}$ operation.
|
||||||
|
For large values of $\sigma$ the computation time of the Gaussian filter might be reduced by applying the filter in frequency domain.
|
||||||
|
Both signals are transformed into frequency domain using the FFT.
|
||||||
|
The filtered result is equal to the point-wise multiplication of the transformed signals.
|
||||||
|
In case of the Gaussian filter the Fourier transform of the kernel can be saved, as the Gaussian is a eigenfunction for the Fourier transform.
|
||||||
|
|
||||||
|
|
||||||
|
While the above mentions algorithms poses efficient computations schemes to compute an exact filter result, approximative algorithms can further speed up the computation.
|
||||||
|
A well-known rapid approximation of the Guassian filter is given by the moving average filter.
|
||||||
|
|
||||||
|
\subsection{Moving Average Filter}
|
||||||
The moving average filter is a simplistic filter which takes an input function $x$ and produces a second function $y$.
|
The moving average filter is a simplistic filter which takes an input function $x$ and produces a second function $y$.
|
||||||
A single output value is computed by taking the average of a number of values symmetrical around a single point in the input.
|
A single output value is computed by taking the average of a number of values symmetrical around a single point in the input.
|
||||||
The number of values in the average can also be seen as the width $w=2r+1$, where $r$ is the \qq{radius} of the filter.
|
The number of values in the average can also be seen as the width $w=2r+1$, where $r$ is the \qq{radius} of the filter.
|
||||||
@@ -13,10 +45,65 @@ The computation of an output value using a moving average filter of radius $r$ i
|
|||||||
y[i]=\frac{1}{2r+1} \sum_{j=-r}^{r}x[i+j] \text{.}
|
y[i]=\frac{1}{2r+1} \sum_{j=-r}^{r}x[i+j] \text{.}
|
||||||
\end{equation}
|
\end{equation}
|
||||||
|
|
||||||
|
% TODO O(N) complexity
|
||||||
|
|
||||||
It is well-known that a moving average filter can approximate a Gaussian filter by repetitive recursive computations.
|
It is well-known that a moving average filter can approximate a Gaussian filter by repetitive recursive computations.
|
||||||
|
\eqref{eq:symMovAvg} is equal to convolution of the input signal and the rectangular function.
|
||||||
|
Given by the central theorem of probabilistic repetitive convolution of a rectangular function with itself will yield a Gaussian in the limit.
|
||||||
|
Filtering a signal with the moving average filter in several passes approximately converges to a Gaussian filter.
|
||||||
|
In practice three or five iterations are most likely enough to obtain a reasonable close Gaussian approximation.
|
||||||
|
This allows rapid approximations of the Gaussian filter as the moving average can be computed with a few additions and multiplications.
|
||||||
|
Opposed to the Gaussian function were exponential functions need to be evaluated for every output value.
|
||||||
|
|
||||||
|
As given in \eqref{eq:gausfx} the solely parameter of the Gaussian kernel is the standard deviation $\sigma$.
|
||||||
|
In contrast the moving average filter is parametrized by its width $w$.
|
||||||
|
Therefore, in order to approximate the Gaussian filter of a given $\sigma$ a corresponding value of $w$ must be found.
|
||||||
|
Given $n$ iterations of moving average filters with identical widths the ideal width $\wideal$, as suggested by Wells~\cite{wells1986efficient}, is
|
||||||
|
\begin{equation}
|
||||||
|
\label{eq:boxidealwidth}
|
||||||
|
\wideal = \sqrt{\frac{12\sigma^2}{n}+1} \text{.}
|
||||||
|
\end{equation}
|
||||||
|
|
||||||
|
In general $\wideal$ can be any real number but the moving average filter in \eqref{eq:symMovAvg} is restricted to odd integer values.
|
||||||
|
Hence the set of possible approximated standard deviations is limited, because the ideal width has to be rounded to the next valid value.
|
||||||
|
In order to reduce the rounding error Kovesi~\cite{kovesi2010fast} proposes to perform two box filters with different widths
|
||||||
|
\begin{align}
|
||||||
|
\label{eq:boxwidthtwo}
|
||||||
|
\begin{split}
|
||||||
|
w_l &=
|
||||||
|
\begin{cases}
|
||||||
|
\floor{w_{\text{ideal}}} - 1 & \text{if } \floor{w_{\text{ideal}}} \text{ is odd} \\
|
||||||
|
\floor{w_{\text{ideal}}} & \text{else }
|
||||||
|
\end{cases} \\
|
||||||
|
w_u &= w_l + 2 \text{.}
|
||||||
|
\end{split}
|
||||||
|
\end{align}
|
||||||
|
|
||||||
|
Given $w_l$ and $w_u$ the approximation is done by computing $m$ box filters of width $w_l$ followed by $(n-m)$ box filters of size $w_u$ where $0\le m\le n$.
|
||||||
|
As all other values are known $m$ can be computed with
|
||||||
|
\begin{equation}
|
||||||
|
\label{eq:boxrepeatm}
|
||||||
|
m=\frac{12\sigma^2-nw_l^2-4nw_l-3n}{-4w_l-4} \text{.}
|
||||||
|
\end{equation}
|
||||||
|
|
||||||
|
The approximated sigma as a function of the integer width has staircase shaped graph.
|
||||||
|
By reducing the rounding error the step size of the function is reduced.
|
||||||
|
However, the overall shape will not change.
|
||||||
|
\etal{Gwosdek}~\cite{gwosdek2011theoretical} proposed an approach which allows to approximate any real-valued value of $\sigma$.
|
||||||
|
This is achieved by
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
As is known the Gaussian filter is parametrized by its standard deviation $\sigma$.
|
|
||||||
To approximate a Gaussian filter one needs to express a given $\sigma$ in terms of moving average filters.
|
|
||||||
|
|
||||||
|
|||||||
@@ -1,2 +1,10 @@
|
|||||||
\section{Usage}
|
\section{Usage}
|
||||||
%wie benutzen wir das ganze jetzt? auf was muss ich achten?
|
%wie benutzen wir das ganze jetzt? auf was muss ich achten?
|
||||||
|
|
||||||
|
% Am Beispiel 2D Daten
|
||||||
|
% Histogram erzeugen (== data binnen)
|
||||||
|
% Hierzu wird min/max benötigt
|
||||||
|
% Anschließend Filterung per Box Filter über das Histogram
|
||||||
|
% - Wenn möglich parallel (SIMD, GPU)
|
||||||
|
% - separiert in jeder dim einzeln
|
||||||
|
% Maximum aus Filter ergebnis nehmen
|
||||||
@@ -112,7 +112,15 @@
|
|||||||
}%
|
}%
|
||||||
}%
|
}%
|
||||||
}
|
}
|
||||||
|
\newcommand{\commentByMarkus}[1]{%
|
||||||
|
\noindent%
|
||||||
|
\fcolorbox{black}{red}{%
|
||||||
|
\parbox[position]{0.45\textwidth}{%
|
||||||
|
\footnotesize%
|
||||||
|
{\bf Markus:} #1%
|
||||||
|
}%
|
||||||
|
}%
|
||||||
|
}
|
||||||
\newcommand{\docRSSI}{RSSI}
|
\newcommand{\docRSSI}{RSSI}
|
||||||
\newcommand{\docTX}{TX}
|
\newcommand{\docTX}{TX}
|
||||||
\newcommand{\docLogDist}{log-distance}
|
\newcommand{\docLogDist}{log-distance}
|
||||||
|
|||||||
Reference in New Issue
Block a user