added todos to all chapters

This commit is contained in:
toni
2018-02-14 19:30:33 +01:00
parent 040572cfe6
commit a6bf369407
5 changed files with 52 additions and 38 deletions

View File

@@ -36,6 +36,8 @@ Especially in time critical and time sequential sensor fusion scenarios, the her
In addition, it requires only a few elementary operations and is highly parallelizable.
\todo{Mehrdimensionen mit aufnehmen. das das abgedeckt ist! }
%linear complexity and easy parall
%ist immer gleich schnell.
%andere rießen daten, wir weniger daten.

View File

@@ -8,10 +8,11 @@
% Binned Gauss KDE
% -> complexity/operation count
The histogram is a simple and for a long time the most used non-parametric estimator.
However, its inability to produce a continuous estimate dismisses it for many applications where a smooth distribution is assumed.
In contrast, KDE is often the preferred tool because of its ability to produce a continuous estimate and its flexibility.
%The histogram is a simple and for a long time the most used non-parametric estimator.
%However, its inability to produce a continuous estimate dismisses it for many applications where a smooth distribution is assumed.
%In contrast,
The KDE is often the preferred tool to estimate a density function from discrete data samples because of its ability to produce a continuous estimate and its flexibility.
%
Given a univariate random sample $X=\{X_1, X_2, \dots, X_n\}$, the kernel estimator $\hat{f}$ which defines the estimate at the point $x$ is given as
\begin{equation}
\label{eq:kde}
@@ -19,29 +20,29 @@ Given a univariate random sample $X=\{X_1, X_2, \dots, X_n\}$, the kernel estima
\end{equation}
where $K_h(t)=K(t/h)/h$ is the normalized kernel \cite[138]{scott2015} and $h\in\R^+$ is an arbitrary smoothing parameter called bandwidth.
%, and $h=h_n$ is a function of the sample size $n$ with $h\rightarrow0$ as $n\rightarrow\infty$ \cite{rosenblatt1956remarks}.
Any function which satisfy $\int K_h(u) \dop{u} = 1$ is a valid kernel.
Any function which satisfies $\int K_h(u) \dop{u} = 1$ is a valid kernel.
In general any kernel can be used, however the general advice is to chose a symmetric and low-order polynomial kernel.
Thus, several popular kernel functions are used in practice, like the Uniform, Gaussian, Epanechnikov, or Silverman kernel \cite[152.]{scott2015}.
While the kernel estimate inherits all the properties of the kernel, usually it is not of crucial matter if a non-optimal kernel was chosen \cite[151f.]{scott2015}.
As a matter of fact, the quality of the kernel estimate is primarily determined by the smoothing parameter $h$ \cite[145]{scott2015}.
In theory it is possible to calculate an optimal bandwidth $h^*$ regarding to the asymptotic mean integrated squared error.
However, in order to do so the density function to be estimated needs to be known which is obviously unknown in practice.
Any non-optimal bandwidth causes undersmoothing or oversmoothing.
An undersmoothing estimator has a large variance and hence a small $h$ leads to undersmoothing.
On the other hand given a large $h$ the bias increases, which leads to oversmoothing \cite[7]{Cybakov2009}.
Clearly with an adverse choice of the bandwidth crucial information like modality might get smoothed out.
All in all it is not obvious to determine a good choice of the bandwidth.
This is aggravated by the fact that the structure of the data may vary significantly.
Given such a situation it is beneficial to adapt the bandwidth to the neighbourhood of the given data point.
As a result, a lot of research is put into developing data-driven bandwidth selections algorithms to obtain an adequate value of $h$ directly from the data.
%In theory it is possible to calculate an optimal bandwidth $h^*$ regarding to the asymptotic mean integrated squared error.
%However, in order to do so the density function to be estimated needs to be known which is obviously unknown in practice.
%
%Any non-optimal bandwidth causes undersmoothing or oversmoothing.
%An undersmoothing estimator has a large variance and hence a small $h$ leads to undersmoothing.
%On the other hand given a large $h$ the bias increases, which leads to oversmoothing \cite[7]{Cybakov2009}.
%Clearly with an adverse choice of the bandwidth crucial information like modality might get smoothed out.
%All in all it is not obvious to determine a good choice of the bandwidth.
%
%This is aggravated by the fact that the structure of the data may vary significantly.
%Given such a situation it is beneficial to adapt the bandwidth to the neighbourhood of the given data point.
%As a result, a lot of research is put into developing data-driven bandwidth selections algorithms to obtain an adequate value of $h$ directly from the data.
% TODO aus gründen wird hier die Bandbreite als gegeben angenommen
As mentioned above the particular choice of the kernel is only of minor importance as it affects the overall result in an negligible way.
%
%As mentioned above the particular choice of the kernel is only of minor importance as it affects the overall result in an negligible way.
It is common practice to suspect that the data is approximately Gaussian, and therefore the Gaussian kernel is frequently used.
Note that this assumption is different compared to assuming a concrete distribution family like a Gaussian distribution or mixture distribution.
%Note that this assumption is different compared to assuming a concrete distribution family like a Gaussian distribution or mixture distribution.
In this work we choose the Gaussian kernel in favour of computational efficiency as our approach is based on the approximation of the Gaussian filter.
The Gaussian kernel is given as
\begin{equation}
@@ -83,25 +84,21 @@ where the bandwidth is given as a vector $\bm{h}=(h_1, \dots, h_d)$.
%\end{equation}
The flexibility of the KDE comes at the expense of computational efficiency, which leads to the development of more efficient computation schemes.
The computation time depends, besides the number of calculated points, on the number of data points $n$.
The computation time depends, besides the number of calculated points, on the number of data points $N$.
In general, reducing the size of the sample negatively affects the accuracy of the estimate.
Still, the sample size is a suitable parameter to speedup the computation.
\todo{neu schreiben}
Silverman \cite{silverman1982algorithm} suggested to reduce the number of single data points by combining adjacent points into data bins.
This approximation is called binned kernel density estimate (BKDE) and was extensively analysed \cite{fan1994fast} \cite{wand1994fast} \cite{hall1996accuracy} \cite{holmstrom2000accuracy}.
Usually the data is binned over an equidistant grid.
Due to the equally-spaced grid many kernel evaluations are almost the same and can be saved, which greatly reduces the number of evaluated kernels and naturally leads to a reduced computation time \cite{fan1994fast}.
\todo{bin size variable einführen}
At first the data, i.e. a random sample $X$, has to be assigned to a grid.
A binning rule distributes a sample $x$ among the grid points $g_j=j\delta$ for $j\in\Z$ and can be represented as a set of functions $\{ w_j(x,\delta), j\in\Z \}$.
For computation a finite grid is used on the interval $[a,b]$ containing the data, thus the number of grid points is $G=(b-a)/\delta+1$.
While the estimate can be efficiently computed it is unknown how large the grid should be chosen.
Because the computation time heavily depends on the grid size, it is desirable to chose a grid as small as possible without losing to much accuracy.
In general, there is no definite answer because the amount of binning depends on the structure of the unknown density and the sample size.
The roughness of the unknown density directly affects the grid size.
Coarser grids allow a greater speedup but at the same time might conceal important details of the unknown density \cite{wand1994fast}.
Given a binning rule $w_j$ the BKDE $\tilde{f}$ of a density $f$ computed pointwise at the grid point $g_x$ is given as
\begin{equation}
@@ -138,12 +135,21 @@ and the common linear binning rule
\end{align}
An advantage of these often used binning rules is that their effect on the approximation is extensively investigated and well understood \cite{wand1994fast} \cite{hall1996accuracy} \cite{holmstrom2000accuracy}.
\todo{textfluß}
While the estimate can be efficiently computed it is unknown how large the grid should be chosen.
Because the computation time heavily depends on the grid size, it is desirable to chose a grid as small as possible without losing to much accuracy.
In general, there is no definite answer because the amount of binning depends on the structure of the unknown density and the sample size.
The roughness of the unknown density directly affects the grid size.
Coarser grids allow a greater speedup but at the same time might conceal important details of the unknown density \cite{wand1994fast}.
As already stated the computational savings are achieved by reducing the number of evaluated kernels.
A naive implementation of \eqref{eq:binKde} reduces the number evaluations to $\landau{G^2}$ \cite{fan1994fast}.
Because of the fixed grid spacing $\delta$ most of the kernel evaluations are the same, as each $g_j-g_{j-k}=k\delta$ is independent of $j$ \cite{fan1994fast}.
Therefore, many evaluated kernels can be reused, so that the number kernel evaluations are reduced to $\landau{G}$ \cite{fan1994fast}.
However, more important for this work the fact that the BKDE can be seen as a convolution operation.
\todo{Satz}
Once the grid counts $N_j$ in \eqref{eq:gridCnts} and kernel values are computed they need to be combined, which is, in fact, a discrete convolution \cite{wand1994fast}.
This makes it possible to apply a wide range of well studied techniques from the DSP field.
Often a FFT-convolution based computation scheme is used to efficiently compute the estimate \cite{silverman1982algorithm}\cite[210ff.]{scott2015}.
@@ -153,6 +159,7 @@ Using the Gaussian kernel from \eqref{eq:gausKern} in conjunction with the BKDE
\hat{f}(g_x)=\frac{1}{nh\sqrt{2\pi}} \sum_{i=1}^{G} N_j \expp{-\frac{(x-X_i)^2}{2h^2}} \text{.}
\end{equation}
\todo{großes N zu großes C und im Text unten benutzen damit klarer}
As already stated the above formula is a convolution operation of the data and the kernel.
More precisely it is a discrete convolution of the finite data grid and the Gaussian function.
In terms of DSP this is analogous to filter the binned data with a Gaussian filter.

View File

@@ -5,6 +5,7 @@
% Repetitive Box filter to approx Gauss
% Simple multipass, n/m approach, extended box filter
\todo{normalisierungsfaktor, sigma vs. h beschreiben}
The Gaussian filter is a widely used smoothing filter.
It is defined as the convolution of an input signal and the Gaussian function
\begin{equation}
@@ -13,26 +14,27 @@ g(x) = \frac{1}{\sigma \sqrt{2\pi}} \expp{-\frac{x^2}{2\sigma^2}} \text{,}
\end{equation}
where $\sigma$ is a smoothing parameter called standard deviation.
In the discrete case the Gaussian filter is easily computed with the sliding window algorithm in time domain.
It is easily extended to multi-dimensional signals, as convolution is separable if the filter kernel is separable, i.e. multidimensional convolution can be computed as individual one-dimensional convolutions with a one-dimensional kernel.
Because of $\operatorname{e}^{x^2+y^2} = \operatorname{e}^{x^2}\cdot\operatorname{e}^{y^2}$ the Gaussian filter is separable and can be easily applied to multi-dimensional signals.
%In the discrete case the Gaussian filter is easily computed with the sliding window algorithm in time domain.
If the filter kernel is separable, the convolution is also separable i.e. multi-dimensional convolution can be computed as individual one-dimensional convolutions with a one-dimensional kernel.
Because of $\operatorname{e}^{x^2+y^2} = \operatorname{e}^{x^2}\cdot\operatorname{e}^{y^2}$ the Gaussian filter is separable and can be easily applied to multi-dimensional signals. \todo{quelle}
% TODO ähnlichkeit Gauss und KDE -> schneller Gaus = schnelle KDE
Computation of a filter using the a naive implementation of the sliding window algorithm yields $\landau{NK}$, where $N$ is the length of the input signal and $K$ is the size of the filter kernel.
Computation of a filter using the a naive implementation of the discrete convolution algorithm yields $\landau{NK}$, where $N$ is the length of the input signal and $K$ is the size of the filter kernel.
Note that in the case of the Gaussian filter $K$ depends on $\sigma$.
In order to capture all significant values of the Gaussian function the kernel size $K$ must be adopted to the standard deviation of the Gaussian.
A popular approach to efficiently compute a filter result is the FFT-convoultion algorithm which is $\landau{N\log(N)}$.
For large values of $\sigma$ the computation time of the Gaussian filter might be reduced by applying the filter in frequency domain.
In order to do so, both signals are transformed into frequency domain using the FFT.
The convoluted time signal is equal to the point-wise multiplication of the signals in frequency domain.
In case of the Gaussian filter the computation of the Fourier transform of the kernel can be saved, as the Gaussian is a eigenfunction for the Fourier transform \cite{?}.
%A popular approach to efficiently compute a filter result is the FFT-convoultion algorithm which is $\landau{N\log(N)}$.
%For large values of $\sigma$ the computation time of the Gaussian filter might be reduced by applying the filter in frequency domain.
%In order to do so, both signals are transformed into frequency domain using the FFT.
%The convoluted time signal is equal to the point-wise multiplication of the signals in frequency domain.
%In case of the Gaussian filter the computation of the Fourier transform of the kernel can be saved, as the Gaussian is a eigenfunction for the Fourier transform \cite{?}.
While the FFT-convolution algorithm poses an efficient algorithm for large signals, it adds an noticeable overhead for small signals.
%While the FFT-convolution algorithm poses an efficient algorithm for large signals, it adds an noticeable overhead for small signals.
While the above mentions algorithms poses efficient computations schemes to compute an exact filter result, approximative algorithms can further speed up the computation.
%While the above mentions algorithms poses efficient computations schemes to compute an exact filter result, approximative algorithms can further speed up the computation.
\todo{o(nk) ist scheiße und wir wollen o(n) haben, deshalb box filter boy}
A well-known rapid approximation of the Guassian filter is given by the moving average filter.
\subsection{Moving Average Filter}

View File

@@ -44,6 +44,7 @@ However, reducing the sample size by distributing the data on a equidistant grid
Silverman \cite{silverman1982algorithm} originally suggested to combine adjacent data points into data bins, which results in a discrete convolution structure of the KDE.
Allowing to efficiently compute the estimate using a FFT algorithm.
This approximation scheme was later called binned KDE (BKDE) and was extensively studied \cite{fan1994fast} \cite{wand1994fast} \cite{hall1996accuracy} \cite{holmstrom2000accuracy}.
While the FFT algorithm poses an efficient algorithm for large sample sets, it adds an noticeable overhead for smaller ones.
The idea to approximate a Gaussian filter using several box filters was first formulated by Wells \cite{wells1986efficient}.
Kovesi \cite{kovesi2010fast} suggested to use two box filters with different widths to increase accuracy maintaining the same complexity.

View File

@@ -9,6 +9,8 @@
% - separiert in jeder dim einzeln
% Maximum aus Filter ergebnis nehmen
\todo{Absatz zum Thema 2D}
The objective of our method is to allow reliable recover the most probable state from a time-sequential Monte Carlo sensor fusion system.
Assuming a sample based representation, our method allows to estimate the density of the unknown distribution of the state space in a narrow time frame.
Such systems are often used to obtain an estimation of the most probable state in near real time.