This commit is contained in:
2018-02-20 15:51:43 +01:00
parent 62d891b5ba
commit 0b842a2f62
4 changed files with 31 additions and 30 deletions

View File

@@ -113,6 +113,8 @@
\newcommand{\Lideal} {\ensuremath{ L_{\text{ideal}} }}
\newcommand{\floor} [1]{\ensuremath{ \lfloor #1 \rfloor }}
\newcommand{\etal} [1]{#1~et~al.}
\newcommand{\G} [2]{\ensuremath{ \mathcal{N} \left(#1,#2\right) }}
\newcommand{\VecTwo}[2]{\ensuremath{\left[\begin{smallmatrix} #1 \\ #2 \end{smallmatrix}\right] }}
\newcommand{\qq} [1]{``#1''}

View File

@@ -1,21 +1,34 @@
\section{Experiments}
We now empirically evaluate the accuracy of our method and compare its runtime performance with other state of the art approaches.
To conclude our findings we present a real world example from a indoor localisation system.
All tests are performed on a Intel Core \mbox{i5-7600K} CPU with a frequency of $4.5 \text{GHz}$, which supports the AVX2 instruction set, hence 256-bit wide SIMD registers are available.
We compare our C++ implementation of the box filter based KDE to the KernSmooth R package and the \qq{FastKDE} implementation \cite{oBrien2016fast}.
The KernSmooth packages provides a FFT-based BKDE implementation based on optimized C functions at its core.
\subsection{Error}
In order to quantity the accuracy of our method the mean integrated squared error (MISE) is used.
The ground truth is given as a synthetic data set drawn from a mixture normal density.
Clearly, the choice of the ground truth distribution affects the resulting error.
However, as our method approximates the KDE it is only of interest to evaluate the closeness to the KDE and not to the ground truth itself.
\subsection{Mean Integrated Squared Error}
We now empirically evaluate the accuracy of our method, using the mean integrated squared error (MISE).
The ground truth is given as $N=1000$ synthetic samples drawn from a bivariate mixture normal density $f$
\begin{equation}
\begin{split}
\bm{X} \sim &\G{\VecTwo{0}{0}}{0.5\bm{I}} + \G{\VecTwo{3}{0}}{\bm{I}} \\
&+ \G{\VecTwo{0}{3}}{\bm{I}} + \G{\VecTwo{-3}{0} }{\bm{I}} + \G{\VecTwo{0}{-3}}{\bm{I}}
\end{split}
\end{equation}
where the majority of the probability mass lies in the range $[-6; 6]^2$.
Clearly, the structure of the ground truth affects the error in the estimate, but as our method approximates the KDE only the closeness to the KDE is of interest.
Therefore, the particular choice of the ground truth is only of minor importance here.
At first we evaluate the accuracy of our method as a function of the bandwidth $h$ in comparison to the exact KDE and the BKDE.
Both the BKDE and the extended box filter estimate resemble the error curve of the KDE quite well and stable.
They are rather close to each other, with a tendency to diverge for larger $h$.
In contrast, the error curve of the box filter estimate has noticeable jumps at $h=(0.4; 0.252; 0.675; 0.825)$.
These jumps are caused by the rounding of the integer-valued box width given by \eqref{eq:boxidealwidth}.
As the extend box filter is able to approximate an exact $\sigma$, it lacks these discontinues.
The exact KDE, evaluated at $50^2$ points, is compared to the BKDE, box filter, and extended box filter approximation, which are evaluated at a smaller grid with $30^2$ points.
The MISE between $f$ and the estimates as a function of $h$ are evaluated, and the resulting plot is given in figure~\ref{fig:evalBandwidth}.
\begin{figure}
\label{fig:evalBandwidth}
\end{figure}
Other test cases of theoretical relevance are error as a function of the grid size $G$ and the sample size $N$.
However, both cases do not give a deeper insight of the error behaviour of our method, as it closely mimics the error curve of the KDE and only confirm the theoretical expectations.
% kde, box filter, exbox in abhänigkeit von h (bild)
% sample size und grid size text
@@ -23,5 +36,8 @@ At first we evaluate the accuracy of our method as a function of the bandwidth $
\subsection{Performance}
All tests are performed on a Intel Core \mbox{i5-7600K} CPU with a frequency of $4.5 \text{GHz}$, which supports the AVX2 instruction set, hence 256-bit wide SIMD registers are available.
We compare our C++ implementation of the box filter based KDE to the KernSmooth R package and the \qq{FastKDE} implementation \cite{oBrien2016fast}.
The KernSmooth packages provides a FFT-based BKDE implementation based on optimized C functions at its core.
\subsection{Real World}

View File

@@ -116,18 +116,5 @@ Just like the conventional box filter, the extended version has a uniform value
This extension introduces only marginal computational overhead over conventional box filtering.
\commentByToni{Warum benutzen wir den extended box filter nicht? oder tun wir das? Liest sich so, als wäre er der heilige Gral.
Ansonsten: Kapitel find ich gut. Vielleicht kann man hier und da noch paar sätze fusionieren um etwas Platz zu sparen.}
\commentByToni{Aber irgendwie fehlt mir hier noch so ein Absatz oder Kapitel "Zusammenstecken" der Mathematik. Die Usage kommt wieder so Hart. Und nochmal eine kleine Diskussion zum Zusammenstecken. Vielleicht kann man das mit dem 2D verbinden? Weil aktuell haben wir die beiden Verfahren besprochen, aber der eigene Anteil ist irgendwie nicht ersichtlich. Was war jetzt deine tolle Leistung hier? Das muss durch eine gute Diskussion klar werden.}

View File

@@ -51,7 +51,7 @@ These limits might be given by the application, for example, the position of a p
Such knowledge should be integrated into the system to avoid a linear search over the sample set, naturally reducing the computation time.
Given the extreme values of the samples and grid sizes $G_1$ and $G_2$ defined by the user, a $G_1\times G_2$ grid can be constructed, using a binning rule from \eqref{eq:simpleBinning} or \eqref{eq:linearBinning}.
As the number of grid points directly affects both computation time and accuracy, a suitable grid should be as coarse as possible but at the same time narrow enough to produce an estimate sufficiently fast with an acceptable approximation error.
As the number of grid points directly affects both computation time and accuracy, a suitable grid should be as coarse as possible, but at the same time narrow enough to produce an estimate sufficiently fast with an acceptable approximation error.
If the extreme values are known in advanced, the computation of the grid is $\landau{N}$, otherwise an additional $\landau{N}$ search is required.
The grid is stored as an linear array in memory, thus its space complexity is $\landau{G_1\cdot G_2}$.
@@ -74,7 +74,3 @@ Due to its simple indexing scheme, the recursive box filter can easily be comput
Finally, the most likely state can be obtained from the filtered data, i.e. from the estimated discrete density, by searching filtered data for its maximum value.
\commentByToni{An sich ganz cooles Kapitel, aber wir müssen den Bezug nach oben stärker aufbauen. Also die Formeln zitieren. irgendwie halt nach oben referenzieren, damit niemand abgehängt wird.
Würde es Sinn machen das obere irgendwie Algorithmisch darzustellen? Also mit Pseudocode? Weil irgendwie/wo müssen wir ja "DAS IST UNSER APPROACH" stehen haben}.