Fixed FE 1

2018-03-12 22:21:39 +01:00
parent c224967b19
commit 316b1d2911
11 changed files with 76 additions and 72 deletions
--- a/tex/bare_conf.tex
+++ b/tex/bare_conf.tex
@@ -119,6 +119,8 @@
 \newcommand{\VecTwo}[2]{\ensuremath{\left[\begin{smallmatrix} #1 \\ #2 \end{smallmatrix}\right] }}

 \newcommand{\qq}    [1]{``#1''}
+\newcommand{\eg}       {e.\,g.}
+\newcommand{\ie}       {i.\,e.}

 % missing math operators
 \DeclareMathOperator*{\argmin}{arg\,min}
--- a/tex/chapters/abstract.tex
+++ b/tex/chapters/abstract.tex
@@ -1,14 +1,14 @@
 \begin{abstract}
 It is common practice to use a sample-based representation to solve problems having a probabilistic interpretation.
-In many real world scenarios one is then interested in finding a \qq{best estimate} of the underlying problem, e.g. the position of a robot.
+In many real world scenarios one is then interested in finding a \qq{best estimate} of the underlying problem, \eg{} the position of a robot.
 This is often done by means of simple parametric point estimators, providing the sample statistics. 
 However, in complex scenarios this frequently results in a poor representation, due to multimodal densities and limited sample sizes.

-Recovering the probability density function using a kernel density estimation yields a promising approach to  solve the state estimation problem i.e. finding the \qq{real} most probable state, but comes with high computational costs.
+Recovering the probability density function using a kernel density estimation yields a promising approach to  solve the state estimation problem \ie{} finding the \qq{real} most probable state, but comes with high computational costs.
 Especially in time critical and time sequential scenarios, this turns out to be impractical.
 Therefore, this work uses techniques from digital signal processing in the context of estimation theory, to allow rapid computations of kernel density estimates. 
 The gains in computational efficiency are realized by substituting the Gaussian filter with an approximate filter based on the box filter.
 Our approach outperforms other state of the art solutions, due to a fully linear complexity and a negligible overhead, even for small sample sets.
-Finally, our findings are tried and tested within a real world sensor fusion system.
+Finally, our findings are evaluated and tested within a real world sensor fusion system.
 \end{abstract}

--- a/tex/chapters/conclusion.tex
+++ b/tex/chapters/conclusion.tex
@@ -1,10 +1,10 @@
 \section{Conclusion}

-Within this paper a novel approach for rapid computation of the KDE was presented. 
+Within this paper a novel approach for rapid approximation of the KDE was presented. 
 This is achieved by considering the discrete convolution structure of the BKDE and thus elaborate its connection to digital signal processing, especially the Gaussian filter.
 Using a box filter as an appropriate approximation results in an efficient computation scheme with a fully linear complexity and a negligible overhead, as confirmed by the utilized experiments.

-The analysis of the error showed that the method exhibits an expected error behaviour compared to the BKDE.
+The analysis of the error showed that the method exhibits an similar error behaviour compared to the BKDE.
 In terms of calculation time, our approach outperforms other state of the art implementations.
 Despite being more efficient than other methods, the algorithmic complexity still increases in its exponent with increasing number of dimensions. 

--- a/tex/chapters/experiments.tex
+++ b/tex/chapters/experiments.tex
@@ -3,7 +3,7 @@
 \subsection{Mean Integrated Squared Error}


-We now empirically evaluate the accuracy of our boxKDE method, using the mean integrated squared error (MISE).
+We now empirically evaluate the accuracy of our BoxKDE method, using the mean integrated squared error (MISE).
 The ground truth is given with $N=1000$ synthetic samples drawn from a bivariate mixture normal density $f$
 \begin{equation}
 \begin{split}
@@ -17,20 +17,20 @@ Therefore, the particular choice of the ground truth is only of minor importance

 \begin{figure}[t]
    \input{gfx/error.tex}
-    \caption{MISE relative to the ground truth as a function of $h$. While the error curves of the BKDE (red) and the boxKDE based on the extended box filter (orange dotted line) resemble the overall course of the error of the exact KDE (green), the regular boxKDE (orange) exhibits noticeable jumps to rounding.}  \label{fig:errorBandwidth}
+    \caption{MISE relative to the ground truth as a function of $h$. While the error curves of the BKDE (red) and the BoxKDE based on the extended box filter (orange dotted line) resemble the overall course of the error of the exact KDE (green), the regular BoxKDE (orange) exhibits noticeable jumps to rounding.}  \label{fig:errorBandwidth}
 \end{figure}

-Evaluated at $50^2$ points the exact KDE is compared to the BKDE, boxKDE, and extended box filter approximation, which are evaluated at a smaller grid with $30^2$ points.
+Evaluated at $50^2$ points the exact KDE is compared to the BKDE, BoxKDE, and extended box filter approximation, which are evaluated at a smaller grid with $30^2$ points.
 The MISE between $f$ and the estimates as a function of $h$ are evaluated, and the resulting plot is given in fig.~\ref{fig:errorBandwidth}.
 A minimum error is obtained with $h=0.35$, for larger oversmoothing occurs and the modes gradually fuse together.

 Both the BKDE and the extended box filter estimate resemble the error curve of the KDE quite well and stable.
 They are rather close to each other, with a tendency to diverge for larger $h$.
-In contrast, the error curve of the boxKDE has noticeable jumps at $h=(0.4; 0.252; 0.675; 0.825)$.
+In contrast, the error curve of the BoxKDE has noticeable jumps at $h=(0.4; 0.252; 0.675; 0.825)$.
 These jumps are caused by the rounding of the integer-valued box width given by \eqref{eq:boxidealwidth}.

-As the extend box filter is able to approximate an exact $\sigma$, it lacks these discontinues.
-Consequently, it reduces the overall error of the approximation, but only marginal in this scenario.
+As the extend box filter is able to approximate an exact $\sigma$, these discontinues don't appear.
+Consequently, it reduces the overall error of the approximation, but only marginally in this scenario.
 The global average MISE over all values of $h$ is $0.0049$ for the regular box filter and $0.0047$ in case of the extended version.
 Likewise, the maximum MISE is $0.0093$ and $0.0091$, respectively.
 The choice between the extended and regular box filter algorithm depends on how large the acceptable error should be, thus on the particular application.
@@ -42,7 +42,7 @@ However, both cases do not give a deeper insight of the error behavior of our me
 \begin{figure}[t]
 	%\includegraphics[width=\textwidth,height=6cm]{gfx/tmpPerformance.png}
 	\input{gfx/perf.tex}
-    \caption{Logarithmic plot of the runtime performance with increasing grid size $G$ and bivariate data. The weighted-average estimate (blue) performs fastest followed by the boxKDE (orange) approximation. Both the BKDE (red) and the fastKDE (green) are magnitudes slower, especially for $G<10^3$.}\label{fig:performance}
+    \caption{Logarithmic plot of the runtime performance with increasing grid size $G$ and bivariate data. The weighted-average estimate (blue) performs fastest followed by the BoxKDE (orange) approximation. Both the BKDE (red) and the FastKDE (green) are magnitudes slower, especially for $G<10^3$.}\label{fig:performance}
 \end{figure}

 % kde, box filter, exbox in abhänigkeit von h (bild)
@@ -53,18 +53,18 @@ However, both cases do not give a deeper insight of the error behavior of our me
 \subsection{Performance}
 In the following, we underpin the promising theoretical linear time complexity of our method with empirical time measurements compared to other methods.
 All tests are performed on a Intel Core \mbox{i5-7600K} CPU with a frequency of \SI{4.2}{\giga\hertz}, and \SI{16}{\giga\byte} main memory.
-We compare our C++ implementation of the boxKDE approximation as shown in algorithm~\ref{alg:boxKDE} to the \texttt{ks} R package and the fastKDE Python implementation \cite{oBrien2016fast}.
+We compare our C++ implementation of the BoxKDE approximation as shown in algorithm~\ref{alg:boxKDE} to the \texttt{ks} R package and the FastKDE Python implementation \cite{oBrien2016fast}.
 The \texttt{ks} package provides a FFT-based BKDE implementation based on optimized C functions at its core.
 With state estimation problems in mind, we additionally provide a C++ implementation of a weighted-average estimator.
-As both methods are not using a grid, an equivalent input sample set was used for the weighted-average and the fastKDE.
+As both methods are not using a grid, an equivalent input sample set was used for the weighted-average and the FastKDE.

-The results for performance comparison are presented in fig.~\ref{fig:performance}.
+The results of the performance comparison are presented in fig.~\ref{fig:performance}.
 % O(N) gut erkennbar für box KDE und weighted average
-The linear complexity of the boxKDE and the weighted average is clearly visible.
-% Gerade bei kleinen G bis 10^3 ist die box KDE schneller als R und fastKDE, aber das WA deutlich schneller als alle anderen
-Especially for small $G$ up to $10^3$ the boxKDE is much faster compared to BKDE and fastKDE.
+The linear complexity of the BoxKDE and the weighted average is clearly visible.
+% Gerade bei kleinen G bis 10^3 ist die box KDE schneller als R und FastKDE, aber das WA deutlich schneller als alle anderen
+Especially for small $G$ up to $10^3$ the BoxKDE is much faster compared to BKDE and FastKDE.
 % Bei zunehmend größeren G wird der Abstand zwischen box KDE und WA größer.
-Nevertheless, the simple weighted-average approach performs the fastest and with increasing $G$ the distance to the boxKDE grows constantly. 
+Nevertheless, the simple weighted-average approach performs the fastest and with increasing $G$ the distance to the BoxKDE grows constantly. 
 However, it is obvious that this comes with major disadvantages, like being prone to multimodalities, as discussed in section \ref{sec:intro}. 
 % (Das kann auch daran liegen, weil das Binning mit größeren G langsamer wird, was ich mir aber nicht erklären kann! Vlt Cache Effekte)

@@ -82,13 +82,13 @@ The termination of BKDE graph at $G=4406^2$ is caused by an out of memory error
 % Sowohl der box filter als auch der extended box filter haben ein sehr ähnliches Laufzeit Verhalten und somit einen sehr ähnlichen Kurvenverlauf.
 % Während die durschnittliche Laufzeit über alle Werte von G beim box filter bei 0.4092s liegt, benötigte der extended box filter im Durschnitt 0.4169s.
 Both discussed Gaussian filter approximations, namely box filter and extended box filter, yield a similar runtime behavior and therefore a similar curve progression.
-While the average runtime over all values of $G$ for the standard box filter is \SI{0.4092}{\second}, the extended one provides an average of \SI{0.4169}{\second}. 
-To keep the arrangement of fig. \ref{fig:performance} clear, we only illustrated the results of the boxKDE with the regular box filter. 
+While the average runtime over all values of $G$ for the standard box filter is \SI{0.4092}{\second}, the extended one has an average of \SI{0.4169}{\second}. 
+To keep the arrangement of fig. \ref{fig:performance} clear, we only illustrated the results of the BoxKDE with the regular box filter. 

 The weighted-average has the great advantage of being independent of the dimensionality of the input and can be implemented effortlessly.
-In contrast, the computation of the boxKDE approach increases exponentially with increasing number of dimensions.
-However, due to the linear time complexity and the very simple computation scheme, the overall computation time is still sufficient fast for many applications and much smaller compared to other methods.
-The boxKDE approach presents a reasonable alternative to the weighted-average and is easily integrated into existing systems. 
+In contrast, the computation of the BoxKDE approach increases exponentially with increasing number of dimensions.
+However, due to the linear time complexity and the very simple computation scheme, the overall computation time is still sufficiently fast for many applications and much smaller compared to other methods.
+The BoxKDE approach presents a reasonable alternative to the weighted-average and is easily integrated into existing systems. 

 In addition, modern CPUs do benefit from the recursive computation scheme of the box filter, as the data exhibits a high degree of spatial locality in memory and the accesses are reliable predictable.
 Furthermore, the computation is easily parallelized, as there is no data dependency between the one-dimensional filter passes in algorithm~\ref{alg:boxKDE}.
--- a/tex/chapters/introduction.tex
+++ b/tex/chapters/introduction.tex
@@ -3,10 +3,10 @@

 Sensor fusion approaches are often based upon probabilistic descriptions like particle filters, using samples to represent the distribution of a dynamical system. 
 To update the system recursively in time, probabilistic sensor models process the noisy measurements and a state transition function provides the system's dynamics. 
-Therefore a sample or particle is a representation of one possible system state, e.g. the position of a pedestrian within a building. 
+Therefore a sample or particle is a representation of one possible system state, \eg{} the position of a pedestrian within a building. 
 In most real world scenarios one is then interested in finding the most probable state within the state space, to provide the best estimate of the underlying problem, generally speaking, solving the state estimation problem.
 In the discrete manner of a sample representation this is often done by providing a single value, also known as sample statistic, to serve as a \qq{best guess}.  
-This value is then calculated by means of simple parametric point estimators, e.g. the weighted-average over all samples, the sample with the highest weight or by assuming other parametric statistics like normal distributions \cite{Fetzer2016OMC}. 
+This value is then calculated by means of simple parametric point estimators, \eg{} the weighted-average over all samples, the sample with the highest weight or by assuming other parametric statistics like normal distributions \cite{Fetzer2016OMC}. 
 %da muss es doch noch andere methoden geben... verflixt und zugenäht... aber grundsätzlich ist ein weighted average doch ein point estimator? (https://www.statlect.com/fundamentals-of-statistics/point-estimation)
 %Für related work brauchen wir hier definitiv quellen. einige berechnen ja auch https://en.wikipedia.org/wiki/Sample_mean_and_covariance oder nehmen eine gewisse verteilung für die sample menge and und berechnen dort die parameter

@@ -19,14 +19,14 @@ Additionally, in most practical scenarios the sample size and therefore the reso
 It is obvious, that a computation of the full posterior could solve the above, but finding such an analytical solution is an intractable problem, which is the reason for applying a sample representation in the first place. 
 Another promising way is to recover the probability density function from the sample set itself, by using a non-parametric estimator like a kernel density estimation (KDE). 
 With this, the \qq{real} most probable state is given by the maxima of the density estimation and thus avoids the aforementioned drawbacks.
-However, non-parametric estimators tend to consume a large amount of computational time, which renders them unpractical for real time scenarios.
+However, non-parametric estimators tend to consume a large amount of computation time, which renders them unpractical for real time scenarios.
 Nevertheless, the availability of a fast processing density estimate might improve the accuracy of today's sensor fusion systems without sacrificing their real time capability.

 %Therefore, this paper presents a novel approximation approach for rapid computation of the KDE. 
 %In this paper, a well known approximation of the Gaussian filter is used to speed up the computation of the KDE. 
 In this paper, a novel approximation approach for rapid computation of the KDE is presented.
 The basic idea is to interpret the estimation problem as a filtering operation.
-We show that computing the KDE with a Gaussian kernel on pre-binned data is equal to applying a Gaussian filter on the binned data.
+We show that computing the KDE with a Gaussian kernel on binned data is equal to applying a Gaussian filter on the binned data.
 This allows us to use a well known approximation scheme for Gaussian filters: the box filter.
 By the central limit theorem, multiple recursion of a box filter yields an approximative Gaussian filter \cite{kovesi2010fast}.

--- a/tex/chapters/kde.tex
+++ b/tex/chapters/kde.tex
@@ -13,7 +13,7 @@
 %In contrast, 
 The KDE is often the preferred tool to estimate a density function from discrete data samples because of its flexibility and ability to produce a continuous estimate.
 %
-Given a univariate random sample set $X=\{X_1, \dots, X_N\}$, where $X$ has the density function $f$ and let $w_1, \dots w_N$ be associated weights.
+Given an univariate random sample set $X=\{X_1, \dots, X_N\}$, where $X$ has the density function $f$ and let $w_1, \dots w_N$ be associated weights.
 The kernel estimator $\hat{f}$ which estimates $f$ at the point $x$ is given as
 \begin{equation}
 \label{eq:kde}
@@ -31,7 +31,7 @@ As a matter of fact, the quality of the kernel estimate is primarily determined
 %
 %Any non-optimal bandwidth causes undersmoothing or oversmoothing.
 %An undersmoothing estimator has a large variance and hence a small $h$ leads to undersmoothing.
-%On the other hand given a large $h$ the bias increases, which leads to oversmoothing \cite[7]{Cybakov2009}.
+%On the other hand given a large $h$ the bias increases, which leads to oversmoothing \cite{Cybakov2009}.
 %Clearly with an adverse choice of the bandwidth crucial information like modality might get smoothed out.
 %All in all it is not obvious to determine a good choice of the bandwidth.
 %
@@ -50,16 +50,17 @@ The Gaussian kernel is given as
 K_G(u)=\frac{1}{\sqrt{2\pi}} \expp{- \frac{u^2}{2} } \text{.}
 \end{equation}

-The flexibility of the KDE comes at the expense of computational efficiency, which leads to the development of more efficient computation schemes.
-The computation time depends, besides the number of calculated points $M$, on the input size, namely the number of data points $N$.
-In general, reducing the size of the sample negatively affects the accuracy of the estimate.
-Still, the sample size is a suitable parameter to speed up the computation.
+The flexibility of the KDE comes at the expense of computation speed, which leads to the development of more efficient computation schemes.
+The computation time depends, besides the number of calculated points $M$, on the input size, namely the size of sample $N$.
+In general, reducing the size of the sample set negatively affects the accuracy of the estimate.
+Still, $N$ is a suitable parameter to speed up the computation.

-Since each single sample is combined with its adjacent samples into bins, the BKDE approximates the KDE.
+The BKDE reduces $N$ by combining each single sample with its adjacent samples into bins, and thus, approximates the KDE.
+%Since each single sample is combined with its adjacent samples into bins, the BKDE approximates the KDE.
 Each bin represents the count of the sample set at a given point of an equidistant grid with spacing $\delta$.
-A binning rule distributes a sample among the grid points $g_j=j\delta$, indexed by $j\in\Z$.
+A binning rule distributes each sample among the grid points $g_j=j\delta$, indexed by $j\in\Z$.
 % and can be represented as a set of functions $\{ w_j(x,\delta), j\in\Z \}$.
-Computation requires a finite grid on the interval $[a,b]$ containing the data, thus the number of grid points is $G=(b-a)/\delta+1$.
+Computation requires a finite grid on the interval $[a,b]$ containing the data, thus the number of grid points is $G=(b-a)/\delta+1$ \cite{hall1996accuracy}.

 Given a binning rule $r_j$ the BKDE $\tilde{f}$ of a density $f$ computed pointwise at the grid point $g_x$ is given as
 \begin{equation}
--- a/tex/chapters/multivariate.tex
+++ b/tex/chapters/multivariate.tex
@@ -10,15 +10,15 @@ Multivariate kernel functions can be constructed in various ways, however, a pop
 Such a kernel is constructed by combining several univariate kernels into a product, where each kernel is applied in each dimension with a possibly different bandwidth.

 Given a multivariate random variable $\bm{X}=(x_1,\dots ,x_d)$ in $d$ dimensions.
-The sample set $\mathcal{X}$ is a $n\times d$ matrix \cite[162]{scott2015}.
+The sample set $\mathcal{X}$ is a $n\times d$ matrix \cite{scott2015}.
 The multivariate KDE $\hat{f}$ which defines the estimate pointwise at $\bm{u}=(u_1, \dots, u_d)^T$ is given as
 \begin{equation}
 \label{eq:mvKDE}
-    \hat{f}(\bm{u}) = \frac{1}{W} \sum_{i=1}^{n} \frac{w_i}{h_1 \dots h_d} \left[  \prod_{j=1}^{d} K\left( \frac{u_j-x_{ij}}{h_j} \right)  \right]  \text{,}
+    \hat{f}(\bm{u}) = \frac{1}{W} \sum_{i=1}^{n} \frac{w_i}{h_1 \dots h_d} \left[  \prod_{j=1}^{d} K\left( \frac{u_j-x_{i,j}}{h_j} \right)  \right]  \text{,}
 \end{equation}
 where the bandwidth is given as a vector $\bm{h}=(h_1, \dots, h_d)$.

-Note that \eqref{eq:mvKDE} does not include all possible multivariate kernels, such as spherically symmetric kernels, which are based on rotation of a univariate kernel.
+Note that \eqref{eq:mvKDE} does not include all possible multivariate kernels, such as spherically symmetric kernels, which are based on rotation of an univariate kernel.
 In general, a multivariate product and spherically symmetric kernel based on the same univariate kernel will differ.
 The only exception is the Gaussian kernel, which is spherically symmetric and has independent marginals. % TODO scott cite?!
 In addition, only smoothing in the direction of the axes is possible.
@@ -30,7 +30,7 @@ Likewise, the ideas of common and linear binning rule scale with dimensionality

 In general, multi-dimensional filters are multi-dimensional convolution operations.
 However, by utilizing the separability property of convolution, a straightforward and a more efficient implementation can be found.
-Convolution is separable if the filter kernel is separable, i.e. it can be split into successive convolutions of several kernels.
+Convolution is separable if the filter kernel is separable, \ie{} it can be split into successive convolutions of several kernels.
 In example, the Gaussian filter is separable, because of $e^{x^2+y^2} = e^{x^2}\cdot e^{y^2}$.
 Likewise digital filters based on such kernels are called separable filters.
 They are easily applied to multi-dimensional signals, because the input signal can be filtered in each dimension individually by an one-dimensional filter \cite{dspGuide1997}.
@@ -45,7 +45,7 @@ They are easily applied to multi-dimensional signals, because the input signal c
 %These kind of multivariate kernel is called product kernel as the multivariate kernel result is the product of each individual univariate kernel.
 %
 %Given a multivariate random variable $X=(x_1,\dots ,x_d)$ in $d$ dimensions.
-%The sample $\bm{X}$ is a $n\times d$ matrix defined as \cite[162]{scott2015}
+%The sample $\bm{X}$ is a $n\times d$ matrix defined as \cite{scott2015}
 %\begin{equation}
 %    \bm{X}=
 %    \begin{pmatrix}
@@ -61,7 +61,7 @@ They are easily applied to multi-dimensional signals, because the input signal c
 %    \end{pmatrix} \text{.}
 %\end{equation}
 %
-%The multivariate kernel density estimator $\hat{f}$ which defines the estimate pointwise at $\bm{x}=(x_1, \dots, x_d)^T$ is given as \cite[162]{scott2015}
+%The multivariate kernel density estimator $\hat{f}$ which defines the estimate pointwise at $\bm{x}=(x_1, \dots, x_d)^T$ is given as \cite{scott2015}
 %\begin{equation}
 %    \hat{f}(\bm{x}) = \frac{1}{nh_1 \dots h_d} \sum_{i=1}^{n} \left[  \prod_{j=1}^{d} K\left( \frac{x_j-x_{ij}}{h_j} \right)  \right]  \text{.}
 %\end{equation}
@@ -77,7 +77,7 @@ They are easily applied to multi-dimensional signals, because the input signal c
 %\end{equation}

 % Gaus:
-%If the filter kernel is separable, the convolution is also separable i.e. multi-dimensional convolution can be computed as individual one-dimensional convolutions with a one-dimensional kernel.
+%If the filter kernel is separable, the convolution is also separable \ie{} multi-dimensional convolution can be computed as individual one-dimensional convolutions with a one-dimensional kernel.
 %Because of $e^{x^2+y^2} = e^{x^2}\cdot e^{y^2}$ the Gaussian filter is separable and can be easily applied to multi-dimensional signals. \todo{quelle}


--- a/tex/chapters/mvg.tex
+++ b/tex/chapters/mvg.tex
@@ -4,7 +4,7 @@
 % Gauss Blur Filter
 % Repetitive Box filter to approx Gauss
 % Simple multipass, n/m approach, extended box filter
-Digital filters are implemented by convolving the input signal with a filter kernel, i.e. the digital filter's impulse response.
+Digital filters are implemented by convolving the input signal with a filter kernel, \ie{} the digital filter's impulse response.
 Consequently, the filter kernel of a Gaussian filter is a Gaussian with finite support \cite{dspGuide1997}.
 Assuming a finite-support Gaussian filter kernel of size $M$ and an input signal $x$, discrete convolution produces the smoothed output signal 
 \begin{equation}
@@ -14,8 +14,8 @@ Assuming a finite-support Gaussian filter kernel of size $M$ and an input signal
 where $\sigma$ is a smoothing parameter called standard deviation.

 Note that \eqref{eq:bkdeGaus} has the same structure as \eqref{eq:gausFilt}, except the varying notational symbol of the smoothing parameter and the different factor in front of the sum.
-While in both equations the constant factor of the Gaussian is removed of the inner sum, \eqref{eq:bkdeGaus} has an additional normalization factor $W^{-1}$.
-This factor is necessary to ensure that the estimate is a valid density function, i.e. that it integrates to one.
+While in both equations the constant factor of the Gaussian is removed from the inner sum, \eqref{eq:bkdeGaus} has an additional normalization factor $W^{-1}$.
+This factor is necessary to ensure that the estimate is a valid density function, \ie{} that it integrates to one.
 Such a restriction is superfluous in the context of digital filters, so the normalization factor is omitted.

 Computation of a digital filter using the naive implementation of the discrete convolution algorithm yields $\landau{NM}$, where $N$ is again the input size given by the length of the input signal and $M$ is the size of the filter kernel.
@@ -77,7 +77,7 @@ The overall algorithm to efficiently compute \eqref{eq:boxFilt} is listed in Alg
 \end{algorithm}

 Given a fast approximation scheme, it is necessary to construct a box filter analogous to a given Gaussian filter.
-As seen in \eqref{eq:gausFilt}, the solely parameter of the Gaussian kernel is the standard deviation $\sigma$.
+As seen in \eqref{eq:gausFilt}, the sole parameter of the Gaussian kernel is the standard deviation $\sigma$.
 In contrast, the box function \eqref{eq:boxFx} is parametrized by its width $L$.
 Therefore, in order to approximate the Gaussian filter of a given $\sigma$, a corresponding value of $L$ must be found.
 Given $n$ iterations of box filters with identical sizes the ideal size $\Lideal$, as suggested by Wells~\cite{wells1986efficient}, is
@@ -112,7 +112,7 @@ The approximated $\sigma$ as a function of the integer width has a staircase sha
 By reducing the rounding error, the step size of the function is reduced.
 However, the overall shape will not change.
 \etal{Gwosdek}~\cite{gwosdek2011theoretical} proposed an approach which allows to approximate any real-valued value of $\sigma$.
-Just like the conventional box filter, the extended version has a uniform value in the range $[-l; l]$, but unlike the conventional the extended box filter has different values at its edges.
+Just like the conventional box filter, the extended version has a uniform value in the range $[-l; l]$, but unlike the conventional, the extended box filter has different values at its edges.
 This extension introduces only marginal computational overhead over conventional box filtering.


--- a/tex/chapters/realworld.tex
+++ b/tex/chapters/realworld.tex
@@ -2,12 +2,12 @@

 To demonstrate the real time capabilities of the proposed method a real world scenario was chosen, namely indoor localization. 
 The given problem is to localize a pedestrian walking inside a building. 
-Ebner et al. proposed a method, which incorporates multiple sensors, e.g. Wi-Fi, barometer, step-detection and turn-detection \cite{Ebner-15}. 
+Ebner et al. proposed a method, which incorporates multiple sensors, \eg{} Wi-Fi, barometer, step-detection and turn-detection \cite{Ebner-15}. 
 At a given time $t$ the system estimates a state providing the most probable position of the pedestrian.
 It is implemented using a particle filter with sample importance resampling and \SI{5000} particles.
 The dynamics are modelled realistically, which constrains the movement according to walls, doors and stairs. 

-We arranged a \SI{223}{\meter} long walk within the first floor of a \SI{2500}{m$^2$} museum, which was build in the 13th century and therefore offers non-optimal conditions for localization. 
+We arranged a \SI{223}{\meter} long walk within the first floor of a \SI{2500}{m$^2$} museum, which was built in the 13th century and therefore offers non-optimal conditions for localization. 
 %The measurements for the walks were recorded using a Motorola Nexus 6 at 2.4 GHz band only.
 %
 Since this work only focuses on processing a given sample set, further details of the localisation system and the described scenario can be looked up in \cite{Ebner17} and \cite{Fetzer17}.
@@ -17,26 +17,26 @@ The bivariate state estimation was calculated whenever a step was recognized, ab

 \begin{figure}
 	\input{gfx/walk.tex}
-	\caption{Occurring bimodal distribution caused by uncertain measurements in the first \SI{13.4}{\second} of the walk. After \SI{20.8}{\second}, the distribution gets unimodal. The weigted-average estimation (blue) provides an high error compared to the ground truth (solid black), while the boxKDE approach (orange) does not. }
+	\caption{Occurring bimodal distribution caused by uncertain measurements in the first \SI{13.4}{\second} of the walk. After \SI{20.8}{\second}, the distribution gets unimodal. The weigted-average estimation (blue) provides an high error compared to the ground truth (solid black), while the BoxKDE approach (orange) does not. }
 	\label{fig:realWorldMulti}
 \end{figure}
 %
 Fig.~\ref{fig:realWorldMulti} illustrates a frequently occurring situation, where the particle set splits apart, due to uncertain measurements and multiple possible walking directions.
 This results in a bimodal posterior distribution, which reaches its maximum distances between the modes at \SI{13.4}{\second} (black dotted line).
-Thus estimating the most probable state using the weighted-average results in the blue line, describing the pedestrian's position to be somewhere outside the building (light green area). 
-In contrast, the here proposed method (orange line) is able to retrieve a good estimate compared the the ground truth path shown by the black solid line.
+Thus estimating the most probable state over time using the weighted-average results in the blue line, describing the pedestrian's position to be somewhere outside the building (light green area). 
+In contrast, the here proposed method (orange line) is able to retrieve a good estimate compared to the ground truth path shown by the black solid line.
 Due to a right turn, the distribution gets unimodal after \SI{20.8}{\second}. 
-This happens since the lower red particles are walking against a wall and thus punished with a low weight.
+This happens since the lower red particles are walking against a wall and are punished with a low weight.

 This example highlights the main benefits using our approach. 
 While being fast enough to be computed in real time, the proposed method reduces the estimation error of the state in this situation, as it is possible to distinguish the two modes of the density.
 It is clearly visible, that this enables the system to recover the real state if multimodalities arise.
-However, in situations with highly uncertain measurements, the estimation error could further increase since the real estimate is not equal to the best estimate, i.e. the real position of the pedestrian.
+However, in situations with highly uncertain measurements, the estimation error could further increase since the real estimate is not equal to the best estimate, \ie{} the real position of the pedestrian.

 The error over time for different estimation methods of the complete walk can be seen in fig. \ref{fig:realWorldTime}.
 It is given by calculating the distance between estimation and ground truth at a specific time $t$.
 Estimates provided by simply choosing the maximum particle stand out the most. 
-As one could have expected beforehand, this method provides many strong peaks through continues jumping between single particles. 
+As one could have expected beforehand, this method provides many strong peaks through continuously jumping between single particles. 
 Additionally, in most real world scenarios many particles share the same weight and thus multiple highest-weighted particles exist.

 \begin{figure}
@@ -45,16 +45,17 @@ Additionally, in most real world scenarios many particles share the same weight
 	\label{fig:realWorldTime}
 \end{figure}

-Further investigating fig. \ref{fig:realWorldTime}, the boxKDE performs slightly better than the weighted-average, however after deploying \SI{100} Monte Carlo runs, the difference becomes insignificant.
+Further investigating fig. \ref{fig:realWorldTime}, the BoxKDE performs slightly better than the weighted-average.
+However after deploying \SI{100} Monte Carlo runs, the difference becomes insignificant.
 The main reason for this are again multimodalities caused by faulty or delayed measurements, especially when entering or leaving rooms. 
 Within our experiments the problem occurred due to slow and attenuated Wi-Fi signals inside thick-walled rooms. 
 While the system's dynamics are moving the particles outside, the faulty Wi-Fi readings are holding back a majority by assigning corresponding weights. 
 Therefore, the average between the modes of the distribution is often closer to the ground truth as the real estimate, which is located on the \qq{wrong} mode. 
 With new measurements coming from the hallway or other parts of the building, the distribution and thus the estimation are able to recover.

-Nevertheless, it could be seen that our approach is able to resolve multimodalities even under real world conditions. 
+Nevertheless, it can be seen that our approach is able to resolve multimodalities even under real world conditions. 
 It does not always provide the lowest error, since it depends more on an accurate sensor model than a weighted-average approach, but is very suitable as a good indicator about the real performance of a sensor fusion system.
-At the end, in the here shown examples we only searched for a global maxima, even though the boxKDE approach opens a wide range of other possibilities for finding a best estimate. 
+In the here shown examples we only searched for a global maxima, even though the BoxKDE approach opens a wide range of other possibilities for finding a best estimate. 

 %springt nicht so viel wie maximum
 %sehr ähnlich zu weighted-average. in 1000 mc runs ist sind average und std sehr ähnlich.
--- a/tex/chapters/relatedwork.tex
+++ b/tex/chapters/relatedwork.tex
@@ -33,12 +33,12 @@ The term fast Gauss transform was coined by Greengard \cite{greengard1991fast} w
 % However, the complexity grows exponentially with dimension. \cite{Improved Fast Gauss Transform and Efficient Kernel Density Estimation}

 % FastKDE, passed on ECF and nuFFT
-Recent methods based on the self-consistent KDE proposed by Bernacchia and Pigolotti \cite{bernacchia2011self} allow to obtain an estimate without any assumptions, i.e. the kernel and bandwidth are both derived during the estimation.
+Recent methods based on the self-consistent KDE proposed by Bernacchia and Pigolotti \cite{bernacchia2011self} allow to obtain an estimate without any assumptions, \ie{} the kernel and bandwidth are both derived during the estimation.
 They define a Fourier-based filter on the empirical characteristic function of a given dataset.
 The computation time was further reduced by \etal{O'Brien} using a non-uniform fast Fourier transform (FFT) algorithm to efficiently transform the data into Fourier space \cite{oBrien2016fast}.

 % binning => FFT
-In general, it is desirable to omit a grid, as the data points do not necessarily fall onto equally spaced points.
+In general, it is desirable to compute the estimate directly from the sample set.
 However, reducing the sample size by distributing the data on an equidistant grid can significantly reduce the computation time, if an approximative KDE is acceptable.
 Silverman \cite{silverman1982algorithm} originally suggested to combine adjacent data points into data bins, which results in a discrete convolution structure of the KDE.
 Allowing to efficiently compute the estimate using a FFT algorithm.
--- a/tex/chapters/usage.tex
+++ b/tex/chapters/usage.tex
@@ -5,7 +5,7 @@
 %As the density estimation poses only a single step in the whole process, its computation needs to be as fast as possible.
 % not taking to much time from the frame

-Consider a set of two-dimensional samples with associated weights, e.g. presumably generated from a particle filter system.
+Consider a set of two-dimensional samples with associated weights, \eg{} presumably generated from a particle filter system.
 The overall process for bivariate data is described in Algorithm~\ref{alg:boxKDE}.

 Assuming that the given $N$ samples are stored in a sequential list, the first step is to create a grid representation.
@@ -35,7 +35,7 @@ Such knowledge should be integrated into the system to avoid a linear search ove
        \Statex 
        
        %\For{$1 \textbf{ to } n$}   
-        \Loop{ $n$ \textbf{times}} \Comment{$n$ box filter iterations}
+        \Loop{ $n$ \textbf{times}} \Comment{$n$ separated box filter iterations}
        
        
          \For{$ i=1 \textbf{ to } G_1$}     
@@ -51,26 +51,26 @@ Such knowledge should be integrated into the system to avoid a linear search ove
 \end{algorithm}

 Given the extreme values of the samples and grid sizes $G_1$ and $G_2$ defined by the user, a $G_1\times G_2$ grid can be constructed, using a binning rule from \eqref{eq:simpleBinning} or \eqref{eq:linearBinning}.
-As the number of grid points directly affects both computation time and accuracy, a suitable grid should be as coarse as possible, but at the same time narrow enough to produce an estimate sufficiently fast with an acceptable approximation error.
+As the number of grid points directly affects both, computation time and accuracy, a suitable grid should be as coarse as possible, but at the same time narrow enough to produce an estimate sufficiently fast with an acceptable approximation error.

 If the extreme values are known in advanced, the computation of the grid is $\landau{N}$, otherwise an additional $\landau{N}$ search is required.
 The grid is stored as an linear array in memory, thus its space complexity is $\landau{G_1\cdot G_2}$.

 Next, the binned data is filtered with a Gaussian using the box filter approximation.
-The box filter width is derived from the standard deviation of the approximated Gaussian, which is in turn equal to the bandwidth of the KDE.
+The box filter's width is derived by \eqref{eq:boxidealwidth} from the standard deviation of the approximated Gaussian, which is in turn equal to the bandwidth of the KDE.
 However, the bandwidth $h$ needs to be scaled according to the grid size.
-This is necessary as $h$ is defined in the input space of the KDE, i.e. in relation to the sample data.
+This is necessary as $h$ is defined in the input space of the KDE, \ie{} in relation to the sample data.
 In contrast, the bandwidth of a BKDE is defined in the context of the binned data, which differs from the unbinned data due to the discretisation of the samples.
 For this reason, $h$ needs to be divided by the bin size to account the discrepancy between the different sampling spaces.

-Given the scaled bandwidth the required box filter width can be computed. % as in \eqref{label}
+Given the scaled bandwidth the required box filter's width can be computed. % as in \eqref{label}
 Due to its best runtime performance the recursive box filter implementation is used.
 If multivariate data is processed, the algorithm is easily extended due to its separability.
-Each filter pass is computed in $\landau{G}$ operations, however, an additional memory buffer is required.
+Each filter pass is computed in $\landau{G}$ operations, however, an additional memory buffer is required \cite{dspGuide1997}.

 While the integer-sized box filter requires fewest operations, it causes a larger approximation error due to rounding errors.
-Depending on the required accuracy the extended box filter algorithm can further improve the estimation results, with only a small additional overhead.
-Due to its simple indexing scheme, the recursive box filter can easily be computed in parallel using SIMD operations or parallel computation cores.
+Depending on the required accuracy, the extended box filter algorithm can further improve the estimation results, with only a small additional overhead \cite{gwosdek2011theoretical}.
+Due to its simple indexing scheme, the recursive box filter can easily be computed in parallel using SIMD operations and parallel computation cores.

-Finally, the most likely state can be obtained from the filtered data, i.e. from the estimated discrete density, by searching filtered data for its maximum value.
+Finally, the most likely state can be obtained from the filtered data, \ie{} from the estimated discrete density, by searching filtered data for its maximum value.