diff --git a/tex/chapters/abstract.tex b/tex/chapters/abstract.tex
index 5c5c6fe..3f741fb 100644
--- a/tex/chapters/abstract.tex
+++ b/tex/chapters/abstract.tex
@@ -1,7 +1,7 @@
 \begin{abstract}
 It is common practice to use a sample-based representation to solve problems having a probabilistic interpretation.
 In many real world scenarios one is then interested in finding a \qq{best estimate} of the underlying problem, e.g. the position of a robot.
-This is often done by means of simple parametric point estimator, providing the sample statistics. 
+This is often done by means of simple parametric point estimators, providing the sample statistics. 
 However, in complex scenarios this frequently results in a poor representation, due to multimodal densities and limited sample sizes.
 
 Recovering the probability density function using a kernel density estimation yields a promising approach to  solve the state estimation problem i.e. finding the \qq{real} most probable state, but comes with high computational costs.
diff --git a/tex/chapters/experiments.tex b/tex/chapters/experiments.tex
index 5eafd9b..187282e 100644
--- a/tex/chapters/experiments.tex
+++ b/tex/chapters/experiments.tex
@@ -4,7 +4,7 @@
 
 
 We now empirically evaluate the accuracy of our boxKDE method, using the mean integrated squared error (MISE).
-The ground truth is given as $N=1000$ synthetic samples drawn from a bivariate mixture normal density $f$
+The ground truth is given with $N=1000$ synthetic samples drawn from a bivariate mixture normal density $f$
 \begin{equation}
 \begin{split}
   \bm{X} \sim & ~\G{\VecTwo{0}{0}}{0.5\bm{I}} + \G{\VecTwo{3}{0}}{\bm{I}} + \G{\VecTwo{0}{3}}{\bm{I}} \\ 
@@ -21,7 +21,7 @@ Therefore, the particular choice of the ground truth is only of minor importance
 \end{figure}
 
 Evaluated at $50^2$ points the exact KDE is compared to the BKDE, boxKDE, and extended box filter approximation, which are evaluated at a smaller grid with $30^2$ points.
-The MISE between $f$ and the estimates as a function of $h$ are evaluated, and the resulting plot is given in figure~\ref{fig:errorBandwidth}.
+The MISE between $f$ and the estimates as a function of $h$ are evaluated, and the resulting plot is given in fig.~\ref{fig:errorBandwidth}.
 A minimum error is obtained with $h=0.35$, for larger oversmoothing occurs and the modes gradually fuse together.
 
 Both the BKDE and the extended box filter estimate resemble the error curve of the KDE quite well and stable.
@@ -42,7 +42,7 @@ However, both cases do not give a deeper insight of the error behavior of our me
 \begin{figure}[t]
 	%\includegraphics[width=\textwidth,height=6cm]{gfx/tmpPerformance.png}
 	\input{gfx/perf.tex}
-    \caption{Logarithmic plot of the runtime performance with increasing grid size $G$ and bivariate data. The weighted average estimate (blue) performs fastest followed by the boxKDE (orange) approximation. Both the BKDE (red), and the fastKDE (green) are magnitudes slower, especially for $G<10^4$.}\label{fig:performance}
+    \caption{Logarithmic plot of the runtime performance with increasing grid size $G$ and bivariate data. The weighted-average estimate (blue) performs fastest followed by the boxKDE (orange) approximation. Both the BKDE (red) and the fastKDE (green) are magnitudes slower, especially for $G<10^3$.}\label{fig:performance}
 \end{figure}
 
 % kde, box filter, exbox in abhänigkeit von h (bild)
@@ -53,18 +53,18 @@ However, both cases do not give a deeper insight of the error behavior of our me
 \subsection{Performance}
 In the following, we underpin the promising theoretical linear time complexity of our method with empirical time measurements compared to other methods.
 All tests are performed on a Intel Core \mbox{i5-7600K} CPU with a frequency of \SI{4.2}{\giga\hertz}, and \SI{16}{\giga\byte} main memory.
-We compare our C++ implementation of the box filter based KDE approximation based on algorithm~\ref{alg:boxKDE} to the \texttt{ks} R package and the fastKDE Python implementation \cite{oBrien2016fast}.
-The \texttt{ks} packages provides a FFT-based BKDE implementation based on optimized C functions at its core.
-With state estimation problems in mind, we additionally provide a C++ implementation of a weighted average estimator.
-As both methods are not using a grid, an equivalent input sample set was used for the weighted average and the fastKDE.
+We compare our C++ implementation of the boxKDE approximation as shown in algorithm~\ref{alg:boxKDE} to the \texttt{ks} R package and the fastKDE Python implementation \cite{oBrien2016fast}.
+The \texttt{ks} package provides a FFT-based BKDE implementation based on optimized C functions at its core.
+With state estimation problems in mind, we additionally provide a C++ implementation of a weighted-average estimator.
+As both methods are not using a grid, an equivalent input sample set was used for the weighted-average and the fastKDE.
 
-The results for performance comparison are presented in plot \ref{fig:performance}.
+The results for performance comparison are presented in fig.~\ref{fig:performance}.
 % O(N) gut erkennbar für box KDE und weighted average
 The linear complexity of the boxKDE and the weighted average is clearly visible.
 % Gerade bei kleinen G bis 10^3 ist die box KDE schneller als R und fastKDE, aber das WA deutlich schneller als alle anderen
 Especially for small $G$ up to $10^3$ the boxKDE is much faster compared to BKDE and fastKDE.
 % Bei zunehmend größeren G wird der Abstand zwischen box KDE und WA größer.
-Nevertheless, the simple weighted average approach performs the fastest and with increasing $G$ the distance to the boxKDE grows constantly. 
+Nevertheless, the simple weighted-average approach performs the fastest and with increasing $G$ the distance to the boxKDE grows constantly. 
 However, it is obvious that this comes with major disadvantages, like being prone to multimodalities, as discussed in section \ref{sec:intro}. 
 % (Das kann auch daran liegen, weil das Binning mit größeren G langsamer wird, was ich mir aber nicht erklären kann! Vlt Cache Effekte)
 
@@ -74,7 +74,7 @@ Further looking at fig. \ref{fig:performance}, the runtime performance of the BK
 % Dies kommt durch die FFT. Der Input in für die FFT muss immer auf die nächste power of two gerundet werden.
 This behavior is caused by the underlying FFT algorithm. 
 % Daher wird die Laufzeit sprunghaft langsamer wenn auf eine neue power of two aufgefüllt wird, ansonsten bleibt sie konstant.
-The FFT approach requires the input to be always rounded up to a power of two, what then causes a constant runtime behavior within those boundaries and a strong performance deterioration at corresponding manifolds.
+The FFT approach requires the input to be always rounded up to a power of two, what then causes a constant runtime behaviour within those boundaries and a strong performance deterioration at corresponding manifolds.
 % Der Abbruch bei G=4406^2 liegt daran, weil für größere Gs eine out of memory error ausgelöst wird.
 The termination of BKDE graph at $G=4406^2$ is caused by an out of memory error for even bigger $G$ in the \texttt{ks} package.
 
@@ -85,10 +85,10 @@ Both discussed Gaussian filter approximations, namely box filter and extended bo
 While the average runtime over all values of $G$ for the standard box filter is \SI{0.4092}{\second}, the extended one provides an average of \SI{0.4169}{\second}. 
 To keep the arrangement of fig. \ref{fig:performance} clear, we only illustrated the results of the boxKDE with the regular box filter. 
 
-The weighted average has the great advantage of being independent of the dimensionality of the input and effortlessly implemented.
+The weighted-average has the great advantage of being independent of the dimensionality of the input and can be implemented effortlessly.
 In contrast, the computation of the boxKDE approach increases exponentially with increasing number of dimensions.
 However, due to the linear time complexity and the very simple computation scheme, the overall computation time is still sufficient fast for many applications and much smaller compared to other methods.
-The boxKDE approach presents a reasonable alternative to the weighted average and is easily integrated into existing systems. 
+The boxKDE approach presents a reasonable alternative to the weighted-average and is easily integrated into existing systems. 
 
 In addition, modern CPUs do benefit from the recursive computation scheme of the box filter, as the data exhibits a high degree of spatial locality in memory and the accesses are reliable predictable.
 Furthermore, the computation is easily parallelized, as there is no data dependency between the one-dimensional filter passes in algorithm~\ref{alg:boxKDE}.
diff --git a/tex/chapters/introduction.tex b/tex/chapters/introduction.tex
index 181b574..f915131 100644
--- a/tex/chapters/introduction.tex
+++ b/tex/chapters/introduction.tex
@@ -4,8 +4,7 @@
 Sensor fusion approaches are often based upon probabilistic descriptions like particle filters, using samples to represent the distribution of a dynamical system. 
 To update the system recursively in time, probabilistic sensor models process the noisy measurements and a state transition function provides the system's dynamics. 
 Therefore a sample or particle is a representation of one possible system state, e.g. the position of a pedestrian within a building. 
-In most real world scenarios one is then interested in finding the most probable state within the state space, to provide the \qq{best estimate} of the underlying problem. 
-Generally speaking, solving the state estimation problem.
+In most real world scenarios one is then interested in finding the most probable state within the state space, to provide the best estimate of the underlying problem, generally speaking, solving the state estimation problem.
 In the discrete manner of a sample representation this is often done by providing a single value, also known as sample statistic, to serve as a \qq{best guess}.  
 This value is then calculated by means of simple parametric point estimators, e.g. the weighted-average over all samples, the sample with the highest weight or by assuming other parametric statistics like normal distributions \cite{Fetzer2016OMC}. 
 %da muss es doch noch andere methoden geben... verflixt und zugenäht... aber grundsätzlich ist ein weighted average doch ein point estimator? (https://www.statlect.com/fundamentals-of-statistics/point-estimation)
@@ -17,9 +16,9 @@ As a result, those techniques are not able to provide an accurate statement abou
 For example, in a localization scenario where a bimodal distribution represents the current posterior, a reliable position estimation is more likely to be at one of the modes, instead of somewhere in-between, like provided by a simple weighted-average estimation.
 Additionally, in most practical scenarios the sample size and therefore the resolution is limited, causing the variance of the sample based estimate to be high \cite{Verma2003}.
 
-It is obvious, that a computation of the full posterior could solve the above, but finding such an analytical solution is an intractable problem, what is the reason for applying a sample representation in the first place. 
+It is obvious, that a computation of the full posterior could solve the above, but finding such an analytical solution is an intractable problem, which is the reason for applying a sample representation in the first place. 
 Another promising way is to recover the probability density function from the sample set itself, by using a non-parametric estimator like a kernel density estimation (KDE). 
-With this, it is easy to find the \qq{real} most probable state and thus to avoid the aforementioned drawbacks.
+With this, it is easy to recover the \qq{real} most probable state and thus to avoid the aforementioned drawbacks.
 However, non-parametric estimators tend to consume a large amount of computational time, which renders them unpractical for real time scenarios.
 Nevertheless, the availability of a fast processing density estimate might improve the accuracy of today's sensor fusion systems without sacrificing their real time capability.
 
@@ -34,7 +33,7 @@ By the central limit theorem, multiple recursion of a box filter yields an appro
 This process converges quite fast to a reasonable close approximation of the ideal Gaussian.
 In addition, a box filter can be computed extremely fast by a computer, due to its intrinsic simplicity.
 While the idea to use several box filter passes to approximate a Gaussian has been around for a long time, the application to obtain a fast KDE is new.
-Especially in time critical and time sequential sensor fusion scenarios, the here presented approach  outperforms other state of the art solutions, due to a fully linear complexity \landau{N} and a negligible overhead, even for small sample sets.
+Especially in time critical and time sequential sensor fusion scenarios, the here presented approach  outperforms other state of the art solutions, due to a fully linear complexity and a negligible overhead, even for small sample sets.
 In addition, it requires only a few elementary operations and is highly parallelizable.
 
 
diff --git a/tex/chapters/kde.tex b/tex/chapters/kde.tex
index 5e49921..00f035b 100644
--- a/tex/chapters/kde.tex
+++ b/tex/chapters/kde.tex
@@ -1,4 +1,4 @@
-\section{Kernel Density Estimation}
+\section{Kernel Density Estimator}
 % KDE by rosenblatt and parzen
 % general KDE
 % Gauss Kernel
@@ -11,17 +11,17 @@
 %The histogram is a simple and for a long time the most used non-parametric estimator.
 %However, its inability to produce a continuous estimate dismisses it for many applications where a smooth distribution is assumed.
 %In contrast, 
-The KDE is often the preferred tool to estimate a density function from discrete data samples because of its ability to produce a continuous estimate and its flexibility.
+The KDE is often the preferred tool to estimate a density function from discrete data samples because of its flexibility and ability to produce a continuous estimate.
 %
 Given a univariate random sample set $X=\{X_1, \dots, X_N\}$, where $X$ has the density function $f$ and let $w_1, \dots w_N$ be associated weights.
 The kernel estimator $\hat{f}$ which estimates $f$ at the point $x$ is given as
 \begin{equation}
 \label{eq:kde}
-\hat{f}(x) = \frac{1}{W} \sum_{i=1}^{N} \frac{w_i}{h} K \left(\frac{x-X_i}{h}\right)
+\hat{f}(x) = \frac{1}{W} \sum_{i=1}^{N} \frac{w_i}{h} K \left(\frac{x-X_i}{h}\right) \text{,}
 \end{equation}
 where $W=\sum_{i=1}^{N}w_i$ and $h\in\R^+$ is an arbitrary smoothing parameter called bandwidth.
 $K$ is a kernel function such that $\int K(u) \dop{u} = 1$.
-In general any kernel can be used, however the general advice is to chose a symmetric and low-order polynomial kernel.
+In general, any kernel can be used, however a common advice is to chose a symmetric and low-order polynomial kernel.
 Thus, several popular kernel functions are used in practice, like the Uniform, Gaussian, Epanechnikov, or Silverman kernel \cite{scott2015}.
 
 While the kernel estimate inherits all the properties of the kernel, usually it is not of crucial matter if a non-optimal kernel was chosen.
@@ -51,25 +51,25 @@ K_G(u)=\frac{1}{\sqrt{2\pi}} \expp{- \frac{u^2}{2} } \text{.}
 \end{equation}
 
 The flexibility of the KDE comes at the expense of computational efficiency, which leads to the development of more efficient computation schemes.
-The computation time depends, besides the number of calculated points, on the number of data points $N$.
+The computation time depends, besides the number of calculated points $M$, on the input size, namely the number of data points $N$.
 In general, reducing the size of the sample negatively affects the accuracy of the estimate.
-Still, the sample size is a suitable parameter to speedup the computation.
+Still, the sample size is a suitable parameter to speed up the computation.
 
 Since each single sample is combined with its adjacent samples into bins, the BKDE approximates the KDE.
-Each bin represents the count of the sample set at a given point of a equidistant grid with spacing $\delta$.
-A binning rule distributes a sample $x$ among the grid points $g_j=j\delta$, indexed by $j\in\Z$.
+Each bin represents the count of the sample set at a given point of an equidistant grid with spacing $\delta$.
+A binning rule distributes a sample among the grid points $g_j=j\delta$, indexed by $j\in\Z$.
 % and can be represented as a set of functions $\{ w_j(x,\delta), j\in\Z \}$.
 Computation requires a finite grid on the interval $[a,b]$ containing the data, thus the number of grid points is $G=(b-a)/\delta+1$.
 
 Given a binning rule $r_j$ the BKDE $\tilde{f}$ of a density $f$ computed pointwise at the grid point $g_x$ is given as
 \begin{equation}
 \label{eq:binKde}
-\tilde{f}(g_x) = \frac{1}{W} \sum_{j=1}^{G} \frac{C_j}{h} K \left(\frac{g_x-g_j}{h}\right)
+\tilde{f}(g_x) = \frac{1}{W} \sum_{j=1}^{G} \frac{C_j}{h} K \left(\frac{g_x-g_j}{h}\right) \text{,}
 \end{equation}
 where $G$ is the number of grid points and
 \begin{equation}
 \label{eq:gridCnts}
-    C_j=\sum_{i=1}^{n} r_j(x_i,\delta)
+    C_j=\sum_{i=1}^{N} r_j(x_i,\delta)
 \end{equation}
 is the count at grid point $g_j$, such that $\sum_{j=1}^{G} C_j = W$ \cite{hall1996accuracy}.
 
@@ -83,7 +83,7 @@ However, for many applications it is recommend to use the simple binning rule
         0 & \text{else}
     \end{cases}
 \end{align}
-or the common linear binning rule which divides the sample into two fractional weights shared by the nearest grid points
+or the common linear binning rule, which divides the sample into two fractional weights shared by the nearest grid points
 \begin{align}
 \label{eq:linearBinning}
     r_j(x,\delta) &=
@@ -94,32 +94,32 @@ or the common linear binning rule which divides the sample into two fractional w
 \end{align}
 An advantage is that their impact on the approximation error is extensively investigated and well understood \cite{hall1996accuracy}.
 Both methods can be computed with a fast $\landau{N}$ algorithm, as simple binning is essentially the quotient of an integer division and the fractional weights of the linear binning are given by the remainder of the division.
-As linear binning is more precise it is often preferred over simple binning \cite{fan1994fast}.
+As linear binning is more precise, it is often preferred over simple binning \cite{fan1994fast}.
 
-While linear binning improves the accuracy of the estimate the choice of the grid size is of more importance.
+While linear binning improves the accuracy of the estimate, the choice of the grid size is of more importance.
 The number of grid points $G$ determines the trade-off between the approximation error caused by the binning and the computational speed of the algorithm.
-Clearly, a large value of $G$ produces a estimate close to the regular KDE, but requires more evaluations of the kernel compared to a coarser grid.
+Clearly, a large value of $G$ produces an estimate close to the regular KDE, but requires more evaluations of the kernel compared to a coarser grid.
 However, it is unknown what particular $G$ gives the best trade-off for any given sample set.
 In general, there is no definite answer because the amount of binning depends on the structure of the unknown density and the sample size \cite{hall1996accuracy}.
 
 A naive implementation of \eqref{eq:binKde} reduces the number of kernel evaluations to $\landau{G^2}$, assuming that $G<N$ \cite{fan1994fast}.
 However, due to the fixed grid spacing several kernel evaluations are the same and can be reused.
 This reduces the number of kernel evaluations to $\landau{G}$, but the number of additions and multiplications required are still $\landau{G^2}$.
-Using the FFT to perform the discrete convolution, the complexity can be further reduced to $\landau{G\log{G}}$, which is currently the fastest exact BKDE algorithm.
+Using the FFT to perform the discrete convolution, the complexity can be further reduced to $\landau{G\log{G}}$ \cite{silverman1982algorithm}.%, which is currently the fastest exact BKDE algorithm.
 
 The \mbox{FFT-convolution} approach is usually highlighted as the striking computational benefit of the BKDE.
-However, for this work it is the key to recognize the discrete convolution structure of \eqref{eq:binKde}, as this allows one to interpret the computation of a density estimate as a signal filter problem.
+However, for this work it is the key to recognize the discrete convolution structure of \eqref{eq:binKde}, as this allows to interpret the computation of a density estimate as a signal filter problem.
 This makes it possible to apply a wide range of well studied techniques from the broad field of digital signal processing (DSP).
 Using the Gaussian kernel from \eqref{eq:gausKern} in conjunction with \eqref{eq:binKde} results in the following equation
 \begin{equation}
 \label{eq:bkdeGaus}
-\tilde{f}(g_x)=\frac{1}{nh\sqrt{2\pi}} \sum_{i=1}^{G} C_j \expp{-\frac{(x-X_i)^2}{2h^2}} \text{.}
+\tilde{f}(g_x)=\frac{1}{W\sqrt{2\pi}} \sum_{j=1}^{G} \frac{C_j}{h} \expp{-\frac{(g_x-g_j)^2}{2h^2}} \text{.}
 \end{equation}
 
 The above formula is a convolution operation of the data and the Gaussian kernel.
-More precisely it is a discrete convolution of the finite data grid and the Gaussian function.
+More precisely, it is a discrete convolution of the finite data grid and the Gaussian function.
 In terms of DSP this is analogous to filter the binned data with a Gaussian filter.
-This finding allows to speedup the computation of the density estimate by using a fast approximation scheme based on iterated box filters.
+This finding allows to speed up the computation of the density estimate by using a fast approximation scheme based on iterated box filters.
 
 
 
diff --git a/tex/chapters/multivariate.tex b/tex/chapters/multivariate.tex
index 23447f4..28875c3 100644
--- a/tex/chapters/multivariate.tex
+++ b/tex/chapters/multivariate.tex
@@ -5,7 +5,7 @@ Each method can be seen as several one-dimensional problems combined to a multi-
 %However, with an increasing number of dimensions the computation time significantly increases.
 In the following, the generalization to multi-dimensional input are briefly outlined.
 
-In order to estimate a multivariate density using KDE or BKDE a multivariate kernel needs to be used.
+In order to estimate a multivariate density using KDE or BKDE, a multivariate kernel needs to be used.
 Multivariate kernel functions can be constructed in various ways, however, a popular way is given by the product kernel.
 Such a kernel is constructed by combining several univariate kernels into a product, where each kernel is applied in each dimension with a possibly different bandwidth.
 
@@ -19,21 +19,21 @@ The multivariate KDE $\hat{f}$ which defines the estimate pointwise at $\bm{u}=(
 where the bandwidth is given as a vector $\bm{h}=(h_1, \dots, h_d)$.
 
 Note that \eqref{eq:mvKDE} does not include all possible multivariate kernels, such as spherically symmetric kernels, which are based on rotation of a univariate kernel.
-In general a multivariate product and spherically symmetric kernel based on the same univariate kernel will differ.
-The only exception is the Gaussian kernel which is spherical symmetric and has independent marginals. % TODO scott cite?!
-In addition, only smoothing in the direction of the axes are possible.
+In general, a multivariate product and spherically symmetric kernel based on the same univariate kernel will differ.
+The only exception is the Gaussian kernel, which is spherically symmetric and has independent marginals. % TODO scott cite?!
+In addition, only smoothing in the direction of the axes is possible.
 If smoothing in other directions is necessary, the computation needs to be done on a prerotated sample set and the estimate needs to be rotated back to fit the original coordinate system \cite{wand1994fast}.
 
-For the multivariate BKDE, in addition to the kernel function the grid and the binning rules need to be extended to multivariate data.
+For the multivariate BKDE, in addition to the kernel function, the grid and the binning rules need to be extended to multivariate data.
 Their extensions are rather straightforward, as the grid is easily defined on many dimensions.
-Likewise, the ideas of common and linear binning rule scale with the dimensionality \cite{wand1994fast}.
+Likewise, the ideas of common and linear binning rule scale with dimensionality \cite{wand1994fast}.
 
-In general multi-dimensional filters are multi-dimensional convolution operations.
-However, by utilizing the separability property of convolution a straightforward and a more efficient implementation can be found.
+In general, multi-dimensional filters are multi-dimensional convolution operations.
+However, by utilizing the separability property of convolution, a straightforward and a more efficient implementation can be found.
 Convolution is separable if the filter kernel is separable, i.e. it can be split into successive convolutions of several kernels.
 In example, the Gaussian filter is separable, because of $e^{x^2+y^2} = e^{x^2}\cdot e^{y^2}$.
 Likewise digital filters based on such kernels are called separable filters.
-They are easily applied to multi-dimensional signals, because the input signal can be filtered in each dimension individually by an one-dimensional filter.
+They are easily applied to multi-dimensional signals, because the input signal can be filtered in each dimension individually by an one-dimensional filter \cite{dspGuide1997}.
 
 
 
diff --git a/tex/chapters/mvg.tex b/tex/chapters/mvg.tex
index 3bddaca..b6ead5d 100644
--- a/tex/chapters/mvg.tex
+++ b/tex/chapters/mvg.tex
@@ -6,7 +6,7 @@
 % Simple multipass, n/m approach, extended box filter
 Digital filters are implemented by convolving the input signal with a filter kernel, i.e. the digital filter's impulse response.
 Consequently, the filter kernel of a Gaussian filter is a Gaussian with finite support \cite{dspGuide1997}.
-Assuming a finite-support Gaussian filter kernel of size $M$ and a input signal $x$, discrete convolution produces the smoothed output signal 
+Assuming a finite-support Gaussian filter kernel of size $M$ and an input signal $x$, discrete convolution produces the smoothed output signal 
 \begin{equation}
 \label{eq:gausFilt}
 (G_{\sigma}*x)(i) = \frac{1}{\sigma\sqrt{2\pi}} \sum_{k=0}^{M-1} x(k) \expp{-\frac{(i-k)^2}{2\sigma^2}} \text{,}
@@ -18,15 +18,15 @@ While in both equations the constant factor of the Gaussian is removed of the in
 This factor is necessary to ensure that the estimate is a valid density function, i.e. that it integrates to one.
 Such a restriction is superfluous in the context of digital filters, so the normalization factor is omitted.
 
-Computation of a digital filter using the a naive implementation of the discrete convolution algorithm yields $\landau{NM}$, where $N$ is again the input size given by the length of the input signal and $M$ is the size of the filter kernel.
+Computation of a digital filter using the naive implementation of the discrete convolution algorithm yields $\landau{NM}$, where $N$ is again the input size given by the length of the input signal and $M$ is the size of the filter kernel.
 In order to capture all significant values of the Gaussian function the kernel size $M$ must be adopted to the standard deviation of the Gaussian.
 A faster $\landau{N}$ algorithm is given by the well-known approximation scheme based on iterated box filters.
 While reducing the algorithmic complexity this approximation also reduces computational time significantly due to the simplistic computation scheme of the box filter.
-Following that, a implementation of the iterated box filter is one of the fastest ways to obtain an approximative Gaussian filtered signal.
+Following that, an implementation of the iterated box filter is one of the fastest ways to obtain an approximative Gaussian filtered signal.
 % TODO mit einem geringen Fehler
 % TODO und somit kann der mvg filter auch die BKDE approxen?
 
-\subsection{Box Filter}
+%\subsection{Box Filter}
 The box filter is a simplistic filter defined as convolution of the input signal and the box (or rectangular) function.
 A discrete box filter of radius $l\in \N_0$ is given as 
 \begin{equation}
@@ -50,8 +50,8 @@ Given by the central limit theorem of probability, repetitive convolution of a r
 Likewise, filtering a signal with the box filter several times approximately converges to a Gaussian filter.
 In practice three to five iterations are most likely enough to obtain a reasonable close Gaussian approximation \cite{kovesi2010fast}.
 
-This allows rapid approximation of the Gaussian filter using box filter, which only requires a few addition and multiplication operations.
-Opposed to the Gaussian filter where several evaluations of the exponential function are necessary.
+This allows rapid approximation of the Gaussian filter using iterated box filter, which only requires a few addition and multiplication operations.
+Opposed to the Gaussian filter, where several evaluations of the exponential function are necessary.
 The most efficient way to compute a single output value of the box filter is:
 \begin{equation}
 \label{eq:recMovAvg}
@@ -79,7 +79,7 @@ The overall algorithm to efficiently compute \eqref{eq:boxFilt} is listed in Alg
 Given a fast approximation scheme, it is necessary to construct a box filter analogous to a given Gaussian filter.
 As seen in \eqref{eq:gausFilt}, the solely parameter of the Gaussian kernel is the standard deviation $\sigma$.
 In contrast, the box function \eqref{eq:boxFx} is parametrized by its width $L$.
-Therefore, in order to approximate the Gaussian filter of a given $\sigma$ a corresponding value of $L$ must be found.
+Therefore, in order to approximate the Gaussian filter of a given $\sigma$, a corresponding value of $L$ must be found.
 Given $n$ iterations of box filters with identical sizes the ideal size $\Lideal$, as suggested by Wells~\cite{wells1986efficient}, is
 \begin{equation}
 \label{eq:boxidealwidth}
@@ -102,14 +102,14 @@ In order to reduce the rounding error Kovesi~\cite{kovesi2010fast} proposes to p
 \end{align}
 
 Given $L_1$ and $L_2$ the approximation is done by computing $m$ box filters of width $L_1$ followed by $(n-m)$ box filters of size $L_2$, where $0\le m\le n$.
-As all other values are known $m$ can be computed with
+As all other values are known, $m$ can be computed with
 \begin{equation}
 \label{eq:boxrepeatm}
     m=\frac{12\sigma^2-nL_1^2-4nL_1-3n}{-4L_1-4} \quad \text{.}
 \end{equation}
 
 The approximated $\sigma$ as a function of the integer width has a staircase shaped graph.
-By reducing the rounding error the step size of the function is reduced.
+By reducing the rounding error, the step size of the function is reduced.
 However, the overall shape will not change.
 \etal{Gwosdek}~\cite{gwosdek2011theoretical} proposed an approach which allows to approximate any real-valued value of $\sigma$.
 Just like the conventional box filter, the extended version has a uniform value in the range $[-l; l]$, but unlike the conventional the extended box filter has different values at its edges.
diff --git a/tex/chapters/realworld.tex b/tex/chapters/realworld.tex
index b6006b4..fd87bf5 100644
--- a/tex/chapters/realworld.tex
+++ b/tex/chapters/realworld.tex
@@ -2,8 +2,8 @@
 
 To demonstrate the real time capabilities of the proposed method a real world scenario was chosen, namely indoor localization. 
 The given problem is to localize a pedestrian walking inside a building. 
-Ebner et al. proposed a method, which incorporate multiple sensors, e.g. Wi-Fi, barometer, step-detection and turn-detection \cite{Ebner-15}. 
-At a given time $t$ the system estimates a state consisting of the three-dimensional position.
+Ebner et al. proposed a method, which incorporates multiple sensors, e.g. Wi-Fi, barometer, step-detection and turn-detection \cite{Ebner-15}. 
+At a given time $t$ the system estimates a state providing the most probable position of the pedestrian.
 It is implemented using a particle filter with sample importance resampling and \SI{5000} particles.
 The dynamics are modelled realistically, which constrains the movement according to walls, doors and stairs. 
 
@@ -17,11 +17,11 @@ The bivariate state estimation was calculated whenever a step was recognized, ab
 
 \begin{figure}
 	\input{gfx/walk.tex}
-	\caption{Occurring bimodal distribution, caused by uncertain measurements. After \SI{20.8}{\second}, the distribution gets unimodal. The weigted-average estimation (blue) provides an high error compared to the ground truth (solid black), while the boxKDE approach (orange) does not. }
+	\caption{Occurring bimodal distribution at the start of the walk, caused by uncertain measurements. After \SI{20.8}{\second}, the distribution gets unimodal. The weigted-average estimation (blue) provides an high error compared to the ground truth (solid black), while the boxKDE approach (orange) does not. }
 	\label{fig:realWorldMulti}
 \end{figure}
 %
-Fig. \ref{fig:realWorldMulti} illustrates a frequently occurring situation, where the particle set splits apart, due to uncertain measurements and multiple possible walking directions.
+Fig.~\ref{fig:realWorldMulti} illustrates a frequently occurring situation, where the particle set splits apart, due to uncertain measurements and multiple possible walking directions.
 This results in a bimodal posterior distribution, which reaches its maximum distances between the modes at \SI{13.4}{\second} (black dotted line).
 Thus estimating the most probable state using the weighted-average results in the blue line, describing the pedestrian's position to be somewhere outside the building (light green area). 
 In contrast, the here proposed method (orange line) is able to retrieve a good estimate compared the the ground truth path shown by the black solid line.
@@ -30,7 +30,7 @@ This happens since the lower red particles are walking against a wall and thus p
 
 This example highlights the main benefits using our approach. 
 While being fast enough to be computed in real time, the proposed method reduces the estimation error of the state in this situation, as it is possible to distinguish the two modes of the density.
-It is clearly visible, that it enables the system to recover the real state if multimodalities arise.
+It is clearly visible, that this enables the system to recover the real state if multimodalities arise.
 However, in situations with highly uncertain measurements, the estimation error could further increase since the real estimate is not equal to the best estimate, i.e. the real position of the pedestrian.
 
 The error over time for different estimation methods of the complete walk can be seen in fig. \ref{fig:realWorldTime}.
@@ -45,7 +45,7 @@ Additionally, in most real world scenarios many particles share the same weight
 	\label{fig:realWorldTime}
 \end{figure}
 
-Further investigating fig. \ref{fig:realWorldTime}, the boxKDE performs slightly better then the weighted-average, however after deploying \SI{100} Monte Carlo runs, the difference becomes insignificant.
+Further investigating fig. \ref{fig:realWorldTime}, the boxKDE performs slightly better than the weighted-average, however after deploying \SI{100} Monte Carlo runs, the difference becomes insignificant.
 The main reason for this are again multimodalities caused by faulty or delayed measurements, especially when entering or leaving rooms. 
 Within our experiments the problem occurred due to slow and attenuated Wi-Fi signals inside thick-walled rooms. 
 While the system's dynamics are moving the particles outside, the faulty Wi-Fi readings are holding back a majority by assigning corresponding weights. 
@@ -53,7 +53,7 @@ Therefore, the average between the modes of the distribution is often closer to
 With new measurements coming from the hallway or other parts of the building, the distribution and thus the estimation are able to recover.
 
 Nevertheless, it could be seen that our approach is able to resolve multimodalities even under real world conditions. 
-It does not always provide the lowest error, since it depends more on an accurate sensor model then a weighted-average approach, but is very suitable as a good indicator about the real performance of a sensor fusion system.
+It does not always provide the lowest error, since it depends more on an accurate sensor model than a weighted-average approach, but is very suitable as a good indicator about the real performance of a sensor fusion system.
 At the end, in the here shown examples we only searched for a global maxima, even though the boxKDE approach opens a wide range of other possibilities for finding a best estimate. 
 
 %springt nicht so viel wie maximum
diff --git a/tex/chapters/relatedwork.tex b/tex/chapters/relatedwork.tex
index ac93b28..695b00b 100644
--- a/tex/chapters/relatedwork.tex
+++ b/tex/chapters/relatedwork.tex
@@ -6,7 +6,7 @@
 % -> Fourier transfom
 
 
-Kernel density estimation is well known non-parametric estimator, originally described independently by Rosenblatt \cite{rosenblatt1956remarks} and Parzen \cite{parzen1962estimation}.
+The Kernel density estimator is a well known non-parametric estimator, originally described independently by Rosenblatt \cite{rosenblatt1956remarks} and Parzen \cite{parzen1962estimation}.
 It was subject to extensive research and its theoretical properties are well understood.
 A comprehensive reference is given by Scott \cite{scott2015}.
 Although classified as non-parametric, the KDE depends on two free parameters, the kernel function and its bandwidth.
@@ -24,7 +24,7 @@ Various methods have been proposed, which can be clustered based on different te
 
 %  k-nearest neighbor searching
 An obvious way to speed up the computation is to reduce the number of evaluated kernel functions.
-One possible optimization is based on k-nearest neighbour search performed on spatial data structures.
+One possible optimization is based on k-nearest neighbour search, performed on spatial data structures.
 These algorithms reduce the number of evaluated kernels by taking the distance between clusters of data points into account \cite{gray2003nonparametric}.
 
 %  fast multipole method & Fast Gaus Transform 
@@ -38,16 +38,16 @@ They define a Fourier-based filter on the empirical characteristic function of a
 The computation time was further reduced by \etal{O'Brien} using a non-uniform fast Fourier transform (FFT) algorithm to efficiently transform the data into Fourier space \cite{oBrien2016fast}.
 
 % binning => FFT
-In general, it is desirable to omit a grid, as the data points do not necessary fall onto equally spaced points.
-However, reducing the sample size by distributing the data on a equidistant grid can significantly reduce the computation time, if an approximative KDE is acceptable.
+In general, it is desirable to omit a grid, as the data points do not necessarily fall onto equally spaced points.
+However, reducing the sample size by distributing the data on an equidistant grid can significantly reduce the computation time, if an approximative KDE is acceptable.
 Silverman \cite{silverman1982algorithm} originally suggested to combine adjacent data points into data bins, which results in a discrete convolution structure of the KDE.
 Allowing to efficiently compute the estimate using a FFT algorithm.
 This approximation scheme was later called binned KDE (BKDE) and was extensively studied \cite{fan1994fast} \cite{wand1994fast} \cite{hall1996accuracy}.
-While the FFT algorithm poses an efficient algorithm for large sample sets, it adds an noticeable overhead for smaller ones.
+While the FFT algorithm constitutes an efficient algorithm for large sample sets, it adds an noticeable overhead for smaller ones.
 
 The idea to approximate a Gaussian filter using several box filters was first formulated by Wells \cite{wells1986efficient}.
 Kovesi \cite{kovesi2010fast} suggested to use two box filters with different widths to increase accuracy maintaining the same complexity.
-To eliminate the approximation error completely \etal{Gwosdek} \cite{gwosdek2011theoretical} proposed a new approach called extended box filter.
+To eliminate the approximation error completely, \etal{Gwosdek} \cite{gwosdek2011theoretical} proposed a new approach called extended box filter.
 
 This work highlights the discrete convolution structure of the BKDE and elaborates its connection to digital signal processing, especially the Gaussian filter.
 Accordingly, this results in an equivalence relation between BKDE and Gaussian filter.