diff --git a/tex/chapters/kde.tex b/tex/chapters/kde.tex
index 4eba3fd..d89a942 100644
--- a/tex/chapters/kde.tex
+++ b/tex/chapters/kde.tex
@@ -13,19 +13,19 @@
 %In contrast, 
 The KDE is often the preferred tool to estimate a density function from discrete data samples because of its ability to produce a continuous estimate and its flexibility.
 %
-Given a univariate random sample $X=\{X_1, \dots, X_N\}$, where $X$ has the density function $f$ and let $w_1, \dots w_N$ be associated weights.
+Given a univariate random sample set $X=\{X_1, \dots, X_N\}$, where $X$ has the density function $f$ and let $w_1, \dots w_N$ be associated weights.
 The kernel estimator $\hat{f}$ which estimates $f$ at the point $x$ is given as
 \begin{equation}
 \label{eq:kde}
-\hat{f}(x) = \frac{1}{W} \sum_{i=1}^{N} \frac{w_i}{h} K \left(\frac{x-X_i}{h}\right)
+\hat{f}(x) = \frac{1}{W} \sum_{i=1}^{n} \frac{w_i}{h} K \left(\frac{x-X_i}{h}\right)
 \end{equation}
-where $W=\sum_{i=1}^{N}w_i$ and $h\in\R^+$ is an arbitrary smoothing parameter called bandwidth.
-$K$ is a kernel function such that $\int K(u) \dop{u} = 1$.
+where $W=\sum_{i=1}^{n}w_i$ and $h\in\R^+$ is an arbitrary smoothing parameter called bandwidth.
+$K$ is a kernel function such that $\int K(u) \dop{u} = 1$ \cite[138]{scott2015}.
 In general any kernel can be used, however the general advice is to chose a symmetric and low-order polynomial kernel.
-Thus, several popular kernel functions are used in practice, like the Uniform, Gaussian, Epanechnikov, or Silverman kernel \cite{scott2015}.
+Thus, several popular kernel functions are used in practice, like the Uniform, Gaussian, Epanechnikov, or Silverman kernel \cite[152.]{scott2015}.
 
-While the kernel estimate inherits all the properties of the kernel, usually it is not of crucial matter if a non-optimal kernel was chosen.
-As a matter of fact, the quality of the kernel estimate is primarily determined by the smoothing parameter $h$ \cite{scott2015}.
+While the kernel estimate inherits all the properties of the kernel, usually it is not of crucial matter if a non-optimal kernel was chosen \cite[151f.]{scott2015}.
+As a matter of fact, the quality of the kernel estimate is primarily determined by the smoothing parameter $h$ \cite[145]{scott2015}.
 %In theory it is possible to calculate an optimal bandwidth $h^*$ regarding to the asymptotic mean integrated squared error.
 %However, in order to do so the density function to be estimated needs to be known which is obviously unknown in practice.
 %
@@ -56,12 +56,12 @@ In general, reducing the size of the sample negatively affects the accuracy of t
 Still, the sample size is a suitable parameter to speedup the computation.
 
 Since each single sample is combined with its adjacent samples into bins, the BKDE approximates the KDE.
-Each bin represents the count of the sample set at a given point of a equidistant grid with spacing $\delta$.
+Each bin represents the \qq{count} of the sample set at a given point of a equidistant grid with spacing $\delta$.
 A binning rule distributes a sample $x$ among the grid points $g_j=j\delta$, indexed by $j\in\Z$.
 % and can be represented as a set of functions $\{ w_j(x,\delta), j\in\Z \}$.
 Computation requires a finite grid on the interval $[a,b]$ containing the data, thus the number of grid points is $G=(b-a)/\delta+1$.
 
-Given a binning rule $r_j$ the BKDE $\tilde{f}$ of a density $f$ computed pointwise at the grid point $g_x$ is given as
+Given a binning rule $b_j$ the BKDE $\tilde{f}$ of a density $f$ computed pointwise at the grid point $g_x$ is given as
 \begin{equation}
 \label{eq:binKde}
 \tilde{f}(g_x) = \frac{1}{W} \sum_{j=1}^{G} \frac{C_j}{h} K \left(\frac{g_x-g_j}{h}\right)
@@ -69,7 +69,7 @@ Given a binning rule $r_j$ the BKDE $\tilde{f}$ of a density $f$ computed pointw
 where $G$ is the number of grid points and
 \begin{equation}
 \label{eq:gridCnts}
-    C_j=\sum_{i=1}^{n} r_j(x_i,\delta)
+    C_j=\sum_{i=1}^{n} b_j(x_i,\delta)
 \end{equation}
 is the count at grid point $g_j$, such that $\sum_{j=1}^{G} C_j = W$ \cite{hall1996accuracy}.
 
@@ -77,7 +77,7 @@ In theory, any function which determines the count at grid points is a valid bin
 However, for many applications it is recommend to use the simple binning rule
 \begin{align}
 \label{eq:simpleBinning}
-    r_j(x,\delta) &=
+    b_j(x,\delta) &=
     \begin{cases}
         w_j & \text{if } x \in ((j-\frac{1}{2})\delta,  (j-\frac{1}{2})\delta ] \\
         0 & \text{else}
@@ -86,7 +86,7 @@ However, for many applications it is recommend to use the simple binning rule
 or the common linear binning rule which divides the sample into two fractional weights shared by the nearest grid points
 \begin{align}
 \label{eq:linearBinning}
-    r_j(x,\delta) &=
+    b_j(x,\delta) &=
     \begin{cases}
         w_j(1-|\delta^{-1}x-j|) & \text{if } |\delta^{-1}x-j|\le1 \\
         0				   & \text{else.}
diff --git a/tex/chapters/multivariate.tex b/tex/chapters/multivariate.tex
index bdab24c..23447f4 100644
--- a/tex/chapters/multivariate.tex
+++ b/tex/chapters/multivariate.tex
@@ -9,12 +9,12 @@ In order to estimate a multivariate density using KDE or BKDE a multivariate ker
 Multivariate kernel functions can be constructed in various ways, however, a popular way is given by the product kernel.
 Such a kernel is constructed by combining several univariate kernels into a product, where each kernel is applied in each dimension with a possibly different bandwidth.
 
-Given a multivariate random variable $X=(x_1,\dots ,x_d)$ in $d$ dimensions.
-The sample $\bm{X}$ is a $n\times d$ matrix defined as \cite[162]{scott2015}.
-The multivariate KDE $\hat{f}$ which defines the estimate pointwise at $\bm{x}=(x_1, \dots, x_d)^T$ is given as \cite[162]{scott2015}
+Given a multivariate random variable $\bm{X}=(x_1,\dots ,x_d)$ in $d$ dimensions.
+The sample set $\mathcal{X}$ is a $n\times d$ matrix \cite[162]{scott2015}.
+The multivariate KDE $\hat{f}$ which defines the estimate pointwise at $\bm{u}=(u_1, \dots, u_d)^T$ is given as
 \begin{equation}
 \label{eq:mvKDE}
-    \hat{f}(\bm{x}) = \frac{1}{W} \sum_{i=1}^{n} \frac{w_i}{h_1 \dots h_d} \left[  \prod_{j=1}^{d} K\left( \frac{x_j-x_{ij}}{h_j} \right)  \right]  \text{.}
+    \hat{f}(\bm{u}) = \frac{1}{W} \sum_{i=1}^{n} \frac{w_i}{h_1 \dots h_d} \left[  \prod_{j=1}^{d} K\left( \frac{u_j-x_{ij}}{h_j} \right)  \right]  \text{,}
 \end{equation}
 where the bandwidth is given as a vector $\bm{h}=(h_1, \dots, h_d)$.