IPIN2018/tex/chapters/experiments.tex

\section{Experiments}

As explained at the very beginning of this work, we wanted to explore the limits of the here presented localization system.
By utilizing it to a 13th century historic building, we created a challenging scenario not only because of the various architectural factors, but also because of its function as a museum.
During all experiments, the museum was open to the public and had a varying number of \SI{10}{} to \SI{50}{} visitors while recording.

The \SI{2500}{\square\meter} building consists of \SI{6}{} different levels, which are grouped into 4 floors (see fig. \ref{fig:apfingerprint}).
Thus, the ceiling height is not constant over one floor and varies between \SI{2.6}{\meter} to \SI{3.6}{\meter}.
In the middle of the building is an outdoor area, which is only accessible from one side.
While most of the exterior and ground level walls are made of massive stones, the floors above are half-timbered constructions.
Due to different objects like exhibits, cabinets or signs not all positions within the building were walkable.
For the sake of simplicity we did not incorporate such knowledge into the floorplan.
Thus, the floorplan consists only of walls, ceilings, doors, windows and stairs.
It was created using our 3D map editor software based on architectural drawings from the 1980s.

Sensor measurements are recorded using a simple mobile application that implements the standard Android sensor functionalities.
As smartphones we used either a Samsung Note 2, Google Pixel One or Motorola Nexus 6.
The computation of the state estimation as well as the \docWIFI{} optimization are done offline using an Intel Core i7-4702HQ CPU with a frequency of \SI{2.2}{GHz} running \SI{8}{cores} and \SI{16}{GB} main memory.
However, similar to our previous, award-winning system, the setup is able to run completely on commercial smartphones as well as it uses C++ code \cite{torres2017smartphone}.
%Sensor measurements are recorded using a simple mobile application that implements the standard Android SensorManager.

The experiments are separated into four sections:
At first, we discuss the performance of the novel transition model and compare it to a grid-based approach.
In section \ref{sec:exp:opti} we have a look at \docWIFI{} optimization and how the real \docAPshort{} positions differ from it.
Following, we conducted several test walks throughout the building to examine the estimation accuracy (in \SI{}{\meter}) of the localisation system and discuss the here presented solutions for sample impoverishment.
Finally, the respective estimation methods are discussed in section \ref{sec:eval:est}.

\subsection{Transition}
To make a statement about the performance of our novel transition model presented within section \ref {}, we chose a simple scenario, in which a tester walks up and down a staircase three times.

\todo{Unser liebes Treppensteigen. Vergleich altes und neues Bewegungsmodell.}

\subsection{\docWIFI{} Optimization}
\label{sec:exp:opti}

%wie viele ap sind es insgesamt?
The \docAPshort{} positions as well as the fingerprints used for optimization can be seen in fig. \ref{fig:apfingerprint}.
As described in section \ref{sec:wifi} we used \SI{42}{} WEMOS D1 mini to provide a \docWIFI{} infrastructure throughout the building.
The position of every installed beacon was measured using a laser scanner.
This allows a comparison with the optimized \docAPshort{} positions.
Within all Wi-Fi observations, we only consider the beacons, which are identified by their well-known MAC address.
Other transmitters like smart TVs or smartphone hotspots are ignored as they might cause estimation errors.

Fig. compares optimized ap vs real positions for the ground level, thus we only illustrated optimized ap', which are really installed there. red created using the global optimization scheme, blue a optimized only for the rechteckigen ground floor.
%wie fingerprints aufgenommen, wie viele ...

\todo{Vom Journal Paper 2017 noch diese rote optimierungsgrafik. fig 5. Das wäre eigentlich auch echt nicht schlecht. und dazu auch die werte "results from the
(absolute) difference between model predictions and real-world values for each reference measurement"}

\begin{figure}[bt]
	\centering
  \includegraphics[width=0.9\textwidth]{gfx/floorplanDummy.png}
	\caption{Position of Ap's optimized with global and per floor and real.}
	\label{fig:apfingerprint}
\end{figure}


%kurze beschreibung was wir jetzt alles testen wollen.


%was kommt bei der optimierung raus. vergleichen mit ground truth. auch den fehler gegenüberstellen.
%man sollte sehen das ohne optimierung gar nichts geht.

\subsection{Localization Error}

\begin{figure}[ht]
	\centering
  	\includegraphics[width=0.9\textwidth]{gfx/floorplanDummy.png}
	\caption{All conducted walks.}
	\label{fig:floorplan}
\end{figure}
%
The 4 chosen walking paths can be seen in fig. \ref{fig:floorplan}.
Walk 0 is \SI{152}{\meter} long and took about \SI{2.30}{\minute} to walk.
Walk 2 has a length of \SI{223}{\meter} and Walk 3 a length of \SI{231}{\meter}, both required about \SI{6}{\minute} to walk.
Finally, walk 3 is \SI{310}{\meter} long and needs \SI{10}{\minute} to walk.
All walks were carried out be 4 different male testers using either a Samsung Note 2, Google Pixel One or Motorola Nexus 6 for recording the measurements.
All in all, we recorded \SI{28}{} distinct measurement series, \SI{7}{} for each walk.
The picked walks intentionally contain erroneous situations, in which many of the above treated problems occur.
Thus we are able to discuss everything in detail.
A walk is indicated by a set of numbered markers, fixed to the ground.
Small icons on those markers give the direction of the next marker and in some cases provide instructions to pause walking for a certain time.
The intervals for pausing vary between \SI{10}{\second} to \SI{60}{\second}.
The ground truth is then measured by recording a timestamp while passing a marker.
For this, the tester clicks a button on the smartphone application.
Between two consecutive points, a constant movement speed is assumed.
Thus, the ground truth might not be \SI{100}{\percent} accurate, but fair enough for error measurements.
The approximation error is then calculated by comparing the interpolated ground truth position with the current estimation \cite{Fetzer-16}.
An estimation on the wrong floor has a great impact on the location awareness of an pedestrian, but only provides a relatively small error.
Therefore, errors in $z$-direction are penalized by tripling the $z$-value.

%computation und monte carlo runs
For each walk we deployed 100 runs using \SI{5000}{particles} and set $N_{\text{eff}} = 0.85$ for resampling.
Instead of an initial position and heading, all walks start with a uniform distribution (random position and heading) as prior.
The overall localisation results can be see in table \ref{table:overall}.
Here, we differ between the respective anti-impoverishment techniques presented in chapter \ref{sec:impo}.
The simple anti-impoverishment method is added to the resampling step and thus uses the transition method presented in chapter \ref{sec:transition}.
In contrast, the $D_\text{KL}$-based method extends the transition and thus uses a standard cumulative resampling step.
We set $l_\text{max} =$ \SI{-75}{dBm} and $l_\text{min} =$ \SI{-90}{dBm}.
For a better overview, we only used the KDE-based estimation, as the errors compared to the weighted average estimation differ by only a few centimetres.

\newcommand{\STAB}[1]{\begin{tabular}{@{}c@{}}#1\end{tabular}}

\begin{table}[t]
	\centering
	\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|}
		\hline
		Method & \multicolumn{3}{c|}{none} & \multicolumn{3}{c|}{simple} & \multicolumn{3}{c|}{$D_\text{KL}$}\\
		\hline
		  & $\bar{x}$ & $\bar{\sigma}$ & $\tilde{x}_{75}$ & $\bar{x}$ & $\bar{\sigma}$ & $\tilde{x}_{75}$ & $\bar{x}$ & $\bar{\sigma}$ & $\tilde{x}_{75}$ \\
		\hline \hline
     	Walk 0 & \SI{1340}{\centi\meter} & \SI{1115}{\centi\meter} & \SI{2265}{\centi\meter} & \SI{715}{\centi\meter} & \SI{660}{\centi\meter} & \SI{939}{\centi\meter} & \SI{576}{\centi\meter} & \SI{494}{\centi\meter} & \SI{734}{\centi\meter} \\ \hline
		Walk 1 & \SI{320}{\centi\meter} & \SI{242}{\centi\meter} & \SI{406}{\centi\meter} & \SI{322}{\centi\meter} & \SI{258}{\centi\meter} & \SI{404}{\centi\meter} & \SI{379}{\centi\meter} & \SI{317}{\centi\meter} & \SI{463}{\centi\meter} \\ \hline
		Walk 2 & \SI{834}{\centi\meter} & \SI{412}{\centi\meter} & \SI{1092}{\centi\meter} & \SI{356}{\centi\meter} & \SI{232}{\centi\meter} & \SI{486}{\centi\meter} & \SI{362}{\centi\meter} & \SI{234}{\centi\meter} & \SI{484}{\centi\meter} \\ \hline
		Walk 3 & \SI{704}{\centi\meter} & \SI{589}{\centi\meter} & \SI{1350}{\centi\meter} & \SI{538}{\centi\meter} & \SI{469}{\centi\meter} & \SI{782}{\centi\meter} & \SI{476}{\centi\meter} & \SI{431}{\centi\meter} & \SI{648}{\centi\meter} \\
		\hline
	\end{tabular}
		\caption{Overall localization results using the different impoverishment methods. The error is given by the \SI{75}{\percent}-quantil Used only kde for estimation, since kde and avg nehmen sich nicht viel. fehler kleiner als 10 cm im durchschnitt deshalb der übersichtshalber weggelassen. }
	\label{table:overall}
\end{table}

All walks, except for walk 1, suffer in some way from sample impoverishment.
We discuss the single results of table \ref{table:overall} starting with walk 0.
Here, the pedestrians started at the top most level, walking down to the lowest point of the building.
The first critical situation occurs immediately after the start.
While walking down the small staircase, many particles are getting dragged into the room to the right due to erroneous Wi-Fi readings.
At this point, the activity "walking down" is recognized, however only a for very short period.
This is caused by the short length of the stairs.
After this period, only a small number of particles changed the floor correctly, while a majority is stuck within the right-hand room.
The activity based evaluation $p(\vec{o}_t \mid \vec{q}_t)_\text{act}$ prevents particles from further walking down the stairs, while the resampling step mainly draws particles in already populated areas.
In \SI{10}{\percent} of the runs using none of the anti-impoverishment methods, the system is unable to recover and thus unable to finish the walk somewhere near the correct position or even on the same floor.
Yet, the other \SI{90}{\percent} of runs suffer from a very high error.
Only by using one of the here presented methods to prevent impoverishment, the system is able to recover in \SI{100}{\percent} of cases.
Fig. \ref{fig:errorOverTimeWalk0} compares the error over time between the different methods for an exemplary run.
The above described situation, causing the system to stuck after \SI{10}{\second}, is clearly visible.
Both, the simple and the $D_\text{KL}$ method are able to recover early and thus decrease the overall error dramatically.
Between \SI{65}{\second} and \SI{74}{\second} the simple method produces high errors due to some uncertain Wi-Fi measurements coming from an \docAP{} below, causing those particles who are randomly drawn near this \docAPshort{} to be rewarded with a very high weight.
This leads to newly sampled particles in this area and therefore a jump of the estimation.
The situation is resolved after entering another room, which is now shielded by stone walls instead of wooden ones.
Walking down the stairs at \SI{80}{\second} does also recover the localization system using none of the methods.
%
\begin{figure}
	\centering
	\input{gfx/errorOverTimeWalk0/errorOverTime.tex}
	\caption{Error development over time of a single Monte Carlo run of walk 0. Between \SI{10}{\second} and \SI{24}{\second} the Wi-Fi signal was highly attenuated, causing the system to get stuck and producing high errors. Both, the simple and the $D_\text{KL}$ anti-impoverishment method are able to recover early. However, between \SI{65}{\second} and \SI{74}{\second} the simple method produces high errors due to the high random factor involved.}
	\label{fig:errorOverTimeWalk0}
\end{figure}

A similar behaviour as the above can be seen in walk 3.
Without a method to recover from impoverishment, the system lost track in \SI{100}{\percent} of the runs due to a not detected floor change in the last third of the walk.
By using the simple method, the overall error can be reduced and the impoverishment resolved. Nevertheless, unpredictable jumps of the estimation are causing the system to be highly uncertain in some situations, even if those jumps do not last to long.
Only the use of the $D_\text{KL}$ method is able to produce reasonable results.

As described in chapter \ref{sec:wifi}, we use a Wi-Fi model optimized for each floor instead of a single global one.
A good example why we do this, can be seen in fig. \ref{fig:wifiopt}, considering a small section of walk 3.
Here, the system using the global Wi-Fi model makes a big jump into the right-hand corridor and requires \SI{5}{\second} to recover.
This happens through a combination of environmental occurrences, like the many different materials and thus attenuation factors, as well as the limitation of the here used Wi-Fi model, only considering ceilings and ignoring walls.
Following, \docAPshort{}'s on the same floor level, which are highly attenuated by \SI{2}{\meter} thick stone walls, are neglected and \docAPshort{}'s from the floor above, which are only separated by a thin wooden ceiling, have a greater influence within the state evaluation process.
Of course, we optimize the attenuation per floor, but at the end this is just an average value summing up the \docAPshort{}'s surrounding materials.
Therefore, the calculated signal strength predictions do not fit the measurements received from the above in a optimal way.
In contrast, the model optimized for each floor only considers the respective \docAPshort{}'s on that floor, allowing to calculate better fitting parameters.
A major disadvantage of the method is the reduced number of visible \docAPshort{}'s and thus measurements within an area.
This could lead to an underrepresentation of \docAPshort{}'s for triangulation.

\begin{figure}[t!]
	\centering
  	\includegraphics[width=0.9\textwidth]{gfx/wifiOptGlobalFloor/combined_dummy.png}
	\caption{A small section of walk 3. Optimizing the system with a global Wi-Fi optimization scheme (blue) causes a big jump and thus high errors. This happens due to highly attenuated Wi-Fi signals and inappropriate Wi-Fi parameters. We compare this to a system optimized for each floor individually (red), resolving the situation a producing reasonable results.}
	\label{fig:wifiopt}
\end{figure}


%walk 1
Looking at the results of table \ref{table:overall} again, it can be seen that the $D_\text{KL}$ method is able to improve the results in three of the four walks.
Those walks have in common, that they suffer in some way from sample impoverishment or other problems causing the system to stuck.
The only exception is walk 1.
It was set up to provide a challenging scenario, leading to as many multimodalities as possible.
We intentionally searched for situations in which there was a great chance that the particle set would separate, e.g. by providing multiple possible whereabouts through crossings or by blocking and thus separating a straight path with objects like movable walls.
Similar to the other walks, we added different pausing intervals of \SI{10}{\second} to \SI{60}{\second}.
This helps to analyse how the particles behave in such situations, especially in this multimodal setting.

Besides uncertain measurements, one of the main sources for multimodalities are restrictive transition models, e.g. no walking through walls.
As shown in section \ref{sec:impo}, the $D_\text{KL}$ method compares the current posterior $p(\mStateVec_{t} \mid \mObsVec_{1:t})$ with the probability grid $\probGrid_{t, \text{wifi}}$ using the Kullback-Leibler divergence and a Wi-Fi quality factor.
Environmental restriction like walls are not considered while creating $\probGrid_{t, \text{wifi}}$, that is why the grid is not effected by a transition-based multimodal setting.
Given accurate Wi-Fi measurements, it is therefore very likely that $\probGrid_{t, \text{wifi}}$ represents a unimodal distribution, even if the particles got separated by an obstacle or wall.
This leads to a situation, in which posterior and grid differ.
As a result, the radius $r_\text{sub}$ increases and thus the diversity of particles.
We are able to confirm the above by examining the different scenarios integrated into walk 1.
For this, we compared the error development with the corresponding radius $r_\text{sub}$ over time.
In situations where the errors given by the $D_\text{KL}$ method and the simple method differ the most, $r_\text{sub}$ also increases the most.
Here, the radius grows to a maximum of $r_\text{sub} = $ \SI{8.4}{\meter}, using the same measurement series as in fig. \ref{fig:walk1:kdeovertime}.
In contrast, a real sample impoverishment scenario, as seen in walk 0 (cf. fig. \ref{fig:errorOverTimeWalk0}), has a maximum radius of \SI{19.6}{\meter}.
Nevertheless, such an slightly increased diversity is enough to influence the estimation error of the $D_\text{KL}$ in a negative way (cf. walk 1 in table \ref{table:overall}).
Ironically, this is again some type of sample impoverishment, caused by the aforementioned environmental restrictions not allowing particles inside walls or other out of reach areas.

%%estimation
\subsection{Estimation}
\label{sec:eval:est}

\todo{boxkde 0.2 point2(1,1);}

As mentioned before, the single estimation methods (cf. chapter \ref{sec:estimation}) only vary by a few centimetres in the overall localization error.
That means, they differ mainly in the representation of the estimated locations.
More easily spoken, in which way the estimated path is drawn and thus presented to the user.
Regarding the underlying particle set, different shapes of probability distributions need to be considered, especially those with multimodalities.
%
\begin{figure}[t]
	\centering
	\begin{subfigure}{0.48\textwidth}
		\resizebox{1\textwidth}{!}{\input{gfx/walk.tex}}
        \caption{}
        \label{fig:walk1:kde}
    \end{subfigure}
	\begin{subfigure}{0.50\textwidth}
		\resizebox{1\textwidth}{!}{\input{gfx/errorOverTimeWalk1/errorOverTime.tex}}
        \caption{}
        \label{fig:walk1:kdeovertime}
    \end{subfigure}
	\caption{(a) Occurring bimodal distribution caused by uncertain measurements in the first \SI{13.4}{\second} of walk 1. After \SI{20.8}{\second}, the distribution gets unimodal. The weigted-average estimation (blue) provides a high error compared to the ground truth (solid black), while the KDE approach (orange) does not. (b) Error development over time for the complete walk. From \SI{230}{\second} to \SI{290}{\second} to pedestrian was not moving. }
	\label{fig:walk1}
\end{figure}
%
The main advantage of a KDE-based estimation is that it provides the "correct" mode of a density, even under a multimodal setting (cf. section \ref{sec:estimation}).
That is why we again have a look at walk 1.
A situation in which the system highly benefits from this is illustrated in fig. \ref{fig:walk1:kde}.
Here, a set of particles splits apart, due to uncertain measurements and multiple possible walking directions.
Indicated by the black dotted line, the resulting bimodal posterior reaches its maximum distance between the modes at \SI{13.4}{\second}.
Thus, a weighted average estimation (blue line) results in a position of the pedestrian somewhere outside the building (light green area).
The ground truth is given by the black solid line.
The KDE-based estimation (orange line) is able to provide reasonable results by choosing the "correct" mode of the density.
After \SI{20.8}{\second} the setting returns to be unimodal again.
Due to a right turn the lower red particles are walking against a wall and thus punished with a low weight.

Although, situations as displayed in fig. \ref{fig:walk1:kde} frequently occur, the KDE-estimation is not able to improve the overall estimation results.
This can be seen in the corresponding error development over time plot given by fig. \ref{fig:walk1:kdeovertime}.
Here, the KDE-estimation performs slightly better then the weighted-average, however after deploying \SI{100}{} Monte Carlo runs, the difference becomes insignificant.
It is obvious, that the above mentioned "correct" mode, not always provides the lowest error.
In some situations the weighted-average estimation is often closer to the ground truth.
Within our experiments this happened especially when entering or leaving thick-walled rooms, causing slow and attenuated Wi-Fi signals.
While the system’s dynamics are moving the particles outside, the faulty Wi-Fi readings are holding back a majority by assigning corresponding weights.
Only with new measurements coming from the hallway or other parts of the building, the distribution and thus the KDE-estimation are able to recover.

This leads to the conclusion, that a weighted average approach provides a more smooth representation of the estimated locations and thus a higher robustness.

\todo{bild vom gesamten walk 2 und den unterschied zwischen weighted average estimation und kde estimation zeigen. wie sich das auf dne estimated path auswirkt. also der eine pfad springt viel und der andere ist halt smoother

vielleicht noch fig. 8 raus dafür. }

In contrast, a KDE-based approach for estimation is able to resolve multimodalities.
It does not always provide the lowest error, since it depends more on an accurate sensor model then a weighted average approach, but is very suitable as a good indicator about the real performance of a sensor fusion system.
At the end, in the here shown examples we only searched for a global maxima, even though this approach opens a wide range of other possibilities for finding a best estimate.

\begin{figure}[bt]
	\centering
  \includegraphics[width=0.9\textwidth]{gfx/estimationPath2/combined_dummy.png}
	\caption{Estimation results of walk 2 using the KDE method (orange) and the weighted-average (blue).}
	\label{fig:apfingerprint}
\end{figure}