OTHER2017/tex/chapters/experiments.tex

\section{Experiments}

	\todo{
		alles im FHWS gebäude [korrekte groesse fuer beide gebaeude!] mit nem nexus 6
	}

	Within all \docWIFI{} observations (offline and online) we only consider the \docAP{}s that are permanently installed
	within the building. Temporal and movable transmitters are ignored as they might cause estimation errors.


	modell direkt fuer den gelaufenen pfad optimiert (also wirklich jede wifi messung direkt auf den ground-truth)
	der fehler wird zwar kleiner, ist aber immernoch deutlich spürbar. das spricht dafür, dass das modell einfach nicht
	gut geeignet ist.

	optimierungs input: alle 4 walks samt ground-truth
	dann kommt fuer die 4 typen [fixed, all same par, each par, each par pos]
	log probability 50 75,	meter 50, 75


	% -------------------------------- optimization -------------------------------- %

	\subsection{Model optimization}

		As the signal strength prediction model is the heart of the absolute positioning component
		described in \ref{sec:system} we start with the model parameter estimation (see \ref{sec:optimization}) for
		\mTXP, \mPLE and \mWAF based on some reference measurements and compare the results
		between various optimization strategies and a basic empiric choice of \mTXP = \SI{-40}{\decibel{}m} @ \SI{1}{\meter}
		(defined by the usual \docAPshort{} transmit power for europe), a path loss exponent $\mPLE \approx $ \SI{2.5} and
		$\mWAF \approx$ \SI{-8}{\decibel} per floor/ceiling (made of reinforced concrete)  \todo{cite für werte}.

		Figure \ref{fig:referenceMeasurements} depicts the location of the used 121 reference measurements.
		Each location was scanned 30 times ($\approx$ \SI{25}{\second} scan time),
		non permanent \docAP{}s were removed, the values were grouped per physical transmitter (see \ref{sec:vap})
		and aggregated to form the average signal strength per transmitter.

		% used reference measurements
		\begin{figure}
			{
				\centering
				\input{gfx/all_fingerprints.tex}
			}
			\caption{
				Locations of the 121 reference measurements.
				The size of each square denotes the number of permanently installed \docAPshort{}s
				that are visible at this location,
				and ranges between 2 and 22 with an average of 9.
			}
			\label{fig:referenceMeasurements}
		\end{figure}

		% visible APs:
		% cnt(121)	min(2.000000)	max(22.000000)	range(20.000000)	med(8.000000)	avg(9.322314)	stdDev(4.386709)

		\begin{figure}[b]
			\centering
			\input{gfx/compare-wifi-in-out.tex}
			\caption{
				Measurable signal strengths of a testing \docAPshort{} (black dot).
				While the signal diminishes slowly along the corridor (upper rectangle)
				the metallised windows (dashed outline) attenuate the signal by over \SI{30}{\decibel} (lower rectangle).
			}
			\label{fig:wifiIndoorOutdoor}
		\end{figure}

		Figure \ref{fig:wifiIndoorOutdoor} depicts the to-be-expected issues by examining the signal strength
		values of the reference measurements for one \docAP{}.
		Even though the transmitter is only \SI{5}{\meter} away from the reference
		measurement (small box), the metallised windows attenuate the signal as much as \SI{50}{\meter}
		of corridor (wide box). The model described in section \ref{sec:sigStrengthModel} will not be able
		to match such situations, due to the lack of obstacle information.
		%
		We will thus look at various optimization strategies and the error between
		the resulting estimation model and our reference measurements:

		{\em\noOptEmpiric{}} uses the same three empiric parameters \mTXP{}, \mPLE{}, \mWAF{} for each \docAPshort{} in combination
		with its position, which is well known from the floorplan.

		{\em\optParamsAllAP{}} is the same as above, except that the three parameters are optimized
		using the reference measurements.

		{\em\optParamsEachAP{}} optimizes the three parameters per \docAP{} instead of using the same
		parameters for all.

		{\em\optParamsPosEachAP{}} does not need any prior knowledge and will optimize all six parameters
		(3D position, \mTXP, \mPLE, \mWAF) based on the reference measurements.

		{\em\optPerFloor{}} and {\em\optPerRegion{}} are just like \optParamsPosEachAP{} except that
		there are several sub-models that are optimized for one floor / region instead of the whole building.

		\todo{grafik, die die regionen zeigt???}

		Figure \ref{fig:wifiModelError} shows the optimization results for all strategies, which are as expected:
		The estimation error is indirectly proportional to the number of optimized parameters.
		However, even with {\em \optPerRegion{}} the maximal error is relatively high due to some locations that do
		not fit the model at all. Looking at the optimization results for \mTXP{}, \mPLE{} and \mWAF{} supports
		this finding. While the median for those values based on all optimized transmitters is totally sane
		(\SI{-42}{\decibel{}m}, \SI{2.4}, \SI{-6.0}{\decibel}), the minimum and maximum values are clearly outside of the physically possible range.

		The same holds for the estimated transmitter position when using {\em \optParamsPosEachAP{}}: The median
		distance between estimated and real position is $\sim$\SI{8}{\meter} and the maximum $\sim$\SI{27}{\meter}.
		For \SI{68}{\percent} of all installed transmitters, the estimated floor-number matched the real location.

		\begin{figure}
			\input{gfx/wifi_model_error_0_95.tex}
			%\input{gfx/wifi_model_error_95_100.tex}
			\caption{
				Comparison between different optimization strategies by examining the error (in \decibel) at each reference measurement.
				The higher the number of variable parameters, the better the model resembles real world conditions.
			}
			\label{fig:wifiModelError}
		\end{figure}

		% statds:
		%TXP:	cnt(34)	min(-67.698959)	max(4.299183)	range(71.998146)	med(-41.961170)	avg(-41.659286)	stdDev(17.742294)
		%EXP:	cnt(34)	min(0.932817)	max(4.699000)	range(3.766183)	med(2.380410)	avg(2.546959)	stdDev(1.074687)
		%WAF:	cnt(34)	min(-27.764957)	max(5.217187)	range(32.982143)	med(-5.921916)	avg(-7.579522)	stdDev(5.840527)
		%Pos:	cnt(34)	min(3.032438)	max(26.767128)	range(23.734690)	med(7.342710)	avg(8.571227)	stdDev(4.801449)

		While {\em \optPerRegion{}} is able to overcome the indoor vs. outdoor issues depicted in
		figure \ref{fig:wifiIndoorOutdoor} e.g. by using a separate bounding box just for the outdoor area,
		it obviously requires a profound prior knowledge when selecting the individual regions for the sub-model.
		%Such issues can only be fixed using more appropriate models that consider walls and other obstacles.

		% das ist wohl zu viel
		%\begin{figure}
		%	\centering
		%	\input{gfx/wifiOptApPosDifference.tex}
		%	\caption{zu viel, oder?}
		%\end{figure}


	% -------------------------------- number of fingerprints -------------------------------- %

		As we try to minimize the system's setup time as much as possible, we need to determine
		the amount of necessary reference measurements for the optimization to produce viable model parameters.
		Depending on the chosen model and thus the number of to-be-optimized parameters, more measurements are required.

		While there was almost no difference between using 121 or 30 reference measurements for
		{\em \optParamsAllAP{}} and {\em \optParamsEachAP{}}
		(average \SIrange{5.3}{5.4}{\decibel} and \SIrange{4.5}{5.0}{\decibel}),
		{\em \optPerRegion{}} is highly affected
		(average \SIrange{2.0}{6.2}{\decibel}), as it needs at least a certain number of measurements for each
		of its regions for the optimization to converge.

		\begin{figure}[b]
			\input{gfx/wifi_model_error_num_fingerprints_method_5_0_90.tex}
			\input{gfx/wifi_model_error_num_fingerprints_method_5_90_100.tex}
			\caption{%
				Impact of reducing the number of reference measurements during optimization on {\em \optPerRegion{}}.
				The model's cumulative error distribution is determined by comparing the its signal strength prediction against all 121 measurements.
				While using only \SI{50}{\percent} of the 121 scans has barely an impact on the error,
				30 measurements (\SI{25}{\percent}) are clearly insufficient.
			}%
			\label{fig:wifiNumFingerprints}%
		\end{figure}

		Figure \ref{fig:wifiNumFingerprints} depicts the impact of reducing the number of reference measurements
		during the optimization process for the {\em \optPerRegion{}} strategy.
		The error is determined by using the (absolute) difference between expected signal strength and
		the optimized model's corresponding prediction for all of the 121 reference measurements.
		%
		Considering only 60 of the 121 scans (\SI{50}{\percent}) yields a slightly increasing model error and still provides good results.
		While using only \SI{25}{\percent} of the reference  measurements increases the error rapidly,
		for \SI{75}{\percent} of the 121 considered cases the estimation is still better than using just empiric values without optimization.
		The extremely large outlier depicted in the lower half of figure \ref{fig:wifiNumFingerprints} (red line) relates to one
		sub-model with only one assigned reference measurement, where the optimized result is unable to predict values
		for the rest of the sub-model's region. \todo{versteht man das?}

		Additionally we examined the impact of skipping reference measurements for difficult locations
		like staircases, surrounded by steel-enforced concrete. While this slightly decreases the
		estimation error for all other positions (hallway, etc) as expected, the error within the skipped locations is dramatically
		increasing (see lower half of figure \ref{fig:wifiNumFingerprints}). It is thus highly recommended
		to also perform reference measurements for locations, that are expected to strongly deviate (signal strength)
		from their surroundings.


		%leaving out fingerprints for model 1
		%	 25%: cnt(1128)	min(0.007439)	max(27.804710)	range(27.797272)	med(4.404236)	avg(5.449720)	stdDev(4.470373)
		%	 50%: cnt(1128)	min(0.006027)	max(27.732193)	range(27.726166)	med(4.367859)	avg(5.437861)	stdDev(4.475426)
		%	 100%: cnt(1128)	min(0.000282)	max(27.705376)	range(27.705093)	med(4.272881)	avg(5.411202)	stdDev(4.493495)
		%	 noStair%: cnt(1128)	min(0.000801)	max(27.209221)	range(27.208420)	med(4.333328)	avg(5.459918)	stdDev(4.459484)

		%leaving out fingerprints for model 2
		%	 25%: cnt(1128)	min(0.000320)	max(29.752560)	range(29.752239)	med(3.837357)	avg(5.027578)	stdDev(4.617191)
		%	 50%: cnt(1128)	min(0.015305)	max(34.152130)	range(34.136826)	med(3.627090)	avg(4.635868)	stdDev(4.135866)
		%	 100%: cnt(1128)	min(0.000488)	max(25.687740)	range(25.687252)	med(3.319756)	avg(4.441193)	stdDev(3.912525)
		%	 noStair%: cnt(1128)	min(0.017693)	max(25.687740)	range(25.670048)	med(3.304321)	avg(4.507620)	stdDev(3.957071)

		%leaving out fingerprints for model 3
		%	 25%: cnt(1128)	min(0.003242)	max(39.470978)	range(39.467735)	med(3.371758)	avg(4.977330)	stdDev(5.213937)
		%	 50%: cnt(1128)	min(0.002808)	max(30.113415)	range(30.110607)	med(2.941238)	avg(4.015042)	stdDev(3.696969)
		%	 100%: cnt(1128)	min(0.000557)	max(16.813850)	range(16.813293)	med(3.056915)	avg(3.813013)	stdDev(3.062580)
		%	 noStair%: cnt(1128)	min(0.002518)	max(30.370636)	range(30.368118)	med(3.016884)	avg(3.983101)	stdDev(3.508327)

		%leaving out fingerprints for model 4
		%	 25%: cnt(1128)	min(0.000000)	max(62.233345)	range(62.233345)	med(2.502831)	avg(5.432897)	stdDev(8.664582)
		%	 50%: cnt(1128)	min(0.000000)	max(56.843803)	range(56.843803)	med(1.543137)	avg(2.937506)	stdDev(4.417061)
		%	 100%: cnt(1128)	min(0.000046)	max(33.175812)	range(33.175766)	med(1.537933)	avg(2.441976)	stdDev(2.793499)
		%	 noStair%: cnt(1128)	min(0.000000)	max(62.233345)	range(62.233345)	med(1.493668)	avg(2.744918)	stdDev(4.428092)

		%leaving out fingerprints for model 5
		%	 25%: cnt(1128)	min(0.000000)	max(62.620842)	range(62.620842)	med(2.140709)	avg(6.257105)	stdDev(11.638572)
		%	 50%: cnt(1128)	min(0.000000)	max(57.371948)	range(57.371948)	med(1.357452)	avg(2.982217)	stdDev(5.877471)
		%	 100%: cnt(1128)	min(0.000000)	max(14.837151)	range(14.837151)	med(1.251358)	avg(1.989277)	stdDev(2.189072)
		%	 noStair%: cnt(1128)	min(0.000000)	max(62.233345)	range(62.233345)	med(1.143669)	avg(2.316189)	stdDev(4.164822)


	% -------------------------------- wifi walk error -------------------------------- %

	\subsection{Location estimation error}

		\todo{übergang holprig}

		%Using the optimized model setups and the measurements $\mRssiVec$ determined by scanning for nearby \docAPshort{}s,
		%we can directly perform a location estimation by rewriting \refeq{eq:wifiProb}:
		For each of the discussed optimization strategies we can now determine the resulting localization accuracy.
		The position within the building that best fits some signal strength measurements $\mRssiVec$ received by the smartphone
		is the one that maximizes $p(\mPosVec \mid \mRssiVec)$ and can be rewritten as:

		\begin{equation}
			p(\mPosVec \mid \mRssiVec) =
				\frac{p(\mRssiVec \mid \mPosVec) p(\mPosVec)}{p(\mRssiVec)}
			\propto p(\mRssiVec \mid \mPosVec),\enskip
			p(\mPosVec) = p(\mRssiVec) = \text{const}
			.
			\label{eq:wifiBayes}
		\end{equation}

		Following \refeq{eq:wifiObs} and \refeq{eq:wifiProb}, the best
		location $\mPosVec^*$ given $\mRssiVec$ is the one that satisfies

		\begin{equation}
			\mPosVec^* = \argmax_{\mPosVec}
			\prod_{\mRssi_{i} \in \mRssiVec{}}
				\mathcal{N}(\mRssi_i \mid \mu_{i,\mPosVec}, \sigma^2)
			\label{eq:bestWiFiPos}
		\end{equation}

		where $\mu_{i,\mPosVec}$ is the signal strength for \docAP{} $i$
		at location $\mPosVec$ returned from the to-be-examined prediction model.
		For all comparisons we use a constant uncertainty $\sigma = $\SI{8}{\decibel}.

		The quality of the estimated location is determined by using the Euclidean distance between estimation
		$\mPosVec^*$ and the pedestrian's ground truth position at the time the scan $\mRssiVec$
		has been received.


		We therefore conducted 10 walks on 5 different paths within our building,
		each of which is defined by connecting marker points at well known positions
		(see figure \ref{fig:allWalks}).
		Whenever the pedestrian reached such a marker, the current time was recorded.
		Due to constant walking speeds, the ground-truth for any timestamp can be approximated
		using linear interpolation between adjacent markers.

		% walked paths
		\begin{figure}[t]
			{
				\centering
				\input{gfx/all_walks.tex}
			}
			\label{fig:allWalks}
			\caption{
				Overview of all conducted paths.
				Outdoor areas are marked in green.
			}
		\end{figure}

		\begin{figure}[b]
			\input{gfx/modelPerformance_meter.tex}
			\caption{
				Error between ground truth and estimation using \refeq{eq:bestWiFiPos} depending
				on the underlying signal strength prediction model.
				Extremely high errors between the \SIrange{90}{100}{\percent} quartile are related to bad \docWIFI{}
				coverage within outdoor areas (see figure \ref{fig:wifiIndoorOutdoor}).
			}
			\label{fig:modelPerformance}
		\end{figure}

		%To estimate the overall performance of the prediction models, we compare the position estimation
		%for each \docWIFI{} measurement within the recorded paths (3756 \docAPshort{} scans in total)
		%against the corresponding ground-truth, which indicates the absolute 3D error in meter.
		The position estimation for each \docWIFI{} measurement within the recorded walks (3756 scans in total)
		is compared against its corresponding ground-truth, indicating the 3D error.
		The resulting cumulative error distribution can be seen in figure \ref{fig:modelPerformance}.
		The quality of the location estimation directly scales with the quality of the signal strength prediction model.
		However, as discussed earlier, the maximal estimation error might increase for some setups.
		%
		This is either due to multimodalities, where more than one area is possible based on the recent
		\docWIFI{} observation, or optimization yielded an overadaption where the average signal
		strength prediction error is small, but the maximum error is dramatically increased for some regions.


	% -------------------------------- plots indicating walk issues -------------------------------- %

		\begin{figure}[t]
			\input{gfx/wifiMultimodality.tex}
			\caption{
				Location probability \refeq{eq:bestWiFiPos} for three scans. Higher color intensities are more likely.
				Ideally, places near the ground truth (black) are highly highly probable (green).
				Often, other locations are just as likely as the ground truth (blue),
				or the location with the highest probability does not match at all (red).
			}
			\label{fig:wifiMultimodality}
		\end{figure}

		Figure \ref{fig:wifiMultimodality} depicts aforementioned issues of multimodal (blue) or wrong (red) location
		estimations. Filtering (\refeq{eq:recursiveDensity}) thus is highly recommended, as minor errors are compensated
		using other sensors and/or a movement model that prevents the estimation from leaping within the building.
		However, if wrong sensor values (red) are observed for longer time periods, even filtering will produce erroneous
		results and might get stranded (density is trapped e.g. within a room),
		as the movement model is constrained by the actual floorplan.


	% -------------------------------- other distributions, unseen APs, etc -------------------------------- %

		To reduce the amount such of misclassifications, where other locations within the building are
		as likely as the pedestrians actual location, we examined various approaches.
		Unfortunately, none of which provided a viable enhancement under all conditions for the performed walks.

		The misclassification-rate is determined by counting the amount of (random) locations within
		the building that produce a similar probability \refeq{eq:wifiProb} as the actual ground-truth
		position.

		One possibility to dissolve such an equal \docWIFI{}-likelihood between two (or more) locations is,
		to not only consider the \docAPshort{}s seen by the Smartphone, but also the \docAPshort{}s not seen
		by the Smartphone. This additional information can be used to rule out all locations where this
		\docAP{} should be received (high signal strength from the prediction model).
		% There might be an \docAP{} that should be visible at the other locations. However,
		%as the Smartphone did not see this \docAPshort{} the other location can be ruled out.
		While this works in theory, evaluations revealed several issues:

		There is a chance that even a nearby \docAPshort{} is unseen during a scan due to packet collisions or
		temporal effects within the surrounding. It thus might make sense to opt-out other locations
		only, if at least two \docAPshort{}s are missing. On the other hand, this obviously requires (at least)
		two \docAPshort{}s to actually be different between the two locations, and requires a lot of permanently
		installed transmitters to work out.

		Furthermore, this requires the signal strength prediction model to be fairly accurate. Within our testing
		walks, several places are surrounded by concrete walls, which cause a harsh, local drop in signal strength.
		The models used within this work will not accurately predict the signal strength for such locations.
		%%Including \docAPshort{}s unseen by the Smartphone thus often increases the estimation error instead
		%%of fixing the multimodality.

		To sum up, while some situations, e.g. outdoors, could greatly be improved,
		many other situations are deteriorated, especially when some transmitters are (temporarily)
		attenuated by ambient conditions like concrete walls.


		We therefore examined variations of the probability calculation from \refeq{eq:wifiProb}.
		Removing the strongest/weakest \docAPshort{} from $\mRssiVec{}$ yielded similar results.
		While some estimations were improved, the overall estimation error increased for our walks,
		as there are many situations where only a handful \docAP{}s can be seen. Removing (valid)
		information will highly increase the error for such situations.

		Using a more strict exponential distribution for the model vs. scan comparison in \refeq{eq:wifiProb}
		had a positive effect on the misclassification error for some of the walks, but slightly increased
		the estimation error (see figure \ref{fig:normalVsExponential}) and thus produced negative side effects.

		\begin{figure}
			\input{gfx/wifiCompare_normalVsExp_cross.tex}
			\input{gfx/wifiCompare_normalVsExp_meter.tex}
			\caption{
				Comparison between normal- (black) and exponential-distribution (red) for \refeq{eq:wifiProb}.
				While misclassifications are slightly reduced (upper chart),
				the median error between ground-truth and estimation (lower chart) increases by
				about \SI{1}{\meter}.
			}
			\label{fig:normalVsExponential}
		\end{figure}


	\todo{
		erwähnen??? sigma je nach signalstärke anpassen bringt leider auch nichts. wenn man das aber macht,
		dann: fuer grosse signalstaerken ein grosses sigma! andersrum gehts nach hinten los!
	}


	% -------------------------------- final system -------------------------------- %

	% REAL WALKS
		\todo{obwohl das angepasste modell doch recht gut laeuft und der fehler recht klein wird, sind immernoch stellen dabei,
		wo es einfach nicht gut passt, unguenstige mehrdeutigkeiten vorliegen, oder regionen einfach nicht passen wie sie sollten.
		das liegt teils auch daran, dass die fingerprints drehend aufgenommen wurden und beim laufen nach hinten durch den
		menschen abgeschottet wird. auch zeitlicher verzug kann ein problem darstellen.}

		\todo{GPS ist leider kaum eine hilfe. entweder kein empfang wegen ueberdachung oder abschattung, oder
		zu kurz draußen um einen guten gps-fix zu bekommen.}

		\todo{
			walk1 hat eine issue kurz bevor man zur tuer zum hoersaalgebaude reingeht
			je nach resampling killt dieser wlan error evtl alle partikel!
		}

		\todo{
			das bbox modell hat probleme an den uebergängen zwischen bboxes da dort teils starke spruenge sind
			die nicht immer in der realität so auch vorliegen. z.B. z-wechsel machen teils probleme.
			hier wäre ein kontinuierliches modell hilfreich bzw interpolation in randbereichen
		}

		\todo{
			wenn ich beim fingerprinten einen AP an einer stelle NICHT gesehen habe,
			ist das auch eine aussage für die model optimierung.. da kann dann sicher keine signatlstaerke > -90 an der stelle raus kommen
		}

		\todo{gps wird so schnell nicht warm, versagt denn auf dem hof als hilfestellung}


ware das grid-model nicht da, wuerde der outdoor teil richtig schlecht laufen,
weil das wlan hier absolut ungenau ist.. da die partikel aber aufgrund des vorherigen
walks schon recht dicht beisamen sind, kittet das das ganze sehr gut.
kann man testen, indem man z.B. weniger resampling macht und mehr alte partikel aufhebt.
geht sofort kaputt sobald man aus dem gebäude raus kommt

signalstaerke limitieren, wie : alles was im model oder scan < -90 ist, wird auf -90 abgeschnitten hilft
zwar an manchen stellen, im groben und ganzen führt es aber eher zu fehlern als zu verbesserungen.
zudem ist zu erwarten, dass diese zahl stark vom geraet/hardware abhaengt

jeweils beim weighting die niedrigste wifi probability weglassen [je nach particle also ein anderer AP]
bringt auch nicht immer was.. killt gelegentlich floor-changes. zudem stehen am ende nur sehr wenige
APs zur verfügung. da einen zu ignorieren, macht noch mehr kaputt

auch ein versuch wie werfe alle APs aus dem handy-scan weg, die kleiner -90 sind, birgt die selben risiken
es scheint wirklich am sinnvollsten, die scan-daten einfach 1:1 zu nehmen wie sie sind


kurz vor ende von path 2 will die estimation nicht in die cafeteria, weil ein paar particle
die treppe richtung h.1.5 hochgehen und durch das wlan sehr sehr hoch gewichtet werden.
die mittelwert-estimation versagt hier


% was ist das??
%\input{gfx/wifi-opt-error-hist-methods.tex}
%\input{gfx/wifi-opt-error-hist-stair-outdoor.tex}
%outdoor hat insgesamt nicht all zu viel einfluss, da die meisten APs
%an den outdoor punkten kaum gesehen werden. auf einzelne APs kann
%der einfluss jedoch recht groß sein, siehe den fingerprint plot von
%dem einen ausgewählten AP


\todo{anfaenglich falsches heading ist gift, wegen rel. heading, weil sich dann alles verlaeuft. fix: anfaenglich große heading variation erlauben}

\todo{NICHT MEHR AKTUELL: abs-head ist in der observation besser, weil es beim resampling mehr bringt und dafuer srogt, dass die richtigen geloescht werden!}

\todo{ deutlich machen
	wenn man nur die fingerprints des floors nimmt in dem gelaufen wird, ist alles gut
sobald man andere floors drueber/drunter dazu nimmt, ist es nicht mehr gnaz so gut, oder wird schlechter
das spricht dafuer dass das modell nicht gut passt
koennte man zeigen indem man den durchschnittlichen fehler je fingerprint plottet???
}