IPIN2018/tex_review/chapters/transition.tex

\section{Transition}
\label{sec:transition}

	\begin{figure}[t]
		\centering
		\begin{subfigure}{0.325\textwidth}
			\centering
			\includegraphics[width=5.1cm]{gfx/transition/museumMap.pdf}
			\caption{3D Floorplan}
			\label{fig:museumMap}
		\end{subfigure}
		\begin{subfigure}{0.325\textwidth}
			\centering
			\includegraphics[width=5.1cm]{gfx/transition/museumMapGrid.pdf}
			\caption{Navigation graph}
			\label{fig:museumMapGrid}
		\end{subfigure}
		\begin{subfigure}{0.325\textwidth}
			\centering
			\includegraphics[width=5.1cm]{gfx/transition/museumMapMesh.pdf}
			\caption{Navigation mesh}
			\label{fig:museumMapMesh}
		\end{subfigure}
		\caption{
			Floorplan and automatically generated transition data structures for the ground floor of the historic building (\SI{71}{\meter}~x~\SI{53}{\meter}).
			\add{
				To reach every nook and cranny, the generated graph (b) requires many nodes and edges.
				The depicted version uses a coarse node-spacing of \SI{90}{\centi\meter} (1700 nodes) but barely reaches all doors and stairs.
				A navigation mesh generated for the same building required only 320 triangles (c) and reaches every corner within the building.
			}
		}
		\label{fig:transition}
	\end{figure}

	Within previous works, we used a graph of equidistant nodes (see \reffig{fig:museumMapGrid})
	to model the building's floorplan, representing the basis for the transition step \cite{Ebner-15, Ebner-16}.
	\add{
		It is created \emph{automatically}, based on the building's floorplan,
		which, in turn, results from \emph{manually} tracing available blueprint pictures within our editing software.
	}
	% in 15 und 16 haben wir stueckweise den graph eingefuhert
	%
	The resulting graph equals a grid, where each node constitutes the center of a grid-cell.
	\add{
		Each cell uses an empiric choice of \SI{30} x \SI{30}{\centi\meter} in size.
		The algorithm thus places a new cell every \SI{30}{\centi\meter},
		but only in regions that are actually walkable,
		and where the cell's outline does not intersect any wall or other obstacles.
		As cells are equidistant and axis aligned for performance reasons,
		the algorithm works reasonably well for rectangular buildings,
		matching the graph's coordinate system.
		For skewed floorplans, however, many periphery cells will intersect
		with walls and are thus omitted, reducing the quality of the representation.
		While smaller cells thus allow for a more accurate representation of the building,
		more cells are needed in total, increasing memory requirements for the smartphone.
	}
	After placement, each cell is connected with their, up to 8, potential
	neighbors in the plane \add{(left,right,top,bottom and diagonals)}.
	\add{
		Those connections are only added, if the neighbor is actually available,
		and the connection itself does not intersect any obstacles.
	}
	Doing so creates a walkable graph \add{of nodes and edges} for each floor.
	The graphs for each floor are hereafter connected via stairs or elevators,
	to form the final, walkable data structure for the whole building.
	This allows for (semi-)random walks along the graph, \add{modeling potential pedestrian movements}.
	\add{
		By assigning probabilities to each edge, using prior knowledge \eg{} provided by sensors,
		the random walk along those weighted edges denotes the transition probability
		$p(\mStateVec_{t} \mid \mStateVec_{t-1}, \mObsVec_{t-1})$ \cite{Ebner-16}.
	}

	Due to the equidistant spacing \add{every \SI{30}{\centi\meter}},
	the resulting graph is rather rigid and only well-suited for rectangular buildings.
	For more contorted buildings, like many historic ones,
	the node-spacing needs to be small, to reliably reach every door, stair
	and corner of the building.
	\add{For demonstration, we used a \SI{90}{\centi\meter} spacing} within \reffig{fig:museumMapGrid}.
	\add{As can be seen, the automatically generated grid} is barely able to reach all places within
	the lower floors of the building, and fails to connect the upper floors reliably.
	While using smaller spacings remedies the problem, it requires huge amounts of memory:
	\add{Up to hundred megabytes and millions of nodes and edges are realistic for larger buildings.}
	% musuem aus figure: 90cm grid : ca 2000 nodes, ca 6500 edges
	% museum aus figure: 30cm grid : ca 32k nodes und 120k edges
	% museum ganz, 20cm grid : ca 75k nodes, 280k edges

	Because of both, required memory amounts and inaccuracies of the graph-based
	model \add{depending on the spacing},
	we developed a new basis for the transition step, that is still able to answer
	$p(\mStateVec_{t} \mid \mStateVec_{t-1}, \mObsVec_{t-1})$,
	\add{but has a much smaller memory footprint while representing the real floorplan
	more accurately.}
	%
	The new foundation is provided by well-known navigation meshes \cite{navMesh1},
	where the walkable area is spanned by convex polygons.
	\add{
		As the polygons are neither axis-aligned nor fixed in shape and size,
		they can accurately represent skewed architectures, where
		the grid-based approach suffers from aforementioned flaws.
	}
	\add{
		Another main idea behind navigation meshes
		is presented by shared outline edges between adjacent polygons.
		It thus is always possible to walk from one polygon into another,
		if they are adjacent.
		Similar to the graph-based approach, adjacent polygons thus
		denote some sort of walkable surface.
		Just as before, the navigation mesh can be \emph{automatically}
		generated from the building's floorplan, based on
		various algorithms \cite{navMeshAlg1}.
	}
	Using variably shaped/sized elements instead of rigid grid-cells
	provides both, higher accuracy for reaching every corner, and a reduced
	memory footprint as a single polygon is able to cover arbitrarily
	large regions.
	\add{
		For a rectangular room, the number of elements needed for the graph-based approach depends
		on the room's size and the chosen spacing. The navigation mesh, however, only needs
		a single, four-sided polygon, to fully cover the rectangular shape of the room, independent of its size.
	}

	However, polygons impose several drawbacks on
	common operations used within the transition step, like checking whether
	a point is contained within some region. This is much more costly for polygons
	\add{ compared to axis-aligned, rectangular grid-cells.}
	% museum aus figure: 305 3-ecke
	% museum ganz : 789 fuer alles
	%
	Such issues can be mitigated by using triangles instead of polygons, depicted within \reffig{fig:museumMapMesh}.
	Doing so, each element within the mesh has exactly three edges and a maximum of three neighbors.
	\add{
		This usually requires some additional memory, as more triangles are need compared to polygons.
		For the example of the rectangular room, two adjacent triangles are required to form
		a rectangular shape.
		However, using triangles, operations such as aforementioned contains-check, can now easily be performed,
		\eg{} by using barycentric coordinates, yielding noticeable speedups compared to polygons.
	}

	\newcommand{\turnNoise}{\mathcal{T}}
	\newcommand{\stepSize}{\mathcal{S}}
	This data structure yields room for various strategies to be applied within the transition step.
	The most simple approach uses an average pedestrian step size together with the
	number of detected steps $\mObsSteps$ and change in heading $\mObsHeading$
	gathered from sensor observations $\mObsVec_{t-1}$.
	Combined with previously estimated position $(x,y,z)^T$ and heading $\mStateHeading$
	from $\mStateVec_{t-1}$
	, including uncertainties for step-size $\stepSize$
	and turn-angle $\turnNoise$,
	this directly defines new potential whereabouts
	$p(\mStateVec_{t} \mid \mStateVec_{t-1}, \mObsVec_{t-1})$:

	\begin{equation}
		\begin{aligned}
			x_t &=&		\overbrace{x_{t-1}}^{\text{old pos.}}&	&	&+& \overbrace{\mObsSteps \cdot \stepSize}^{\text{distance}}&	&	&\cdot& \overbrace{\cos(\mStateHeading_{t})}^{\text{direction}}&		&			,\enskip \turnNoise &\sim \mathcal{N}(\mObsHeading, \sigma_\text{turn}^2)				  \\
			y_t &=&		y_{t-1}\phantom{.}&						&	&+& \mObsSteps \cdot \stepSize&									&	&\cdot& \sin(\mStateHeading_{t})&										&			,\enskip \stepSize &\sim \mathcal{N}(\SI{70}{\centi\meter}, \sigma_\text{step}^2)	\\
			z_t &=&		\text{interpolated}	\\
			\mStateHeading_{t} &=& \mStateHeading_{t-1} + \turnNoise\\
		\end{aligned}
		\label{eq:navMeshTrans}
	\end{equation}
	\noindent{}with
	%
	\add{
	\begin{equation*}
			\mObsSteps,\mObsHeading	\in	\mObsVec_{t-1}
			\quad
			\text{,}
			\quad
			x_{t-1},y_{t-1},z_{t-1},\mStateHeading_{t-1}	\in	\mStateVec_{t-1}
			\quad
			\text{and}
			\quad
			x_{t},y_{t},z_{t},\mStateHeading_{t}	\in	\mStateVec_{t}
			\enskip.
	\end{equation*}
	}

	\add{
		The walk's starting triangle is the triangle that contains $(x_{t-1}, y_{t-1}, z_{t-1})^T$.
		The $z$-component is hereafter ignored, as it is defined by the mesh's triangles, which denote the walkable floor.
		Hereafter, \refeq{eq:navMeshTrans} is used to determine the walk's destination in $x$ and $y$ only.
		Whether the newly obtained destination $(x_t, y_t)^T$ is actually reachable from the start $(x_{t-1}, y_{t-1})^T$ can be determined
		by checking if there is a way from the starting triangle towards some other, nearby triangle that contains these coordinates.
		If so, the discarded $z$-component $z_t$ is determined using the barycentric coordinates of $(x_t, y_t)^T$
		within a 2D projection of the triangle the position belongs to, and applying them to the original 3D triangle.
		This can be though of walking along a 2D floor, and determining the floor's altitude for the 2D destination.
	}
	\add{
		If this destination is unreachable,
		\eg{} due to walls or other obstacles, different handling strategies are required.
	}
	Simply trying again might
	be a viable solution, as uncertainty induced by $\turnNoise$ and $\stepSize$ will yield a slightly different destination
	that might be reachable. Increasing $\sigma_\text{step}$ and $\sigma_\text{turn}$ for those cases might also be a viable choice.
	Likewise, just using some random position, omitting heading/steps might be viable as well.

	The detected steps $\mObsSteps$ and the heading change $\mObsHeading$ \add{used within above transition} are obtained by the smartphone's IMU.
	For the change in heading, we first need to rotate the gyroscope's readings onto the east-north-up frame using a suitable rotation matrix,
	\add{
		to determined, what the readings would look like, if the smartphone was placed parallel to the ground.
		The matrix is thus used to undo the rotation introduced by the pedestrian holding the phone.
		This rotation matrix is given by the matrix that rotates the current gravity readings from the accelerometer to
		the $(0,0,\SI{9.81}{\meter\per\square\second})^T$ vector.
		After applying the matrix to the gyroscope's readings,
		the pedestrian's change in heading (yaw) is given by integrating over the gyroscope's $z$-axis \cite{Ebner-15}.
	}
	It should be noted, that especially for cheap IMUs, as they can be found in most smartphones,
	the matrix has to be updated at very short intervals of one or two seconds to preserve good results \cite{davidson2017survey}.

To receive the number of steps, we use a very simple step detection based on the accelerometer's magnitude.
For this, we calculated the difference between the average magnitude over the last \SI{200}{\milli\second} and the gravity vector.
If this difference is above a certain threshold ($> \SI{0.32}{\m\per\square\s}$), a step is detected.
To prevent multiple detections within an unrealistic short interval, we block the complete process for \SI{250}{\milli\second} \cite{Koeping14}.
Of course, there are much more advanced methods as surveyed in \cite{davidson2017survey}, however this simple method has served us very well in the past.

	%\commentByFrank{es gaebe noch ganz andere ansaetze etc. aber wir haben wohl nicht mehr genug platz :P}
	%\commentByToni{ich denke aber auch, es langt.}