Inferring movement patterns from geometric similarity

: Spatial movement data nowadays is becoming ubiquitously available, including data of animals, vehicles and people. This data allows us to analyze the underlying movement. In particular, it allows us to infer movement patterns, such as recurring places and routes. Many methods to do so rely on the notion of similarity of places or routes. Here we brieﬂy survey how research on this has developed in the past 15 years and outline challenges for future work.


Introduction
Due to technological advances more and more precise movement data is being collected. Movement data often comes in the form of tracking data using GPS, but can also be recorded in other ways. This data is collected and analysed in different application domains, such as ecology, sports, and urban planning. Also in every day life it is becoming more common to track ones movement, e.g., with a smart phone or watch. Such data allows to gain insights into the underlying movement, in particular to infer patterns of movement. A pattern generally refers to a recurring theme or aspect. In the context of movement, the most common patterns are a place repeatedly visited or a route repeatedly traveled. Other patterns for instance depend on the interaction of two or more entities, such as a followerleader pattern or joint movement [16,17,19,24,35].
Extracting movement patterns is a challenge that has been addressed by researchers from different disciplines in recent years. In particular, this has been addressed in application domains, such as movement ecology, in geographical information science, and in computer science. Two main questions arise for these patterns: how to effectively model them and how to efficiently detect them. Many patterns of movement, such as frequently visited places or routes, are based on geometric similarity of places or routes. That is, a frequently visited place or route is one that is similar to a place or route taken by many in the observed data. Hence these patterns can be determined by quantifying and detecting the underlying similarity.
Several different measures of similarity have been proposed for places, routes and other aspects of movement, or in terms of geometric objects: for points, curves and other objects. How similarity is defined determines the type of movement pattern that is detected. For instance, it makes a difference whether/how the similarity of places or routes take into account the temporal component or the geographic context of the movement. But also, how similarity is defined determines how efficiently movement patterns based on it can be computed.
In the past 15 years, computational movement analysis [18,23], which more broadly studies ways of analysing movement data, has been an active research area. However many challenges still remain, including how to handle uncertainty, how to include context or data from multiple sources, and very large amounts of data [13-15, 25, 26, 29].

Similarity measures
Many similarity measures have been proposed for different geometric objects. For point sets, a popular measure is the Hausdorff distance, which matches each point to its closest neighbor in the other set. Another measure is the Earth movers distance, which finds a global matching of both point sets. Whereas the first can be efficiently computed (in loglinear time), for the latter only approximation algorithms (with fairly high running times) are known. Beyond that several other measures exist [5,32].
For curves, also many different distance measures exist [6,32]. These differ, among other things, how they treat the time component of a curve resulting from a movement path. A simple measure is again the Hausdorff distance applied to the underlying point set of the curve. However, this ignores the course of the curves, which in contrast is taken into account by the Fréchet distance. Another popular similarity measure is the equal-time distance, i.e., comparing distances at equal time stamps. Of these three, the latter can be computed most efficiently (in linear time), the first nearly as efficiently (log-linear time), while the Fréchet distance is less efficient to compute (log-quadratic time). Furthermore, more distance measures have been considered, for instance variants of the measures above and of edit distance and dynamic time warping [30].
For geometric graphs, measuring the similarity becomes even more complex. In particular, the question arises, how to take into account both the geometry and topology of the graph in the comparison. Hence many different approaches to graph similarity exist [2]. Measures that solely consider topology, such as graph isomorphisms have little meaning for geometric graphs. Ideally, both the geometry and topology should be taken into account. Two measures that do this are a geometric edit distance for graphs [12] and graph similarity based on graph mappings [4]. www.josis.org

Movement patterns
In recent years, many patterns of movement have been considered [16,17,24,35]. Here we focus on those that inherently build on geometric similarity of the places visited.
Perhaps the most common movement pattern are recurring or popular places. This pattern has applications for many types of movement data, such as detecting home places and hot spots. A popular place can be modeled as a place often visited, more specifically, this can be modeled as a cluster of similar trajectory points, or by a disc or rectangle often visited by the given trajectories. Note that this pattern differs from the notion of a stay point in a trajectory, i.e., a location and time span, where a one entity remained in one place for some time. Detecting and removing stay points is often used in preprocessing of trajectory data [28]. For detecting hot spots in trajectory data several methods exist [21,27]. And for the more general problem of clustering arbitrary point sets a multitude of methods exists [33].
A second common movement pattern are recurring routes. These occur for instance as commuting or migration routes, or as parts of a road network. A natural model of recurring routes in a set of trajectories is as a set of similar subtrajectories. Here in particular, the computational complexity (i.e., how efficiently these can be computed) depends heavily on the similarity measure used and whether one seeks for similar subtrajectories or whole trajectories. For instance, the Fréchet distance is a good model for subtrajectory similarity, but costly to compute. And similar subtractories are harder to find, as one also needs to find start and endpoints of the subtrajectories. Hence several different approaches exist, which apply in different scenarios [34].
For recurring routes, several related problems occur: determine a single or a set of recurring routes, which cover all or only part of the data. Also finding a good (i.e., simple but realistic) representation for such a recurring route is another problem in itself [3,9].
Oftentimes, one does not want to find a single recurring route, or a set of such routes, but rather the entire underlying travel network. More specifically, this could be the street network used by cars in a city, or the migration network of one (or more) animal species. This can be modeled for instance by a graph that represents all the given trajectories. In recent years, much research has focused on the so-called map construction problem where the goal is to reconstruct the underlying road network. For this several different methods have been proposed [1,2]. In map construction, the aim is to construct the geometry and topology of the underlying network. Given the large amount and uncertainty of data, this quickly becomes a hard problem. Furthermore, for smaller inputs, e.g., groups of animals travelling together, one may want to extract further information, e.g., who was travelling with whom and when [10,11].
Another important type of movement pattern includes those that are defined based on the interaction of individuals. This includes the follower-leader and the joint movement pattern. The follower-leader pattern can be modeled by following similar routes with a slight delay [8,22], or that the follower always is behind (and sees) its leader [7]. A number of approaches have been proposed for detecting joint movement of several entities. In particular, the flock pattern [20,31], which models joint movement as a group of entities moving within a disk for some time, has received considerable attention.

Challenges
There are several challenges that apply to movement data in general, these include data storage, availability of algorithms, anonymisation of data, and linking data of different sources. Although many methods for detecting movement patterns have been proposed, many of these are not yet efficient, effective or reliable enough to deal with todays' large and varying data. Specifically, the following challenges remain: • Efficiency: Can we devise more efficient algorithms for very large data sets? Current algorithms for computing movement patterns have relatively high runtimes or may even be NP-hard. Different algorithmic approaches may include sampling, parallelization, or approximations. • Modeling and semantics: Can we include more semantics in the modeling and analysis of patterns? Movement patterns could be enriched with more information on the type of movement, motivation for movement and/or its geographical context. To do so, we could rely not only on the movement data itself, but also on other data types such as social data. • Data sources: How can other or multiple data sources be used to infer movement patterns? Geospatial tweets or check-in data for instance provide movement data at a much coarser level. Also, data providers often only make aggregate tracking data available due to privacy or other concerns. How can such aggregate data be used to infer movement patterns? And are there ways to aggregate data so that movement patterns can still be inferred? • Behavioral patterns and ethics: How can we model and infer behavioral movement patterns? Behavioral patterns for both humans and animals typically involve interacting with each other or with the geographic context, and hence are more complex to model. While detecting behavioral patterns has important applications for example in emergency management in urban areas, detecting such patterns in human data raises ethical issues that need to be addressed.
Solving these problems in particular requires close interdisciplinary collaborations between computer scientists and domain scientists from various disciplines including ecology, geography, urban studies, and social and behavioral sciences.