-
Notifications
You must be signed in to change notification settings - Fork 0
/
area_agg_seq_tsas.tex
3609 lines (3333 loc) · 126 KB
/
area_agg_seq_tsas.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[acmsmall,natbib=false]{acmart}
\input{area_agg_seq_tsas_setting}
\begin{document}
\title
[Finding Optimal Sequences for Area Aggregation] %short title
{Finding Optimal Sequences for Area Aggregation---\\
\texorpdfstring
{\Astar} %title of the paper
{A*} %title of the pdf
vs.\ Integer Linear Programming%avoid the space in ACM Reference Format
}
%\titlenote{Produces the permission block, and
% copyright information}
%\subtitle{Extended Abstract}
%\subtitlenote{The full version of the author's guide is available as
% \texttt{acmart.pdf} document}
\author{Dongliang Peng}
\authornote{Dongliang Peng did his main contribution
when he was a PhD student at University of W\"urzburg.}
\orcid{0000-0001-6848-3545}
\affiliation{%
\institution{GIS Technology,
Delft University of Technology}
\streetaddress{Julianalaan 134}
\city{Delft}
\state{South Holland}
\postcode{2628 BL}
\country{The Netherlands}
}
\email{http://www.gdmc.nl/staff/}
\affiliation{%
\institution{Chair of Computer Science I,
University of W\"urzburg}
\streetaddress{Am Hubland}
\city{W\"urzburg}
\state{Bavaria}
\postcode{D-97074}
\country{Germany}
}
\email{http://www1.informatik.uni-wuerzburg.de/peng}
\author{Alexander Wolff}
\orcid{0000-0001-5872-718X}
\affiliation{%
\institution{Chair of Computer Science I,
University of W\"urzburg}
\streetaddress{Am Hubland}
\city{W\"urzburg}
\state{Bavaria}
\postcode{D-97074}
\country{Germany}
}
\email{www1.informatik.uni-wuerzburg.de/wolff}
\author{Jan-Henrik Haunert}
\orcid{0000-0001-8005-943X}
\affiliation{%
\institution{Institute of Geodesy and Geoinformation,
University of Bonn}
\streetaddress{Meckenheimer Allee 172}
\city{Bonn}
\state{North Rhine-Westphalia}
\postcode{D-53115}
\country{Germany}
}
\email{www.geoinfo.uni-bonn.de/haunert}
\begin{abstract}
To provide users with maps of different scales and
to allow them to zoom in and out without losing context,
automatic methods for map generalization are needed.
We approach this problem for land-cover maps.
Given two land-cover maps at two different scales,
we want to find a sequence of small incremental
changes that gradually transforms one map into the other.
We assume that the two input maps consist of polygons
each of which belongs to a given land-cover type.
Every polygon on the smaller-scale map
is the union of a set of adjacent polygons
on the larger-scale map.
In each step of the \mine{computed} sequence,
the smallest area is merged with one of its neighbors.
We do not select that neighbor according to a prescribed rule
but compute the whole sequence of pairwise merges at once,
based on global optimization.
We have proved that this problem is NP-hard.
We formalize this optimization problem as that of
finding a shortest path in a (very large) graph.
We present the \Astar algorithm and integer linear programming
to solve this optimization problem.
To avoid long computing times, we allow the two methods to
return non-optimal results.
In addition, we present a greedy algorithm as a benchmark.
We tested the three methods with a
dataset of the official German topographic database ATKIS.
Our main result is that
\Astar finds optimal aggregation sequences for more instances
than the other two methods
within a given time frame.
%
%
\end{abstract}
%
% The code below should be generated by the tool at
% http://dl.acm.org/ccs.cfm
% Please copy and paste the code instead of the example below.
%
\begin{CCSXML}
<ccs2012>
<concept>
<concept_id>10010405.10010476.10010479</concept_id>
<concept_desc>Applied computing~Cartography</concept_desc>
<concept_significance>500</concept_significance>
</concept>
</ccs2012>
\end{CCSXML}
\ccsdesc[500]{Applied computing~Cartography}
\setcopyright{othergov}
\acmYear{2020} \acmVolume{1} \acmNumber{1} \acmArticle{1} \acmMonth{1} \acmPrice{15.00}\acmDOI{10.1145/3409290}
\keywords{Continuous map generalization, Land-cover area,
Type change, Compactness, Shortest path
% , Optimization, Exponential, NP-hard
}
\maketitle
\section{Introduction}
\label{sec:Introduction}
Maps are the main tool to represent geographical information.
As geographical information is usually scale-dependent
\parencite{Muller1995Generalization,Weibel1997},
users need to have access to maps at different scales.
In order to generate these maps,
national mapping agencies produce a base map
and then derive maps at smaller scales by
\emph{map generalization}.
More specifically,
map generalization is the process of extracting and arranging
geographical information from detailed data in order to produce maps
at smaller scales.
A requirement of map generalization is to emphasize the
essential while suppress the unimportant,
and at the same time maintain logical relationship between
objects \parencite{Weibel1997}.
As manual generalization is labor-intensive
\parencite{Duchene2014},
automating map generalization is a promising way
to produce up-to-date maps at high speed and low cost
\parencite{Mackaness2017Generalization}.
In our digital age, people interactively read maps
on computers and mobile phones.
An often used interaction is zooming.
When users zoom in or out,
a map must be changed to provide information
appropriate to the corresponding zoom level.
However, large discrete changes may distract users.
The \emph{on-the-fly generalization}~\parencite{Weibel2017Fly},
which generalizes map features (e.g., polylines and polygons)
in real time, can mitigate this problem.
Still, large discrete changes can be introduced.
By a usability test, \textcite{Midtbo2007} have shown that
a map is easier to follow
if the map extent changes smoothly than stepwise.
In addition to smoothly changing map extent,
we want to change also map features smoothly
when users are zooming.
We believe that this strategy will allow users
to follow maps even more easily and thus keep their context better.
The process of producing maps \mine{with a smooth scale transition}
is known as \emph{continuous map generalization} (CMG),
or simply \emph{continuous generalization}.
Ideally, there should be no discrete change in CMG.
However, the term is also used when
the discrete changes are small enough not to be noticed
\parencite[e.g.,][]{Suba2016Road}.
% as,
%e.g., \textcite{Suba2016Road} stated.
%We consider CMG as the process of producing maps
%covering a continuous interval of scales,
%where CMG considers constraints and quality measures
%that act across different scales.
%In our work, for example, the quality of an aggregation sequence
%is measured as a whole;
%whereas in classical generalization approaches,
%only the quality of a map at \mine{an} output scale is measured.
%We consider CMG possible, desirable, and realistic.
%In CMG, it is not necessary that
%all changes that occur during zooming are animated in a continuous fashion.
%That fashion may generally be possible.
A simple way \mine{of realizing CMG} is to fade in and fade out raster maps
as did by \textcite{Pantazis2009b},
but it is not at all clear
whether such animations would improve the experience of map users.
\mine{Furthermore, although it is possible to generalize raster maps,
we consider it more straightforward to work on vector maps.
For example, \textcite{Jaakkola1998Generalization}
generalized raster maps, but he used essentially the same object-based operators
that are common in the map generalization of vector data.
He even converted raster data to vector data
when he wanted to simplify the boundaries of the land-cover areas.
Therefore, our paper will work on a vector map of land-cover areas.}
%As , we would like to carry out the changes in a more elegant way;
%we would like to change map features gradually.
%He generalizes raster maps, using essentially the same object-based map generalization operators that are common in the generalization of vector data. I.e., patches consisting of pixels of equal land-cover types are considered as objects which can be merged with neighboring patches.
\mypar{Problem definition}
In this paper, we deal with CMG of land-cover areas.
The land-cover area is of significant importance on maps.
When users zoom out, some land-cover areas become
too tiny to be seen, which result in visual clutter.
In order to provide users
with good visual experience during zooming operations,
we propose to \mine{aggregate the tiny areas}
into their neighboring areas.
A \emph{land-cover map} is a planar subdivision in which
each area belongs to a land-cover \emph{class} or \emph{type}.
Suppose that there are two land-cover maps of different scales
that cover the same spatial region.
We consider the problem of finding a sequence
of small incremental changes that gradually transforms
the larger-scale map (the \emph{start map}) to
the smaller-scale map (the \emph{goal map}).
We use this sequence to generate and show land-cover maps at
intermediate scales (see \fig\ref{fig:AreaAgg_example}).
In this way, we try to avoid large and discrete changes
during zooming.
\begin{figure}[tb]
\centering
\includegraphics[page=1]{AreaAgg_Preliminaries}
\caption{The input and a possible output for an instance of
our problem.}
\label{fig:AreaAgg_example}
\end{figure}
With the same motivation,
a strategy of hierarchical schemes has been proposed.
This strategy generalizes a more-detailed representation
to obtain a less-detailed representation
based on small incremental changes,
e.g., the Generalized Area Partitioning tree (GAP-tree).
That tree can be constructed if only the larger-scale map is given
\citep{vanOosterom2005}
or if both the larger-scale map and the smaller-scale map
are given \citep{HaunertDilo2009}.
Typically, the next change in such a sequence
is determined locally, in a greedy fashion.
If we insist on finding a sequence that is optimal
according to some global measures,
the problem becomes complicated.
We assume that there exist many-to-one relationships
between the areas of the start map and the areas of the goal map.
This assumption is based on the fact that many algorithms
\parencite[e.g.,][]{HaunertWolff2010AreaAgg,
vanSmaalen2003,Oehrlein2017Aggregation}
result in many-to-one relationships
when aggregating land-cover areas.
Their inputs and generalized results together
can be used as our inputs.
However, we should not use those algorithms
to generate a sequence of maps at different scales
because those algorithms do not take into account
the consistency between the generated maps.
We use both a start map and a goal map instead of using only the start map
because our generated maps at intermediate scales should be able to
benefit from a goal map with high quality.
We term the areas of the goal map \emph{regions}.
That is, every region is the union of a set of areas
on the start map.
The type of a region may differ from the types of its components.
For example, a small water area together with
multiple adjacent forest areas may constitute
a large forest region on the \mine{goal} map.
However, we assume that every region, on the goal map,
contains at least one area of the same type on the start map.
Our assumptions hold if the goal map has been produced
with an automatic method for area aggregation,
for example, by the method of \citet{HaunertWolff2010AreaAgg}.
That method used a land-cover map at a larger scale as input and
produced a land-cover map at a single smaller scale.
Although \textcite{HaunertWolff2010AreaAgg}
attained results of high quality,
they did not produce a sequence of maps.
Our method can also be extended to find an aggregation sequence
for two maps (a start map and a goal map) that are from different sources.
In that case, one could compute a map overlay of the two maps
and use the result
(with combined boundaries from both input maps and
with land-cover classes
from the given larger-scale map)
as the start map.
In this paper, we try to find an optimal sequence
to aggregate the land-cover areas on the start map
one by one until we arrive at the goal map.
We first independently deal with each region of the goal map
(with its components on the start map).
Once we have found an aggregation sequence for each region,
we integrate all the sequences into an overall sequence,
which transforms the start map into the goal map
(see \fig\ref{fig:AreaAgg_IntegrateSequence}).
Our aggregation sequence may be incorporated into
the GAP-face tree \citep{vanOosterom2005},
the map cube model \citep{Timpf1998},
or ScaleMaster \citep{Brewer2007Guidelines,Touya2013ScaleMaster},
to support on-the-fly visualization.
Smoothly (dis-)appearing areas can be realized
by integrating our results into the \emph{space-scale cube}
\parencite{vanOosterom2014Support,vanOosterom2014tGAPSSC}.
\begin{figure*}[tb]
\centering
\includegraphics[page=2]{AreaAgg_Preliminaries}
\caption{Integrating two aggregation sequences
of different regions:
the resultant sequence contains the given sequences
as subsequences and
always takes the subdivision with the smallest patch next.
The gray arrows show the integration of the two regions.
}
\label{fig:AreaAgg_IntegrateSequence}
\end{figure*}
\mypar{Contribution}
We \mine{first} show \mine{some} related work (\sect\ref{sec:AreaAgg_RelatedWork}).
Then, we model our problem, present some basic concepts,
analyze the size of our model in a worst-case scenario,
and introduce our methods
(\sect\ref{sec:AreaAgg_Preliminaries}).
We define our cost functions
(\sect\ref{sec:AreaAgg_CostFunctions}).
We prove that our problem is NP-hard
(\sect\ref{sec:AreaAgg_NP-hardness}).
Then, we develop and compare three methods
for finding aggregation sequences.
First, we present
a greedy algorithm (\sect\ref{sec:AreaAgg_Greedy}).
Second, we develop a new global optimization approach
based on the \Astar algorithm
(\sect\ref{sec:AreaAgg_AStar}).
Third, we model our pathfinding problem as
an \emph{integer linear program} (ILP),
and we solve this ILP with minimizing our cost function
(\sect\ref{sec:AreaAgg_ILP}).
Our ILP uses binary (0--1) variables.
These variables help us to model our problem,
but in general, it is NP-hard
to solve an ILP optimally.
By comparing with the greedy algorithm,
which is used as a benchmark,
we are able to see whether \Astar or the ILP-based algorithm,
which are more complex and slower, indeed perform better.
Our case study
uses a dataset of the German topographic database ATKIS
(\sect\ref{sec:AreaAgg_CaseStudy}).
In the concluding remarks, we show possible ways to improve our
methods (\sect\ref{sec:AreaAgg_Conclusions}).
We do not simplify polylines in this paper.
The simplification can be handled separately from the
aggregation of areas, for example,
by using one method of
\textcite{Douglas1973},
\textcite{Saalfeld1999},
or \textcite{Wu2004DP}.
Those methods can be used to set up
the binary line generalisation tree (BLG-tree) of
\textcite{vanOosterom1995Development},
which is a hierarchical data structure that
defines a gradual line simplification process.
\mine{Note that simplifying polylines will not speed up
our three algorithms.
The reason is that we compute the areas of the polygons
and the lengths of the boundaries in a preprocessing.
These values will be associated with
a graph representation of the dataset,
where polygons are represented as vertices and
a boundary separating two polygons (no matter how detailed it is)
is represented with a single edge.
During computation, the three algorithms do not need to go through
the vertices of the land-cover areas.
}
Although splitting polygons is a good step
of generalizing land-cover areas,
we do not integrate it into our method at this moment.
Some examples of splitting are as follows.
\textcite{Smith2007MasterMap,Thiemann2018LandCover}
proposed to split tiny polygons
and then to merge the split parts into the neighboring polygons.
\textcite{Meijers2016Split} developed an algorithm
to split a polygon (the splittee)
based on a constrained Delaunay triangulation.
During splitting, their algorithm is capable of
taking into account
the attractivenesses between the splittee and its neighbors;
when merging, a more attractive neighbor
will get a larger portion from the splittee.
\section{Related Work}
\label{sec:AreaAgg_RelatedWork}
\subsection{Continuous map generalization}
\emph{Continuous map generalization} (CMG)
has received a lot of attention
from cartographers and computer scientists.
\Textcite{vanKreveld2001} proposed five gradual changes
to support the continuous zooming of maps,
which are \emph{moving}, \emph{rotating}, \emph{morphing},
\emph{fading}, and \emph{appearing}.
He suggested using these gradual changes
to adapt discrete generalization operators for CMG.
\textcite{Sester2005_CG} suggested simplifying building
footprints based on small incremental steps
to animate each step smoothly.
\textcite{Li2012Continuous} built hierarchies of road segments,
which they then used to omit road segments
from lower levels of the hierarchy.
Moreover, they evaluated similarities
between their results and existing maps.
\textcite{Peng2017Building} gradually grew
buildings to built-up areas by aggregating buildings
whenever they became too close.
\textcite{Touya2017Progressive} progressively replaced
buildings with blocks.
In addition, their method automatically inferred landmarks
and put the landmarks on top of the blocks.
\textcite{Suba2016Road} continuously generalized road networks
which are represented as a set of areas.
Their method repeatedly finds the least-important area
and then either merges it with an adjacent area
or collapses it to a line segment.
\textcite{Danciger2009} investigated the growing of regions,
while preserving topology, area ratios, and
relative positions.
The strategy of using two maps at different scales
to generate intermediate-scale maps has been studied in multiple
representations, e.g., with respect to the selection of roads or
rivers~\parencite{Peng2012River,Girres2014}.
Actually, that strategy is the key idea of the
morphing-based methods for CMG.
In order to morph from one polyline to another polyline,
which respectively represent, say, roads on a larger-scale map
and a smaller-scale map, we first need to compute
corresponding points between them
\parencite[e.g.,][]{Cecconi2003,Noellenburg2008,Chazal2010BallMap,
Deng2015,Li2017_Building,Li2017Annealing}.
Then morphing can be realized by interpolating a set of
intermediate polylines.
% Among the methods of computing corresponding points,
\textcite{Noellenburg2008} computed an optimum
correspondence between two given polylines
according to some cost functions.
While straight-line trajectories
are often used for interpolation
\parencite[e.g.,][]{Cecconi2003,Deng2015},
\textcite{Whited2011BallMorph} considered four other alternatives,
i.e., \emph{hat}, \emph{tangent}, \emph{circular},
and \emph{parabolic} paths
based on so-called \emph{ball-map}
\parencite{Chazal2010BallMap}.
\textcite{Peng2016Admin} morphed county boundaries
to provincial boundaries.
For county boundaries that do not have corresponding
provincial boundaries, they generated the correspondences based
on \emph{compatible triangulations}.
\Textcite{vanOosterom2014Support} used
a data structure called
\emph{smooth topological generalized area partitioning}
to support visualizing CMG.
One of their contributions is that
a polygon merges another polygon continuously
by expanding over the latter.
\textcite{Huang2017Matrix} proposed a matrix-based structure
to support CMG,
using a river network as an example.
For a given scale,
their structure yields the rivers that should be kept
as well as how much these rivers should be simplified.
\subsection{Optimization in map generalization}
Map generalization generally specifies
and takes into account requirements
in order to produce maps of high quality
\parencite{Stoter2009Requirements}.
We categorize requirements as hard and soft constraints.
For example,
when we aggregate a land-cover area into another,
the type of the former is changed to the type of the latter.
In this problem, a hard constraint could be that
we aggregate only two areas at each step
in order to keep changes small
(see for example \fig\ref{fig:AreaAgg_example}).
A soft constraint could be that
we wish to minimize the changes of types, e.g.,
we prefer aggregating a grass area into a farm area
rather than into a settlement area.
This is a typical \emph{optimization} problem,
where we stick to hard constraints and
try to fulfill soft ones as well as possible.
Optimization for map generalization is important
not only because it finds optimal solutions,
but also because it helps us evaluate the quality of a model
\parencite{Haunert2017Label,
Haunert2008Assuring,Haunert2016Optimization}.
When we wish to minimize the changes of types
in aggregating areas one by one,
a model could be to minimize
the \emph{greatest} change over all the steps.
Using a greedy algorithm,
we can minimize the change at each step,
but the result does not necessarily minimize the greatest change
over all the steps.
If a result is bad,
we cannot tell if the bad result comes from the model
or from the greedy algorithm.
Using optimization,
we are able to find optimal solutions
of the model at least for small instances.
If even an optimal solution is bad,
then we can exclude that the bad result
is from the greedy algorithm.
That is to say,
the bad result is because of the model.
In this case, we should improve the model;
we may want to minimize
the \emph{average} change over all the steps.
Moreover, optimization is useful for evaluating heuristics.
We need heuristics because
many optimization problems cannot be solved efficiently
\parencite[e.g.,][]{HaunertWolff2010AreaAgg,Haunert2016Partition}.
While heuristics can find some solutions in reasonable time,
it is important to know the quality of
these solutions.
Fortunately, we can often find an optimal solution when
the size of an instance is sufficiently small.
Consequently, we are able to evaluate
the quality of a heuristic
by comparing its results with optimal solutions
on small instances.
Optimization has been widely used in map generalization.
For example, \textcite{Harrie1999} displaced objects
based on least-squares adjustments (LSA)
to solve spatial conflicts.
In his problem, the soft constraints
for shapes and locations may contradict each other.
Therefore, it is necessary
to mediate between these constraints,
which can be done by LSA.
\textcite{Sester2005Optimization} used LSA not only for
displacing objects but also for simplifying buildings.
She required that
the output boundaries should be
as close to the original buildings as possible.
\textcite{Tong2015AreaLSA} generalized land-cover areas,
where LSA was used to preserve the sizes of
the land-cover areas.
\textcite{Regnauld2001} grouped buildings based on
minimum spanning trees in order to typify
the buildings in a group.
\textcite{Burghardt2005Snakes} smoothed lines based on
energy minimization.
According to his setting, a line contains less energy
if it is smooth and close to the original line.
He repeatedly displaced the line
until a stable state
in terms of minimizing his energy function is found.
\textcite{HaunertWolff2010AreaAgg} aggregated land-cover areas
based on mixed-integer linear programming
to generate a map at a target scale.
Their method is based on a global optimization.
They minimize a combination of type changes and cost for non-compact shapes
while satisfying constraints on the sizes of the output regions.
\textcite{Haunertwolff2010Building} simplified building
footprints by solving an integer linear program (ILP).
They aimed at minimizing the number of edges in the output
under the restriction that
the simplified buildings must be topologically safe,
that is, the selected and extended edges must not intersect with
each other.
\textcite{Oehrlein2017Aggregation} aggregated the departments
of France according to unemployment rates based on integer
linear programming; they used a cutting-plane method to
speed up solving their ILP.
\textcite{Funke2017Simplification} simplified
administrative boundaries based on an ILP.
Their aim was to minimize the number of edges
while keeping the resultant boundaries close to the original ones
and avoiding any intersection.
At the same time, they required that every city,
represented by a point,
stays in the same face as before the generalization.
\subsection{Optimization in \emph{continuous} map generalization}
Optimization becomes more delicate
when we deal with CMG.
In this field, there are requirements
not only for a specific map
but also for relations between maps at difference scales.
%
Some optimization techniques have been applied to CMG.
In the aforementioned article,
\textcite{Noellenburg2008} used dynamic programming
to match points of two polylines to support morphing
according to some matching costs.
\textcite{sahw-oarps-ICAGW13} used
mixed-integer linear programming
to select points of interest.
They required that a point, once disappeared,
should not show up again during zooming out.
They also required that any two points should be
sufficiently far away from each other.
Based on these requirements,
they wanted to show as many points as possible
for a given scale interval.
\textcite{Chimani2014Eat} computed a deletion sequence
for a road network by integer linear programming
and efficient approximation algorithms.
They wanted to delete a stroke,
which is a sequence of edges, at each step
while keeping the remaining network connected.
They assigned each edge a weight,
and their objective was to maximize the total weight
over all the road networks of all the steps.
\textcite{Peng2013LSA} defined trajectories
based on LSA for morphing between polylines.
Their method required both angles in vertices and edge lengths
change linearly.
As those requirements may not agree with each other,
their method mediates between them using LSA.
\section{Preliminaries}
\label{sec:AreaAgg_Preliminaries}
We show how to compute an aggregation sequence
for a single region, $R$.
For a goal map with many regions,
we ``interleave'' the sequences for each of them
with respect to the order of the smallest patches
(see for example \fig\ref{fig:AreaAgg_IntegrateSequence}).
This integration is similar to the merge step in the
Mergesort algorithm;
see \textcite[\sect2.3]{Cormen2009}.
To allow us to describe our method more easily,
below we assume that the goal map has only one region.
This region consists of~$n$ land-cover
areas (components) on the start map.
In other words, the union of the~$n$ land-cover areas
is the only region on the goal map.
To find a sequence of small changes
that transforms the start map into the goal map,
we require that
every change involves only two areas of the current map.
More precisely, in each step the smallest area~$u$ is aggregated
with one of its neighbors~$v$
($v$ does not have to be the smallest neighbor)
such that~$u$ and~$v$ are replaced by their union.
The type of the union is restricted to be
the type of either~$u$ or~$v$.
If the union uses the type of~$u$,
we say that area~$v$ is \emph{aggregated into} area~$u$,
and vice versa.
How to aggregate exactly is decided by
optimizing a global cost function
(see \sect\ref{sec:AreaAgg_CostFunctions}).
This requirement ensures that the
size of the smallest area on the map increases in each step.
Hence, the sequence reflects a gradual reduction of the
map's scale.
From another perspective,
we consider the smallest area as the least important,
rather than involving more rules for (non-)importance.
Even though the requirement reduces
the number of possible solutions,
there is still enough room for optimization
since we leave open with
which of its neighbors the smallest area is aggregated.
We term a sequence of changes
that adheres to our smallest-first requirement simply
an \emph{aggregation sequence}.
\subsection{Model}
\label{sec:AreaAgg_Model}
We consider a directed graph $G_\mathrm{S}$,
which we call the \emph{subdivision graph}
(see \fig\ref{fig:AreaAgg_SubdivisionName}).
The node set $V_\mathrm{S}$ of~$G_\mathrm{S}$
contains nodes for all the
possible maps (or \emph{subdivisions}), including the start map,
all possible intermediate-scale maps, and the goal map.
The arc set~$E_\mathrm{S}$ of~$G_\mathrm{S}$
contains an arc $(\Pnode, P_{t+1,j})$
between any two maps~$\Pnode$ and~$P_{t+1,j}$ in~$V_\mathrm{S}$
if~$P_{t+1,j}$ can be reached from~$\Pnode$
with a single aggregation operation,
involving a smallest area.
On this basis, any directed path in~$G_\mathrm{S}$
from the start map to the goal map
defines a possible aggregation sequence.
\begin{figure}[tb]
\centering
\includegraphics[page=3]{AreaAgg_Preliminaries}
\caption{The subdivision graph, $G_\mathrm{S}$.
The nodes of the graph are the subdivisions.
There is an arc from subdivision~\Pnode
to subdivision~${P}_{t+1,j}$
if~${P}_{t+1,j}$ is the result of
an aggregation step from~\Pnode.}
\label{fig:AreaAgg_SubdivisionName}
\end{figure}
\subsection{Notation}
\label{sec:AreaAgg_Notation}
We represent each land-cover area by a polygon with a type.
We denote by~$P$ the set of polygons on the start map.
We use variables~$p$, $q$, $r$, or~$o$ to denote polygons.
A \emph{patch} is a set of polygons whose union is connected.
Each patch also has a unique land-cover type.
We use variables~$u$ or~$v$ to denote the patches.
Recall that we are dealing with a single region and
there are~$n$ land-cover areas on the start map in this region.
Hence, the desired aggregation sequence consists of~$n-1$ steps.
There are~$n$ subdivisions on a path
from the start map to the goal map.
We use~$t \in T =\{1,2,\dots,n\}$ to denote \emph{time}.
When~$t=1$, the subdivision consists of~$n$ patches,
and there is only one patch remaining when~$t=n$.
The subdivision graph consists of layers~${L}_1,\dots,{L}_n$,
where layer~${L}_t=\{{P}_{t,1},\dots,{P}_{t,n_t}\}$
contains every possible subdivision~\Pnode\ with~$n-t+1$ patches
(see \fig\ref{fig:AreaAgg_ExponentialSize}).
\begin{figure}[tb]
\centering
\includegraphics[page=4]{AreaAgg_Preliminaries}
\caption{An example to show that
the size of subdivision graph~$G_\mathrm{S}$
has exponential lower bound.}
\label{fig:AreaAgg_ExponentialSize}
\end{figure}
Sometimes, we need a graph to represent the adjacencies of
the land-cover areas in a subdivision,
we call such a graph $G_\mathrm{A}$
(see \fig\ref{fig:AreaAgg_Variables_Graph}).
\begin{figure}[tb]
\centering
\includegraphics[page=3]{AreaAgg_ILP}
\caption{The graph of a subdivision.
Each polygon of the subdivision is represented as a node
in the graph.
There is an edge between two nodes
if the corresponding two polygons are adjacent.
}
\label{fig:AreaAgg_Variables_Graph}
\end{figure}
\subsection{Exponential lower bound}
\label{sec:AreaAgg_Exponential}
We now analyze the size of subdivision graph~$G_\mathrm{S}$.
Our analysis is inspired by \textcite{Keane1975Size},
where we use a row of~$n$ land-cover areas.
In our instance
(see \fig\ref{fig:AreaAgg_ExponentialSize}),
the start map consists of~$n=2k+1$ rectangular patches,
and the goal map is simply the union of the~$n$ patches.
From left to right on the start map,
the patches have area sizes~$
%\left(
100 + \frac{1}{n}, 99+ \frac{2}{n},
100 + \frac{3}{n}, 99+ \frac{4}{n}, \ldots,
99 + \frac{n-1}{n}, \text{and~} 101
%\right)
$.
According to our rule, we always
aggregate the smallest patch with one of its neighbors.
Therefore, in the first~$k$ steps,
we aggregate every other patch with one of its neighbors.
However, we do not know which one is the right choice
at each of the steps
in order to minimize our costs (see \sect\ref{sec:AreaAgg_CostFunctions}).
We have to try both of the two choices,
aggregating with the left patch or with the right one.
As a result, there are~$2^k = 2^{(n-1)/2}$ subdivisions
in layer~$L_{k+1}$.
That is to say, the size of subdivision graph~$G_\mathrm{S}$
has exponential lower bound.
\subsection{Methods}
\label{sec:AreaAgg_Methods}
Our idea is to obtain an optimal aggregation sequence through
computing a path with minimum weight, from the start to the goal
(see \fig\ref{fig:AreaAgg_SubdivisionName}).
This idea obviously requires that the arc weights are
set such that a minimum-weight start--goal
path does actually correspond to an aggregation sequence of
maximum cartographic quality.
Moreover, putting the idea to practice is far from trivial
since graph~$G_\mathrm{S}$ can be huge.
We compare a greedy algorithm, \Astar, and an ILP-based algorithm
in finding such paths.
Note that our inputs are only
subdivisions~$\Pstart$ and~$\Pgoal$
(see \fig\ref{fig:AreaAgg_SubdivisionName}).
We generate an intermediate subdivision (node) only
when we want to visit it.
In directed acyclic graphs, shortest paths can be found slightly faster
than in general directed or undirected graphs.
An off-the-shelf shortest-path algorithm for directed acyclic graphs
%\parencite[e.g.,][\sect25.2]{Cormen2009},
(e.g., \textcite[\sect25.2]{Cormen2009}),
however, will explore the whole graph, which has exponential size.
The \Astar algorithm can be seen as
a refinement of Dijkstra's algorithm.
For a user-specified given source, Dijkstra's algorithm
computes shortest paths to all other nodes in an edge-weighted graph
\parencite{Dijkstra1959}.
Even when using Dijkstra's algorithm
to compute only a single shortest path to a user-specified destination,
a large number of nodes need to be explored.
The same holds for shortest-path algorithms
that make use of a topological order of the nodes
in a directed acyclic graph.
Compared to these algorithms,
the \Astar algorithm can greatly reduce the number of explored nodes.
The challenge in our work is to tune the \Astar algorithm such that
it explores only a small fraction of the graph.
\section{Cost Functions}
\label{sec:AreaAgg_CostFunctions}
\fig\ref{fig:AreaAgg_SubdivisionName} shows that there are many
ways to aggregate from the start map to the goal map.
Apparently, some of the ways are more reasonable than others.
In our example, we consider
sequence~$(P_{1,1}, P_{2,1},P_{3,1},P_{4,1})$
more reasonable than
sequence~$(P_{1,1}, P_{2,4},P_{3,5},P_{4,1})$.
This is because that the dark area should not expand
when the target color is light gray.
We want to provide map users with a most reasonable sequence
because we believe that an unreasonable sequence irritates users.
To find a most reasonable sequence, we introduce cost functions.
In the cost functions, we charge a higher penalty
when an aggregation step is less reasonable.
As a result, by minimizing the overall cost of an aggregation
sequence, we find a most reasonable sequence.
It is difficult to define \emph{reasonableness}
because users may have different preferences.
Four preferences have been discussed by
\textcite{Cheng2006}; see \fig\ref{fig:AreaAgg_Preferences}.
A small land-cover area can be aggregated into another area
that isolates the area
(\fig\ref{fig:AreaAgg_Preferences}b),
that is the largest neighbor
(\fig\ref{fig:AreaAgg_Preferences}c),
that shares the longest boundary
(\fig\ref{fig:AreaAgg_Preferences}d), or
that has the most similar type
(\fig\ref{fig:AreaAgg_Preferences}e).
To keep our aggregation problem independent of users' preferences,
our cost function takes two aspects into account:
one based on semantics and
the other based on shape.
%
In terms of semantics, we require that
the \emph{type} of a land-cover area changes
as little as possible.
This requirement means that we prefer, for example,
aggregating an area with type \emph{swamp}
into an area with type \emph{wet ground}
rather than into an area with type \emph{city}.
%
In terms of shape, we hope to have areas
which are as \emph{compact} as possible.
Our argument is that an area is easier
to be identified by a human being
if it is more compact (less clutter).
%
We also consider the total \emph{length}
of the interior boundaries as an alternative compactness;
we consider subdivision~\Pnode
more compact than subdivision~$P_{t,i'}$
if the total length of the interior boundaries of~\Pnode
is less than that of~$P_{t,i'}$.
We add this alternative because we want to make a comparison
involving an ILP,
where a \emph{linear} cost function must be used.
Note that most compactness measures
are \emph{not} linear;
for example, see \textcite{Maceachren1985,Li2013Compactness}.
Although the length of interior boundaries
is not sufficiently precise
to describe compactness \parencite{Young1988},
it is often used as a fair alternative
when compactness is considered in an ILP
\parencite[e.g.,][]{Minas2016,Wright1983}.
The ILP of \textcite{HaunertWolff2010AreaAgg} employed
the centroids of a set of land-cover areas.
One of their costs is the sum of the distances
from the centroids to a \emph{reference point}.
The reference point is one of the centroids
that minimizes the sum.
The sum of the distances can be computed linearly.
We use the length of interior boundaries
instead of the distance of centroids because the former
is more relevant to the shapes of the patches.
Furthermore, \textcite{Harrie2015Readability} showed that
longer lines generally yielded lower map readability.
\begin{figure}[tb]
\centering
\includegraphics[page=1]{AreaAgg_CostsAndEstimations}
\caption{Aggregating land-cover areas
according to different preferences
proposed by \textcite{Cheng2006}:
Aggregating a small land-cover area into another one
that isolates the area (b),
that is the largest neighbor (c),
that shares the longest boundary (d), or
that has the most similar type (e).}
\label{fig:AreaAgg_Preferences}
\end{figure}
\subsection{Cost of type change}
\label{sec:AreaAgg_f_type}
Suppose that we are at the step of aggregating
from subdivision~$P_{s,i}$ to subdivision~$P_{s+1,j}$.
In this step, patch~$u$ is aggregated into patch~$v$
(see \figs\ref{fig:AreaAgg_FirstStep}a
and \ref{fig:AreaAgg_FirstStep}b).
We denote the types of the two patches by~$T(u)$ and~$T(v)$.
We define the cost of type change of this step by
\begin{equation}
\label{eq:f_type}
f_\mathrm{type}(P_{s,i},P_{s+1,j})=\frac{A_{u}}{A_R}
\cdot
\frac{d_\mathrm{type}(T(u),T(v))}{d_\mathrm{type\_max}},
\end{equation}
where~$A_u$ is the area of patch~$u$,
and~$A_R$ is the area of region~$R$
(see \sect\ref{sec:AreaAgg_Preliminaries}).
We use~$A_R$ and~$d_\mathrm{type\_max}$
to normalize the cost of type change.