forked from llvm-mirror/llvm
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ProgrammersManual.html
4061 lines (3305 loc) · 154 KB
/
ProgrammersManual.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8">
<title>LLVM Programmer's Manual</title>
<link rel="stylesheet" href="llvm.css" type="text/css">
</head>
<body>
<h1>
LLVM Programmer's Manual
</h1>
<ol>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#general">General Information</a>
<ul>
<li><a href="#stl">The C++ Standard Template Library</a></li>
<!--
<li>The <tt>-time-passes</tt> option</li>
<li>How to use the LLVM Makefile system</li>
<li>How to write a regression test</li>
-->
</ul>
</li>
<li><a href="#apis">Important and useful LLVM APIs</a>
<ul>
<li><a href="#isa">The <tt>isa<></tt>, <tt>cast<></tt>
and <tt>dyn_cast<></tt> templates</a> </li>
<li><a href="#string_apis">Passing strings (the <tt>StringRef</tt>
and <tt>Twine</tt> classes)</a>
<ul>
<li><a href="#StringRef">The <tt>StringRef</tt> class</a> </li>
<li><a href="#Twine">The <tt>Twine</tt> class</a> </li>
</ul>
</li>
<li><a href="#DEBUG">The <tt>DEBUG()</tt> macro and <tt>-debug</tt>
option</a>
<ul>
<li><a href="#DEBUG_TYPE">Fine grained debug info with <tt>DEBUG_TYPE</tt>
and the <tt>-debug-only</tt> option</a> </li>
</ul>
</li>
<li><a href="#Statistic">The <tt>Statistic</tt> class & <tt>-stats</tt>
option</a></li>
<!--
<li>The <tt>InstVisitor</tt> template
<li>The general graph API
-->
<li><a href="#ViewGraph">Viewing graphs while debugging code</a></li>
</ul>
</li>
<li><a href="#datastructure">Picking the Right Data Structure for a Task</a>
<ul>
<li><a href="#ds_sequential">Sequential Containers (std::vector, std::list, etc)</a>
<ul>
<li><a href="#dss_arrayref">llvm/ADT/ArrayRef.h</a></li>
<li><a href="#dss_fixedarrays">Fixed Size Arrays</a></li>
<li><a href="#dss_heaparrays">Heap Allocated Arrays</a></li>
<li><a href="#dss_tinyptrvector">"llvm/ADT/TinyPtrVector.h"</a></li>
<li><a href="#dss_smallvector">"llvm/ADT/SmallVector.h"</a></li>
<li><a href="#dss_vector"><vector></a></li>
<li><a href="#dss_deque"><deque></a></li>
<li><a href="#dss_list"><list></a></li>
<li><a href="#dss_ilist">llvm/ADT/ilist.h</a></li>
<li><a href="#dss_packedvector">llvm/ADT/PackedVector.h</a></li>
<li><a href="#dss_other">Other Sequential Container Options</a></li>
</ul></li>
<li><a href="#ds_string">String-like containers</a>
<ul>
<li><a href="#dss_stringref">llvm/ADT/StringRef.h</a></li>
<li><a href="#dss_twine">llvm/ADT/Twine.h</a></li>
<li><a href="#dss_smallstring">llvm/ADT/SmallString.h</a></li>
<li><a href="#dss_stdstring">std::string</a></li>
</ul></li>
<li><a href="#ds_set">Set-Like Containers (std::set, SmallSet, SetVector, etc)</a>
<ul>
<li><a href="#dss_sortedvectorset">A sorted 'vector'</a></li>
<li><a href="#dss_smallset">"llvm/ADT/SmallSet.h"</a></li>
<li><a href="#dss_smallptrset">"llvm/ADT/SmallPtrSet.h"</a></li>
<li><a href="#dss_denseset">"llvm/ADT/DenseSet.h"</a></li>
<li><a href="#dss_FoldingSet">"llvm/ADT/FoldingSet.h"</a></li>
<li><a href="#dss_set"><set></a></li>
<li><a href="#dss_setvector">"llvm/ADT/SetVector.h"</a></li>
<li><a href="#dss_uniquevector">"llvm/ADT/UniqueVector.h"</a></li>
<li><a href="#dss_otherset">Other Set-Like ContainerOptions</a></li>
</ul></li>
<li><a href="#ds_map">Map-Like Containers (std::map, DenseMap, etc)</a>
<ul>
<li><a href="#dss_sortedvectormap">A sorted 'vector'</a></li>
<li><a href="#dss_stringmap">"llvm/ADT/StringMap.h"</a></li>
<li><a href="#dss_indexedmap">"llvm/ADT/IndexedMap.h"</a></li>
<li><a href="#dss_densemap">"llvm/ADT/DenseMap.h"</a></li>
<li><a href="#dss_valuemap">"llvm/ADT/ValueMap.h"</a></li>
<li><a href="#dss_intervalmap">"llvm/ADT/IntervalMap.h"</a></li>
<li><a href="#dss_map"><map></a></li>
<li><a href="#dss_inteqclasses">"llvm/ADT/IntEqClasses.h"</a></li>
<li><a href="#dss_othermap">Other Map-Like Container Options</a></li>
</ul></li>
<li><a href="#ds_bit">BitVector-like containers</a>
<ul>
<li><a href="#dss_bitvector">A dense bitvector</a></li>
<li><a href="#dss_smallbitvector">A "small" dense bitvector</a></li>
<li><a href="#dss_sparsebitvector">A sparse bitvector</a></li>
</ul></li>
</ul>
</li>
<li><a href="#common">Helpful Hints for Common Operations</a>
<ul>
<li><a href="#inspection">Basic Inspection and Traversal Routines</a>
<ul>
<li><a href="#iterate_function">Iterating over the <tt>BasicBlock</tt>s
in a <tt>Function</tt></a> </li>
<li><a href="#iterate_basicblock">Iterating over the <tt>Instruction</tt>s
in a <tt>BasicBlock</tt></a> </li>
<li><a href="#iterate_institer">Iterating over the <tt>Instruction</tt>s
in a <tt>Function</tt></a> </li>
<li><a href="#iterate_convert">Turning an iterator into a
class pointer</a> </li>
<li><a href="#iterate_complex">Finding call sites: a more
complex example</a> </li>
<li><a href="#calls_and_invokes">Treating calls and invokes
the same way</a> </li>
<li><a href="#iterate_chains">Iterating over def-use &
use-def chains</a> </li>
<li><a href="#iterate_preds">Iterating over predecessors &
successors of blocks</a></li>
</ul>
</li>
<li><a href="#simplechanges">Making simple changes</a>
<ul>
<li><a href="#schanges_creating">Creating and inserting new
<tt>Instruction</tt>s</a> </li>
<li><a href="#schanges_deleting">Deleting <tt>Instruction</tt>s</a> </li>
<li><a href="#schanges_replacing">Replacing an <tt>Instruction</tt>
with another <tt>Value</tt></a> </li>
<li><a href="#schanges_deletingGV">Deleting <tt>GlobalVariable</tt>s</a> </li>
</ul>
</li>
<li><a href="#create_types">How to Create Types</a></li>
<!--
<li>Working with the Control Flow Graph
<ul>
<li>Accessing predecessors and successors of a <tt>BasicBlock</tt>
<li>
<li>
</ul>
-->
</ul>
</li>
<li><a href="#threading">Threads and LLVM</a>
<ul>
<li><a href="#startmultithreaded">Entering and Exiting Multithreaded Mode
</a></li>
<li><a href="#shutdown">Ending execution with <tt>llvm_shutdown()</tt></a></li>
<li><a href="#managedstatic">Lazy initialization with <tt>ManagedStatic</tt></a></li>
<li><a href="#llvmcontext">Achieving Isolation with <tt>LLVMContext</tt></a></li>
<li><a href="#jitthreading">Threads and the JIT</a></li>
</ul>
</li>
<li><a href="#advanced">Advanced Topics</a>
<ul>
<li><a href="#SymbolTable">The <tt>ValueSymbolTable</tt> class</a></li>
<li><a href="#UserLayout">The <tt>User</tt> and owned <tt>Use</tt> classes' memory layout</a></li>
</ul></li>
<li><a href="#coreclasses">The Core LLVM Class Hierarchy Reference</a>
<ul>
<li><a href="#Type">The <tt>Type</tt> class</a> </li>
<li><a href="#Module">The <tt>Module</tt> class</a></li>
<li><a href="#Value">The <tt>Value</tt> class</a>
<ul>
<li><a href="#User">The <tt>User</tt> class</a>
<ul>
<li><a href="#Instruction">The <tt>Instruction</tt> class</a></li>
<li><a href="#Constant">The <tt>Constant</tt> class</a>
<ul>
<li><a href="#GlobalValue">The <tt>GlobalValue</tt> class</a>
<ul>
<li><a href="#Function">The <tt>Function</tt> class</a></li>
<li><a href="#GlobalVariable">The <tt>GlobalVariable</tt> class</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li><a href="#BasicBlock">The <tt>BasicBlock</tt> class</a></li>
<li><a href="#Argument">The <tt>Argument</tt> class</a></li>
</ul>
</li>
</ul>
</li>
</ol>
<div class="doc_author">
<p>Written by <a href="mailto:[email protected]">Chris Lattner</a>,
<a href="mailto:[email protected]">Dinakar Dhurjati</a>,
<a href="mailto:[email protected]">Gabor Greif</a>,
<a href="mailto:[email protected]">Joel Stanley</a>,
<a href="mailto:[email protected]">Reid Spencer</a> and
<a href="mailto:[email protected]">Owen Anderson</a></p>
</div>
<!-- *********************************************************************** -->
<h2>
<a name="introduction">Introduction </a>
</h2>
<!-- *********************************************************************** -->
<div>
<p>This document is meant to highlight some of the important classes and
interfaces available in the LLVM source-base. This manual is not
intended to explain what LLVM is, how it works, and what LLVM code looks
like. It assumes that you know the basics of LLVM and are interested
in writing transformations or otherwise analyzing or manipulating the
code.</p>
<p>This document should get you oriented so that you can find your
way in the continuously growing source code that makes up the LLVM
infrastructure. Note that this manual is not intended to serve as a
replacement for reading the source code, so if you think there should be
a method in one of these classes to do something, but it's not listed,
check the source. Links to the <a href="/doxygen/">doxygen</a> sources
are provided to make this as easy as possible.</p>
<p>The first section of this document describes general information that is
useful to know when working in the LLVM infrastructure, and the second describes
the Core LLVM classes. In the future this manual will be extended with
information describing how to use extension libraries, such as dominator
information, CFG traversal routines, and useful utilities like the <tt><a
href="/doxygen/InstVisitor_8h-source.html">InstVisitor</a></tt> template.</p>
</div>
<!-- *********************************************************************** -->
<h2>
<a name="general">General Information</a>
</h2>
<!-- *********************************************************************** -->
<div>
<p>This section contains general information that is useful if you are working
in the LLVM source-base, but that isn't specific to any particular API.</p>
<!-- ======================================================================= -->
<h3>
<a name="stl">The C++ Standard Template Library</a>
</h3>
<div>
<p>LLVM makes heavy use of the C++ Standard Template Library (STL),
perhaps much more than you are used to, or have seen before. Because of
this, you might want to do a little background reading in the
techniques used and capabilities of the library. There are many good
pages that discuss the STL, and several books on the subject that you
can get, so it will not be discussed in this document.</p>
<p>Here are some useful links:</p>
<ol>
<li><a href="http://www.dinkumware.com/manuals/#Standard C++ Library">Dinkumware
C++ Library reference</a> - an excellent reference for the STL and other parts
of the standard C++ library.</li>
<li><a href="http://www.tempest-sw.com/cpp/">C++ In a Nutshell</a> - This is an
O'Reilly book in the making. It has a decent Standard Library
Reference that rivals Dinkumware's, and is unfortunately no longer free since the
book has been published.</li>
<li><a href="http://www.parashift.com/c++-faq-lite/">C++ Frequently Asked
Questions</a></li>
<li><a href="http://www.sgi.com/tech/stl/">SGI's STL Programmer's Guide</a> -
Contains a useful <a
href="http://www.sgi.com/tech/stl/stl_introduction.html">Introduction to the
STL</a>.</li>
<li><a href="http://www.research.att.com/%7Ebs/C++.html">Bjarne Stroustrup's C++
Page</a></li>
<li><a href="http://64.78.49.204/">
Bruce Eckel's Thinking in C++, 2nd ed. Volume 2 Revision 4.0 (even better, get
the book).</a></li>
</ol>
<p>You are also encouraged to take a look at the <a
href="CodingStandards.html">LLVM Coding Standards</a> guide which focuses on how
to write maintainable code more than where to put your curly braces.</p>
</div>
<!-- ======================================================================= -->
<h3>
<a name="stl">Other useful references</a>
</h3>
<div>
<ol>
<li><a href="http://www.fortran-2000.com/ArnaudRecipes/sharedlib.html">Using
static and shared libraries across platforms</a></li>
</ol>
</div>
</div>
<!-- *********************************************************************** -->
<h2>
<a name="apis">Important and useful LLVM APIs</a>
</h2>
<!-- *********************************************************************** -->
<div>
<p>Here we highlight some LLVM APIs that are generally useful and good to
know about when writing transformations.</p>
<!-- ======================================================================= -->
<h3>
<a name="isa">The <tt>isa<></tt>, <tt>cast<></tt> and
<tt>dyn_cast<></tt> templates</a>
</h3>
<div>
<p>The LLVM source-base makes extensive use of a custom form of RTTI.
These templates have many similarities to the C++ <tt>dynamic_cast<></tt>
operator, but they don't have some drawbacks (primarily stemming from
the fact that <tt>dynamic_cast<></tt> only works on classes that
have a v-table). Because they are used so often, you must know what they
do and how they work. All of these templates are defined in the <a
href="/doxygen/Casting_8h-source.html"><tt>llvm/Support/Casting.h</tt></a>
file (note that you very rarely have to include this file directly).</p>
<dl>
<dt><tt>isa<></tt>: </dt>
<dd><p>The <tt>isa<></tt> operator works exactly like the Java
"<tt>instanceof</tt>" operator. It returns true or false depending on whether
a reference or pointer points to an instance of the specified class. This can
be very useful for constraint checking of various sorts (example below).</p>
</dd>
<dt><tt>cast<></tt>: </dt>
<dd><p>The <tt>cast<></tt> operator is a "checked cast" operation. It
converts a pointer or reference from a base class to a derived class, causing
an assertion failure if it is not really an instance of the right type. This
should be used in cases where you have some information that makes you believe
that something is of the right type. An example of the <tt>isa<></tt>
and <tt>cast<></tt> template is:</p>
<div class="doc_code">
<pre>
static bool isLoopInvariant(const <a href="#Value">Value</a> *V, const Loop *L) {
if (isa<<a href="#Constant">Constant</a>>(V) || isa<<a href="#Argument">Argument</a>>(V) || isa<<a href="#GlobalValue">GlobalValue</a>>(V))
return true;
// <i>Otherwise, it must be an instruction...</i>
return !L->contains(cast<<a href="#Instruction">Instruction</a>>(V)->getParent());
}
</pre>
</div>
<p>Note that you should <b>not</b> use an <tt>isa<></tt> test followed
by a <tt>cast<></tt>, for that use the <tt>dyn_cast<></tt>
operator.</p>
</dd>
<dt><tt>dyn_cast<></tt>:</dt>
<dd><p>The <tt>dyn_cast<></tt> operator is a "checking cast" operation.
It checks to see if the operand is of the specified type, and if so, returns a
pointer to it (this operator does not work with references). If the operand is
not of the correct type, a null pointer is returned. Thus, this works very
much like the <tt>dynamic_cast<></tt> operator in C++, and should be
used in the same circumstances. Typically, the <tt>dyn_cast<></tt>
operator is used in an <tt>if</tt> statement or some other flow control
statement like this:</p>
<div class="doc_code">
<pre>
if (<a href="#AllocationInst">AllocationInst</a> *AI = dyn_cast<<a href="#AllocationInst">AllocationInst</a>>(Val)) {
// <i>...</i>
}
</pre>
</div>
<p>This form of the <tt>if</tt> statement effectively combines together a call
to <tt>isa<></tt> and a call to <tt>cast<></tt> into one
statement, which is very convenient.</p>
<p>Note that the <tt>dyn_cast<></tt> operator, like C++'s
<tt>dynamic_cast<></tt> or Java's <tt>instanceof</tt> operator, can be
abused. In particular, you should not use big chained <tt>if/then/else</tt>
blocks to check for lots of different variants of classes. If you find
yourself wanting to do this, it is much cleaner and more efficient to use the
<tt>InstVisitor</tt> class to dispatch over the instruction type directly.</p>
</dd>
<dt><tt>cast_or_null<></tt>: </dt>
<dd><p>The <tt>cast_or_null<></tt> operator works just like the
<tt>cast<></tt> operator, except that it allows for a null pointer as an
argument (which it then propagates). This can sometimes be useful, allowing
you to combine several null checks into one.</p></dd>
<dt><tt>dyn_cast_or_null<></tt>: </dt>
<dd><p>The <tt>dyn_cast_or_null<></tt> operator works just like the
<tt>dyn_cast<></tt> operator, except that it allows for a null pointer
as an argument (which it then propagates). This can sometimes be useful,
allowing you to combine several null checks into one.</p></dd>
</dl>
<p>These five templates can be used with any classes, whether they have a
v-table or not. To add support for these templates, you simply need to add
<tt>classof</tt> static methods to the class you are interested casting
to. Describing this is currently outside the scope of this document, but there
are lots of examples in the LLVM source base.</p>
</div>
<!-- ======================================================================= -->
<h3>
<a name="string_apis">Passing strings (the <tt>StringRef</tt>
and <tt>Twine</tt> classes)</a>
</h3>
<div>
<p>Although LLVM generally does not do much string manipulation, we do have
several important APIs which take strings. Two important examples are the
Value class -- which has names for instructions, functions, etc. -- and the
StringMap class which is used extensively in LLVM and Clang.</p>
<p>These are generic classes, and they need to be able to accept strings which
may have embedded null characters. Therefore, they cannot simply take
a <tt>const char *</tt>, and taking a <tt>const std::string&</tt> requires
clients to perform a heap allocation which is usually unnecessary. Instead,
many LLVM APIs use a <tt>StringRef</tt> or a <tt>const Twine&</tt> for
passing strings efficiently.</p>
<!-- _______________________________________________________________________ -->
<h4>
<a name="StringRef">The <tt>StringRef</tt> class</a>
</h4>
<div>
<p>The <tt>StringRef</tt> data type represents a reference to a constant string
(a character array and a length) and supports the common operations available
on <tt>std:string</tt>, but does not require heap allocation.</p>
<p>It can be implicitly constructed using a C style null-terminated string,
an <tt>std::string</tt>, or explicitly with a character pointer and length.
For example, the <tt>StringRef</tt> find function is declared as:</p>
<pre class="doc_code">
iterator find(StringRef Key);
</pre>
<p>and clients can call it using any one of:</p>
<pre class="doc_code">
Map.find("foo"); <i>// Lookup "foo"</i>
Map.find(std::string("bar")); <i>// Lookup "bar"</i>
Map.find(StringRef("\0baz", 4)); <i>// Lookup "\0baz"</i>
</pre>
<p>Similarly, APIs which need to return a string may return a <tt>StringRef</tt>
instance, which can be used directly or converted to an <tt>std::string</tt>
using the <tt>str</tt> member function. See
"<tt><a href="/doxygen/classllvm_1_1StringRef_8h-source.html">llvm/ADT/StringRef.h</a></tt>"
for more information.</p>
<p>You should rarely use the <tt>StringRef</tt> class directly, because it contains
pointers to external memory it is not generally safe to store an instance of the
class (unless you know that the external storage will not be freed). StringRef is
small and pervasive enough in LLVM that it should always be passed by value.</p>
</div>
<!-- _______________________________________________________________________ -->
<h4>
<a name="Twine">The <tt>Twine</tt> class</a>
</h4>
<div>
<p>The <tt>Twine</tt> class is an efficient way for APIs to accept concatenated
strings. For example, a common LLVM paradigm is to name one instruction based on
the name of another instruction with a suffix, for example:</p>
<div class="doc_code">
<pre>
New = CmpInst::Create(<i>...</i>, SO->getName() + ".cmp");
</pre>
</div>
<p>The <tt>Twine</tt> class is effectively a
lightweight <a href="http://en.wikipedia.org/wiki/Rope_(computer_science)">rope</a>
which points to temporary (stack allocated) objects. Twines can be implicitly
constructed as the result of the plus operator applied to strings (i.e., a C
strings, an <tt>std::string</tt>, or a <tt>StringRef</tt>). The twine delays the
actual concatenation of strings until it is actually required, at which point
it can be efficiently rendered directly into a character array. This avoids
unnecessary heap allocation involved in constructing the temporary results of
string concatenation. See
"<tt><a href="/doxygen/classllvm_1_1Twine_8h-source.html">llvm/ADT/Twine.h</a></tt>"
for more information.</p>
<p>As with a <tt>StringRef</tt>, <tt>Twine</tt> objects point to external memory
and should almost never be stored or mentioned directly. They are intended
solely for use when defining a function which should be able to efficiently
accept concatenated strings.</p>
</div>
</div>
<!-- ======================================================================= -->
<h3>
<a name="DEBUG">The <tt>DEBUG()</tt> macro and <tt>-debug</tt> option</a>
</h3>
<div>
<p>Often when working on your pass you will put a bunch of debugging printouts
and other code into your pass. After you get it working, you want to remove
it, but you may need it again in the future (to work out new bugs that you run
across).</p>
<p> Naturally, because of this, you don't want to delete the debug printouts,
but you don't want them to always be noisy. A standard compromise is to comment
them out, allowing you to enable them if you need them in the future.</p>
<p>The "<tt><a href="/doxygen/Debug_8h-source.html">llvm/Support/Debug.h</a></tt>"
file provides a macro named <tt>DEBUG()</tt> that is a much nicer solution to
this problem. Basically, you can put arbitrary code into the argument of the
<tt>DEBUG</tt> macro, and it is only executed if '<tt>opt</tt>' (or any other
tool) is run with the '<tt>-debug</tt>' command line argument:</p>
<div class="doc_code">
<pre>
DEBUG(errs() << "I am here!\n");
</pre>
</div>
<p>Then you can run your pass like this:</p>
<div class="doc_code">
<pre>
$ opt < a.bc > /dev/null -mypass
<i><no output></i>
$ opt < a.bc > /dev/null -mypass -debug
I am here!
</pre>
</div>
<p>Using the <tt>DEBUG()</tt> macro instead of a home-brewed solution allows you
to not have to create "yet another" command line option for the debug output for
your pass. Note that <tt>DEBUG()</tt> macros are disabled for optimized builds,
so they do not cause a performance impact at all (for the same reason, they
should also not contain side-effects!).</p>
<p>One additional nice thing about the <tt>DEBUG()</tt> macro is that you can
enable or disable it directly in gdb. Just use "<tt>set DebugFlag=0</tt>" or
"<tt>set DebugFlag=1</tt>" from the gdb if the program is running. If the
program hasn't been started yet, you can always just run it with
<tt>-debug</tt>.</p>
<!-- _______________________________________________________________________ -->
<h4>
<a name="DEBUG_TYPE">Fine grained debug info with <tt>DEBUG_TYPE</tt> and
the <tt>-debug-only</tt> option</a>
</h4>
<div>
<p>Sometimes you may find yourself in a situation where enabling <tt>-debug</tt>
just turns on <b>too much</b> information (such as when working on the code
generator). If you want to enable debug information with more fine-grained
control, you define the <tt>DEBUG_TYPE</tt> macro and the <tt>-debug</tt> only
option as follows:</p>
<div class="doc_code">
<pre>
#undef DEBUG_TYPE
DEBUG(errs() << "No debug type\n");
#define DEBUG_TYPE "foo"
DEBUG(errs() << "'foo' debug type\n");
#undef DEBUG_TYPE
#define DEBUG_TYPE "bar"
DEBUG(errs() << "'bar' debug type\n"));
#undef DEBUG_TYPE
#define DEBUG_TYPE ""
DEBUG(errs() << "No debug type (2)\n");
</pre>
</div>
<p>Then you can run your pass like this:</p>
<div class="doc_code">
<pre>
$ opt < a.bc > /dev/null -mypass
<i><no output></i>
$ opt < a.bc > /dev/null -mypass -debug
No debug type
'foo' debug type
'bar' debug type
No debug type (2)
$ opt < a.bc > /dev/null -mypass -debug-only=foo
'foo' debug type
$ opt < a.bc > /dev/null -mypass -debug-only=bar
'bar' debug type
</pre>
</div>
<p>Of course, in practice, you should only set <tt>DEBUG_TYPE</tt> at the top of
a file, to specify the debug type for the entire module (if you do this before
you <tt>#include "llvm/Support/Debug.h"</tt>, you don't have to insert the ugly
<tt>#undef</tt>'s). Also, you should use names more meaningful than "foo" and
"bar", because there is no system in place to ensure that names do not
conflict. If two different modules use the same string, they will all be turned
on when the name is specified. This allows, for example, all debug information
for instruction scheduling to be enabled with <tt>-debug-type=InstrSched</tt>,
even if the source lives in multiple files.</p>
<p>The <tt>DEBUG_WITH_TYPE</tt> macro is also available for situations where you
would like to set <tt>DEBUG_TYPE</tt>, but only for one specific <tt>DEBUG</tt>
statement. It takes an additional first parameter, which is the type to use. For
example, the preceding example could be written as:</p>
<div class="doc_code">
<pre>
DEBUG_WITH_TYPE("", errs() << "No debug type\n");
DEBUG_WITH_TYPE("foo", errs() << "'foo' debug type\n");
DEBUG_WITH_TYPE("bar", errs() << "'bar' debug type\n"));
DEBUG_WITH_TYPE("", errs() << "No debug type (2)\n");
</pre>
</div>
</div>
</div>
<!-- ======================================================================= -->
<h3>
<a name="Statistic">The <tt>Statistic</tt> class & <tt>-stats</tt>
option</a>
</h3>
<div>
<p>The "<tt><a
href="/doxygen/Statistic_8h-source.html">llvm/ADT/Statistic.h</a></tt>" file
provides a class named <tt>Statistic</tt> that is used as a unified way to
keep track of what the LLVM compiler is doing and how effective various
optimizations are. It is useful to see what optimizations are contributing to
making a particular program run faster.</p>
<p>Often you may run your pass on some big program, and you're interested to see
how many times it makes a certain transformation. Although you can do this with
hand inspection, or some ad-hoc method, this is a real pain and not very useful
for big programs. Using the <tt>Statistic</tt> class makes it very easy to
keep track of this information, and the calculated information is presented in a
uniform manner with the rest of the passes being executed.</p>
<p>There are many examples of <tt>Statistic</tt> uses, but the basics of using
it are as follows:</p>
<ol>
<li><p>Define your statistic like this:</p>
<div class="doc_code">
<pre>
#define <a href="#DEBUG_TYPE">DEBUG_TYPE</a> "mypassname" <i>// This goes before any #includes.</i>
STATISTIC(NumXForms, "The # of times I did stuff");
</pre>
</div>
<p>The <tt>STATISTIC</tt> macro defines a static variable, whose name is
specified by the first argument. The pass name is taken from the DEBUG_TYPE
macro, and the description is taken from the second argument. The variable
defined ("NumXForms" in this case) acts like an unsigned integer.</p></li>
<li><p>Whenever you make a transformation, bump the counter:</p>
<div class="doc_code">
<pre>
++NumXForms; // <i>I did stuff!</i>
</pre>
</div>
</li>
</ol>
<p>That's all you have to do. To get '<tt>opt</tt>' to print out the
statistics gathered, use the '<tt>-stats</tt>' option:</p>
<div class="doc_code">
<pre>
$ opt -stats -mypassname < program.bc > /dev/null
<i>... statistics output ...</i>
</pre>
</div>
<p> When running <tt>opt</tt> on a C file from the SPEC benchmark
suite, it gives a report that looks like this:</p>
<div class="doc_code">
<pre>
7646 bitcodewriter - Number of normal instructions
725 bitcodewriter - Number of oversized instructions
129996 bitcodewriter - Number of bitcode bytes written
2817 raise - Number of insts DCEd or constprop'd
3213 raise - Number of cast-of-self removed
5046 raise - Number of expression trees converted
75 raise - Number of other getelementptr's formed
138 raise - Number of load/store peepholes
42 deadtypeelim - Number of unused typenames removed from symtab
392 funcresolve - Number of varargs functions resolved
27 globaldce - Number of global variables removed
2 adce - Number of basic blocks removed
134 cee - Number of branches revectored
49 cee - Number of setcc instruction eliminated
532 gcse - Number of loads removed
2919 gcse - Number of instructions removed
86 indvars - Number of canonical indvars added
87 indvars - Number of aux indvars removed
25 instcombine - Number of dead inst eliminate
434 instcombine - Number of insts combined
248 licm - Number of load insts hoisted
1298 licm - Number of insts hoisted to a loop pre-header
3 licm - Number of insts hoisted to multiple loop preds (bad, no loop pre-header)
75 mem2reg - Number of alloca's promoted
1444 cfgsimplify - Number of blocks simplified
</pre>
</div>
<p>Obviously, with so many optimizations, having a unified framework for this
stuff is very nice. Making your pass fit well into the framework makes it more
maintainable and useful.</p>
</div>
<!-- ======================================================================= -->
<h3>
<a name="ViewGraph">Viewing graphs while debugging code</a>
</h3>
<div>
<p>Several of the important data structures in LLVM are graphs: for example
CFGs made out of LLVM <a href="#BasicBlock">BasicBlock</a>s, CFGs made out of
LLVM <a href="CodeGenerator.html#machinebasicblock">MachineBasicBlock</a>s, and
<a href="CodeGenerator.html#selectiondag_intro">Instruction Selection
DAGs</a>. In many cases, while debugging various parts of the compiler, it is
nice to instantly visualize these graphs.</p>
<p>LLVM provides several callbacks that are available in a debug build to do
exactly that. If you call the <tt>Function::viewCFG()</tt> method, for example,
the current LLVM tool will pop up a window containing the CFG for the function
where each basic block is a node in the graph, and each node contains the
instructions in the block. Similarly, there also exists
<tt>Function::viewCFGOnly()</tt> (does not include the instructions), the
<tt>MachineFunction::viewCFG()</tt> and <tt>MachineFunction::viewCFGOnly()</tt>,
and the <tt>SelectionDAG::viewGraph()</tt> methods. Within GDB, for example,
you can usually use something like <tt>call DAG.viewGraph()</tt> to pop
up a window. Alternatively, you can sprinkle calls to these functions in your
code in places you want to debug.</p>
<p>Getting this to work requires a small amount of configuration. On Unix
systems with X11, install the <a href="http://www.graphviz.org">graphviz</a>
toolkit, and make sure 'dot' and 'gv' are in your path. If you are running on
Mac OS/X, download and install the Mac OS/X <a
href="http://www.pixelglow.com/graphviz/">Graphviz program</a>, and add
<tt>/Applications/Graphviz.app/Contents/MacOS/</tt> (or wherever you install
it) to your path. Once in your system and path are set up, rerun the LLVM
configure script and rebuild LLVM to enable this functionality.</p>
<p><tt>SelectionDAG</tt> has been extended to make it easier to locate
<i>interesting</i> nodes in large complex graphs. From gdb, if you
<tt>call DAG.setGraphColor(<i>node</i>, "<i>color</i>")</tt>, then the
next <tt>call DAG.viewGraph()</tt> would highlight the node in the
specified color (choices of colors can be found at <a
href="http://www.graphviz.org/doc/info/colors.html">colors</a>.) More
complex node attributes can be provided with <tt>call
DAG.setGraphAttrs(<i>node</i>, "<i>attributes</i>")</tt> (choices can be
found at <a href="http://www.graphviz.org/doc/info/attrs.html">Graph
Attributes</a>.) If you want to restart and clear all the current graph
attributes, then you can <tt>call DAG.clearGraphAttrs()</tt>. </p>
<p>Note that graph visualization features are compiled out of Release builds
to reduce file size. This means that you need a Debug+Asserts or
Release+Asserts build to use these features.</p>
</div>
</div>
<!-- *********************************************************************** -->
<h2>
<a name="datastructure">Picking the Right Data Structure for a Task</a>
</h2>
<!-- *********************************************************************** -->
<div>
<p>LLVM has a plethora of data structures in the <tt>llvm/ADT/</tt> directory,
and we commonly use STL data structures. This section describes the trade-offs
you should consider when you pick one.</p>
<p>
The first step is a choose your own adventure: do you want a sequential
container, a set-like container, or a map-like container? The most important
thing when choosing a container is the algorithmic properties of how you plan to
access the container. Based on that, you should use:</p>
<ul>
<li>a <a href="#ds_map">map-like</a> container if you need efficient look-up
of an value based on another value. Map-like containers also support
efficient queries for containment (whether a key is in the map). Map-like
containers generally do not support efficient reverse mapping (values to
keys). If you need that, use two maps. Some map-like containers also
support efficient iteration through the keys in sorted order. Map-like
containers are the most expensive sort, only use them if you need one of
these capabilities.</li>
<li>a <a href="#ds_set">set-like</a> container if you need to put a bunch of
stuff into a container that automatically eliminates duplicates. Some
set-like containers support efficient iteration through the elements in
sorted order. Set-like containers are more expensive than sequential
containers.
</li>
<li>a <a href="#ds_sequential">sequential</a> container provides
the most efficient way to add elements and keeps track of the order they are
added to the collection. They permit duplicates and support efficient
iteration, but do not support efficient look-up based on a key.
</li>
<li>a <a href="#ds_string">string</a> container is a specialized sequential
container or reference structure that is used for character or byte
arrays.</li>
<li>a <a href="#ds_bit">bit</a> container provides an efficient way to store and
perform set operations on sets of numeric id's, while automatically
eliminating duplicates. Bit containers require a maximum of 1 bit for each
identifier you want to store.
</li>
</ul>
<p>
Once the proper category of container is determined, you can fine tune the
memory use, constant factors, and cache behaviors of access by intelligently
picking a member of the category. Note that constant factors and cache behavior
can be a big deal. If you have a vector that usually only contains a few
elements (but could contain many), for example, it's much better to use
<a href="#dss_smallvector">SmallVector</a> than <a href="#dss_vector">vector</a>
. Doing so avoids (relatively) expensive malloc/free calls, which dwarf the
cost of adding the elements to the container. </p>
</div>
<!-- ======================================================================= -->
<h3>
<a name="ds_sequential">Sequential Containers (std::vector, std::list, etc)</a>
</h3>
<div>
There are a variety of sequential containers available for you, based on your
needs. Pick the first in this section that will do what you want.
<!-- _______________________________________________________________________ -->
<h4>
<a name="dss_arrayref">llvm/ADT/ArrayRef.h</a>
</h4>
<div>
<p>The llvm::ArrayRef class is the preferred class to use in an interface that
accepts a sequential list of elements in memory and just reads from them. By
taking an ArrayRef, the API can be passed a fixed size array, an std::vector,
an llvm::SmallVector and anything else that is contiguous in memory.
</p>
</div>
<!-- _______________________________________________________________________ -->
<h4>
<a name="dss_fixedarrays">Fixed Size Arrays</a>
</h4>
<div>
<p>Fixed size arrays are very simple and very fast. They are good if you know
exactly how many elements you have, or you have a (low) upper bound on how many
you have.</p>
</div>
<!-- _______________________________________________________________________ -->
<h4>
<a name="dss_heaparrays">Heap Allocated Arrays</a>
</h4>
<div>
<p>Heap allocated arrays (new[] + delete[]) are also simple. They are good if
the number of elements is variable, if you know how many elements you will need
before the array is allocated, and if the array is usually large (if not,
consider a <a href="#dss_smallvector">SmallVector</a>). The cost of a heap
allocated array is the cost of the new/delete (aka malloc/free). Also note that
if you are allocating an array of a type with a constructor, the constructor and
destructors will be run for every element in the array (re-sizable vectors only
construct those elements actually used).</p>
</div>
<!-- _______________________________________________________________________ -->
<h4>
<a name="dss_tinyptrvector">"llvm/ADT/TinyPtrVector.h"</a>
</h4>
<div>
<p><tt>TinyPtrVector<Type></tt> is a highly specialized collection class
that is optimized to avoid allocation in the case when a vector has zero or one
elements. It has two major restrictions: 1) it can only hold values of pointer
type, and 2) it cannot hold a null pointer.</p>
<p>Since this container is highly specialized, it is rarely used.</p>
</div>
<!-- _______________________________________________________________________ -->
<h4>
<a name="dss_smallvector">"llvm/ADT/SmallVector.h"</a>
</h4>
<div>
<p><tt>SmallVector<Type, N></tt> is a simple class that looks and smells
just like <tt>vector<Type></tt>:
it supports efficient iteration, lays out elements in memory order (so you can
do pointer arithmetic between elements), supports efficient push_back/pop_back
operations, supports efficient random access to its elements, etc.</p>
<p>The advantage of SmallVector is that it allocates space for
some number of elements (N) <b>in the object itself</b>. Because of this, if
the SmallVector is dynamically smaller than N, no malloc is performed. This can
be a big win in cases where the malloc/free call is far more expensive than the
code that fiddles around with the elements.</p>
<p>This is good for vectors that are "usually small" (e.g. the number of
predecessors/successors of a block is usually less than 8). On the other hand,
this makes the size of the SmallVector itself large, so you don't want to
allocate lots of them (doing so will waste a lot of space). As such,
SmallVectors are most useful when on the stack.</p>
<p>SmallVector also provides a nice portable and efficient replacement for
<tt>alloca</tt>.</p>
</div>
<!-- _______________________________________________________________________ -->
<h4>
<a name="dss_vector"><vector></a>
</h4>
<div>
<p>
std::vector is well loved and respected. It is useful when SmallVector isn't:
when the size of the vector is often large (thus the small optimization will
rarely be a benefit) or if you will be allocating many instances of the vector
itself (which would waste space for elements that aren't in the container).
vector is also useful when interfacing with code that expects vectors :).
</p>
<p>One worthwhile note about std::vector: avoid code like this:</p>
<div class="doc_code">
<pre>
for ( ... ) {
std::vector<foo> V;