forked from rust-lang/rust
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathrust.texi
3830 lines (3109 loc) · 136 KB
/
rust.texi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\input texinfo @c -*-texinfo-*-
@c %**start of header
@setfilename rust.info
@settitle Rust Documentation
@setchapternewpage odd
@c %**end of header
@syncodeindex fn cp
@include version.texi
@ifinfo
This manual is for the ``Rust'' programming language.
@uref{http://github.com/graydon/rust}
Version: @gitversion
Copyright 2006-2010 Graydon Hoare
Copyright 2009-2010 Mozilla Foundation
See accompanying LICENSE.txt for terms.
@end ifinfo
@dircategory Programming
@direntry
* rust: (rust). Rust programming language
@end direntry
@titlepage
@title Rust
@subtitle A safe, concurrent, practical language.
@author Graydon Hoare
@author Mozilla Foundation
@page
@vskip 0pt plus 1filll
@uref{http://github.com/graydon/rust}
Version: @gitversion
@sp 2
Copyright @copyright{} 2006-2010 Graydon Hoare
Copyright @copyright{} 2009-2010 Mozilla Foundation
See accompanying LICENSE.txt for terms.
@end titlepage
@everyfooting @| @emph{-- Draft @today --} @|
@ifnottex
@node Top
@top Top
Rust Documentation
@end ifnottex
@menu
* Disclaimer:: Notes on a work in progress.
* Introduction:: Background, intentions, lineage.
* Tutorial:: Gentle introduction to reading Rust code.
* Reference:: Systematic reference of language elements.
* Index:: Index
@end menu
@ifnottex
Complete table of contents
@end ifnottex
@contents
@c ############################################################
@c Disclaimer
@c ############################################################
@node Disclaimer
@chapter Disclaimer
To the reader,
Rust is a work in progress. The language continues to evolve as the design
shifts and is fleshed out in working code. Certain parts work, certain parts
do not, certain parts will be removed or changed.
This manual is a snapshot written in the present tense. Some features
described do not yet exist in working code. Some may be temporary. It
is a @emph{draft}, and we ask that you not take anything you read here
as either definitive or final. The manual is to help you get a sense
of the language and its organization, not to serve as a complete
specification. At least not yet.
If you have suggestions to make, please try to focus them on @emph{reductions}
to the language: possible features that can be combined or omitted. At this
point, every ``additive'' feature we're likely to support is already on the
table. The task ahead involves combining, trimming, and implementing.
@c ############################################################
@c Introduction
@c ############################################################
@node Introduction
@chapter Introduction
@quotation
We have to fight chaos, and the most effective way of doing that is
to prevent its emergence.
@flushright
- Edsger Dijkstra
@end flushright
@end quotation
@sp 2
Rust is a curly-brace, block-structured expression language. It visually
resembles the C language family, but differs significantly in syntactic and
semantic details. Its design is oriented toward concerns of ``programming in
the large'', that is, of creating and maintaining @emph{boundaries} -- both
abstract and operational -- that preserve large-system @emph{integrity},
@emph{availability} and @emph{concurrency}.
It supports a mixture of imperative procedural, concurrent actor, object
oriented and pure functional styles. Rust also supports generic programming
and metaprogramming, in both static and dynamic styles.
@menu
* Goals:: Intentions, motivations.
* Sales Pitch:: A summary for the impatient.
* Influences:: Relationship to past languages.
@end menu
@node Goals
@section Goals
The language design pursues the following goals:
@sp 1
@itemize
@item Compile-time error detection and prevention.
@item Run-time fault tolerance and containment.
@item System building, analysis and maintenance affordances.
@item Clarity and precision of expression.
@item Implementation simplicity.
@item Run-time efficiency.
@item High concurrency.
@end itemize
@sp 1
Note that most of these goals are @emph{engineering} goals, not showcases for
sophisticated language technology. Most of the technology in Rust is
@emph{old} and has been seen decades earlier in other languages.
All new languages are developed in a technological context. Rust's goals arise
from the context of writing large programs that interact with the internet --
both servers and clients -- and are thus much more concerned with
@emph{safety} and @emph{concurrency} than older generations of program. Our
experience is that these two forces do not conflict; rather they drive system
design decisions toward extensive use of @emph{partitioning} and
@emph{statelessness}. Rust aims to make these a more natural part of writing
programs, within the niche of lower-level, practical, resource-conscious
languages.
@page
@node Sales Pitch
@section Sales Pitch
The following comprises a brief ``sales pitch'' overview of the salient
features of Rust, relative to other languages.
@itemize
@sp 1
@item No @code{null} pointers
The initialization state of every slot is statically computed as part of the
typestate system (see below), and requires that all slots are initialized
before use. There is no @code{null} value; uninitialized slots are
uninitialized, and can only be written to, not read.
The common use for @code{null} in other languages -- as a sentinel value -- is
subsumed into the more general facility of disjoint union types. A program
must explicitly model its use of such types.
@sp 1
@item Lightweight tasks with no shared mutable state
Like many @emph{actor} languages, Rust provides an isolation (and concurrency)
model based on lightweight tasks scheduled by the language runtime. These
tasks are very inexpensive and statically unable to mutate one another's local
memory. Breaking the rule of task isolation is only possible by calling
external (C/C++) code.
Inter-task communication is typed, asynchronous and simplex, based on passing
messages over channels to ports. Transmission can be rate-limited or
rate-unlimited. Selection between multiple senders is pseudo-randomized on the
receiver side.
@sp 1
@item Predictable native code, simple runtime
@cindex DWARF
The meaning and cost of every operation within a Rust program is intended to
be easy to model for the reader. The code should not ``surprise'' the
programmer once it has been compiled.
Rust compiles to native code. Rust compilation units are large and the
compilation model is designed around multi-file, whole-library or
whole-program optimization. The compiled units are standard loadable objects
(ELF, PE, Mach-O) containing standard metadata (DWARF) and are compatible with
existing, standard low-level tools (disassemblers, debuggers, profilers,
dynamic loaders).
The Rust runtime library is a small collection of support code for scheduling,
memory management, inter-task communication, reflection and runtime
linkage. This library is written in standard C++ and is quite
straightforward. It presents a simple interface to embeddings. No
research-level virtual machine, JIT or garbage collection technology is
required. It should be relatively easy to adapt a Rust front-end on to many
existing native toolchains.
@sp 1
@item Integrated system-construction facility
The units of compilation of Rust are multi-file amalgamations called
@emph{crates}. A crate is described by a separate, declarative type of source
file that guides the compilation of the crate, its packaging, its versioning,
and its external dependencies. Crates are also the units of distribution and
loading. Significantly: the dependency graph of crates is @emph{acyclic} and
@emph{anonymous}: there is no global namespace for crates, and module-level
recursion cannot cross crate barriers.
Unlike many languages, individual modules do @emph{not} carry all the
mechanisms or restrictions of crates. Modules and crates serve different
roles.
@sp 1
@item Static control over memory allocation, packing and aliasing.
Many values in Rust are allocated @emph{within} their containing stack-frame
or parent structure. Numbers, records, tuples and tags are all allocated this
way. To allocate such values in the heap, they must be explicitly
@emph{boxed}. A @dfn{box} is a pointer to a heap allocation that holds another
value, its @emph{content}.
Boxing and unboxing in Rust is explicit, though in many cases (arithmetic
operations, name-component dereferencing) Rust will automatically ``reach
through'' the box to access its content. Box values can be passed and assigned
independently, like pointers in C; the difference is that in Rust they always
point to live contents, and are not subject to pointer arithmetic.
In addition to boxes, Rust supports a kind of pass-by-reference slot called an
alias. Forming or releasing an alias does not perform reference-count
operations; aliases can only be formed on referents that will provably outlive
the alias, and are therefore only used for passing arguments to
functions. Aliases are not ``general values'', in the sense that they cannot
be independently manipulated. They are more like C++ references, except that
like boxes, aliases are safe: they always point to live values.
In addition, every slot (stack-local allocation or alias) has a static
initialization state that is calculated by the typestate system. This permits
late initialization of slots in functions with complex control-flow, while
still guaranteeing that every use of a slot occurs after it has been
initialized.
@sp 1
@item Static control over mutability and garbage collection.
Types in Rust are classified into @emph{layers}. There is a layer of immutable
values, a layer of state values, and a layer of GC values. By default, all
types are immutable.
If a field within a type is declared as @code{mutable}, then the type is part
of the @code{state} layer and must be declared as such. Any type directly
marked as @code{state} @emph{or indirectly referring to} a state type is also
a state type.
If a field within a type is potentially cyclic (this is a narrow, but
well-defined condition involving mutable recursive types) then it is part of
the @code{gc} layer and must be declared as such.
This classification of data types in Rust interacts with the memory allocation,
transmission and destruction rules. In particular:
@itemize
@item Only immutable values can be sent over channels.
@item Only non-GC objects can have destructor functions.
@end itemize
Garbage collection, when present, operates per-task and does not interrupt
other tasks while running. It is limited to types that need it and can be
statically avoided altogether by limiting the types in a program to the state
and immutable layers.
Non-GC values are reference-counted and have a deterministic destruction
order: top-down, immediately upon release of the last live reference.
State values can refer to non-state values, but not vice-versa; likewise GC
values can refer to non-GC values but not vice-versa. Rust therefore
encourages the programmer to write in a style that consists primarily of
immutable types, but also permits limited, local (per-task) mutability,
and provides local (per-task) GC only when required.
@sp 1
@item Stack-based iterators
Rust provides a type of function-like multiple-invocation iterator that is
very efficient: the iterator state lives only on the stack and is tightly
coupled to the loop that invoked it.
@sp 1
@item Direct interface to C code
Rust can load and call many C library functions simply by declaring
them. Calling a C function is an ``unsafe'' action, and can only be taken
within a block marked with the @code{unsafe} keyword. Every unsafe block
in a Rust compilation unit must be explicitly authorized in the crate file.
@sp 1
@item Structural algebraic data types
The Rust type system is primarily structural, and contains the standard
assortment of useful ``algebraic'' type constructors from functional
languages, such as function types, tuples, record types, vectors, and
nominally-tagged disjoint unions. Such values may be @emph{pattern-matched} in
an @code{alt} expression.
@sp 1
@item Generic code
Rust supports a simple form of parametric polymorphism: functions, iterators,
types and objects can be parametrized by other types.
@sp 1
@item Argument binding
Rust provides a mechanism of partially binding arguments to functions,
producing new functions that accept the remaining un-bound arguments. This
mechanism combines some of the features of lexical closures with some of the
features of currying, in a smaller and simpler package.
@sp 1
@item Local type inference
To save some quantity of programmer key-pressing, Rust supports local type
inference: signatures of functions, objects and iterators always require type
annotation, but within the body of a function or iterator many slots can be
declared @code{auto} and Rust will infer the slot's type from its uses.
@sp 1
@item Structural object system
Rust has a lightweight object system based on structural object types: there
is no ``class hierarchy'' nor any concept of inheritance. Method overriding
and object restriction are performed explicitly on object values, which are
little more than order-insensitive records of methods sharing a common private
value. Objects that reside outside the GC layer can have destructors.
@sp 1
@item Dynamic type
Rust includes support for values of a top type, @code{any}, that can hold any
type of value whatsoever. An @code{any} value is a pair of a type code and a
boxed value of that type. Injection into an @code{any} and projection by
type-case-selection is integrated into the language.
@sp 1
@item Dynamic metaprogramming (reflection)
Rust supports run-time reflection on the structure of a crate, using a
combination of custom descriptor structures and the DWARF metadata tables used
to support crate linkage and other runtime services.
@sp 1
@item Static metaprogramming (syntactic extension)
Rust supports a system for syntactic extensions that can be loaded into the
compiler, to implement user-defined notations, macros, program-generators and
the like. These notations are @emph{marked} using a special form of
bracketing, such that a reader unfamiliar with the extension can still parse
the surrounding text by skipping over the bracketed ``extension text''.
@sp 1
@item Idempotent failure
If a task fails due to a signal, or if it evaluates the special @code{fail}
expression, it enters the @emph{failing} state. A failing task unwinds its
control stack, frees all of its owned resources (executing destructors) and
enters the @emph{dead} state. Failure is idempotent and non-recoverable.
@sp 1
@item Signal handling
Rust has a system for propagating task-failures and other spontaneous
events between tasks. Some signals can be trapped and redirected to
channels; other signals are fatal and result in task-failure. Tasks
can designate other tasks to handle signals for them. This permits
organizing tasks into mutually-supervising or mutually-failing groups.
@sp 1
@item Deterministic destruction
Non-GC objects can have destructor functions, which are executed
deterministically in top-down ownership order, as control frames are exited
and/or objects are otherwise freed from data structures holding them. The same
destructors are run in the same order whether the object is deleted by
unwinding during failure or normal execution.
Similarly, the rules for freeing non-GC values are deterministic and
predictable: on scope-exit or structure-release, local slots are released
immediately. Referenced boxes have their reference count decreased and are
released if the count drops to zero. Aliases are silently forgotten.
GC values are local to a task, and are subject to per-task garbage
collection. As a result, unreferenced GC-layer boxes are not necessarily freed
immediately; if an unreferenced GC box is part of an acyclic graph, it is
freed when the last reference to it drops, but if it is part of a reference
cycle it will be freed when the GC collects it (or when the owning task
terminates, at the latest).
GC values can point to non-GC values but not vice-versa. Doing so merely
delays (to an undefined future time) the moment when the deterministic,
top-down destruction sequence for the referenced non-GC values
@emph{start}. In other words, the non-GC ``leaves'' of a GC value are released
in a locally-predictable order, even if the ``interior'' cyclic part of the GC
value is released in an unpredictable order.
@sp 1
@item Typestate system
Every storage slot in a Rust frame participates in not only a conventional
structural static type system, describing the interpretation of memory in the
slot, but also a @emph{typestate} system. The static typestates of a program
describe the set of @emph{pure, dynamic predicates} that provably hold over
some set of slots, at each point in the program's control-flow graph within
each frame. The static calculation of the typestates of a program is a
function-local dataflow problem, and handles user-defined predicates in a
similar fashion to the way the type system permits user-defined types.
A short way of thinking of this is: types statically model values,
typestates statically model @emph{assertions that hold} before and
after statements and expressions.
@end itemize
@page
@node Influences
@section Influences
@sp 2
@quotation
The essential problem that must be solved in making a fault-tolerant
software system is therefore that of fault-isolation. Different programmers
will write different modules, some modules will be correct, others will have
errors. We do not want the errors in one module to adversely affect the
behaviour of a module which does not have any errors.
@flushright
- Joe Armstrong
@end flushright
@end quotation
@sp 2
@quotation
In our approach, all data is private to some process, and processes can
only communicate through communications channels. @emph{Security}, as used
in this paper, is the property which guarantees that processes in a system
cannot affect each other except by explicit communication.
When security is absent, nothing which can be proven about a single module
in isolation can be guaranteed to hold when that module is embedded in a
system [...]
@flushright
- Robert Strom and Shaula Yemini
@end flushright
@end quotation
@sp 2
@quotation
Concurrent and applicative programming complement each other. The
ability to send messages on channels provides I/O without side effects,
while the avoidance of shared data helps keep concurrent processes from
colliding.
@flushright
- Rob Pike
@end flushright
@end quotation
@sp 2
@page
Rust is not a particularly original language. It may however appear unusual by
contemporary standards, as its design elements are drawn from a number of
``historical'' languages that have, with a few exceptions, fallen out of
favour. Five prominent lineages contribute the most:
@itemize
@sp 1
@item
The NIL (1981) and Hermes (1990) family. These languages were developed by
Robert Strom, Shaula Yemini, David Bacon and others in their group at IBM
Watson Research Center (Yorktown Heights, NY, USA).
@sp 1
@item
The Erlang (1987) language, developed by Joe Armstrong, Robert Virding, Claes
Wikstr@"om, Mike Williams and others in their group at the Ericsson Computer
Science Laboratory (@"Alvsj@"o, Stockholm, Sweden) .
@sp 1
@item
The Sather (1990) language, developed by Stephen Omohundro, Chu-Cheow Lim,
Heinz Schmidt and others in their group at The International Computer Science
Institute of the University of California, Berkeley (Berkeley, CA, USA).
@sp 1
@item
The Newsqueak (1988), Alef (1995), and Limbo (1996) family. These languages
were developed by Rob Pike, Phil Winterbottom, Sean Dorward and others in
their group at Bell labs Computing Sciences Reserch Center (Murray Hill, NJ,
USA).
@sp 1
@item
The Napier (1985) and Napier88 (1988) family. These languages were developed
by Malcolm Atkinson, Ron Morrison and others in their group at the University
of St. Andrews (St. Andrews, Fife, UK).
@end itemize
@sp 1
Additional specific influences can be seen from the following languages:
@itemize
@item The structural algebraic types and compilation manager of SML.
@item The syntax-extension systems of Camlp4 and the Common Lisp readtable.
@item The deterministic destructor system of C++.
@end itemize
@c ############################################################
@c Tutorial
@c ############################################################
@node Tutorial
@chapter Tutorial
@emph{TODO}.
@c ############################################################
@c Reference
@c ############################################################
@node Reference
@chapter Reference
@menu
* Ref.Lex:: Lexical structure.
* Ref.Path:: References to slots and items.
* Ref.Gram:: Grammar.
* Ref.Comp:: Compilation and component model.
* Ref.Mem:: Semantic model of memory.
* Ref.Task:: Semantic model of tasks.
* Ref.Item:: The components of a module.
* Ref.Type:: The types of values held in memory.
* Ref.Typestate:: Predicates that hold at points in time.
* Ref.Stmt:: Components of an executable block.
* Ref.Expr:: Units of execution and evaluation.
* Ref.Run:: Organization of runtime services.
@end menu
@node Ref.Lex
@section Ref.Lex
@c * Ref.Lex:: Lexical structure.
@cindex Lexical structure
@cindex Token
The lexical structure of a Rust source file or crate file is defined in terms
of Unicode character codes and character properties.
Groups of Unicode character codes and characters are organized into
@emph{tokens}. Tokens are defined as the longest contiguous sequence of
characters within the same token type (identifier, keyword, literal, symbol),
or interrupted by ignored characters.
Most tokens in Rust follow rules similar to the C family.
Most tokens (including whitespace, keywords, operators and structural symbols)
are drawn from the ASCII-compatible range of Unicode. Identifiers are drawn
from Unicode characters specified by the @code{XID_start} and
@code{XID_continue} rules given by UAX #31@footnote{Unicode Standard Annex
#31: Unicode Identifier and Pattern Syntax}. String and character literals may
include the full range of Unicode characters.
@emph{TODO: formalize this section much more}.
@menu
* Ref.Lex.Ignore:: Ignored characters.
* Ref.Lex.Ident:: Identifier tokens.
* Ref.Lex.Key:: Keyword tokens.
* Ref.Lex.Res:: Reserved tokens.
* Ref.Lex.Num:: Numeric tokens.
* Ref.Lex.Text:: String and character tokens.
* Ref.Lex.Syntax:: Syntactic extension tokens.
* Ref.Lex.Sym:: Special symbol tokens.
@end menu
@node Ref.Lex.Ignore
@subsection Ref.Lex.Ignore
@c * Ref.Lex.Ignore:: Ignored tokens.
Characters considered to be @emph{whitespace} or @emph{comment} are ignored,
and are not considered as tokens. They serve only to delimit tokens. Rust is
otherwise a free-form language.
@dfn{Whitespace} is any of the following Unicode characters: U+0020 (space),
U+0009 (tab, @code{'\t'}), U+000A (LF, @code{'\n'}), U+000D (CR, @code{'\r'}).
@dfn{Comments} are @emph{single-line comments} or @emph{multi-line comments}.
A @dfn{single-line comment} is any sequence of Unicode characters beginning
with U+002F U+002F (@code{"//"}) and extending to the next U+000A character,
@emph{excluding} cases in which such a sequence occurs within a string literal
token or a syntactic extension token.
A @dfn{multi-line comments} is any sequence of Unicode characters beginning
with U+002F U+002A (@code{"/*"}) and ending with U+002A U+002F (@code{"*/"}),
@emph{excluding} cases in which such a sequence occurs within a string literal
token or a syntactic extension token. Multi-line comments may be nested.
@node Ref.Lex.Ident
@subsection Ref.Lex.Ident
@c * Ref.Lex.Ident:: Identifier tokens.
@cindex Identifier token
Identifiers follow the rules given by Unicode Standard Annex #31, in the form
closed under NFKC normalization, @emph{excluding} those tokens that are
otherwise defined as keywords or reserved
tokens. @xref{Ref.Lex.Key}. @xref{Ref.Lex.Res}.
That is: an identifier starts with any character having derived property
@code{XID_Start} and continues with zero or more characters having derived
property @code{XID_Continue}; and such an identifier is NFKC-normalized during
lexing, such that all subsequent comparison of identifiers is performed on the
NFKC-normalized forms.
@emph{TODO: define relationship between Unicode and Rust versions}.
@footnote{This identifier syntax is a superset of the identifier syntaxes of C
and Java, and is modeled on Python PEP #3131, which formed the definition of
identifiers in Python 3.0 and later.}
@node Ref.Lex.Key
@subsection Ref.Lex.Key
@c * Ref.Lex.Key:: Keyword tokens.
The keywords are:
@cindex Keywords
@sp 2
@multitable @columnfractions .15 .15 .15 .15 .15
@item @code{use}
@tab @code{syntax}
@tab @code{mutable}
@tab @code{native}
@item @code{mod}
@tab @code{import}
@tab @code{export}
@tab @code{let}
@tab @code{auto}
@item @code{state}
@tab @code{gc}
@tab @code{const}
@tab @code{thread}
@item @code{auth}
@tab @code{unsafe}
@tab @code{as}
@tab @code{self}
@tab @code{log}
@item @code{bind}
@tab @code{type}
@tab @code{true}
@tab @code{false}
@tab @code{any}
@item @code{int}
@tab @code{uint}
@tab @code{float}
@tab @code{char}
@tab @code{bool}
@item @code{u8}
@tab @code{u16}
@tab @code{u32}
@tab @code{u64}
@tab @code{f32}
@item @code{i8}
@tab @code{i16}
@tab @code{i32}
@tab @code{i64}
@tab @code{f64}
@item @code{rec}
@tab @code{tup}
@tab @code{tag}
@tab @code{vec}
@tab @code{str}
@item @code{fn}
@tab @code{iter}
@tab @code{pred}
@tab @code{obj}
@tab @code{drop}
@item @code{task}
@tab @code{port}
@tab @code{chan}
@tab @code{spawn}
@tab @code{with}
@item @code{if}
@tab @code{else}
@tab @code{alt}
@tab @code{case}
@tab @code{in}
@item @code{do}
@tab @code{while}
@tab @code{break}
@tab @code{cont}
@tab @code{note}
@item @code{assert}
@tab @code{claim}
@tab @code{check}
@tab @code{prove}
@tab @code{fail}
@item @code{for}
@tab @code{each}
@tab @code{ret}
@tab @code{put}
@tab @code{be}
@end multitable
@node Ref.Lex.Res
@subsection Ref.Lex.Res
@c * Ref.Lex.Res:: Reserved tokens.
The reserved tokens are:
@cindex Reserved
@sp 2
@multitable @columnfractions .15 .15 .15 .15 .15
@item @code{f16}
@tab @code{f80}
@tab @code{f128}
@item @code{m32}
@tab @code{m64}
@tab @code{m128}
@tab @code{dec}
@end multitable
@sp 2
At present these tokens have no defined meaning in the Rust language.
These tokens may correspond, in some current or future implementation,
to additional built-in types for decimal floating-point, extended
binary and interchange floating-point formats, as defined in the IEEE
754-1985 and IEEE 754-2008 specifications.
@node Ref.Lex.Num
@subsection Ref.Lex.Num
@c * Ref.Lex.Num:: Numeric tokens.
@cindex Number token
@cindex Hex token
@cindex Decimal token
@cindex Binary token
@cindex Floating-point token
@c FIXME: This discussion isn't quite right since 'f' and 'i' can be used as
@c suffixes
A @dfn{number literal} is either an @emph{integer literal} or a
@emph{floating-point literal}.
@sp 1
An @dfn{integer literal} has one of three forms:
@enumerate
@item A @dfn{decimal literal} starts with a @emph{decimal digit} and continues
with any mixture of @emph{decimal digits} and @emph{underscores}.
@item A @dfn{hex literal} starts with the character sequence U+0030
U+0078 (@code{"0x"}) and continues as any mixture @emph{hex digits}
and @emph{underscores}.
@item A @dfn{binary literal} starts with the character sequence U+0030
U+0062 (@code{"0b"}) and continues as any mixture @emph{binary digits}
and @emph{underscores}.
@end enumerate
By default, an integer literal is of type @code{int}. An integer literal may
be followed (immediately, without any spaces) by a @dfn{integer suffix}, which
changes the type of the literal. There are three kinds of integer literal
suffix:
@enumerate
@item The @code{u} suffix gives the literal type @code{uint}.
@item The @code{g} suffix gives the literal type @code{big}.
@item Each of the signed and unsigned machine types @code{u8}, @code{i8},
@code{u16}, @code{i16}, @code{u32}, @code{i32}, @code{u64} and @code{i64}
give the literal the corresponding machine type.
@end enumerate
@sp 1
A @dfn{floating-point literal} has one of two forms:
@enumerate
@item Two @emph{decimal literals} separated by a period
character U+002E ('.'), with an optional @emph{exponent} trailing after the
second @emph{decimal literal}.
@item A single @emph{decimal literal} followed by an @emph{exponent}.
@end enumerate
By default, a floating-point literal is of type @code{float}. A floating-point
literal may be followed (immediately, without any spaces) by a
@dfn{floating-point suffix}, which changes the type of the literal. There are
only two floating-point suffixes: @code{f32} and @code{f64}. Each of these
gives the floating point literal the associated type, rather than
@code{float}.
A set of suffixes are also reserved to accommodate literal support for
types corresponding to reserved tokens. The reserved suffixes are @code{f16},
@code{f80}, @code{f128}, @code{m}, @code{m32}, @code{m64} and @code{m128}.
@sp 1
A @dfn{hex digit} is either a @emph{decimal digit} or else a character in the
ranges U+0061-U+0066 and U+0041-U+0046 (@code{'a'}-@code{'f'},
@code{'A'}-@code{'F'}).
A @dfn{binary digit} is either the character U+0030 or U+0031 (@code{'0'} or
@code{'1'}).
An @dfn{exponent} begins with either of the characters U+0065 or U+0045
(@code{'e'} or @code{'E'}), followed by an optional @emph{sign character},
followed by a trailing @emph{decimal literal}.
A @dfn{sign character} is either U+002B or U+002D (@code{'+'} or @code{'-'}).
Examples of integer literals of various forms:
@example
123; // type int
123u; // type uint
123_u; // type uint
0xff00; // type int
0xffu8; // type u8
0b1111_1111_1001_0000_i32; // type i32
0xffff_ffff_ffff_ffff_ffff_ffffg; // type big
@end example
Examples of floating-point literals of various forms:
@example
123.0; // type float
0.1; // type float
0.1f32; // type f32
12E+99_f64; // type f64
@end example
@node Ref.Lex.Text
@subsection Ref.Lex.Text
@c * Ref.Lex.Key:: String and character tokens.
@cindex String token
@cindex Character token
@cindex Escape sequence
@cindex Unicode
A @dfn{character literal} is a single Unicode character enclosed within two
U+0027 (single-quote) characters, with the exception of U+0027 itself, which
must be @emph{escaped} by a preceding U+005C character ('\').
A @dfn{string literal} is a sequence of any Unicode characters enclosed
within two U+0022 (double-quote) characters, with the exception of U+0022
itself, which must be @emph{escaped} by a preceding U+005C character
('\').
Some additional @emph{escapes} are available in either character or string
literals. An escape starts with a U+005C ('\') and continues with one
of the following forms:
@itemize
@item An @dfn{8-bit codepoint escape} escape starts with U+0078 ('x') and is
followed by exactly two @dfn{hex digits}. It denotes the Unicode codepoint
equal to the provided hex value.
@item A @dfn{16-bit codepoint escape} starts with U+0075 ('u') and is followed
by exactly four @dfn{hex digits}. It denotes the Unicode codepoint equal to
the provided hex value.
@item A @dfn{32-bit codepoint escape} starts with U+0055 ('U') and is followed
by exactly eight @dfn{hex digits}. It denotes the Unicode codepoint equal to
the provided hex value.
@item A @dfn{whitespace escape} is one of the characters U+006E, U+0072, or
U+0074, denoting the unicode values U+000A (LF), U+000D (CR) or U+0009 (HT)
respectively.
@item The @dfn{backslash escape} is the character U+005C ('\') which must be
escaped in order to denote @emph{itself}.
@end itemize
@node Ref.Lex.Syntax
@subsection Ref.Lex.Syntax
@c * Ref.Lex.Syntax:: Syntactic extension tokens.
Syntactic extensions are marked with the @emph{pound} sigil U+0023 (@code{#}),
followed by a qualified name of a compile-time imported module item, an
optional parenthesized list of @emph{parsed expressions}, and an optional
brace-enclosed region of free-form text (with brace-matching and
brace-escaping used to determine the limit of the
region). @xref{Ref.Comp.Syntax}.
@emph{TODO: formalize those terms more}.
@node Ref.Lex.Sym
@subsection Ref.Lex.Sym
@c * Ref.Lex.Sym:: Special symbol tokens.
@cindex Symbol
@cindex Operator
The special symbols are:
@sp 2
@multitable @columnfractions .1 .1 .1 .1 .1 .1
@item @code{@@}
@tab @code{_}
@item @code{#}
@tab @code{:}
@tab @code{.}
@tab @code{;}
@tab @code{,}
@item @code{[}
@tab @code{]}
@tab @code{@{}
@tab @code{@}}
@tab @code{(}
@tab @code{)}
@item @code{=}
@tab @code{<-}
@tab @code{<|}
@tab @code{<+}
@tab @code{->}
@item @code{+}
@tab @code{++}
@tab @code{+=}
@tab @code{-}
@tab @code{--}
@tab @code{-=}
@item @code{*}
@tab @code{/}
@tab @code{%}
@tab @code{*=}
@tab @code{/=}
@tab @code{%=}
@item @code{&}
@tab @code{|}
@tab @code{!}
@tab @code{~}
@tab @code{^}
@item @code{&=}
@tab @code{|=}
@tab @code{^=}
@tab @code{!=}
@item @code{>>}
@tab @code{>>>}
@tab @code{<<}
@tab @code{<<=}
@tab @code{>>=}
@tab @code{>>>=}
@item @code{<}
@tab @code{<=}
@tab @code{==}
@tab @code{>=}
@tab @code{>}
@item @code{&&}
@tab @code{||}
@end multitable
@page
@page
@node Ref.Path
@section Ref.Path
@c * Ref.Path:: References to slots and items.
@cindex Names of items or slots
@cindex Path name
@cindex Type parameters
A @dfn{path} is a ubiquitous syntactic form in Rust that deserves special
attention. A path denotes a slot or an