forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathRTFP.txt
2463 lines (2285 loc) · 72 KB
/
RTFP.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Read the Fscking Papers!
This document describes RCU-related publications, and is followed by
the corresponding bibtex entries. A number of the publications may
be found at http://www.rdrop.com/users/paulmck/RCU/. For others, browsers
and search engines will usually find what you are looking for.
The first thing resembling RCU was published in 1980, when Kung and Lehman
[Kung80] recommended use of a garbage collector to defer destruction
of nodes in a parallel binary search tree in order to simplify its
implementation. This works well in environments that have garbage
collectors, but most production garbage collectors incur significant
overhead.
In 1982, Manber and Ladner [Manber82,Manber84] recommended deferring
destruction until all threads running at that time have terminated, again
for a parallel binary search tree. This approach works well in systems
with short-lived threads, such as the K42 research operating system.
However, Linux has long-lived tasks, so more is needed.
In 1986, Hennessy, Osisek, and Seigh [Hennessy89] introduced passive
serialization, which is an RCU-like mechanism that relies on the presence
of "quiescent states" in the VM/XA hypervisor that are guaranteed not
to be referencing the data structure. However, this mechanism was not
optimized for modern computer systems, which is not surprising given
that these overheads were not so expensive in the mid-80s. Nonetheless,
passive serialization appears to be the first deferred-destruction
mechanism to be used in production. Furthermore, the relevant patent
has lapsed, so this approach may be used in non-GPL software, if desired.
(In contrast, implementation of RCU is permitted only in software licensed
under either GPL or LGPL. Sorry!!!)
In 1990, Pugh [Pugh90] noted that explicitly tracking which threads
were reading a given data structure permitted deferred free to operate
in the presence of non-terminating threads. However, this explicit
tracking imposes significant read-side overhead, which is undesirable
in read-mostly situations. This algorithm does take pains to avoid
write-side contention and parallelize the other write-side overheads by
providing a fine-grained locking design, however, it would be interesting
to see how much of the performance advantage reported in 1990 remains
in 2004.
At about this same time, Adams [Adams91] described ``chaotic relaxation'',
where the normal barriers between successive iterations of convergent
numerical algorithms are relaxed, so that iteration $n$ might use
data from iteration $n-1$ or even $n-2$. This introduces error,
which typically slows convergence and thus increases the number of
iterations required. However, this increase is sometimes more than made
up for by a reduction in the number of expensive barrier operations,
which are otherwise required to synchronize the threads at the end
of each iteration. Unfortunately, chaotic relaxation requires highly
structured data, such as the matrices used in scientific programs, and
is thus inapplicable to most data structures in operating-system kernels.
In 1992, Henry (now Alexia) Massalin completed a dissertation advising
parallel programmers to defer processing when feasible to simplify
synchronization. RCU makes extremely heavy use of this advice.
In 1993, Jacobson [Jacobson93] verbally described what is perhaps the
simplest deferred-free technique: simply waiting a fixed amount of time
before freeing blocks awaiting deferred free. Jacobson did not describe
any write-side changes he might have made in this work using SGI's Irix
kernel. Aju John published a similar technique in 1995 [AjuJohn95].
This works well if there is a well-defined upper bound on the length of
time that reading threads can hold references, as there might well be in
hard real-time systems. However, if this time is exceeded, perhaps due
to preemption, excessive interrupts, or larger-than-anticipated load,
memory corruption can ensue, with no reasonable means of diagnosis.
Jacobson's technique is therefore inappropriate for use in production
operating-system kernels, except when such kernels can provide hard
real-time response guarantees for all operations.
Also in 1995, Pu et al. [Pu95a] applied a technique similar to that of Pugh's
read-side-tracking to permit replugging of algorithms within a commercial
Unix operating system. However, this replugging permitted only a single
reader at a time. The following year, this same group of researchers
extended their technique to allow for multiple readers [Cowan96a].
Their approach requires memory barriers (and thus pipeline stalls),
but reduces memory latency, contention, and locking overheads.
1995 also saw the first publication of DYNIX/ptx's RCU mechanism
[Slingwine95], which was optimized for modern CPU architectures,
and was successfully applied to a number of situations within the
DYNIX/ptx kernel. The corresponding conference paper appeared in 1998
[McKenney98].
In 1999, the Tornado and K42 groups described their "generations"
mechanism, which quite similar to RCU [Gamsa99]. These operating systems
made pervasive use of RCU in place of "existence locks", which greatly
simplifies locking hierarchies.
2001 saw the first RCU presentation involving Linux [McKenney01a]
at OLS. The resulting abundance of RCU patches was presented the
following year [McKenney02a], and use of RCU in dcache was first
described that same year [Linder02a].
Also in 2002, Michael [Michael02b,Michael02a] presented "hazard-pointer"
techniques that defer the destruction of data structures to simplify
non-blocking synchronization (wait-free synchronization, lock-free
synchronization, and obstruction-free synchronization are all examples of
non-blocking synchronization). In particular, this technique eliminates
locking, reduces contention, reduces memory latency for readers, and
parallelizes pipeline stalls and memory latency for writers. However,
these techniques still impose significant read-side overhead in the
form of memory barriers. Researchers at Sun worked along similar lines
in the same timeframe [HerlihyLM02]. These techniques can be thought
of as inside-out reference counts, where the count is represented by the
number of hazard pointers referencing a given data structure (rather than
the more conventional counter field within the data structure itself).
By the same token, RCU can be thought of as a "bulk reference count",
where some form of reference counter covers all reference by a given CPU
or thread during a set timeframe. This timeframe is related to, but
not necessarily exactly the same as, an RCU grace period. In classic
RCU, the reference counter is the per-CPU bit in the "bitmask" field,
and each such bit covers all references that might have been made by
the corresponding CPU during the prior grace period. Of course, RCU
can be thought of in other terms as well.
In 2003, the K42 group described how RCU could be used to create
hot-pluggable implementations of operating-system functions [Appavoo03a].
Later that year saw a paper describing an RCU implementation of System
V IPC [Arcangeli03], and an introduction to RCU in Linux Journal
[McKenney03a].
2004 has seen a Linux-Journal article on use of RCU in dcache
[McKenney04a], a performance comparison of locking to RCU on several
different CPUs [McKenney04b], a dissertation describing use of RCU in a
number of operating-system kernels [PaulEdwardMcKenneyPhD], a paper
describing how to make RCU safe for soft-realtime applications [Sarma04c],
and a paper describing SELinux performance with RCU [JamesMorris04b].
2005 brought further adaptation of RCU to realtime use, permitting
preemption of RCU realtime critical sections [PaulMcKenney05a,
PaulMcKenney05b].
2006 saw the first best-paper award for an RCU paper [ThomasEHart2006a],
as well as further work on efficient implementations of preemptible
RCU [PaulEMcKenney2006b], but priority-boosting of RCU read-side critical
sections proved elusive. An RCU implementation permitting general
blocking in read-side critical sections appeared [PaulEMcKenney2006c],
Robert Olsson described an RCU-protected trie-hash combination
[RobertOlsson2006a].
2007 saw the journal version of the award-winning RCU paper from 2006
[ThomasEHart2007a], as well as a paper demonstrating use of Promela
and Spin to mechanically verify an optimization to Oleg Nesterov's
QRCU [PaulEMcKenney2007QRCUspin], a design document describing
preemptible RCU [PaulEMcKenney2007PreemptibleRCU], and the three-part
LWN "What is RCU?" series [PaulEMcKenney2007WhatIsRCUFundamentally,
PaulEMcKenney2008WhatIsRCUUsage, and PaulEMcKenney2008WhatIsRCUAPI].
2008 saw a journal paper on real-time RCU [DinakarGuniguntala2008IBMSysJ],
a history of how Linux changed RCU more than RCU changed Linux
[PaulEMcKenney2008RCUOSR], and a design overview of hierarchical RCU
[PaulEMcKenney2008HierarchicalRCU].
2009 introduced user-level RCU algorithms [PaulEMcKenney2009MaliciousURCU],
which Mathieu Desnoyers is now maintaining [MathieuDesnoyers2009URCU]
[MathieuDesnoyersPhD]. TINY_RCU [PaulEMcKenney2009BloatWatchRCU] made
its appearance, as did expedited RCU [PaulEMcKenney2009expeditedRCU].
The problem of resizeable RCU-protected hash tables may now be on a path
to a solution [JoshTriplett2009RPHash]. A few academic researchers are now
using RCU to solve their parallel problems [HariKannan2009DynamicAnalysisRCU].
2010 produced a simpler preemptible-RCU implementation
based on TREE_RCU [PaulEMcKenney2010SimpleOptRCU], lockdep-RCU
[PaulEMcKenney2010LockdepRCU], another resizeable RCU-protected hash
table [HerbertXu2010RCUResizeHash] (this one consuming more memory,
but allowing arbitrary changes in hash function, as required for DoS
avoidance in the networking code), realization of the 2009 RCU-protected
hash table with atomic node move [JoshTriplett2010RPHash], an update on
the RCU API [PaulEMcKenney2010RCUAPI].
2011 marked the inclusion of Nick Piggin's fully lockless dentry search
[LinusTorvalds2011Linux2:6:38:rc1:NPigginVFS], an RCU-protected red-black
tree using software transactional memory to protect concurrent updates
(strange, but true!) [PhilHoward2011RCUTMRBTree], yet another variant of
RCU-protected resizeable hash tables [Triplett:2011:RPHash], the 3.0 RCU
trainwreck [PaulEMcKenney2011RCU3.0trainwreck], and Neil Brown's "Meet the
Lockers" LWN article [NeilBrown2011MeetTheLockers].
Bibtex Entries
@article{Kung80
,author="H. T. Kung and Q. Lehman"
,title="Concurrent Maintenance of Binary Search Trees"
,Year="1980"
,Month="September"
,journal="ACM Transactions on Database Systems"
,volume="5"
,number="3"
,pages="354-382"
,note="Available:
\url{http://portal.acm.org/citation.cfm?id=320619&dl=GUIDE,}
[Viewed December 3, 2007]"
,annotation={
Use garbage collector to clean up data after everyone is done with it.
.
Oldest use of something vaguely resembling RCU that I have found.
}
}
@techreport{Manber82
,author="Udi Manber and Richard E. Ladner"
,title="Concurrency Control in a Dynamic Search Structure"
,institution="Department of Computer Science, University of Washington"
,address="Seattle, Washington"
,year="1982"
,number="82-01-01"
,month="January"
,pages="28"
,annotation={
.
Superseded by Manber84.
.
Describes concurrent AVL tree implementation. Uses a
garbage-collection mechanism to handle concurrent use and deletion
of nodes in the tree, but lacks the summary-of-execution-history
concept of read-copy locking.
.
Keeps full list of processes that were active when a given
node was to be deleted, and waits until all such processes have
-terminated- before allowing this node to be reused. This is
not described in great detail -- one could imagine using process
IDs for this if the ID space was large enough that overlapping
never occurred.
.
This restriction makes this algorithm unsuitable for use in
systems comprised of long-lived processes. It also produces
completely unacceptable overhead in systems with large numbers
of processes. Finally, it is specific to AVL trees.
.
Cites Kung80, so not an independent invention, but the first
RCU-like usage that does not rely on an automatic garbage
collector.
}
}
@article{Manber84
,author="Udi Manber and Richard E. Ladner"
,title="Concurrency Control in a Dynamic Search Structure"
,Year="1984"
,Month="September"
,journal="ACM Transactions on Database Systems"
,volume="9"
,number="3"
,pages="439-455"
,annotation={
Describes concurrent AVL tree implementation. Uses a
garbage-collection mechanism to handle concurrent use and deletion
of nodes in the tree, but lacks the summary-of-execution-history
concept of read-copy locking.
.
Keeps full list of processes that were active when a given
node was to be deleted, and waits until all such processes have
-terminated- before allowing this node to be reused. This is
not described in great detail -- one could imagine using process
IDs for this if the ID space was large enough that overlapping
never occurred.
.
This restriction makes this algorithm unsuitable for use in
systems comprised of long-lived processes. It also produces
completely unacceptable overhead in systems with large numbers
of processes. Finally, it is specific to AVL trees.
}
}
@Conference{RichardRashid87a
,Author="Richard Rashid and Avadis Tevanian and Michael Young and
David Golub and Robert Baron and David Black and William Bolosky and
Jonathan Chew"
,Title="Machine-Independent Virtual Memory Management for Paged
Uniprocessor and Multiprocessor Architectures"
,Booktitle="{2\textsuperscript{nd} Symposium on Architectural Support
for Programming Languages and Operating Systems}"
,Publisher="Association for Computing Machinery"
,Month="October"
,Year="1987"
,pages="31-39"
,Address="Palo Alto, CA"
,note="Available:
\url{http://www.cse.ucsc.edu/~randal/221/rashid-machvm.pdf}
[Viewed February 17, 2005]"
,annotation={
Describes lazy TLB flush, where one waits for each CPU to pass
through a scheduling-clock interrupt before reusing a given range
of virtual address. Does not describe how one determines that
all CPUs have in fact taken such an interrupt, though there are
no shortage of straightforward methods for accomplishing this.
.
Note that it does not make sense to just wait a fixed amount of
time, since a given CPU might have interrupts disabled for an
extended amount of time.
}
}
@article{BarbaraLiskov1988ArgusCACM
,author = {Barbara Liskov}
,title = {Distributed programming in {Argus}}
,journal = {Commun. ACM}
,volume = {31}
,number = {3}
,year = {1988}
,issn = {0001-0782}
,pages = {300--312}
,doi = {http://doi.acm.org/10.1145/42392.42399}
,publisher = {ACM}
,address = {New York, NY, USA}
,annotation= {
At the top of page 307: "Conflicts with deposits and withdrawals
are necessary if the reported total is to be up to date. They
could be avoided by having total return a sum that is slightly
out of date." Relies on semantics -- approximate numerical
values sometimes OK.
}
}
@techreport{Hennessy89
,author="James P. Hennessy and Damian L. Osisek and Joseph W. {Seigh II}"
,title="Passive Serialization in a Multitasking Environment"
,institution="US Patent and Trademark Office"
,address="Washington, DC"
,year="1989"
,number="US Patent 4,809,168 (lapsed)"
,month="February"
,pages="11"
}
@techreport{Pugh90
,author="William Pugh"
,title="Concurrent Maintenance of Skip Lists"
,institution="Institute of Advanced Computer Science Studies, Department of Computer Science, University of Maryland"
,address="College Park, Maryland"
,year="1990"
,number="CS-TR-2222.1"
,month="June"
,annotation={
Concurrent access to skip lists. Has both weak and strong search.
Uses concept of ``garbage queue'', but has no real way of cleaning
the garbage efficiently.
.
Appears to be an independent invention of an RCU-like mechanism.
}
}
@Book{Adams91
,Author="Gregory R. Adams"
,title="Concurrent Programming, Principles, and Practices"
,Publisher="Benjamin Cummins"
,Year="1991"
,annotation={
Has a few paragraphs describing ``chaotic relaxation'', a
numerical analysis technique that allows multiprocessors to
avoid synchronization overhead by using possibly-stale data.
.
Seems like this is descended from yet another independent
invention of RCU-like function -- but this is restricted
in that reclamation is not necessary.
}
}
@unpublished{Jacobson93
,author="Van Jacobson"
,title="Avoid Read-Side Locking Via Delayed Free"
,year="1993"
,month="September"
,note="private communication"
,annotation={
Use fixed time delay to approximate grace period. Very simple,
but subject to random memory corruption under heavy load.
.
Independent invention of RCU-like mechanism.
}
}
@Conference{AjuJohn95
,Author="Aju John"
,Title="Dynamic vnodes -- Design and Implementation"
,Booktitle="{USENIX Winter 1995}"
,Publisher="USENIX Association"
,Month="January"
,Year="1995"
,pages="11-23"
,Address="New Orleans, LA"
,note="Available:
\url{https://www.usenix.org/publications/library/proceedings/neworl/full_papers/john.a}
[Viewed October 1, 2010]"
,annotation={
Age vnodes out of the cache, and have a fixed time set by a kernel
parameter. Not clear that all races were in fact correctly handled.
Used a 20-minute time by default, which would most definitely not
be suitable during DoS attacks or virus scans.
.
Apparently independent invention of RCU-like mechanism.
}
}
@conference{Pu95a,
Author = "Calton Pu and Tito Autrey and Andrew Black and Charles Consel and
Crispin Cowan and Jon Inouye and Lakshmi Kethana and Jonathan Walpole and
Ke Zhang",
Title = "Optimistic Incremental Specialization: Streamlining a Commercial
Operating System",
Booktitle = "15\textsuperscript{th} ACM Symposium on
Operating Systems Principles (SOSP'95)",
address = "Copper Mountain, CO",
month="December",
year="1995",
pages="314-321",
annotation="
Uses a replugger, but with a flag to signal when people are
using the resource at hand. Only one reader at a time.
"
}
@conference{Cowan96a,
Author = "Crispin Cowan and Tito Autrey and Charles Krasic and
Calton Pu and Jonathan Walpole",
Title = "Fast Concurrent Dynamic Linking for an Adaptive Operating System",
Booktitle = "International Conference on Configurable Distributed Systems
(ICCDS'96)",
address = "Annapolis, MD",
month="May",
year="1996",
pages="108",
isbn="0-8186-7395-8",
annotation="
Uses a replugger, but with a counter to signal when people are
using the resource at hand. Allows multiple readers.
"
}
@techreport{Slingwine95
,author="John D. Slingwine and Paul E. McKenney"
,title="Apparatus and Method for Achieving Reduced Overhead Mutual
Exclusion and Maintaining Coherency in a Multiprocessor System
Utilizing Execution History and Thread Monitoring"
,institution="US Patent and Trademark Office"
,address="Washington, DC"
,year="1995"
,number="US Patent 5,442,758"
,month="August"
,annotation={
Describes the parallel RCU infrastructure. Includes NUMA aspect
(structure of bitmap can reflect bus structure of computer system).
.
Another independent invention of an RCU-like mechanism, but the
"real" RCU this time!
}
}
@techreport{Slingwine97
,author="John D. Slingwine and Paul E. McKenney"
,title="Method for Maintaining Data Coherency Using Thread Activity
Summaries in a Multicomputer System"
,institution="US Patent and Trademark Office"
,address="Washington, DC"
,year="1997"
,number="US Patent 5,608,893"
,month="March"
,pages="19"
,annotation={
Describes use of RCU to synchronize data between a pair of
SMP/NUMA computer systems.
}
}
@techreport{Slingwine98
,author="John D. Slingwine and Paul E. McKenney"
,title="Apparatus and Method for Achieving Reduced Overhead Mutual
Exclusion and Maintaining Coherency in a Multiprocessor System
Utilizing Execution History and Thread Monitoring"
,institution="US Patent and Trademark Office"
,address="Washington, DC"
,year="1998"
,number="US Patent 5,727,209"
,month="March"
,annotation={
Describes doing an atomic update by copying the data item and
then substituting it into the data structure.
}
}
@Conference{McKenney98
,Author="Paul E. McKenney and John D. Slingwine"
,Title="Read-Copy Update: Using Execution History to Solve Concurrency
Problems"
,Booktitle="{Parallel and Distributed Computing and Systems}"
,Month="October"
,Year="1998"
,pages="509-518"
,Address="Las Vegas, NV"
,note="Available:
\url{http://www.rdrop.com/users/paulmck/RCU/rclockpdcsproof.pdf}
[Viewed December 3, 2007]"
,annotation={
Describes and analyzes RCU mechanism in DYNIX/ptx. Describes
application to linked list update and log-buffer flushing.
Defines 'quiescent state'. Includes both measured and analytic
evaluation.
}
}
@Conference{Gamsa99
,Author="Ben Gamsa and Orran Krieger and Jonathan Appavoo and Michael Stumm"
,Title="Tornado: Maximizing Locality and Concurrency in a Shared Memory
Multiprocessor Operating System"
,Booktitle="{Proceedings of the 3\textsuperscript{rd} Symposium on
Operating System Design and Implementation}"
,Month="February"
,Year="1999"
,pages="87-100"
,Address="New Orleans, LA"
,note="Available:
\url{http://www.usenix.org/events/osdi99/full_papers/gamsa/gamsa.pdf}
[Viewed August 30, 2006]"
,annotation={
Use of RCU-like facility in K42/Tornado. Another independent
invention of RCU.
See especially pages 7-9 (Section 5).
}
}
@unpublished{RustyRussell2000a
,Author="Rusty Russell"
,Title="Re: modular net drivers"
,month="June"
,year="2000"
,day="23"
,note="Available:
\url{http://oss.sgi.com/projects/netdev/archive/2000-06/msg00250.html}
[Viewed April 10, 2006]"
,annotation={
Proto-RCU proposal from Phil Rumpf and Rusty Russell.
Yet another independent invention of RCU.
Outline of algorithm to unload modules...
.
Appeared on net-dev mailing list.
}
}
@unpublished{RustyRussell2000b
,Author="Rusty Russell"
,Title="Re: modular net drivers"
,month="June"
,year="2000"
,day="24"
,note="Available:
\url{http://oss.sgi.com/projects/netdev/archive/2000-06/msg00254.html}
[Viewed April 10, 2006]"
,annotation={
Proto-RCU proposal from Phil Rumpf and Rusty Russell.
.
Appeared on net-dev mailing list.
}
}
@unpublished{McKenney01b
,Author="Paul E. McKenney and Dipankar Sarma"
,Title="Read-Copy Update Mutual Exclusion in {Linux}"
,month="February"
,year="2001"
,note="Available:
\url{http://lse.sourceforge.net/locking/rcu/rcupdate_doc.html}
[Viewed October 18, 2004]"
,annotation={
Prototypical Linux documentation for RCU.
}
}
@techreport{Slingwine01
,author="John D. Slingwine and Paul E. McKenney"
,title="Apparatus and Method for Achieving Reduced Overhead Mutual
Exclusion and Maintaining Coherency in a Multiprocessor System
Utilizing Execution History and Thread Monitoring"
,institution="US Patent and Trademark Office"
,address="Washington, DC"
,year="2001"
,number="US Patent 6,219,690"
,month="April"
,annotation={
'Change in mode' aspect of RCU. Can be thought of as a lazy barrier.
}
}
@Conference{McKenney01a
,Author="Paul E. McKenney and Jonathan Appavoo and Andi Kleen and
Orran Krieger and Rusty Russell and Dipankar Sarma and Maneesh Soni"
,Title="Read-Copy Update"
,Booktitle="{Ottawa Linux Symposium}"
,Month="July"
,Year="2001"
,note="Available:
\url{http://www.linuxsymposium.org/2001/abstracts/readcopy.php}
\url{http://www.rdrop.com/users/paulmck/RCU/rclock_OLS.2001.05.01c.pdf}
[Viewed June 23, 2004]"
,annotation={
Described RCU, and presented some patches implementing and using
it in the Linux kernel.
}
}
@unpublished{McKenney01f
,Author="Paul E. McKenney"
,Title="{RFC:} patch to allow lock-free traversal of lists with insertion"
,month="October"
,year="2001"
,note="Available:
\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=100259266316456&w=2}
[Viewed June 23, 2004]"
,annotation="
Memory-barrier and Alpha thread. 100 messages, not too bad...
"
}
@unpublished{Spraul01
,Author="Manfred Spraul"
,Title="Re: {RFC:} patch to allow lock-free traversal of lists with insertion"
,month="October"
,year="2001"
,note="Available:
\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=100264675012867&w=2}
[Viewed June 23, 2004]"
,annotation="
Suggested burying memory barriers in Linux's list-manipulation
primitives.
"
}
@unpublished{LinusTorvalds2001a
,Author="Linus Torvalds"
,Title="{Re:} {[Lse-tech]} {Re:} {RFC:} patch to allow lock-free traversal of lists with insertion"
,month="October"
,year="2001"
,note="Available:
\url{http://lkml.org/lkml/2001/10/13/105}
[Viewed August 21, 2004]"
}
@unpublished{Blanchard02a
,Author="Anton Blanchard"
,Title="some RCU dcache and ratcache results"
,month="March"
,year="2002"
,note="Available:
\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=101637107412972&w=2}
[Viewed October 18, 2004]"
}
@Conference{Linder02a
,Author="Hanna Linder and Dipankar Sarma and Maneesh Soni"
,Title="Scalability of the Directory Entry Cache"
,Booktitle="{Ottawa Linux Symposium}"
,Month="June"
,Year="2002"
,pages="289-300"
,annotation="
Measured scalability of Linux 2.4 kernel's directory-entry cache
(dcache), and measured some scalability enhancements.
"
}
@Conference{McKenney02a
,Author="Paul E. McKenney and Dipankar Sarma and
Andrea Arcangeli and Andi Kleen and Orran Krieger and Rusty Russell"
,Title="Read-Copy Update"
,Booktitle="{Ottawa Linux Symposium}"
,Month="June"
,Year="2002"
,pages="338-367"
,note="Available:
\url{http://www.linux.org.uk/~ajh/ols2002_proceedings.pdf.gz}
[Viewed June 23, 2004]"
,annotation="
Presented and compared a number of RCU implementations for the
Linux kernel.
"
}
@unpublished{Sarma02a
,Author="Dipankar Sarma"
,Title="specweb99: dcache scalability results"
,month="July"
,year="2002"
,note="Available:
\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=102645767914212&w=2}
[Viewed June 23, 2004]"
,annotation="
Compare fastwalk and RCU for dcache. RCU won.
"
}
@unpublished{Barbieri02
,Author="Luca Barbieri"
,Title="Re: {[PATCH]} Initial support for struct {vfs\_cred}"
,month="August"
,year="2002"
,note="Available:
\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=103082050621241&w=2}
[Viewed: June 23, 2004]"
,annotation="
Suggested RCU for vfs\_shared\_cred.
"
}
@unpublished{Dickins02a
,author="Hugh Dickins"
,title="Use RCU for System-V IPC"
,year="2002"
,month="October"
,note="private communication"
}
@unpublished{Sarma02b
,Author="Dipankar Sarma"
,Title="Some dcache\_rcu benchmark numbers"
,month="October"
,year="2002"
,note="Available:
\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=103462075416638&w=2}
[Viewed June 23, 2004]"
,annotation="
Performance of dcache RCU on kernbench for 16x NUMA-Q and 1x,
2x, and 4x systems. RCU does no harm, and helps on 16x.
"
}
@unpublished{LinusTorvalds2003a
,Author="Linus Torvalds"
,Title="Re: {[PATCH]} small fixes in brlock.h"
,month="March"
,year="2003"
,note="Available:
\url{http://lkml.org/lkml/2003/3/9/205}
[Viewed March 13, 2006]"
,annotation="
Linus suggests replacing brlock with RCU and/or seqlocks:
.
'It's entirely possible that the current user could be replaced
by RCU and/or seqlocks, and we could get rid of brlocks entirely.'
.
Steve Hemminger responds by replacing them with RCU.
"
}
@article{Appavoo03a
,author="J. Appavoo and K. Hui and C. A. N. Soules and R. W. Wisniewski and
D. M. {Da Silva} and O. Krieger and M. A. Auslander and D. J. Edelsohn and
B. Gamsa and G. R. Ganger and P. McKenney and M. Ostrowski and
B. Rosenburg and M. Stumm and J. Xenidis"
,title="Enabling Autonomic Behavior in Systems Software With Hot Swapping"
,Year="2003"
,Month="January"
,journal="IBM Systems Journal"
,volume="42"
,number="1"
,pages="60-76"
,annotation="
Use of RCU to enable hot-swapping for autonomic behavior in K42.
"
}
@unpublished{Seigh03
,author="Joseph W. {Seigh II}"
,title="Read Copy Update"
,Year="2003"
,Month="March"
,note="email correspondence"
,annotation="
Described the relationship of the VM/XA passive serialization to RCU.
"
}
@Conference{Arcangeli03
,Author="Andrea Arcangeli and Mingming Cao and Paul E. McKenney and
Dipankar Sarma"
,Title="Using Read-Copy Update Techniques for {System V IPC} in the
{Linux} 2.5 Kernel"
,Booktitle="Proceedings of the 2003 USENIX Annual Technical Conference
(FREENIX Track)"
,Publisher="USENIX Association"
,year="2003"
,month="June"
,pages="297-310"
,note="Available:
\url{http://www.rdrop.com/users/paulmck/RCU/rcu.FREENIX.2003.06.14.pdf}
[Viewed November 21, 2007]"
,annotation="
Compared updated RCU implementations for the Linux kernel, and
described System V IPC use of RCU, including order-of-magnitude
performance improvements.
"
}
@Conference{Soules03a
,Author="Craig A. N. Soules and Jonathan Appavoo and Kevin Hui and
Dilma {Da Silva} and Gregory R. Ganger and Orran Krieger and
Michael Stumm and Robert W. Wisniewski and Marc Auslander and
Michal Ostrowski and Bryan Rosenburg and Jimi Xenidis"
,Title="System Support for Online Reconfiguration"
,Booktitle="Proceedings of the 2003 USENIX Annual Technical Conference"
,Publisher="USENIX Association"
,year="2003"
,month="June"
,pages="141-154"
}
@article{McKenney03a
,author="Paul E. McKenney"
,title="Using {RCU} in the {Linux} 2.5 Kernel"
,Year="2003"
,Month="October"
,journal="Linux Journal"
,volume="1"
,number="114"
,pages="18-26"
,note="Available:
\url{http://www.linuxjournal.com/article/6993}
[Viewed November 14, 2007]"
,annotation="
Reader-friendly intro to RCU, with the infamous old-man-and-brat
cartoon.
"
}
@unpublished{Sarma03a
,Author="Dipankar Sarma"
,Title="RCU low latency patches"
,month="December"
,year="2003"
,note="Message ID: [email protected]"
,annotation="dipankar/ct.2004.03.27/RCUll.2003.12.22.patch"
}
@techreport{Friedberg03a
,author="Stuart A. Friedberg"
,title="Lock-Free Wild Card Search Data Structure and Method"
,institution="US Patent and Trademark Office"
,address="Washington, DC"
,year="2003"
,number="US Patent 6,662,184"
,month="December"
,pages="112"
,annotation="
Applies RCU to a wildcard-search Patricia tree in order to permit
synchronization-free lookup. RCU is used to retain removed nodes
for a grace period before freeing them.
"
}
@article{McKenney04a
,author="Paul E. McKenney and Dipankar Sarma and Maneesh Soni"
,title="Scaling dcache with {RCU}"
,Year="2004"
,Month="January"
,journal="Linux Journal"
,volume="1"
,number="118"
,pages="38-46"
,note="Available:
\url{http://www.linuxjournal.com/node/7124}
[Viewed December 26, 2010]"
,annotation="
Reader friendly intro to dcache and RCU.
"
}
@Conference{McKenney04b
,Author="Paul E. McKenney"
,Title="{RCU} vs. Locking Performance on Different {CPUs}"
,Booktitle="{linux.conf.au}"
,Month="January"
,Year="2004"
,Address="Adelaide, Australia"
,note="Available:
\url{http://www.linux.org.au/conf/2004/abstracts.html#90}
\url{http://www.rdrop.com/users/paulmck/RCU/lockperf.2004.01.17a.pdf}
[Viewed June 23, 2004]"
,annotation="
Compares performance of RCU to that of other locking primitives
over a number of CPUs (x86, Opteron, Itanium, and PPC).
"
}
@unpublished{Sarma04a
,Author="Dipankar Sarma"
,Title="{[PATCH]} {RCU} for low latency (experimental)"
,month="March"
,year="2004"
,note="\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=108003746402892&w=2}"
,annotation="Head of thread: dipankar/2004.03.23/rcu-low-lat.1.patch"
}
@unpublished{Sarma04b
,Author="Dipankar Sarma"
,Title="Re: {[PATCH]} {RCU} for low latency (experimental)"
,month="March"
,year="2004"
,note="\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=108016474829546&w=2}"
,annotation="dipankar/rcuth.2004.03.24/rcu-throttle.patch"
}
@unpublished{Spraul04a
,Author="Manfred Spraul"
,Title="[RFC] 0/5 rcu lock update"
,month="May"
,year="2004"
,note="Available:
\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=108546407726602&w=2}
[Viewed June 23, 2004]"
,annotation="
Hierarchical-bitmap patch for RCU infrastructure.
"
}
@unpublished{Steiner04a
,Author="Jack Steiner"
,Title="Re: [Lse-tech] [RFC, PATCH] 1/5 rcu lock update:
Add per-cpu batch counter"
,month="May"
,year="2004"
,note="Available:
\url{http://marc.theaimsgroup.com/?l=linux-kernel&m=108551764515332&w=2}
[Viewed June 23, 2004]"
,annotation={
RCU runs reasonably on a 512-CPU SGI using Manfred Spraul's patches,
which may be found at:
https://lkml.org/lkml/2004/5/20/49 (split vars into cachelines)
https://lkml.org/lkml/2004/5/22/114 (cpu_quiet() patch)
https://lkml.org/lkml/2004/5/25/24 (0/5)
https://lkml.org/lkml/2004/5/25/23 (1/5)
https://lkml.org/lkml/2004/5/25/265 (works for Jack)
https://lkml.org/lkml/2004/5/25/20 (2/5)
https://lkml.org/lkml/2004/5/25/22 (3/5)
https://lkml.org/lkml/2004/5/25/19 (4/5)
https://lkml.org/lkml/2004/5/25/21 (5/5)
}
}
@Conference{Sarma04c
,Author="Dipankar Sarma and Paul E. McKenney"
,Title="Making {RCU} Safe for Deep Sub-Millisecond Response
Realtime Applications"
,Booktitle="Proceedings of the 2004 USENIX Annual Technical Conference
(FREENIX Track)"
,Publisher="USENIX Association"
,year="2004"
,month="June"
,pages="182-191"
,annotation="
Describes and compares a number of modifications to the Linux RCU
implementation that make it friendly to realtime applications.
"
}
@phdthesis{PaulEdwardMcKenneyPhD
,author="Paul E. McKenney"
,title="Exploiting Deferred Destruction:
An Analysis of Read-Copy-Update Techniques
in Operating System Kernels"
,school="OGI School of Science and Engineering at
Oregon Health and Sciences University"
,year="2004"
,note="Available:
\url{http://www.rdrop.com/users/paulmck/RCU/RCUdissertation.2004.07.14e1.pdf}
[Viewed October 15, 2004]"
,annotation="
Describes RCU implementations and presents design patterns
corresponding to common uses of RCU in several operating-system
kernels.
"
}
@unpublished{PaulEMcKenney2004rcu:dereference
,Author="Dipankar Sarma"
,Title="{Re: RCU : Abstracted RCU dereferencing [5/5]}"
,month="August"
,year="2004"
,note="Available:
\url{http://lkml.org/lkml/2004/8/6/237}
[Viewed June 8, 2010]"
,annotation="
Introduce rcu_dereference().
"
}
@unpublished{JimHouston04a
,Author="Jim Houston"
,Title="{[RFC\&PATCH] Alternative {RCU} implementation}"
,month="August"
,year="2004"
,note="Available:
\url{http://lkml.org/lkml/2004/8/30/87}
[Viewed February 17, 2005]"
,annotation="
Uses active code in rcu_read_lock() and rcu_read_unlock() to
make RCU happen, allowing RCU to function on CPUs that do not