forked from codeplaysoftware/syclacademy
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
415 lines (406 loc) · 17.7 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<link rel="stylesheet" href="../common-revealjs/css/reveal.css">
<link rel="stylesheet" href="../common-revealjs/css/theme/white.css">
<link rel="stylesheet" href="../common-revealjs/css/custom.css">
</head>
<body>
<span class='overlay-image bottom-left'>SYCL and the SYCL logo are trademarks of the Khronos Group Inc.</span>
<span class='overlay-image bottom-right'><img src="../common-revealjs/images/codeplay.png" alt="Codeplay Software"></span>
<span class='overlay-image top-left'><img src="../common-revealjs/images/sycl_academy.png" alt="SYCL Academy"></span>
<span class='overlay-image top-right'><img src="../common-revealjs/images/sycl_logo.png" alt="SYCL"></span>
<div class="reveal">
<div class="slides">
<!--Slide 1-->
<section class="hbox" data-markdown>
## Managing Data in SYCL Applications
</section>
<!--Slide 2-->
<section class="hbox" data-markdown>
## Learning Objectives
* Understand the SYCL memory model
* Learn how to use SYCL buffers and accessors
* Understand memory access in different address spaces
* Learn about different ways to access the data
* Learn about execution ordering using data dependencies
* Learn about how SYCL synchronizes data
</section>
<!--Slide 3-->
<section class="hbox" data-markdown>
## The SYCL Memory Model
</section>
<!--Slide 4-->
<section>
<div class="container">
<div class="col" data-markdown>
* Each work-item can access a dedicated region of **private memory**
* A work-item cannot access the private memory of another work-item
</div>
<div class="col" data-markdown>
![Private Memory](../common-revealjs/images/workitem-privatememory.png "Private Memory")
</div>
</div>
</section>
<!--Slide 5-->
<section>
<div class="container">
<div class="col-left-3" data-markdown>
![Local Memory](../common-revealjs/images/workitem-localmemory.png "Local Memory")
</div>
<div class="col-right-2" data-markdown>
* Each work-item can access a dedicated region of **local memory** accessible to all work-items in a work-group
* A work-item cannot access the local memory of another workgroup
</div>
</div>
</section>
<!--Slide 6-->
<section>
<div class="container">
<div class="col-left-3" data-markdown>
![Constant Memory](../common-revealjs/images/workitem-constantmemory.png "Constant Memory")
</div>
<div class="col-right-2" data-markdown>
* Each work-item can access a single region of **global memory** that's accessible to all work-items in a ND-range
* Each work-item can also access a region of global memory reserved as **constant memory**, which is read-only
</div>
</div>
</section>
<!--Slide 7-->
<section>
<div class="container">
<div class="col" data-markdown>
* Each memory region has a different size and access latency
* Global / constant memory is larger than local memory and local memory is larger than private memory
* Private memory is faster than local memory and local memory is faster than global / constant memory
</div>
<div class="col" data-markdown>
![Memory Regions](../common-revealjs/images/memory-regions.png "Memory Regions")
</div>
</div>
</section>
<!--Slide 8-->
<section>
<div class="hbox" data-markdown>
## SYCL Buffers & Accessors
</div>
</section>
<!--Slide 9-->
<section>
<div class="hbox" data-markdown>
* SYCL separates the storage and access of data
* A SYCL buffer manages data across the host and any number of devices
* A SYCL accessor requests access to data on the host or on a device for a specific SYCL kernel function
* Accessors are also used to access data within a SYCL kernel function
* This means they are declared in the host code but captured by and then accessed within a SYCL kernel function
</div>
</section>
<!--Slide 10-->
<section>
<div class="container">
<div class="col-left-3">
<code><pre>
buffer<float, 1> bufA(dA.data(), range<1>(dA.size()));
buffer<float, 1> bufB(dB.data(), range<1>(dB.size()));
buffer<float, 1> bufO(dO.data(), range<1>(dO.size()));
gpuQueue.submit([&](handler &cgh){
<mark>auto inA = bufA.get_access<access::mode::read>(cgh);</mark>
<mark>auto inB = bufB.get_access<access::mode::read>(cgh);</mark>
<mark>auto out = bufO.get_access<access::mode::write>(cgh);</mark>
cgh.parallel_for<add>(range<1>(dA.size()),
[=](id<1> i){ out[i] = inA[i] + inB[i]; });
});
</code></pre>
</div>
<div class="col-right-2" data-markdown>
* In order to achieve this the host compiler and the SYCL device compiler interpret accessors differently
* The host compiler interprets accessors as host objects which instruct the SYCL runtime of data dependencies for a SYCL kernel function
* The SYCL device compiler interprets accessors as a wrapper on the argument to the SYCL kernel function (a pointer to device memory) that is used to access data on the device
</div>
</div>
</section>
<!--Slide 11-->
<section>
<div class="container">
<div class="col" data-markdown>
* A SYCL buffer can be constructed with a pointer to host memory
* For the lifetime of the buffer this memory is owned by the SYCL runtime
* When a buffer object is constructed it will not allocate or copy to device memory at first
* This will only happen once the SYCL runtime knows the data needs to be accessed and where it needs to be accessed
</div>
<div class="col" data-markdown>
![Buffer Host Memory](../common-revealjs/images/buffer-hostmemory.png "Buffer Host Memory")
</div>
</div>
</section>
<!--Slide 12-->
<section>
<div class="container">
<div class="col" data-markdown>
* Constructing an accessor specifies a request to access the data managed by the buffer
* There are a range of different types of accessor which provide different ways to access data
</div>
<div class="col" data-markdown>
![Buffer Host Memory Accessor](../common-revealjs/images/buffer-hostmemory-accessor.png "Buffer Host Memory Accessor")
</div>
</div>
</section>
<!--Slide 13-->
<section>
<div class="container">
<div class="col" data-markdown>
* When an accessor is constructed it is associated with a command group via the handler object
* This connects the buffer that is being accessed, the way in which it’s being accessed and the device that the command group is being submitted to
</div>
<div class="col" data-markdown>
![Buffer Host Memory Accessor CG](../common-revealjs/images/buffer-hostmemory-accessor-cg.png "Buffer Host Memory Accessor CG")
</div>
</div>
</section>
<!--Slide 14-->
<section>
<div class="container">
<div class="col" data-markdown>
* Once the SYCL scheduler selects the command group to be executed it must first satisfy its data dependencies
* This means allocating and copying data to the device the data is being accessed on if necessary
* If the most recent copy of the data is already on the device then the runtime will not copy again
</div>
<div class="col" data-markdown>
![Buffer Host Memory Accessor CG Device](../common-revealjs/images/buffer-hostmemory-accessor-cg-device.png "Buffer Host Memory Accessor CG Device")
</div>
</div>
</section>
<!--Slide 15-->
<section>
<div class="container">
<div class="col" data-markdown>
* Data will remain in device memory after kernels finish executing until another command group requests access in a different device or on the host
* When the buffer object is destroyed it will wait for any outstanding work that is accessing the data to complete and then copy back to the original host memory
</div>
<div class="col" data-markdown>
![Buffer Destroyed](../common-revealjs/images/buffer-destroyed.png "Buffer Destroyed")
</div>
</div>
</section>
<!--Slide 16-->
<section>
<div class="container">
<div class="col" data-markdown>
* If a buffer is accessed on one device when the latest copy of the data is on another device, the data will be copied
* If the two devices are of the same context the data can be copied directly
</div>
<div class="col" data-markdown>
![Buffer Copied](../common-revealjs/images/buffer-copied.png "Buffer Copied")
</div>
</div>
</section>
<!--Slide 17-->
<section>
<div class="container">
<div class="col" data-markdown>
* If the devices are of different contexts the data must be copied via host memory
* It's important to consider this as it could incur further overhead when copying between devices
</div>
<div class="col" data-markdown>
![Buffer Copied](../common-revealjs/images/copy-via-host.png "Buffer Copied")
</div>
</div>
</section>
<!--Slide 18-->
<section>
<div class="hbox" data-markdown>
## Different Kinds of Accessors
</div>
</section>
<!--Slide 19-->
<section>
<div class="hbox" data-markdown>
![Accessor Types](../common-revealjs/images/accessor-types.png "Accessor Types")
</div>
</section>
<!--Slide 20-->
<section>
<div class="code-header" data-markdown>```accessor<elementT, dimensions, access::mode, access::target, access::placeholder>```</div>
<div class="container">
<div class="col" data-markdown>
* Accessors using the access::target **global_buffer** and **constant_buffer** will allocate memory on the device in the **global** and **constant** address space respectively
* Accessors of these access targets must be constructed using a buffer
* A useful shortcut to construct an accessor with the access::target **global_buffer** is to use the **get_access** member function of the buffer
* Element type and dimensionality must be the same
</div>
<div class="col" data-markdown>
![Buffer Accessors](../common-revealjs/images/buffer-accessors.png "Buffer Accessors")
</div>
</div>
</section>
<!--Slide 21-->
<section>
<div class="code-header" data-markdown>```accessor<elementT, dimensions, access::mode, access::target, access::placeholder>```</div>
<div class="container">
<div class="col" data-markdown>
* Accessors using the access::target **host_buffer** will make the data available immediately on the host
* This access targets must be constructed using a buffer outside of a command group
* A useful shortcut to construct an accessor with the access::target **global_buffer** is to use the **get_access** member function of the buffer
* Note that a host accessor will block other accessors until it’s destroyed
</div>
<div class="col" data-markdown>
![Buffer Accessors](../common-revealjs/images/buffer-accessors-get-access.png "Buffer Accessors")
</div>
</div>
</section>
<!--Slide 22-->
<section>
<div class="code-header" data-markdown>```accessor<elementT, dimensions, access::mode, access::target, access::placeholder>```</div>
<div class="container">
<div class="col" data-markdown>
* Accessors using the access::target **local** will allocate memory in the **local** address space per work-group
* This access targets does not require a buffer to be constructed as it’s temporary memory allocation for the duration of a SYCL kernel function invocation
</div>
<div class="col" data-markdown>
![Buffer Accessors](../common-revealjs/images/buffer-accessors-local.png "Buffer Accessors")
</div>
</div>
</section>
<!--Slide 23-->
<section>
<div class="container">
<div class="col" data-markdown>
* A **read** accessor instructs the SYCL runtime that the SYCL kernel function will read the data – cannot be written to within a SYCL kernel function
* A **write** accessor instructs the SYCL runtime that the SYCL kernel function will modify the data – creating a dependency for future command groups
* A **discard_*** accessor instructs the SYCL runtime that the SYCL kernel function does not need the initial values of the data – removing the dependency on previous command groups
</div>
<div class="col" data-markdown>
![Buffer Accessors](../common-revealjs/images/accessors.png "Buffer Accessors")
</div>
</div>
</section>
<!--Slide 24-->
<section>
<div class="hbox" data-markdown>
## Accessor Resolution
</div>
</section>
<!--Slide 25-->
<section>
<div class="hbox" data-markdown>
* If a command group has accessors to the same buffer with conflicting access modes they are resolved into one
* read & write => ```read_write```
* ```read_write``` & ```discard_write``` => ```read_write```
* Within the SYCL kernel function there are still multiple accessors, but they alias to the same memory address
</div>
</section>
<!--Slide 26-->
<section>
<div class="container">
<div class="col-left-3">
<code><pre>
buffer<float, 1> bufI(dA.data(), range<1>(dA.size()));
buffer<float, 1> bufO(dO.data(), range<1>(dO.size()));
gpuQueue.submit([&](handler &cgh){
auto inA = bufI.get_access<<mark>access::mode::read</mark>>(cgh);
auto inB = bufI.get_access<<mark>access::mode::write</mark>>(cgh);
auto out = bufO.get_access<access::mode::write>(cgh);
cgh.parallel_for<add>(range<1>(dA.size()),
[=](id<1> i){ out[i] = inA[i] + inB[i]; });
});
</code></pre>
</div>
<div class="col-right-2" data-markdown>
* Here **inA** and **inB** both point to bufI but one is access::mode::read and one is access::mode::write
* So the SYCL runtime will treat them both as access::mode::read_write
* Both will point to a single allocation of global memory on the devices
* The runtime will resolve the data dependency into access::mode::read_write
</div>
</div>
</section>
<!--Slide 27-->
<section>
<div class="hbox" data-markdown>
## Using Data With Accessors
</div>
</section>
<!--Slide 28-->
<section>
<div class="container" data-markdown>
* There are a few different ways to access the data represented by an accessor
* The subscript operator can take an **id**
* Must be the same dimensionality of the accessor
* For dimensions > 1, linear address is calculated in row major
* Nested subscript operators can be called for each dimension taking a **size_t**
* E.g. a 3-dimensional accessor: acc[x][y][z] = …
* A pointer to memory can be retrieved by calling **get_pointer**
* This returns a **multi_ptr**, which is a wrapper class for pointers to the memory in the relevant memory space
</div>
</section>
<!--Slide 29-->
<section>
<div class="container">
<div class="col-left-3">
<code><pre>
buffer<float, 1> bufA(dA.data(), range<1>(dA.size()));
buffer<float, 1> bufB(dB.data(), range<1>(dB.size()));
buffer<float, 1> bufO(dO.data(), range<1>(dO.size()));
gpuQueue.submit([&](handler &cgh){
auto inA = bufA.get_access<access::mode::read>(cgh);
auto inB = bufB.get_access<access::mode::read>(cgh);
auto out = bufO.get_access<access::mode::write>(cgh);
cgh.parallel_for<add>(range<1>(dA.size()),
[=](id<1> i){
<mark>out[i] = inA[i] + inB[i];</mark>
});
});
</code></pre>
</div>
<div class="col-right-2" data-markdown>
* Here we access the data of the accessor by passing in the id object passed to the SYCL kernel function
</div>
</div>
</section>
<!--Slide 30-->
<section>
<div class="container">
<div class="col-left-3">
<code><pre>
buffer<float, 1> bufA(dA.data(), range<1>(dA.size()));
buffer<float, 1> bufB(dB.data(), range<1>(dB.size()));
buffer<float, 1> bufO(dO.data(), range<1>(dO.size()));
gpuQueue.submit([&](handler &cgh){
auto inA = bufA.get_access<access::mode::read>(cgh);
auto inB = bufB.get_access<access::mode::read>(cgh);
auto out = bufO.get_access<access::mode::write>(cgh);
cgh.parallel_for<add>(rng, [=](id<3> i){
<mark>auto ptrA = inA.get_pointer();</mark>
<mark>auto ptrB = inB.get_pointer();</mark>
<mark>auto ptrO = out.get_pointer();</mark>
<mark>auto linearId = i.get_linear_id();</mark>
<mark>ptrA[linearId] = ptrB[linearId] + ptrO[linearId]; </mark>
});
});
</code></pre>
</div>
<div class="col-right-2" data-markdown>
* Here we retrieve the underlying pointer for each of the accessors
* We then access the pointer using the linearized id by calling the **get_linear_id** member function on the **item** class
* Again this linearization is calculated in row major
</div>
</div>
</section>
<!--Slide 31-->
<section>
<div class="hbox" data-markdown>
## Questions
</div>
</section>
</div>
</div>
<script src="../common-revealjs/js/reveal.js"></script>
<script src="../common-revealjs/plugin/markdown/marked.js"></script>
<script src="../common-revealjs/plugin/markdown/markdown.js"></script>
<script src="../common-revealjs/plugin/notes/notes.js"></script>
<script>
Reveal.initialize();
Reveal.configure({ slideNumber: true });
</script>
</body>
</html>