Lesson_Materials/Lesson-4-Managing-Data-in-SYCL-Applications/index.html

<!DOCTYPE html>
<html>
	<head>
	    <meta charset="utf-8">
		<link rel="stylesheet" href="../common-revealjs/css/reveal.css">
		<link rel="stylesheet" href="../common-revealjs/css/theme/white.css">
		<link rel="stylesheet" href="../common-revealjs/css/custom.css">
		
	</head>
	<body>
		<span class='overlay-image bottom-left'>SYCL and the SYCL logo are trademarks of the Khronos Group Inc.</span>
		<span class='overlay-image bottom-right'><img src="../common-revealjs/images/codeplay.png" alt="Codeplay Software"></span>
		<span class='overlay-image top-left'><img src="../common-revealjs/images/sycl_academy.png" alt="SYCL Academy"></span>
		<span class='overlay-image top-right'><img src="../common-revealjs/images/sycl_logo.png" alt="SYCL"></span>
		<div class="reveal">
			<div class="slides">
				<!--Slide 1-->
				<section class="hbox" data-markdown>
					## Managing Data in SYCL Applications
				</section>
				<!--Slide 2-->
				<section class="hbox" data-markdown>
					## Learning Objectives

					* Understand the SYCL memory model
					* Learn how to use SYCL buffers and accessors
					* Understand memory access in different address spaces
					* Learn about different ways to access the data
					* Learn about execution ordering using data dependencies
					* Learn about how SYCL synchronizes data
				</section>
				<!--Slide 3-->
				<section class="hbox" data-markdown>
				    ## The SYCL Memory Model
				</section>
				<!--Slide 4-->
				<section>
					<div class="container">
						<div class="col" data-markdown>
							* Each work-item can access a dedicated region of **private memory**
							* A work-item cannot access the private memory of another work-item
						</div>
						<div class="col" data-markdown>
							![Private Memory](../common-revealjs/images/workitem-privatememory.png "Private Memory")
						</div>

					</div>
				</section>
				<!--Slide 5-->
				<section>
					<div class="container">
						<div class="col-left-3" data-markdown>
							![Local Memory](../common-revealjs/images/workitem-localmemory.png "Local Memory")
						</div>
						<div class="col-right-2" data-markdown>
							* Each work-item can access a dedicated region of **local memory** accessible to all work-items in a work-group
							* A work-item cannot access the local memory of another workgroup
						</div>
					</div>
				</section>
				<!--Slide 6-->
				<section>
					<div class="container">
						<div class="col-left-3" data-markdown>
							![Constant Memory](../common-revealjs/images/workitem-constantmemory.png "Constant Memory")
						</div>
						<div class="col-right-2" data-markdown>
							* Each work-item can access a single region of **global memory** that's accessible to all work-items in a ND-range
							* Each work-item can also access a region of global memory reserved as **constant memory**, which is read-only
						</div>

					</div>
				</section>
				<!--Slide 7-->
				<section>
					<div class="container">
						<div class="col" data-markdown>
							* Each memory region has a different size and access latency 
							* Global / constant memory is larger than local memory and local memory is larger than private memory
							* Private memory is faster than local memory and local memory is faster than global / constant memory
						</div>
						<div class="col" data-markdown>
							![Memory Regions](../common-revealjs/images/memory-regions.png "Memory Regions")
						</div>
					</div>
				</section>
				<!--Slide 8-->
				<section>
					<div class="hbox" data-markdown>
						## SYCL Buffers & Accessors
					</div>
				</section>
				<!--Slide 9-->
				<section>
					<div class="hbox" data-markdown>
						* SYCL separates the storage and access of data 
						  * A SYCL buffer manages data across the host and any number of devices 
						  * A SYCL accessor requests access to data on the host or on a device for a specific SYCL kernel function
						* Accessors are also used to access data within a SYCL kernel function
						  * This means they are declared in the host code but captured by and then accessed within a SYCL kernel function
					</div>
				</section>
				<!--Slide 10-->
				<section>
					<div class="container">
						<div class="col-left-3">
							<code><pre>
buffer&ltfloat, 1&gt bufA(dA.data(), range&lt1&gt(dA.size())); 
buffer&ltfloat, 1&gt bufB(dB.data(), range&lt1&gt(dB.size())); 
buffer&ltfloat, 1&gt bufO(dO.data(), range&lt1&gt(dO.size()));

gpuQueue.submit([&](handler &cgh){
  <mark>auto inA = bufA.get_access&ltaccess::mode::read&gt(cgh);</mark>
  <mark>auto inB = bufB.get_access&ltaccess::mode::read&gt(cgh);</mark>
  <mark>auto out = bufO.get_access&ltaccess::mode::write&gt(cgh);</mark>
  cgh.parallel_for&ltadd&gt(range&lt1&gt(dA.size()), 
    [=](id&lt1&gt i){ out[i] = inA[i] + inB[i]; }); 
});
							</code></pre>
						</div>
						<div class="col-right-2" data-markdown>
							* In order to achieve this the host compiler and the SYCL device compiler interpret accessors differently
							* The host compiler interprets accessors as host objects which instruct the SYCL runtime of data dependencies for a SYCL kernel function
							* The SYCL device compiler interprets accessors as a wrapper on the argument to the SYCL kernel function (a pointer to device memory) that is used to access data on the device
						</div>
					</div>
				</section>
				<!--Slide 11-->
				<section>
					<div class="container">
						<div class="col" data-markdown>
							* A SYCL buffer can be constructed with a pointer to host memory
							* For the lifetime of the buffer this memory is owned by the SYCL runtime
							* When a buffer object is constructed it will not allocate or copy to device memory at first
							* This will only happen once the SYCL runtime knows the data needs to be accessed and where it needs to be accessed
						</div>
						<div class="col" data-markdown>
							![Buffer Host Memory](../common-revealjs/images/buffer-hostmemory.png "Buffer Host Memory")
						</div>
					</div>
				</section>
				<!--Slide 12-->
				<section>
					<div class="container">
						<div class="col" data-markdown>
							* Constructing an accessor specifies a request to access the data managed by the buffer
							* There are a range of different types of accessor which provide different ways to access data
						</div>
						<div class="col" data-markdown>
							![Buffer Host Memory Accessor](../common-revealjs/images/buffer-hostmemory-accessor.png "Buffer Host Memory Accessor")
						</div>
					</div>
				</section>
				<!--Slide 13-->
				<section>
					<div class="container">
						<div class="col" data-markdown>
							* When an accessor is constructed it is associated with a command group via the handler object
							* This connects the buffer that is being accessed, the way in which it’s being accessed and the device that the command group is being submitted to
						</div>
						<div class="col" data-markdown>
							![Buffer Host Memory Accessor CG](../common-revealjs/images/buffer-hostmemory-accessor-cg.png "Buffer Host Memory Accessor CG")
						</div>
					</div>
				</section>
				<!--Slide 14-->
				<section>
					<div class="container">
						<div class="col" data-markdown>
							* Once the SYCL scheduler selects the command group to be executed it must first satisfy its data dependencies
							* This means allocating and copying data to the device the data is being accessed on if necessary
							* If the most recent copy of the data is already on the device then the runtime will not copy again
						</div>
						<div class="col" data-markdown>
							![Buffer Host Memory Accessor CG Device](../common-revealjs/images/buffer-hostmemory-accessor-cg-device.png "Buffer Host Memory Accessor CG Device")
						</div>
					</div>
				</section>
				<!--Slide 15-->
				<section>
					<div class="container">
						<div class="col" data-markdown>
							* Data will remain in device memory after kernels finish executing until another command group requests access in a different device or on the host
							* When the buffer object is destroyed it will wait for any outstanding work that is accessing the data to complete and then copy back to the original host memory
						</div>
						<div class="col" data-markdown>
							![Buffer Destroyed](../common-revealjs/images/buffer-destroyed.png "Buffer Destroyed")
						</div>
					</div>
				</section>
				<!--Slide 16-->
				<section>
					<div class="container">
						<div class="col" data-markdown>
							* If a buffer is accessed on one device when the latest copy of the data is on another device, the data will be copied
							* If the two devices are of the same context the data can be copied directly
						</div>
						<div class="col" data-markdown>
							![Buffer Copied](../common-revealjs/images/buffer-copied.png "Buffer Copied")
						</div>
					</div>
				</section>
				<!--Slide 17-->
				<section>
					<div class="container">
						<div class="col" data-markdown>
							* If the devices are of different contexts the data must be copied via host memory
							* It's important to consider this as it could incur further overhead when copying between devices
						</div>
						<div class="col" data-markdown>
							![Buffer Copied](../common-revealjs/images/copy-via-host.png "Buffer Copied")
						</div>
					</div>
				</section>
				<!--Slide 18-->
				<section>
					<div class="hbox" data-markdown>
						## Different Kinds of Accessors
					</div>
				</section>
				<!--Slide 19-->
				<section>
					<div class="hbox" data-markdown>
						![Accessor Types](../common-revealjs/images/accessor-types.png "Accessor Types")
					</div>
				</section>
				<!--Slide 20-->
				<section>
					<div class="code-header" data-markdown>```accessor&ltelementT, dimensions, access::mode, access::target, access::placeholder&gt```</div>
					<div class="container">
						<div class="col" data-markdown>
							* Accessors using the access::target **global_buffer** and **constant_buffer** will allocate memory on the device in the **global** and **constant** address space respectively
							* Accessors of these access targets must be constructed using a buffer
							* A useful shortcut to construct an accessor with the access::target **global_buffer** is to use the **get_access** member function of the buffer
							* Element type and dimensionality must be the same
						</div>
						<div class="col" data-markdown>
							![Buffer Accessors](../common-revealjs/images/buffer-accessors.png "Buffer Accessors")
						</div>
					</div>
				</section>
				<!--Slide 21-->
				<section>
					<div class="code-header" data-markdown>```accessor&ltelementT, dimensions, access::mode, access::target, access::placeholder&gt```</div>
					<div class="container">
						<div class="col" data-markdown>
							* Accessors using the access::target **host_buffer** will make the data available immediately on the host
							* This access targets must be constructed using a buffer outside of a command group
							* A useful shortcut to construct an accessor with the access::target **global_buffer** is to use the **get_access** member function of the buffer
							* Note that a host accessor will block other accessors until it’s destroyed
						</div>
						<div class="col" data-markdown>
							![Buffer Accessors](../common-revealjs/images/buffer-accessors-get-access.png "Buffer Accessors")
						</div>
					</div>
				</section>
				<!--Slide 22-->
				<section>
					<div class="code-header" data-markdown>```accessor&ltelementT, dimensions, access::mode, access::target, access::placeholder&gt```</div>
					<div class="container">
						<div class="col" data-markdown>
							* Accessors using the access::target **local** will allocate memory in the **local** address space per work-group
							* This access targets does not require a buffer to be constructed as it’s temporary memory allocation for the duration of a SYCL kernel function invocation
						</div>
						<div class="col" data-markdown>
							![Buffer Accessors](../common-revealjs/images/buffer-accessors-local.png "Buffer Accessors")
						</div>
					</div>
				</section>
				<!--Slide 23-->
				<section>
					<div class="container">
						<div class="col" data-markdown>
							* A **read** accessor instructs the SYCL runtime that the SYCL kernel function will read the data – cannot be written to within a SYCL kernel function
							* A **write** accessor instructs the SYCL runtime that the SYCL kernel function will modify the data – creating a dependency for future command groups
							* A **discard_*** accessor instructs the SYCL runtime that the SYCL kernel function does not need the initial values of the data – removing the dependency on previous command groups
						</div>
						<div class="col" data-markdown>
							![Buffer Accessors](../common-revealjs/images/accessors.png "Buffer Accessors")
						</div>
					</div>
				</section>
				<!--Slide 24-->
				<section>
					<div class="hbox" data-markdown>
						## Accessor Resolution
					</div>
				</section>
				<!--Slide 25-->
				<section>
					<div class="hbox" data-markdown>
						* If a command group has accessors to the same buffer with conflicting access modes they are resolved into one 
						  * read & write => ```read_write```
						  * ```read_write``` & ```discard_write``` => ```read_write```
						* Within the SYCL kernel function there are still multiple accessors, but they alias to the same memory address
					</div>
				</section>
				<!--Slide 26-->
				<section>
					<div class="container">
						<div class="col-left-3">
							<code><pre>
buffer&ltfloat, 1&gt bufI(dA.data(), range&lt1&gt(dA.size())); 
buffer&ltfloat, 1&gt bufO(dO.data(), range&lt1&gt(dO.size()));

gpuQueue.submit([&](handler &cgh){
  auto inA = bufI.get_access&lt<mark>access::mode::read</mark>&gt(cgh);
  auto inB = bufI.get_access&lt<mark>access::mode::write</mark>&gt(cgh);
  auto out = bufO.get_access&ltaccess::mode::write&gt(cgh);
  cgh.parallel_for&ltadd&gt(range&lt1&gt(dA.size()), 
    [=](id&lt1&gt i){ out[i] = inA[i] + inB[i]; }); 
});
							</code></pre>
						</div>
						<div class="col-right-2" data-markdown>
							* Here **inA** and **inB** both point to bufI but one is access::mode::read and one is access::mode::write
							* So the SYCL runtime will treat them both as access::mode::read_write
							* Both will point to a single allocation of global memory on the devices
							* The runtime will resolve the data dependency into access::mode::read_write
						</div>
					</div>
				</section>
				<!--Slide 27-->
				<section>
					<div class="hbox" data-markdown>
						## Using Data With Accessors
					</div>
				</section>
				<!--Slide 28-->
				<section>
					<div class="container" data-markdown>
					* There are a few different ways to access the data represented by an accessor
					  *  The subscript operator can take an **id**
					    * Must be the same dimensionality of the accessor
					    * For dimensions > 1, linear address is calculated in row major 
					* Nested subscript operators can be called for each dimension taking a **size_t**
					  * E.g. a 3-dimensional accessor: acc[x][y][z] = …
					* A pointer to memory can be retrieved by calling **get_pointer**
					  * This returns a **multi_ptr**, which is a wrapper class for pointers to the memory in the relevant memory space
					</div>
				</section>
				<!--Slide 29-->
				<section>
					<div class="container">
						<div class="col-left-3">
							<code><pre>
buffer&ltfloat, 1&gt bufA(dA.data(), range&lt1&gt(dA.size())); 
buffer&ltfloat, 1&gt bufB(dB.data(), range&lt1&gt(dB.size())); 
buffer&ltfloat, 1&gt bufO(dO.data(), range&lt1&gt(dO.size()));

gpuQueue.submit([&](handler &cgh){
  auto inA = bufA.get_access&ltaccess::mode::read&gt(cgh);
  auto inB = bufB.get_access&ltaccess::mode::read&gt(cgh);
  auto out = bufO.get_access&ltaccess::mode::write&gt(cgh);
  cgh.parallel_for&ltadd&gt(range&lt1&gt(dA.size()), 
    [=](id&lt1&gt i){ 
    <mark>out[i] = inA[i] + inB[i];</mark>
  });
});
							</code></pre>
						</div>
						<div class="col-right-2" data-markdown>
							* Here we access the data of the accessor by passing in the id object passed to the SYCL kernel function
						</div>
					</div>
				</section>
				<!--Slide 30-->
				<section>
					<div class="container">
						<div class="col-left-3">
							<code><pre>
buffer&ltfloat, 1&gt bufA(dA.data(), range&lt1&gt(dA.size())); 
buffer&ltfloat, 1&gt bufB(dB.data(), range&lt1&gt(dB.size())); 
buffer&ltfloat, 1&gt bufO(dO.data(), range&lt1&gt(dO.size()));

gpuQueue.submit([&](handler &cgh){
  auto inA = bufA.get_access&ltaccess::mode::read&gt(cgh);
  auto inB = bufB.get_access&ltaccess::mode::read&gt(cgh);
  auto out = bufO.get_access&ltaccess::mode::write&gt(cgh);
  cgh.parallel_for&ltadd&gt(rng, [=](id&lt3&gt i){
    <mark>auto ptrA = inA.get_pointer();</mark>
    <mark>auto ptrB = inB.get_pointer();</mark>
    <mark>auto ptrO = out.get_pointer();</mark>
    <mark>auto linearId = i.get_linear_id();</mark>

    <mark>ptrA[linearId] = ptrB[linearId] + ptrO[linearId]; </mark>
  });
});
							</code></pre>
						</div>
						<div class="col-right-2" data-markdown>
							* Here we retrieve the underlying pointer for each of the accessors
							* We then access the pointer using the linearized id by calling the **get_linear_id** member function on the **item** class
							* Again this linearization is calculated in row major
						</div>
					</div>
				</section>
				<!--Slide 31-->
				<section>
					<div class="hbox" data-markdown>
						## Questions
					</div>
				</section>
			</div>
		</div>
		<script src="../common-revealjs/js/reveal.js"></script>
		<script src="../common-revealjs/plugin/markdown/marked.js"></script>
		<script src="../common-revealjs/plugin/markdown/markdown.js"></script>
		<script src="../common-revealjs/plugin/notes/notes.js"></script>
		<script>
			Reveal.initialize();
			Reveal.configure({ slideNumber: true });
		</script>
	</body>
</html>