Skip to content

Commit

Permalink
Add modifications to kernel module.
Browse files Browse the repository at this point in the history
Additional modifications had to be included into the build of nvidia.ko
in order for things to work properly.
  • Loading branch information
DualCoder committed Feb 22, 2021
1 parent 5cfacdf commit 6881c41
Show file tree
Hide file tree
Showing 3 changed files with 1,273 additions and 15 deletions.
167 changes: 152 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,42 +5,179 @@ Unlock vGPU functionality for consumer grade GPUs.

## Important!

This tool is a work in progress. In the current state it does not work.
This tool is very untested, use at your own risk.


## Description

This tool enables the use of Geforce and Quadro GPUs with the NVIDIA vGPU
software. NVIDIA vGPU normally only supports a few Tesla GPUs but since some
Geforce and Quadro GPUs share the same physical chip as the Tesla this is only
a software limitation for those GPUs. This tool works by intercepting the ioctl
syscalls between the userspace nvidia-vgpud and nvidia-vgpu-mgr services and
the kernel driver. Doing this allows the script to alter the identification and
capabilities that the user space services relies on to determine if the GPU is
vGPU capable.
a software limitation for those GPUs. This tool aims to remove this limitation.


## Dependencies:

* This tool requires Python3, the latest version is recommended.
* The python package "frida" is required. `pip3 install frida`.
* The tool requires the NVIDIA GRID vGPU driver to be properly installed for it
to do its job. This special driver is only accessible to NVIDIA enterprise
customers. The script has only been tested with 11.3 for "KVM on Linux" and
may or may not work on other versions.
* The tool requires the NVIDIA GRID vGPU driver.
* "dkms" is highly recommended as it simplifies the process of rebuilding the
driver alot.


## Installation:

The NVIDIA vGPU drivers will create an nvidia-vgpud and nvidia-vgpu-mgr
systemd service. All we have to do is replace the path
/usr/bin/<executable> in /lib/systemd/system/nvidia-vgpud.service and
/lib/systemd/system/nvidia-vgpu-mgr.service with the path to the vgpu\_unlock
script and pass the original executable path as the first argument.
In the following instructions `<path_to_vgpu_unlock>` need to be replaced with
the path to this repository on the target system and `<version>` need to be
replaced with the version of the NVIDIA GRID vGPU driver.

Install the NVIDIA GRID vGPU driver, make sure to install it as a dkms module.
```
./nvidia-installer
```

Modify the line begining with `ExecStart=` in `/lib/systemd/system/nvidia-vgpu.service`
and `/lib/systemd/system/nvidia-vgpu-mgr.service` to use `vgpu_unlock` as
executable and pass the original executable as the first argument. Ex:
```
ExecStart=<path_to_vgpu_unlock>/vgpu_unlock /usr/bin/nvidia-vgpud
```

Reload the systemd daemons:
```
systemctl daemon-reload
```

Modify the file `/usr/src/nvidia-<version>/nvidia/os-interface.c` and add the
following line after the lines begining with `#include` at the start of the
file.
```
#include "<path_to_vgpu_unlock>/vgpu_unlock_hooks.c"
```

Modify the file `/usr/src/nvidia-<version>/nvidia/nvidia.Kbuild` and add the
following line.
```
ldflags-y += -T <path_to_vgpu_unlock>/kern.ld
```

Remove the nvidia kernel module using dkms:
```
dkms remove -m nvidia -v <version> --all
```

Rebuild and reinstall the nvidia kernel module using dkms:
```
dkms install -m nvidia -v <version>
```

Reboot.

---
**NOTE**

This script will only work if there exists a vGPU compatible Tesla GPU that
uses the same physical chip as the actual GPU being used.


---

## How it works

### vGPU supported?

In order to determine if a certain GPU supports the vGPU functionality the
driver looks at the PCI device ID. This identifier together with the PCI vendor
ID is unique for each type of PCI device. In order to enable vGPU support we
need to tell the driver that the PCI device ID of the installed GPU is one of
the device IDs used by a vGPU capable GPU.

### Userspace script: vgpu\_unlock

The userspace services nvidia-vgpud and nvidia-vgpu-mgr uses the ioctl syscall
to communicate with the kernel module. Specifically they read the PCI device ID
and determines if the installed GPU is vGPU capable.

The python script vgpu\_unlock intercepts all ioctl syscalls between the
executable specified as the first argument and the kernel. The script then
modifies the kernel responses to indicate a PCI device ID with vGPU support
and a vGPU capable GPU.

### Kernel module hooks: vgpu\_unlock\_hooks.c

In order to exchange data with the GPU the kernel module maps the physical
address space of the PCI bus into its own virtual address space. This is done
using the ioremap\* kernel functions. The kernel module then reads and writes
data into that mapped address space. This is done using the memcpy kernel
function.

By including the vgpu\_unlock\_hooks.c file into the os-interface.c file we can
use C preprocessor macros to replace and intercept calls to the iormeap and
memcpy functions. Doing this allows us to maintain a view of what is mapped
where and what data that is being accessed.

### Kernel module linker script: kern.ld

This is a modified version of the default linker script provided by gcc. The
script is modified to place the .rodata section of nv-kernel.o into .data
section instead of .rodata, making it writable. The script also provide the
symbols `vgpu_unlock_nv_kern_rodata_beg` and `vgpu_unlock_nv_kern_rodata_end`
to let us know where that section begins and ends.

### How it all comes together

After boot the nvidia-vgpud service queries the kernel for all installed GPUs
and checks for vGPU capability. This call is intercepted by the vgpu\_unlock
python script and the GPU is made vGPU capable. If a vGPU capable GPU is found
then nvidia-vgpu creates an MDEV device and the /sys/class/mdev\_bus directory
is created by the system.

vGPU devices can now be created by echoing UUIDs into the `create` files in the
mdev bus representation. This will create additional structures representing
the new vGPU device on the MDEV bus. These devices can then be assigned to VMs,
and when the VM starts it will open the MDEV device. This causes nvidia-vgpu-mgr
to start communicating with the kernel using ioctl. Again these calls are
intercepted by the vgpu\_unlock python script and when nvidia-vgpu-mgr asks if
the GPU is vGPU capable the answer is changed to yes. After that check it
attempts to initialize the vGPU device instance.

Initialization of the vGPU device is handled by the kernel module and it
performs its own check for vGPU capability, this one is a bit more complicated.

The kernel module maps the physical PCI address range 0xf0000000-0xf1000000 into
its virtual address space, it then performs some magical operations which we
don't really know what they do. What we do know is that after these operations
it accesses a 128 bit value at physical address 0xf0029624, which we call the
magic value. The kernel module also accessses a 128 bit value at physical
address 0xf0029634, which we call the key value.

The kernel module then has a couple of lookup tables for the magic value, one
for vGPU capable GPUs and one for the others. So the kernel module looks for the
magic value in both of these lookup tables, and if it is found that table entry
also contains a set of AES-128 encrypted data blocks and a HMAC-SHA256
signature.

The signature is then validated by using the key value mentioned earlier to
calculate the HMAC-SHA256 signature over the encrypted data blocks. If the
signature is correct, then the blocks are decrypted using AES-128 and the same
key.

Inside of the decrypted data is once again the PCI device ID.

So in order for the kernel module to accept the GPU as vGPU capable the magic
value will have to be in the table of vGPU capable magic values, the key has
to generate a valid HMAC-SHA256 signature and the AES-128 decrypted data blocks
has to contain a vGPU capable PCI device ID. If any of these checks fail, then
the error code 0x56 "Call not supported" is returned.

In order to make these checks pass the hooks in vgpu\_unlock\_hooks.c will look
for a ioremap call that maps the physical address range that contain the magic
and key values, recalculate the addresses of those values into the virtual
address space of the kernel module, monitor memcpy operations reading at those
addresses, and if such an operation occurs, keep a copy of the value until both
are known, locate the lookup tables in the .rodata section of nv-kernel.o, find
the signature and data bocks, validate the signature, decrypt the blocks, edit
the PCI device ID in the decrypted data, reencrypt the blocks, regenerate the
signature and insert the magic, blocks and signature into the table of vGPU
capable magic values. And that's what they do.

162 changes: 162 additions & 0 deletions kern.ld
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
/* Script for ld -r: link without relocation */
/* Copyright (C) 2014-2018 Free Software Foundation, Inc.
Copying and distribution of this script, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved. */
OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64",
"elf64-x86-64")
OUTPUT_ARCH(i386:x86-64)
/* For some reason, the Solaris linker makes bad executables
if gld -r is used and the intermediate file has sections starting
at non-zero addresses. Could be a Solaris ld bug, could be a GNU ld
bug. But for now assigning the zero vmas works. */
SECTIONS
{
/* Read-only sections, merged into text segment: */
.interp 0 : { *(.interp) }
.note.gnu.build-id : { *(.note.gnu.build-id) }
.hash 0 : { *(.hash) }
.gnu.hash 0 : { *(.gnu.hash) }
.dynsym 0 : { *(.dynsym) }
.dynstr 0 : { *(.dynstr) }
.gnu.version 0 : { *(.gnu.version) }
.gnu.version_d 0: { *(.gnu.version_d) }
.gnu.version_r 0: { *(.gnu.version_r) }
.rela.init 0 : { *(.rela.init) }
.rela.text 0 : { *(.rela.text) }
.rela.fini 0 : { *(.rela.fini) }
.rela.rodata 0 : { *(.rela.rodata) }
.rela.data.rel.ro 0 : { *(.rela.data.rel.ro) }
.rela.data 0 : { *(.rela.data) }
.rela.tdata 0 : { *(.rela.tdata) }
.rela.tbss 0 : { *(.rela.tbss) }
.rela.ctors 0 : { *(.rela.ctors) }
.rela.dtors 0 : { *(.rela.dtors) }
.rela.got 0 : { *(.rela.got) }
.rela.bss 0 : { *(.rela.bss) }
.rela.ldata 0 : { *(.rela.ldata) }
.rela.lbss 0 : { *(.rela.lbss) }
.rela.lrodata 0 : { *(.rela.lrodata) }
.rela.ifunc 0 : { *(.rela.ifunc) }
.rela.plt 0 :
{
*(.rela.plt)
}
.init 0 :
{
KEEP (*(SORT_NONE(.init)))
}
.plt 0 : { *(.plt) *(.iplt) }
.plt.got 0 : { *(.plt.got) }
.plt.sec 0 : { *(.plt.sec) }
.text 0 :
{
*(.text .stub)
/* .gnu.warning sections are handled specially by elf32.em. */
*(.gnu.warning)
}
.fini 0 :
{
KEEP (*(SORT_NONE(.fini)))
}
.rodata 0 : { *(EXCLUDE_FILE (*nv-kernel.o) .rodata) }
.rodata1 0 : { *(.rodata1) }
.eh_frame_hdr : { *(.eh_frame_hdr) }
.eh_frame 0 : ONLY_IF_RO { KEEP (*(.eh_frame)) }
.gcc_except_table 0 : ONLY_IF_RO { *(.gcc_except_table
.gcc_except_table.*) }
.gnu_extab 0 : ONLY_IF_RO { *(.gnu_extab*) }
/* These sections are generated by the Sun/Oracle C++ compiler. */
.exception_ranges 0 : ONLY_IF_RO { *(.exception_ranges
.exception_ranges*) }
/* Adjust the address for the data segment. We want to adjust up to
the same address within the page on the next page up. */
/* Exception handling */
.eh_frame 0 : ONLY_IF_RW { KEEP (*(.eh_frame)) }
.gnu_extab 0 : ONLY_IF_RW { *(.gnu_extab) }
.gcc_except_table 0 : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) }
.exception_ranges 0 : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) }
/* Thread Local Storage sections */
.tdata 0 :
{
*(.tdata)
}
.tbss 0 : { *(.tbss) }
.jcr 0 : { KEEP (*(.jcr)) }
.dynamic 0 : { *(.dynamic) }
.got 0 : { *(.got) *(.igot) }
.got.plt 0 : { *(.got.plt) *(.igot.plt) }
.data 0 :
{
*(.data)
vgpu_unlock_nv_kern_rodata_beg = .;
*nv-kernel.o(.rodata)
vgpu_unlock_nv_kern_rodata_end = .;
}
.data1 0 : { *(.data1) }
.bss 0 :
{
*(.bss)
*(COMMON)
/* Align here to ensure that the .bss section occupies space up to
_end. Align after .bss to ensure correct alignment even if the
.bss section disappears because there are no input sections.
FIXME: Why do we need it? When there is no .bss section, we don't
pad the .data section. */
}
.lbss 0 :
{
*(.dynlbss)
*(.lbss)
*(LARGE_COMMON)
}
.lrodata 0 :
{
*(.lrodata)
}
.ldata 0 :
{
*(.ldata)
}
/* Stabs debugging sections. */
.stab 0 : { *(.stab) }
.stabstr 0 : { *(.stabstr) }
.stab.excl 0 : { *(.stab.excl) }
.stab.exclstr 0 : { *(.stab.exclstr) }
.stab.index 0 : { *(.stab.index) }
.stab.indexstr 0 : { *(.stab.indexstr) }
.comment 0 : { *(.comment) }
/* DWARF debug sections.
Symbols in the DWARF debugging sections are relative to the beginning
of the section so we begin them at 0. */
/* DWARF 1 */
.debug 0 : { *(.debug) }
.line 0 : { *(.line) }
/* GNU DWARF 1 extensions */
.debug_srcinfo 0 : { *(.debug_srcinfo) }
.debug_sfnames 0 : { *(.debug_sfnames) }
/* DWARF 1.1 and DWARF 2 */
.debug_aranges 0 : { *(.debug_aranges) }
.debug_pubnames 0 : { *(.debug_pubnames) }
/* DWARF 2 */
.debug_info 0 : { *(.debug_info) }
.debug_abbrev 0 : { *(.debug_abbrev) }
.debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) }
.debug_frame 0 : { *(.debug_frame) }
.debug_str 0 : { *(.debug_str) }
.debug_loc 0 : { *(.debug_loc) }
.debug_macinfo 0 : { *(.debug_macinfo) }
/* SGI/MIPS DWARF 2 extensions */
.debug_weaknames 0 : { *(.debug_weaknames) }
.debug_funcnames 0 : { *(.debug_funcnames) }
.debug_typenames 0 : { *(.debug_typenames) }
.debug_varnames 0 : { *(.debug_varnames) }
/* DWARF 3 */
.debug_pubtypes 0 : { *(.debug_pubtypes) }
.debug_ranges 0 : { *(.debug_ranges) }
/* DWARF Extension. */
.debug_macro 0 : { *(.debug_macro) }
.debug_addr 0 : { *(.debug_addr) }
.gnu.attributes 0 : { KEEP (*(.gnu.attributes)) }
}

Loading

0 comments on commit 6881c41

Please sign in to comment.