Skip to content

Commit

Permalink
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Browse files Browse the repository at this point in the history
Pull kvm updates from Paolo Bonzini:
 "ARM:
   - GICv4.1 support

   - 32bit host removal

  PPC:
   - secure (encrypted) using under the Protected Execution Framework
     ultravisor

  s390:
   - allow disabling GISA (hardware interrupt injection) and protected
     VMs/ultravisor support.

  x86:
   - New dirty bitmap flag that sets all bits in the bitmap when dirty
     page logging is enabled; this is faster because it doesn't require
     bulk modification of the page tables.

   - Initial work on making nested SVM event injection more similar to
     VMX, and less buggy.

   - Various cleanups to MMU code (though the big ones and related
     optimizations were delayed to 5.8). Instead of using cr3 in
     function names which occasionally means eptp, KVM too has
     standardized on "pgd".

   - A large refactoring of CPUID features, which now use an array that
     parallels the core x86_features.

   - Some removal of pointer chasing from kvm_x86_ops, which will also
     be switched to static calls as soon as they are available.

   - New Tigerlake CPUID features.

   - More bugfixes, optimizations and cleanups.

  Generic:
   - selftests: cleanups, new MMU notifier stress test, steal-time test

   - CSV output for kvm_stat"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits)
  x86/kvm: fix a missing-prototypes "vmread_error"
  KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y
  KVM: VMX: Add a trampoline to fix VMREAD error handling
  KVM: SVM: Annotate svm_x86_ops as __initdata
  KVM: VMX: Annotate vmx_x86_ops as __initdata
  KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()
  KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection
  KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes
  KVM: VMX: Configure runtime hooks using vmx_x86_ops
  KVM: VMX: Move hardware_setup() definition below vmx_x86_ops
  KVM: x86: Move init-only kvm_x86_ops to separate struct
  KVM: Pass kvm_init()'s opaque param to additional arch funcs
  s390/gmap: return proper error code on ksm unsharing
  KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move()
  KVM: Fix out of range accesses to memslots
  KVM: X86: Micro-optimize IPI fastpath delay
  KVM: X86: Delay read msr data iff writes ICR MSR
  KVM: PPC: Book3S HV: Add a capability for enabling secure guests
  KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs
  KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs
  ...
  • Loading branch information
torvalds committed Apr 2, 2020
2 parents f14a953 + 514ccc1 commit 8c1b724
Show file tree
Hide file tree
Showing 206 changed files with 7,867 additions and 9,698 deletions.
5 changes: 5 additions & 0 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3821,6 +3821,11 @@
before loading.
See Documentation/admin-guide/blockdev/ramdisk.rst.

prot_virt= [S390] enable hosting protected virtual machines
isolated from the hypervisor (if hardware supports
that).
Format: <bool>

psi= [KNL] Enable or disable pressure stall information
tracking.
Format: <bool>
Expand Down
128 changes: 105 additions & 23 deletions Documentation/virt/kvm/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1574,8 +1574,8 @@ This ioctl would set vcpu's xcr to the value userspace specified.
};

#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0)
#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1)
#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2)
#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1) /* deprecated */
#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2) /* deprecated */

struct kvm_cpuid_entry2 {
__u32 function;
Expand Down Expand Up @@ -1626,13 +1626,6 @@ emulate them efficiently. The fields in each entry are defined as follows:

KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
if the index field is valid
KVM_CPUID_FLAG_STATEFUL_FUNC:
if cpuid for this function returns different values for successive
invocations; there will be several entries with the same function,
all with this flag set
KVM_CPUID_FLAG_STATE_READ_NEXT:
for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
the first entry to be read by a cpu

eax, ebx, ecx, edx:
the values returned by the cpuid instruction for
Expand Down Expand Up @@ -2117,7 +2110,8 @@ Errors:

====== ============================================================
 ENOENT   no such register
 EINVAL   invalid register ID, or no such register
 EINVAL   invalid register ID, or no such register or used with VMs in
protected virtualization mode on s390
 EPERM    (arm64) register access not allowed before vcpu finalization
====== ============================================================

Expand Down Expand Up @@ -2552,7 +2546,8 @@ Errors include:

======== ============================================================
 ENOENT   no such register
 EINVAL   invalid register ID, or no such register
 EINVAL   invalid register ID, or no such register or used with VMs in
protected virtualization mode on s390
 EPERM    (arm64) register access not allowed before vcpu finalization
======== ============================================================

Expand Down Expand Up @@ -3347,8 +3342,8 @@ The member 'flags' is used for passing flags from userspace.
::

#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0)
#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1)
#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2)
#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1) /* deprecated */
#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2) /* deprecated */

struct kvm_cpuid_entry2 {
__u32 function;
Expand Down Expand Up @@ -3394,13 +3389,6 @@ The fields in each entry are defined as follows:

KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
if the index field is valid
KVM_CPUID_FLAG_STATEFUL_FUNC:
if cpuid for this function returns different values for successive
invocations; there will be several entries with the same function,
all with this flag set
KVM_CPUID_FLAG_STATE_READ_NEXT:
for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
the first entry to be read by a cpu

eax, ebx, ecx, edx:

Expand Down Expand Up @@ -4649,6 +4637,60 @@ the clear cpu reset definition in the POP. However, the cpu is not put
into ESA mode. This reset is a superset of the initial reset.


4.125 KVM_S390_PV_COMMAND
-------------------------

:Capability: KVM_CAP_S390_PROTECTED
:Architectures: s390
:Type: vm ioctl
:Parameters: struct kvm_pv_cmd
:Returns: 0 on success, < 0 on error

::

struct kvm_pv_cmd {
__u32 cmd; /* Command to be executed */
__u16 rc; /* Ultravisor return code */
__u16 rrc; /* Ultravisor return reason code */
__u64 data; /* Data or address */
__u32 flags; /* flags for future extensions. Must be 0 for now */
__u32 reserved[3];
};

cmd values:

KVM_PV_ENABLE
Allocate memory and register the VM with the Ultravisor, thereby
donating memory to the Ultravisor that will become inaccessible to
KVM. All existing CPUs are converted to protected ones. After this
command has succeeded, any CPU added via hotplug will become
protected during its creation as well.

Errors:

===== =============================
EINTR an unmasked signal is pending
===== =============================

KVM_PV_DISABLE

Deregister the VM from the Ultravisor and reclaim the memory that
had been donated to the Ultravisor, making it usable by the kernel
again. All registered VCPUs are converted back to non-protected
ones.

KVM_PV_VM_SET_SEC_PARMS
Pass the image header from VM memory to the Ultravisor in
preparation of image unpacking and verification.

KVM_PV_VM_UNPACK
Unpack (protect and decrypt) a page of the encrypted boot image.

KVM_PV_VM_VERIFY
Verify the integrity of the unpacked image. Only if this succeeds,
KVM is allowed to start protected VCPUs.


5. The kvm_run structure
========================

Expand Down Expand Up @@ -5707,8 +5749,13 @@ and injected exceptions.
:Architectures: x86, arm, arm64, mips
:Parameters: args[0] whether feature should be enabled or not

With this capability enabled, KVM_GET_DIRTY_LOG will not automatically
clear and write-protect all pages that are returned as dirty.
Valid flags are::

#define KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (1 << 0)
#define KVM_DIRTY_LOG_INITIALLY_SET (1 << 1)

With KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE is set, KVM_GET_DIRTY_LOG will not
automatically clear and write-protect all pages that are returned as dirty.
Rather, userspace will have to do this operation separately using
KVM_CLEAR_DIRTY_LOG.

Expand All @@ -5719,18 +5766,42 @@ than requiring to sync a full memslot; this ensures that KVM does not
take spinlocks for an extended period of time. Second, in some cases a
large amount of time can pass between a call to KVM_GET_DIRTY_LOG and
userspace actually using the data in the page. Pages can be modified
during this time, which is inefficint for both the guest and userspace:
during this time, which is inefficient for both the guest and userspace:
the guest will incur a higher penalty due to write protection faults,
while userspace can see false reports of dirty pages. Manual reprotection
helps reducing this time, improving guest performance and reducing the
number of dirty log false positives.

With KVM_DIRTY_LOG_INITIALLY_SET set, all the bits of the dirty bitmap
will be initialized to 1 when created. This also improves performance because
dirty logging can be enabled gradually in small chunks on the first call
to KVM_CLEAR_DIRTY_LOG. KVM_DIRTY_LOG_INITIALLY_SET depends on
KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (it is also only available on
x86 for now).

KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 was previously available under the name
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT, but the implementation had bugs that make
it hard or impossible to use it correctly. The availability of
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 signals that those bugs are fixed.
Userspace should not try to use KVM_CAP_MANUAL_DIRTY_LOG_PROTECT.

7.19 KVM_CAP_PPC_SECURE_GUEST
------------------------------

:Architectures: ppc

This capability indicates that KVM is running on a host that has
ultravisor firmware and thus can support a secure guest. On such a
system, a guest can ask the ultravisor to make it a secure guest,
one whose memory is inaccessible to the host except for pages which
are explicitly requested to be shared with the host. The ultravisor
notifies KVM when a guest requests to become a secure guest, and KVM
has the opportunity to veto the transition.

If present, this capability can be enabled for a VM, meaning that KVM
will allow the transition to secure guest mode. Otherwise KVM will
veto the transition.

8. Other capabilities.
======================

Expand Down Expand Up @@ -6027,3 +6098,14 @@ Architectures: s390

This capability indicates that the KVM_S390_NORMAL_RESET and
KVM_S390_CLEAR_RESET ioctls are available.

8.23 KVM_CAP_S390_PROTECTED

Architecture: s390


This capability indicates that the Ultravisor has been initialized and
KVM can therefore start protected VMs.
This capability governs the KVM_S390_PV_COMMAND ioctl and the
KVM_MP_STATE_LOAD MP_STATE. KVM_SET_MP_STATE can fail for protected
guests when the state change is invalid.
5 changes: 5 additions & 0 deletions Documentation/virt/kvm/arm/hyp-abi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ hypervisor when running as a guest (under Xen, KVM or any other
hypervisor), or any hypervisor-specific interaction when the kernel is
used as a host.

Note: KVM/arm has been removed from the kernel. The API described
here is still valid though, as it allows the kernel to kexec when
booted at HYP. It can also be used by a hypervisor other than KVM
if necessary.

On arm and arm64 (without VHE), the kernel doesn't run in hypervisor
mode, but still needs to interact with it, allowing a built-in
hypervisor to be either installed or torn down.
Expand Down
11 changes: 2 additions & 9 deletions Documentation/virt/kvm/devices/s390_flic.rst
Original file line number Diff line number Diff line change
Expand Up @@ -108,16 +108,9 @@ Groups:
mask or unmask the adapter, as specified in mask

KVM_S390_IO_ADAPTER_MAP
perform a gmap translation for the guest address provided in addr,
pin a userspace page for the translated address and add it to the
list of mappings

.. note:: A new mapping will be created unconditionally; therefore,
the calling code should avoid making duplicate mappings.

This is now a no-op. The mapping is purely done by the irq route.
KVM_S390_IO_ADAPTER_UNMAP
release a userspace page for the translated address specified in addr
from the list of mappings
This is now a no-op. The mapping is purely done by the irq route.

KVM_DEV_FLIC_AISM
modify the adapter-interruption-suppression mode for a given isc if the
Expand Down
2 changes: 2 additions & 0 deletions Documentation/virt/kvm/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ KVM
nested-vmx
ppc-pv
s390-diag
s390-pv
s390-pv-boot
timekeeping
vcpu-requests

Expand Down
11 changes: 5 additions & 6 deletions Documentation/virt/kvm/locking.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,19 +96,18 @@ will happen:
We dirty-log for gfn1, that means gfn2 is lost in dirty-bitmap.

For direct sp, we can easily avoid it since the spte of direct sp is fixed
to gfn. For indirect sp, before we do cmpxchg, we call gfn_to_pfn_atomic()
to pin gfn to pfn, because after gfn_to_pfn_atomic():
to gfn. For indirect sp, we disabled fast page fault for simplicity.

A solution for indirect sp could be to pin the gfn, for example via
kvm_vcpu_gfn_to_pfn_atomic, before the cmpxchg. After the pinning:

- We have held the refcount of pfn that means the pfn can not be freed and
be reused for another gfn.
- The pfn is writable that means it can not be shared between different gfns
- The pfn is writable and therefore it cannot be shared between different gfns
by KSM.

Then, we can ensure the dirty bitmaps is correctly set for a gfn.

Currently, to simplify the whole things, we disable fast page fault for
indirect shadow page.

2) Dirty bit tracking

In the origin code, the spte can be fast updated (non-atomically) if the
Expand Down
84 changes: 84 additions & 0 deletions Documentation/virt/kvm/s390-pv-boot.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
.. SPDX-License-Identifier: GPL-2.0
======================================
s390 (IBM Z) Boot/IPL of Protected VMs
======================================

Summary
-------
The memory of Protected Virtual Machines (PVMs) is not accessible to
I/O or the hypervisor. In those cases where the hypervisor needs to
access the memory of a PVM, that memory must be made accessible.
Memory made accessible to the hypervisor will be encrypted. See
:doc:`s390-pv` for details."

On IPL (boot) a small plaintext bootloader is started, which provides
information about the encrypted components and necessary metadata to
KVM to decrypt the protected virtual machine.

Based on this data, KVM will make the protected virtual machine known
to the Ultravisor (UV) and instruct it to secure the memory of the
PVM, decrypt the components and verify the data and address list
hashes, to ensure integrity. Afterwards KVM can run the PVM via the
SIE instruction which the UV will intercept and execute on KVM's
behalf.

As the guest image is just like an opaque kernel image that does the
switch into PV mode itself, the user can load encrypted guest
executables and data via every available method (network, dasd, scsi,
direct kernel, ...) without the need to change the boot process.


Diag308
-------
This diagnose instruction is the basic mechanism to handle IPL and
related operations for virtual machines. The VM can set and retrieve
IPL information blocks, that specify the IPL method/devices and
request VM memory and subsystem resets, as well as IPLs.

For PVMs this concept has been extended with new subcodes:

Subcode 8: Set an IPL Information Block of type 5 (information block
for PVMs)
Subcode 9: Store the saved block in guest memory
Subcode 10: Move into Protected Virtualization mode

The new PV load-device-specific-parameters field specifies all data
that is necessary to move into PV mode.

* PV Header origin
* PV Header length
* List of Components composed of
* AES-XTS Tweak prefix
* Origin
* Size

The PV header contains the keys and hashes, which the UV will use to
decrypt and verify the PV, as well as control flags and a start PSW.

The components are for instance an encrypted kernel, kernel parameters
and initrd. The components are decrypted by the UV.

After the initial import of the encrypted data, all defined pages will
contain the guest content. All non-specified pages will start out as
zero pages on first access.


When running in protected virtualization mode, some subcodes will result in
exceptions or return error codes.

Subcodes 4 and 7, which specify operations that do not clear the guest
memory, will result in specification exceptions. This is because the
UV will clear all memory when a secure VM is removed, and therefore
non-clearing IPL subcodes are not allowed.

Subcodes 8, 9, 10 will result in specification exceptions.
Re-IPL into a protected mode is only possible via a detour into non
protected mode.

Keys
----
Every CEC will have a unique public key to enable tooling to build
encrypted images.
See `s390-tools <https://github.com/ibm-s390-tools/s390-tools/>`_
for the tooling.
Loading

0 comments on commit 8c1b724

Please sign in to comment.