Skip to content

Commit

Permalink
Merge branch 'kvm-5.20-early'
Browse files Browse the repository at this point in the history
s390:

* add an interface to provide a hypervisor dump for secure guests

* improve selftests to show tests

x86:

* Intel IPI virtualization

* Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS

* PEBS virtualization

* Simplify PMU emulation by just using PERF_TYPE_RAW events

* More accurate event reinjection on SVM (avoid retrying instructions)

* Allow getting/setting the state of the speaker port data bit

* Rewrite gfn-pfn cache refresh

* Refuse starting the module if VM-Entry/VM-Exit controls are inconsistent

* "Notify" VM exit
  • Loading branch information
bonzini committed Jun 9, 2022
2 parents e0f3f46 + b172862 commit e15f5e6
Show file tree
Hide file tree
Showing 71 changed files with 3,123 additions and 712 deletions.
244 changes: 241 additions & 3 deletions Documentation/virt/kvm/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1150,6 +1150,10 @@ The following bits are defined in the flags field:
fields contain a valid state. This bit will be set whenever
KVM_CAP_EXCEPTION_PAYLOAD is enabled.

- KVM_VCPUEVENT_VALID_TRIPLE_FAULT may be set to signal that the
triple_fault_pending field contains a valid state. This bit will
be set whenever KVM_CAP_TRIPLE_FAULT_EVENT is enabled.

ARM64:
^^^^^^

Expand Down Expand Up @@ -1245,6 +1249,10 @@ can be set in the flags field to signal that the
exception_has_payload, exception_payload, and exception.pending fields
contain a valid state and shall be written into the VCPU.

If KVM_CAP_TRIPLE_FAULT_EVENT is enabled, KVM_VCPUEVENT_VALID_TRIPLE_FAULT
can be set in flags field to signal that the triple_fault field contains
a valid state and shall be written into the VCPU.

ARM64:
^^^^^^

Expand Down Expand Up @@ -2998,7 +3006,9 @@ KVM_CREATE_PIT2. The state is returned in the following structure::
Valid flags are::

/* disable PIT in HPET legacy mode */
#define KVM_PIT_FLAGS_HPET_LEGACY 0x00000001
#define KVM_PIT_FLAGS_HPET_LEGACY 0x00000001
/* speaker port data bit enabled */
#define KVM_PIT_FLAGS_SPEAKER_DATA_ON 0x00000002

This IOCTL replaces the obsolete KVM_GET_PIT.

Expand Down Expand Up @@ -5127,7 +5137,15 @@ into ESA mode. This reset is a superset of the initial reset.
__u32 reserved[3];
};

cmd values:
**Ultravisor return codes**
The Ultravisor return (reason) codes are provided by the kernel if a
Ultravisor call has been executed to achieve the results expected by
the command. Therefore they are independent of the IOCTL return
code. If KVM changes `rc`, its value will always be greater than 0
hence setting it to 0 before issuing a PV command is advised to be
able to detect a change of `rc`.

**cmd values:**

KVM_PV_ENABLE
Allocate memory and register the VM with the Ultravisor, thereby
Expand All @@ -5143,7 +5161,6 @@ KVM_PV_ENABLE
===== =============================

KVM_PV_DISABLE

Deregister the VM from the Ultravisor and reclaim the memory that
had been donated to the Ultravisor, making it usable by the kernel
again. All registered VCPUs are converted back to non-protected
Expand All @@ -5160,6 +5177,117 @@ KVM_PV_VM_VERIFY
Verify the integrity of the unpacked image. Only if this succeeds,
KVM is allowed to start protected VCPUs.

KVM_PV_INFO
:Capability: KVM_CAP_S390_PROTECTED_DUMP

Presents an API that provides Ultravisor related data to userspace
via subcommands. len_max is the size of the user space buffer,
len_written is KVM's indication of how much bytes of that buffer
were actually written to. len_written can be used to determine the
valid fields if more response fields are added in the future.

::

enum pv_cmd_info_id {
KVM_PV_INFO_VM,
KVM_PV_INFO_DUMP,
};

struct kvm_s390_pv_info_header {
__u32 id;
__u32 len_max;
__u32 len_written;
__u32 reserved;
};

struct kvm_s390_pv_info {
struct kvm_s390_pv_info_header header;
struct kvm_s390_pv_info_dump dump;
struct kvm_s390_pv_info_vm vm;
};

**subcommands:**

KVM_PV_INFO_VM
This subcommand provides basic Ultravisor information for PV
hosts. These values are likely also exported as files in the sysfs
firmware UV query interface but they are more easily available to
programs in this API.

The installed calls and feature_indication members provide the
installed UV calls and the UV's other feature indications.

The max_* members provide information about the maximum number of PV
vcpus, PV guests and PV guest memory size.

::

struct kvm_s390_pv_info_vm {
__u64 inst_calls_list[4];
__u64 max_cpus;
__u64 max_guests;
__u64 max_guest_addr;
__u64 feature_indication;
};


KVM_PV_INFO_DUMP
This subcommand provides information related to dumping PV guests.

::

struct kvm_s390_pv_info_dump {
__u64 dump_cpu_buffer_len;
__u64 dump_config_mem_buffer_per_1m;
__u64 dump_config_finalize_len;
};

KVM_PV_DUMP
:Capability: KVM_CAP_S390_PROTECTED_DUMP

Presents an API that provides calls which facilitate dumping a
protected VM.

::

struct kvm_s390_pv_dmp {
__u64 subcmd;
__u64 buff_addr;
__u64 buff_len;
__u64 gaddr; /* For dump storage state */
};

**subcommands:**

KVM_PV_DUMP_INIT
Initializes the dump process of a protected VM. If this call does
not succeed all other subcommands will fail with -EINVAL. This
subcommand will return -EINVAL if a dump process has not yet been
completed.

Not all PV vms can be dumped, the owner needs to set `dump
allowed` PCF bit 34 in the SE header to allow dumping.

KVM_PV_DUMP_CONFIG_STOR_STATE
Stores `buff_len` bytes of tweak component values starting with
the 1MB block specified by the absolute guest address
(`gaddr`). `buff_len` needs to be `conf_dump_storage_state_len`
aligned and at least >= the `conf_dump_storage_state_len` value
provided by the dump uv_info data. buff_user might be written to
even if an error rc is returned. For instance if we encounter a
fault after writing the first page of data.

KVM_PV_DUMP_COMPLETE
If the subcommand succeeds it completes the dump process and lets
KVM_PV_DUMP_INIT be called again.

On success `conf_dump_finalize_len` bytes of completion data will be
stored to the `buff_addr`. The completion data contains a key
derivation seed, IV, tweak nonce and encryption keys as well as an
authentication tag all of which are needed to decrypt the dump at a
later time.


4.126 KVM_X86_SET_MSR_FILTER
----------------------------

Expand Down Expand Up @@ -5802,6 +5930,32 @@ of CPUID leaf 0xD on the host.

This ioctl injects an event channel interrupt directly to the guest vCPU.

4.136 KVM_S390_PV_CPU_COMMAND
-----------------------------

:Capability: KVM_CAP_S390_PROTECTED_DUMP
:Architectures: s390
:Type: vcpu ioctl
:Parameters: none
:Returns: 0 on success, < 0 on error

This ioctl closely mirrors `KVM_S390_PV_COMMAND` but handles requests
for vcpus. It re-uses the kvm_s390_pv_dmp struct and hence also shares
the command ids.

**command:**

KVM_PV_DUMP
Presents an API that provides calls which facilitate dumping a vcpu
of a protected VM.

**subcommand:**

KVM_PV_DUMP_CPU
Provides encrypted dump data like register values.
The length of the returned data is provided by uv_info.guest_cpu_stor_len.


5. The kvm_run structure
========================

Expand Down Expand Up @@ -6405,6 +6559,26 @@ array field represents return values. The userspace should update the return
values of SBI call before resuming the VCPU. For more details on RISC-V SBI
spec refer, https://github.com/riscv/riscv-sbi-doc.

::

/* KVM_EXIT_NOTIFY */
struct {
#define KVM_NOTIFY_CONTEXT_INVALID (1 << 0)
__u32 flags;
} notify;

Used on x86 systems. When the VM capability KVM_CAP_X86_NOTIFY_VMEXIT is
enabled, a VM exit generated if no event window occurs in VM non-root mode
for a specified amount of time. Once KVM_X86_NOTIFY_VMEXIT_USER is set when
enabling the cap, it would exit to userspace with the exit reason
KVM_EXIT_NOTIFY for further handling. The "flags" field contains more
detailed info.

The valid value for 'flags' is:

- KVM_NOTIFY_CONTEXT_INVALID -- the VM context is corrupted and not valid
in VMCS. It would run into unknown result if resume the target VM.

::

/* Fix the size of the union. */
Expand Down Expand Up @@ -7350,6 +7524,56 @@ The valid bits in cap.args[0] are:
generate a #UD within the guest.
=================================== ============================================

7.32 KVM_CAP_MAX_VCPU_ID
------------------------

:Architectures: x86
:Target: VM
:Parameters: args[0] - maximum APIC ID value set for current VM
:Returns: 0 on success, -EINVAL if args[0] is beyond KVM_MAX_VCPU_IDS
supported in KVM or if it has been set.

This capability allows userspace to specify maximum possible APIC ID
assigned for current VM session prior to the creation of vCPUs, saving
memory for data structures indexed by the APIC ID. Userspace is able
to calculate the limit to APIC ID values from designated
CPU topology.

The value can be changed only until KVM_ENABLE_CAP is set to a nonzero
value or until a vCPU is created. Upon creation of the first vCPU,
if the value was set to zero or KVM_ENABLE_CAP was not invoked, KVM
uses the return value of KVM_CHECK_EXTENSION(KVM_CAP_MAX_VCPU_ID) as
the maximum APIC ID.

7.33 KVM_CAP_X86_NOTIFY_VMEXIT
------------------------------

:Architectures: x86
:Target: VM
:Parameters: args[0] is the value of notify window as well as some flags
:Returns: 0 on success, -EINVAL if args[0] contains invalid flags or notify
VM exit is unsupported.

Bits 63:32 of args[0] are used for notify window.
Bits 31:0 of args[0] are for some flags. Valid bits are::

#define KVM_X86_NOTIFY_VMEXIT_ENABLED (1 << 0)
#define KVM_X86_NOTIFY_VMEXIT_USER (1 << 1)

This capability allows userspace to configure the notify VM exit on/off
in per-VM scope during VM creation. Notify VM exit is disabled by default.
When userspace sets KVM_X86_NOTIFY_VMEXIT_ENABLED bit in args[0], VMM will
enable this feature with the notify window provided, which will generate
a VM exit if no event window occurs in VM non-root mode for a specified of
time (notify window).

If KVM_X86_NOTIFY_VMEXIT_USER is set in args[0], upon notify VM exits happen,
KVM would exit to userspace for handling.

This capability is aimed to mitigate the threat that malicious VMs can
cause CPU stuck (due to event windows don't open up) and make the CPU
unavailable to host or other VMs.

8. Other capabilities.
======================

Expand Down Expand Up @@ -7956,6 +8180,20 @@ should adjust CPUID leaf 0xA to reflect that the PMU is disabled.
When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.

8.37 KVM_CAP_S390_PROTECTED_DUMP
--------------------------------

:Capability: KVM_CAP_S390_PROTECTED_DUMP
:Architectures: s390
:Type: vm

This capability indicates that KVM and the Ultravisor support dumping
PV guests. The `KVM_PV_DUMP` command is available for the
`KVM_S390_PV_COMMAND` ioctl and the `KVM_PV_INFO` command provides
dump related UV data. Also the vcpu ioctl `KVM_S390_PV_CPU_COMMAND` is
available and supports the `KVM_PV_DUMP_CPU` subcommand.


9. Known KVM API problems
=========================

Expand Down
1 change: 1 addition & 0 deletions Documentation/virt/kvm/s390/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ KVM for s390 systems
s390-diag
s390-pv
s390-pv-boot
s390-pv-dump
64 changes: 64 additions & 0 deletions Documentation/virt/kvm/s390/s390-pv-dump.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
.. SPDX-License-Identifier: GPL-2.0
===========================================
s390 (IBM Z) Protected Virtualization dumps
===========================================

Summary
-------

Dumping a VM is an essential tool for debugging problems inside
it. This is especially true when a protected VM runs into trouble as
there's no way to access its memory and registers from the outside
while it's running.

However when dumping a protected VM we need to maintain its
confidentiality until the dump is in the hands of the VM owner who
should be the only one capable of analysing it.

The confidentiality of the VM dump is ensured by the Ultravisor who
provides an interface to KVM over which encrypted CPU and memory data
can be requested. The encryption is based on the Customer
Communication Key which is the key that's used to encrypt VM data in a
way that the customer is able to decrypt.


Dump process
------------

A dump is done in 3 steps:

**Initiation**

This step initializes the dump process, generates cryptographic seeds
and extracts dump keys with which the VM dump data will be encrypted.

**Data gathering**

Currently there are two types of data that can be gathered from a VM:
the memory and the vcpu state.

The vcpu state contains all the important registers, general, floating
point, vector, control and tod/timers of a vcpu. The vcpu dump can
contain incomplete data if a vcpu is dumped while an instruction is
emulated with help of the hypervisor. This is indicated by a flag bit
in the dump data. For the same reason it is very important to not only
write out the encrypted vcpu state, but also the unencrypted state
from the hypervisor.

The memory state is further divided into the encrypted memory and its
metadata comprised of the encryption tweaks and status flags. The
encrypted memory can simply be read once it has been exported. The
time of the export does not matter as no re-encryption is
needed. Memory that has been swapped out and hence was exported can be
read from the swap and written to the dump target without need for any
special actions.

The tweaks / status flags for the exported pages need to be requested
from the Ultravisor.

**Finalization**

The finalization step will provide the data needed to be able to
decrypt the vcpu and memory data and end the dump process. When this
step completes successfully a new dump initiation can be started.
4 changes: 4 additions & 0 deletions arch/s390/boot/uv.c
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@ void uv_query_info(void)
uv_info.max_num_sec_conf = uvcb.max_num_sec_conf;
uv_info.max_guest_cpu_id = uvcb.max_guest_cpu_id;
uv_info.uv_feature_indications = uvcb.uv_feature_indications;
uv_info.supp_se_hdr_ver = uvcb.supp_se_hdr_versions;
uv_info.supp_se_hdr_pcf = uvcb.supp_se_hdr_pcf;
uv_info.conf_dump_storage_state_len = uvcb.conf_dump_storage_state_len;
uv_info.conf_dump_finalize_len = uvcb.conf_dump_finalize_len;
}

#ifdef CONFIG_PROTECTED_VIRTUALIZATION_GUEST
Expand Down
Loading

0 comments on commit e15f5e6

Please sign in to comment.