Skip to content

Commit

Permalink
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Browse files Browse the repository at this point in the history
Pull KVM updates from Paolo Bonzini:
 "PPC:
   - Better machine check handling for HV KVM
   - Ability to support guests with threads=2, 4 or 8 on POWER9
   - Fix for a race that could cause delayed recognition of signals
   - Fix for a bug where POWER9 guests could sleep with interrupts pending.

  ARM:
   - VCPU request overhaul
   - allow timer and PMU to have their interrupt number selected from userspace
   - workaround for Cavium erratum 30115
   - handling of memory poisonning
   - the usual crop of fixes and cleanups

  s390:
   - initial machine check forwarding
   - migration support for the CMMA page hinting information
   - cleanups and fixes

  x86:
   - nested VMX bugfixes and improvements
   - more reliable NMI window detection on AMD
   - APIC timer optimizations

  Generic:
   - VCPU request overhaul + documentation of common code patterns
   - kvm_stat improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits)
  Update my email address
  kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
  x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12
  kvm: x86: mmu: allow A/D bits to be disabled in an mmu
  x86: kvm: mmu: make spte mmio mask more explicit
  x86: kvm: mmu: dead code thanks to access tracking
  KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code
  KVM: PPC: Book3S HV: Close race with testing for signals on guest entry
  KVM: PPC: Book3S HV: Simplify dynamic micro-threading code
  KVM: x86: remove ignored type attribute
  KVM: LAPIC: Fix lapic timer injection delay
  KVM: lapic: reorganize restart_apic_timer
  KVM: lapic: reorganize start_hv_timer
  kvm: nVMX: Check memory operand to INVVPID
  KVM: s390: Inject machine check into the nested guest
  KVM: s390: Inject machine check into the guest
  tools/kvm_stat: add new interactive command 'b'
  tools/kvm_stat: add new command line switch '-i'
  tools/kvm_stat: fix error on interactive command 'g'
  KVM: SVM: suppress unnecessary NMI singlestep on GIF=0 and nested exit
  ...
  • Loading branch information
torvalds committed Jul 7, 2017
2 parents e0f25a3 + 1372324 commit c136b84
Show file tree
Hide file tree
Showing 95 changed files with 4,250 additions and 967 deletions.
12 changes: 12 additions & 0 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1862,6 +1862,18 @@
for all guests.
Default is 1 (enabled) if in 64-bit or 32-bit PAE mode.

kvm-arm.vgic_v3_group0_trap=
[KVM,ARM] Trap guest accesses to GICv3 group-0
system registers

kvm-arm.vgic_v3_group1_trap=
[KVM,ARM] Trap guest accesses to GICv3 group-1
system registers

kvm-arm.vgic_v3_common_trap=
[KVM,ARM] Trap guest accesses to GICv3 common
system registers

kvm-intel.ept= [KVM,Intel] Disable extended page tables
(virtualized MMU) support on capable Intel chips.
Default is 1 (enabled)
Expand Down
1 change: 1 addition & 0 deletions Documentation/arm64/silicon-errata.txt
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ stable kernels.
| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 |
| Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 |
| Cavium | ThunderX SMMUv2 | #27704 | N/A |
| Cavium | ThunderX Core | #30115 | CAVIUM_ERRATUM_30115 |
| | | | |
| Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
| | | | |
Expand Down
172 changes: 172 additions & 0 deletions Documentation/virtual/kvm/api.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3255,6 +3255,141 @@ Otherwise, if the MCE is a corrected error, KVM will just
store it in the corresponding bank (provided this bank is
not holding a previously reported uncorrected error).

4.107 KVM_S390_GET_CMMA_BITS

Capability: KVM_CAP_S390_CMMA_MIGRATION
Architectures: s390
Type: vm ioctl
Parameters: struct kvm_s390_cmma_log (in, out)
Returns: 0 on success, a negative value on error

This ioctl is used to get the values of the CMMA bits on the s390
architecture. It is meant to be used in two scenarios:
- During live migration to save the CMMA values. Live migration needs
to be enabled via the KVM_REQ_START_MIGRATION VM property.
- To non-destructively peek at the CMMA values, with the flag
KVM_S390_CMMA_PEEK set.

The ioctl takes parameters via the kvm_s390_cmma_log struct. The desired
values are written to a buffer whose location is indicated via the "values"
member in the kvm_s390_cmma_log struct. The values in the input struct are
also updated as needed.
Each CMMA value takes up one byte.

struct kvm_s390_cmma_log {
__u64 start_gfn;
__u32 count;
__u32 flags;
union {
__u64 remaining;
__u64 mask;
};
__u64 values;
};

start_gfn is the number of the first guest frame whose CMMA values are
to be retrieved,

count is the length of the buffer in bytes,

values points to the buffer where the result will be written to.

If count is greater than KVM_S390_SKEYS_MAX, then it is considered to be
KVM_S390_SKEYS_MAX. KVM_S390_SKEYS_MAX is re-used for consistency with
other ioctls.

The result is written in the buffer pointed to by the field values, and
the values of the input parameter are updated as follows.

Depending on the flags, different actions are performed. The only
supported flag so far is KVM_S390_CMMA_PEEK.

The default behaviour if KVM_S390_CMMA_PEEK is not set is:
start_gfn will indicate the first page frame whose CMMA bits were dirty.
It is not necessarily the same as the one passed as input, as clean pages
are skipped.

count will indicate the number of bytes actually written in the buffer.
It can (and very often will) be smaller than the input value, since the
buffer is only filled until 16 bytes of clean values are found (which
are then not copied in the buffer). Since a CMMA migration block needs
the base address and the length, for a total of 16 bytes, we will send
back some clean data if there is some dirty data afterwards, as long as
the size of the clean data does not exceed the size of the header. This
allows to minimize the amount of data to be saved or transferred over
the network at the expense of more roundtrips to userspace. The next
invocation of the ioctl will skip over all the clean values, saving
potentially more than just the 16 bytes we found.

If KVM_S390_CMMA_PEEK is set:
the existing storage attributes are read even when not in migration
mode, and no other action is performed;

the output start_gfn will be equal to the input start_gfn,

the output count will be equal to the input count, except if the end of
memory has been reached.

In both cases:
the field "remaining" will indicate the total number of dirty CMMA values
still remaining, or 0 if KVM_S390_CMMA_PEEK is set and migration mode is
not enabled.

mask is unused.

values points to the userspace buffer where the result will be stored.

This ioctl can fail with -ENOMEM if not enough memory can be allocated to
complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
KVM_S390_CMMA_PEEK is not set but migration mode was not enabled, with
-EFAULT if the userspace address is invalid or if no page table is
present for the addresses (e.g. when using hugepages).

4.108 KVM_S390_SET_CMMA_BITS

Capability: KVM_CAP_S390_CMMA_MIGRATION
Architectures: s390
Type: vm ioctl
Parameters: struct kvm_s390_cmma_log (in)
Returns: 0 on success, a negative value on error

This ioctl is used to set the values of the CMMA bits on the s390
architecture. It is meant to be used during live migration to restore
the CMMA values, but there are no restrictions on its use.
The ioctl takes parameters via the kvm_s390_cmma_values struct.
Each CMMA value takes up one byte.

struct kvm_s390_cmma_log {
__u64 start_gfn;
__u32 count;
__u32 flags;
union {
__u64 remaining;
__u64 mask;
};
__u64 values;
};

start_gfn indicates the starting guest frame number,

count indicates how many values are to be considered in the buffer,

flags is not used and must be 0.

mask indicates which PGSTE bits are to be considered.

remaining is not used.

values points to the buffer in userspace where to store the values.

This ioctl can fail with -ENOMEM if not enough memory can be allocated to
complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
the count field is too large (e.g. more than KVM_S390_CMMA_SIZE_MAX) or
if the flags field was not 0, with -EFAULT if the userspace address is
invalid, if invalid pages are written to (e.g. after the end of memory)
or if no page table is present for the addresses (e.g. when using
hugepages).

5. The kvm_run structure
------------------------

Expand Down Expand Up @@ -3996,6 +4131,34 @@ Parameters: none
Allow use of adapter-interruption suppression.
Returns: 0 on success; -EBUSY if a VCPU has already been created.

7.11 KVM_CAP_PPC_SMT

Architectures: ppc
Parameters: vsmt_mode, flags

Enabling this capability on a VM provides userspace with a way to set
the desired virtual SMT mode (i.e. the number of virtual CPUs per
virtual core). The virtual SMT mode, vsmt_mode, must be a power of 2
between 1 and 8. On POWER8, vsmt_mode must also be no greater than
the number of threads per subcore for the host. Currently flags must
be 0. A successful call to enable this capability will result in
vsmt_mode being returned when the KVM_CAP_PPC_SMT capability is
subsequently queried for the VM. This capability is only supported by
HV KVM, and can only be set before any VCPUs have been created.
The KVM_CAP_PPC_SMT_POSSIBLE capability indicates which virtual SMT
modes are available.

7.12 KVM_CAP_PPC_FWNMI

Architectures: ppc
Parameters: none

With this capability a machine check exception in the guest address
space will cause KVM to exit the guest with NMI exit reason. This
enables QEMU to build error log and branch to guest kernel registered
machine check handling routine. Without this capability KVM will
branch to guests' 0x200 interrupt vector.

8. Other capabilities.
----------------------

Expand Down Expand Up @@ -4157,3 +4320,12 @@ Currently the following bits are defined for the device_irq_level bitmap:
Future versions of kvm may implement additional events. These will get
indicated by returning a higher number from KVM_CHECK_EXTENSION and will be
listed above.

8.10 KVM_CAP_PPC_SMT_POSSIBLE

Architectures: ppc

Querying this capability returns a bitmap indicating the possible
virtual SMT modes that can be set using KVM_CAP_PPC_SMT. If bit N
(counting from the right) is set, then a virtual SMT mode of 2^N is
available.
15 changes: 15 additions & 0 deletions Documentation/virtual/kvm/devices/s390_flic.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ FLIC provides support to
- register and modify adapter interrupt sources (KVM_DEV_FLIC_ADAPTER_*)
- modify AIS (adapter-interruption-suppression) mode state (KVM_DEV_FLIC_AISM)
- inject adapter interrupts on a specified adapter (KVM_DEV_FLIC_AIRQ_INJECT)
- get/set all AIS mode states (KVM_DEV_FLIC_AISM_ALL)

Groups:
KVM_DEV_FLIC_ENQUEUE
Expand Down Expand Up @@ -136,6 +137,20 @@ struct kvm_s390_ais_req {
an isc according to the adapter-interruption-suppression mode on condition
that the AIS capability is enabled.

KVM_DEV_FLIC_AISM_ALL
Gets or sets the adapter-interruption-suppression mode for all ISCs. Takes
a kvm_s390_ais_all describing:

struct kvm_s390_ais_all {
__u8 simm; /* Single-Interruption-Mode mask */
__u8 nimm; /* No-Interruption-Mode mask *
};

simm contains Single-Interruption-Mode mask for all ISCs, nimm contains
No-Interruption-Mode mask for all ISCs. Each bit in simm and nimm corresponds
to an ISC (MSB0 bit 0 to ISC 0 and so on). The combination of simm bit and
nimm bit presents AIS mode for a ISC.

Note: The KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR device ioctls executed on
FLIC with an unknown group or attribute gives the error code EINVAL (instead of
ENXIO, as specified in the API documentation). It is not possible to conclude
Expand Down
41 changes: 34 additions & 7 deletions Documentation/virtual/kvm/devices/vcpu.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@ Parameters: in kvm_device_attr.addr the address for PMU overflow interrupt is a
Returns: -EBUSY: The PMU overflow interrupt is already set
-ENXIO: The overflow interrupt not set when attempting to get it
-ENODEV: PMUv3 not supported
-EINVAL: Invalid PMU overflow interrupt number supplied
-EINVAL: Invalid PMU overflow interrupt number supplied or
trying to set the IRQ number without using an in-kernel
irqchip.

A value describing the PMUv3 (Performance Monitor Unit v3) overflow interrupt
number for this vcpu. This interrupt could be a PPI or SPI, but the interrupt
Expand All @@ -25,11 +27,36 @@ all vcpus, while as an SPI it must be a separate number per vcpu.

1.2 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_INIT
Parameters: no additional parameter in kvm_device_attr.addr
Returns: -ENODEV: PMUv3 not supported
-ENXIO: PMUv3 not properly configured as required prior to calling this
attribute
Returns: -ENODEV: PMUv3 not supported or GIC not initialized
-ENXIO: PMUv3 not properly configured or in-kernel irqchip not
configured as required prior to calling this attribute
-EBUSY: PMUv3 already initialized

Request the initialization of the PMUv3. This must be done after creating the
in-kernel irqchip. Creating a PMU with a userspace irqchip is currently not
supported.
Request the initialization of the PMUv3. If using the PMUv3 with an in-kernel
virtual GIC implementation, this must be done after initializing the in-kernel
irqchip.


2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
Architectures: ARM,ARM64

2.1. ATTRIBUTE: KVM_ARM_VCPU_TIMER_IRQ_VTIMER
2.2. ATTRIBUTE: KVM_ARM_VCPU_TIMER_IRQ_PTIMER
Parameters: in kvm_device_attr.addr the address for the timer interrupt is a
pointer to an int
Returns: -EINVAL: Invalid timer interrupt number
-EBUSY: One or more VCPUs has already run

A value describing the architected timer interrupt number when connected to an
in-kernel virtual GIC. These must be a PPI (16 <= intid < 32). Setting the
attribute overrides the default values (see below).

KVM_ARM_VCPU_TIMER_IRQ_VTIMER: The EL1 virtual timer intid (default: 27)
KVM_ARM_VCPU_TIMER_IRQ_PTIMER: The EL1 physical timer intid (default: 30)

Setting the same PPI for different timers will prevent the VCPUs from running.
Setting the interrupt number on a VCPU configures all VCPUs created at that
time to use the number provided for a given timer, overwriting any previously
configured values on other VCPUs. Userspace should configure the interrupt
numbers on at least one VCPU after creating all VCPUs and before running any
VCPUs.
33 changes: 33 additions & 0 deletions Documentation/virtual/kvm/devices/vm.txt
Original file line number Diff line number Diff line change
Expand Up @@ -222,3 +222,36 @@ Allows user space to disable dea key wrapping, clearing the wrapping key.

Parameters: none
Returns: 0

5. GROUP: KVM_S390_VM_MIGRATION
Architectures: s390

5.1. ATTRIBUTE: KVM_S390_VM_MIGRATION_STOP (w/o)

Allows userspace to stop migration mode, needed for PGSTE migration.
Setting this attribute when migration mode is not active will have no
effects.

Parameters: none
Returns: 0

5.2. ATTRIBUTE: KVM_S390_VM_MIGRATION_START (w/o)

Allows userspace to start migration mode, needed for PGSTE migration.
Setting this attribute when migration mode is already active will have
no effects.

Parameters: none
Returns: -ENOMEM if there is not enough free memory to start migration mode
-EINVAL if the state of the VM is invalid (e.g. no memory defined)
0 in case of success.

5.3. ATTRIBUTE: KVM_S390_VM_MIGRATION_STATUS (r/o)

Allows userspace to query the status of migration mode.

Parameters: address of a buffer in user space to store the data (u64) to;
the data itself is either 0 if migration mode is disabled or 1
if it is enabled
Returns: -EFAULT if the given address is not accessible from kernel space
0 in case of success.
4 changes: 4 additions & 0 deletions Documentation/virtual/kvm/mmu.txt
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,10 @@ Shadow pages contain the following information:
shadow page; it is also used to go back from a struct kvm_mmu_page
to a memslot, through the kvm_memslots_for_spte_role macro and
__gfn_to_memslot.
role.ad_disabled:
Is 1 if the MMU instance cannot use A/D bits. EPT did not have A/D
bits before Haswell; shadow EPT page tables also cannot use A/D bits
if the L1 hypervisor does not enable them.
gfn:
Either the guest page table containing the translations shadowed by this
page, or the base page frame for linear translations. See role.direct.
Expand Down
Loading

0 comments on commit c136b84

Please sign in to comment.