Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
…/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 6.2

- Enable the per-vcpu dirty-ring tracking mechanism, together with an
  option to keep the good old dirty log around for pages that are
  dirtied by something other than a vcpu.

- Switch to the relaxed parallel fault handling, using RCU to delay
  page table reclaim and giving better performance under load.

- Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
  option, which multi-process VMMs such as crosvm rely on.

- Merge the pKVM shadow vcpu state tracking that allows the hypervisor
  to have its own view of a vcpu, keeping that state private.

- Add support for the PMUv3p5 architecture revision, bringing support
  for 64bit counters on systems that support it, and fix the
  no-quite-compliant CHAIN-ed counter support for the machines that
  actually exist out there.

- Fix a handful of minor issues around 52bit VA/PA support (64kB pages
  only) as a prefix of the oncoming support for 4kB and 16kB pages.

- Add/Enable/Fix a bunch of selftests covering memslots, breakpoints,
  stage-2 faults and access tracking. You name it, we got it, we
  probably broke it.

- Pick a small set of documentation and spelling fixes, because no
  good merge window would be complete without those.

As a side effect, this tag also drags:

- The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring
  series

- A shared branch with the arm64 tree that repaints all the system
  registers to match the ARM ARM's naming, and resulting in
  interesting conflicts
  • Loading branch information
bonzini committed Dec 9, 2022
2 parents 1e79a9e + 753d734 commit eb56189
Show file tree
Hide file tree
Showing 91 changed files with 6,077 additions and 1,770 deletions.
41 changes: 31 additions & 10 deletions Documentation/virt/kvm/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7418,8 +7418,9 @@ hibernation of the host; however the VMM needs to manually save/restore the
tags as appropriate if the VM is migrated.

When this capability is enabled all memory in memslots must be mapped as
not-shareable (no MAP_SHARED), attempts to create a memslot with a
MAP_SHARED mmap will result in an -EINVAL return.
``MAP_ANONYMOUS`` or with a RAM-based file mapping (``tmpfs``, ``memfd``),
attempts to create a memslot with an invalid mmap will result in an
-EINVAL return.

When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
perform a bulk copy of tags to/from the guest.
Expand Down Expand Up @@ -7954,7 +7955,7 @@ regardless of what has actually been exposed through the CPUID leaf.
8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
----------------------------------------------------------

:Architectures: x86
:Architectures: x86, arm64
:Parameters: args[0] - size of the dirty log ring

KVM is capable of tracking dirty memory using ring buffers that are
Expand Down Expand Up @@ -8036,13 +8037,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl). To achieve that, one
needs to kick the vcpu out of KVM_RUN using a signal. The resulting
vmexit ensures that all dirty GFNs are flushed to the dirty rings.

NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG. After enabling
KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
machine will switch to ring-buffer dirty page tracking and further
KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.

NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
should be exposed by weakly ordered architecture, in order to indicate
the additional memory ordering requirements imposed on userspace when
Expand All @@ -8051,6 +8045,33 @@ Architecture with TSO-like ordering (such as x86) are allowed to
expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
to userspace.

After enabling the dirty rings, the userspace needs to detect the
capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the
ring structures can be backed by per-slot bitmaps. With this capability
advertised, it means the architecture can dirty guest pages without
vcpu/ring context, so that some of the dirty information will still be
maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
can't be enabled if the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
hasn't been enabled, or any memslot has been existing.

Note that the bitmap here is only a backup of the ring structure. The
use of the ring and bitmap combination is only beneficial if there is
only a very small amount of memory that is dirtied out of vcpu/ring
context. Otherwise, the stand-alone per-slot bitmap mechanism needs to
be considered.

To collect dirty bits in the backup bitmap, userspace can use the same
KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG isn't needed as long as all
the generation of the dirty bits is done in a single pass. Collecting
the dirty bitmap should be the very last thing that the VMM does before
considering the state as complete. VMM needs to ensure that the dirty
state is final and avoid missing dirty pages from another ioctl ordered
after the bitmap collection.

NOTE: One example of using the backup bitmap is saving arm64 vgic/its
tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
KVM device "kvm-arm-vgic-its" when dirty ring is enabled.

8.30 KVM_CAP_XEN_HVM
--------------------

Expand Down
14 changes: 8 additions & 6 deletions Documentation/virt/kvm/arm/pvtime.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,21 +23,23 @@ the PV_TIME_FEATURES hypercall should be probed using the SMCCC 1.1
ARCH_FEATURES mechanism before calling it.

PV_TIME_FEATURES
============= ======== ==========

============= ======== =================================================
Function ID: (uint32) 0xC5000020
PV_call_id: (uint32) The function to query for support.
Currently only PV_TIME_ST is supported.
Return value: (int64) NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
PV-time feature is supported by the hypervisor.
============= ======== ==========
============= ======== =================================================

PV_TIME_ST
============= ======== ==========

============= ======== ==============================================
Function ID: (uint32) 0xC5000021
Return value: (int64) IPA of the stolen time data structure for this
VCPU. On failure:
NOT_SUPPORTED (-1)
============= ======== ==========
============= ======== ==============================================

The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
with inner and outer write back caching attributes, in the inner shareable
Expand Down Expand Up @@ -76,5 +78,5 @@ It is advisable that one or more 64k pages are set aside for the purpose of
these structures and not used for other purposes, this enables the guest to map
the region using 64k pages and avoids conflicting attributes with other memory.

For the user space interface see Documentation/virt/kvm/devices/vcpu.rst
section "3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL".
For the user space interface see
:ref:`Documentation/virt/kvm/devices/vcpu.rst <kvm_arm_vcpu_pvtime_ctrl>`.
5 changes: 4 additions & 1 deletion Documentation/virt/kvm/devices/arm-vgic-its.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,10 @@ KVM_DEV_ARM_VGIC_GRP_CTRL

KVM_DEV_ARM_ITS_SAVE_TABLES
save the ITS table data into guest RAM, at the location provisioned
by the guest in corresponding registers/table entries.
by the guest in corresponding registers/table entries. Should userspace
require a form of dirty tracking to identify which pages are modified
by the saving process, it should use a bitmap even if using another
mechanism to track the memory dirtied by the vCPUs.

The layout of the tables in guest memory defines an ABI. The entries
are laid out in little endian format as described in the last paragraph.
Expand Down
2 changes: 2 additions & 0 deletions Documentation/virt/kvm/devices/vcpu.rst
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,8 @@ configured values on other VCPUs. Userspace should configure the interrupt
numbers on at least one VCPU after creating all VCPUs and before running any
VCPUs.

.. _kvm_arm_vcpu_pvtime_ctrl:

3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL
==================================

Expand Down
1 change: 1 addition & 0 deletions arch/arm64/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -1965,6 +1965,7 @@ config ARM64_MTE
depends on ARM64_PAN
select ARCH_HAS_SUBPAGE_FAULTS
select ARCH_USES_HIGH_VMA_FLAGS
select ARCH_USES_PG_ARCH_X
help
Memory Tagging (part of the ARMv8.5 Extensions) provides
architectural support for run-time, always-on detection of
Expand Down
8 changes: 6 additions & 2 deletions arch/arm64/include/asm/kvm_arm.h
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@
* 40 bits wide (T0SZ = 24). Systems with a PARange smaller than 40 bits are
* not known to exist and will break with this configuration.
*
* The VTCR_EL2 is configured per VM and is initialised in kvm_arm_setup_stage2().
* The VTCR_EL2 is configured per VM and is initialised in kvm_init_stage2_mmu.
*
* Note that when using 4K pages, we concatenate two first level page tables
* together. With 16K pages, we concatenate 16 first level page tables.
Expand Down Expand Up @@ -340,9 +340,13 @@
* We have
* PAR [PA_Shift - 1 : 12] = PA [PA_Shift - 1 : 12]
* HPFAR [PA_Shift - 9 : 4] = FIPA [PA_Shift - 1 : 12]
*
* Always assume 52 bit PA since at this point, we don't know how many PA bits
* the page table has been set up for. This should be safe since unused address
* bits in PAR are res0.
*/
#define PAR_TO_HPFAR(par) \
(((par) & GENMASK_ULL(PHYS_MASK_SHIFT - 1, 12)) >> 8)
(((par) & GENMASK_ULL(52 - 1, 12)) >> 8)

#define ECN(x) { ESR_ELx_EC_##x, #x }

Expand Down
7 changes: 5 additions & 2 deletions arch/arm64/include/asm/kvm_asm.h
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,9 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_init_traps,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
__KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
};

#define DECLARE_KVM_VHE_SYM(sym) extern char sym[]
Expand Down Expand Up @@ -106,7 +109,7 @@ enum __kvm_host_smccc_func {
#define per_cpu_ptr_nvhe_sym(sym, cpu) \
({ \
unsigned long base, off; \
base = kvm_arm_hyp_percpu_base[cpu]; \
base = kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu]; \
off = (unsigned long)&CHOOSE_NVHE_SYM(sym) - \
(unsigned long)&CHOOSE_NVHE_SYM(__per_cpu_start); \
base ? (typeof(CHOOSE_NVHE_SYM(sym))*)(base + off) : NULL; \
Expand Down Expand Up @@ -211,7 +214,7 @@ DECLARE_KVM_HYP_SYM(__kvm_hyp_vector);
#define __kvm_hyp_init CHOOSE_NVHE_SYM(__kvm_hyp_init)
#define __kvm_hyp_vector CHOOSE_HYP_SYM(__kvm_hyp_vector)

extern unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
extern unsigned long kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[];
DECLARE_KVM_NVHE_SYM(__per_cpu_start);
DECLARE_KVM_NVHE_SYM(__per_cpu_end);

Expand Down
76 changes: 74 additions & 2 deletions arch/arm64/include/asm/kvm_host.h
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,63 @@ u32 __attribute_const__ kvm_target_cpu(void);
int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);

struct kvm_hyp_memcache {
phys_addr_t head;
unsigned long nr_pages;
};

static inline void push_hyp_memcache(struct kvm_hyp_memcache *mc,
phys_addr_t *p,
phys_addr_t (*to_pa)(void *virt))
{
*p = mc->head;
mc->head = to_pa(p);
mc->nr_pages++;
}

static inline void *pop_hyp_memcache(struct kvm_hyp_memcache *mc,
void *(*to_va)(phys_addr_t phys))
{
phys_addr_t *p = to_va(mc->head);

if (!mc->nr_pages)
return NULL;

mc->head = *p;
mc->nr_pages--;

return p;
}

static inline int __topup_hyp_memcache(struct kvm_hyp_memcache *mc,
unsigned long min_pages,
void *(*alloc_fn)(void *arg),
phys_addr_t (*to_pa)(void *virt),
void *arg)
{
while (mc->nr_pages < min_pages) {
phys_addr_t *p = alloc_fn(arg);

if (!p)
return -ENOMEM;
push_hyp_memcache(mc, p, to_pa);
}

return 0;
}

static inline void __free_hyp_memcache(struct kvm_hyp_memcache *mc,
void (*free_fn)(void *virt, void *arg),
void *(*to_va)(phys_addr_t phys),
void *arg)
{
while (mc->nr_pages)
free_fn(pop_hyp_memcache(mc, to_va), arg);
}

void free_hyp_memcache(struct kvm_hyp_memcache *mc);
int topup_hyp_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages);

struct kvm_vmid {
atomic64_t id;
};
Expand Down Expand Up @@ -115,6 +172,13 @@ struct kvm_smccc_features {
unsigned long vendor_hyp_bmap;
};

typedef unsigned int pkvm_handle_t;

struct kvm_protected_vm {
pkvm_handle_t handle;
struct kvm_hyp_memcache teardown_mc;
};

struct kvm_arch {
struct kvm_s2_mmu mmu;

Expand Down Expand Up @@ -163,9 +227,19 @@ struct kvm_arch {

u8 pfr0_csv2;
u8 pfr0_csv3;
struct {
u8 imp:4;
u8 unimp:4;
} dfr0_pmuver;

/* Hypercall features firmware registers' descriptor */
struct kvm_smccc_features smccc_feat;

/*
* For an untrusted host VM, 'pkvm.handle' is used to lookup
* the associated pKVM instance in the hypervisor.
*/
struct kvm_protected_vm pkvm;
};

struct kvm_vcpu_fault_info {
Expand Down Expand Up @@ -915,8 +989,6 @@ int kvm_set_ipa_limit(void);
#define __KVM_HAVE_ARCH_VM_ALLOC
struct kvm *kvm_arch_alloc_vm(void);

int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type);

static inline bool kvm_vm_is_protected(struct kvm *kvm)
{
return false;
Expand Down
3 changes: 3 additions & 0 deletions arch/arm64/include/asm/kvm_hyp.h
Original file line number Diff line number Diff line change
Expand Up @@ -123,4 +123,7 @@ extern u64 kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val);
extern u64 kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val);
extern u64 kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val);

extern unsigned long kvm_nvhe_sym(__icache_flags);
extern unsigned int kvm_nvhe_sym(kvm_arm_vmid_bits);

#endif /* __ARM64_KVM_HYP_H__ */
2 changes: 1 addition & 1 deletion arch/arm64/include/asm/kvm_mmu.h
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
void free_hyp_pgds(void);

void stage2_unmap_vm(struct kvm *kvm);
int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type);
void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
phys_addr_t pa, unsigned long size, bool writable);
Expand Down
Loading

0 comments on commit eb56189

Please sign in to comment.