Skip to content

Commit

Permalink
KVM: x86: fix tsc catchup issue with tsc scaling
Browse files Browse the repository at this point in the history
To fix a problem related to different resolution of TSC and system clock,
the offset in TSC units is approximated by

delta = vcpu->hv_clock.tsc_timestamp 	- 	vcpu->last_guest_tsc

(Guest TSC value at 			(Guest TSC value at last VM-exit)
the last kvm_guest_time_update
call)

Delta is then later scaled using mult,shift pair found in hv_clock
structure (which is correct against tsc_timestamp in that
structure).

However, if a frequency change is performed between these two points,
this delta is measured using different TSC frequencies, but scaled using
mult,shift pair for one frequency only.

The end result is an incorrect delta.

The bug which this code works around is not the only cause for
clock backwards events. The global accumulator is still
necessary, so remove the max_kernel_ns fix and rely on the
global accumulator for no clock backwards events.

Signed-off-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
  • Loading branch information
matosatti authored and bonzini committed Jan 15, 2014
1 parent 9ed96e8 commit f25e656
Showing 1 changed file with 1 addition and 40 deletions.
41 changes: 1 addition & 40 deletions arch/x86/kvm/x86.c
Original file line number Diff line number Diff line change
Expand Up @@ -1487,7 +1487,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
unsigned long flags, this_tsc_khz;
struct kvm_vcpu_arch *vcpu = &v->arch;
struct kvm_arch *ka = &v->kvm->arch;
s64 kernel_ns, max_kernel_ns;
s64 kernel_ns;
u64 tsc_timestamp, host_tsc;
struct pvclock_vcpu_time_info guest_hv_clock;
u8 pvclock_flags;
Expand Down Expand Up @@ -1546,52 +1546,13 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
if (!vcpu->pv_time_enabled)
return 0;

/*
* Time as measured by the TSC may go backwards when resetting the base
* tsc_timestamp. The reason for this is that the TSC resolution is
* higher than the resolution of the other clock scales. Thus, many
* possible measurments of the TSC correspond to one measurement of any
* other clock, and so a spread of values is possible. This is not a
* problem for the computation of the nanosecond clock; with TSC rates
* around 1GHZ, there can only be a few cycles which correspond to one
* nanosecond value, and any path through this code will inevitably
* take longer than that. However, with the kernel_ns value itself,
* the precision may be much lower, down to HZ granularity. If the
* first sampling of TSC against kernel_ns ends in the low part of the
* range, and the second in the high end of the range, we can get:
*
* (TSC - offset_low) * S + kns_old > (TSC - offset_high) * S + kns_new
*
* As the sampling errors potentially range in the thousands of cycles,
* it is possible such a time value has already been observed by the
* guest. To protect against this, we must compute the system time as
* observed by the guest and ensure the new system time is greater.
*/
max_kernel_ns = 0;
if (vcpu->hv_clock.tsc_timestamp) {
max_kernel_ns = vcpu->last_guest_tsc -
vcpu->hv_clock.tsc_timestamp;
max_kernel_ns = pvclock_scale_delta(max_kernel_ns,
vcpu->hv_clock.tsc_to_system_mul,
vcpu->hv_clock.tsc_shift);
max_kernel_ns += vcpu->last_kernel_ns;
}

if (unlikely(vcpu->hw_tsc_khz != this_tsc_khz)) {
kvm_get_time_scale(NSEC_PER_SEC / 1000, this_tsc_khz,
&vcpu->hv_clock.tsc_shift,
&vcpu->hv_clock.tsc_to_system_mul);
vcpu->hw_tsc_khz = this_tsc_khz;
}

/* with a master <monotonic time, tsc value> tuple,
* pvclock clock reads always increase at the (scaled) rate
* of guest TSC - no need to deal with sampling errors.
*/
if (!use_master_clock) {
if (max_kernel_ns > kernel_ns)
kernel_ns = max_kernel_ns;
}
/* With all the info we got, fill in the values */
vcpu->hv_clock.tsc_timestamp = tsc_timestamp;
vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset;
Expand Down

0 comments on commit f25e656

Please sign in to comment.