OVMBA-2015-0092

ULN >
Oracle Linux Errata repository >
OVMBA-2015-0092

OVMBA-2015-0092 - xen bug fix update

Type:	BUG
Impact:	NA
Release Date:	2015-07-19

Description

[4.3.0-55.el6.47.3]
- x86: vcpu_destroy_pagetables() must not return -EINTR
.. otherwise it has the side effect that: domain_relinquish_resources
will stop and will return to user-space with -EINTR which it is not
equipped to deal with that error code; or vcpu_reset - which will
ignore it and convert the error to -ENOMEM..
The preemption mechanism we have for domain destruction is to return
-EAGAIN (and then user-space calls the hypercall again) and as such we need
to catch the case of:
domain_relinquish_resources
->vcpu_destroy_pagetables
-> put_page_and_type_preemptible
-> __put_page_type
returns -EINTR
and convert it to the proper type. For:
XEN_DOMCTL_setvcpucontext
-> vcpu_reset
-> vcpu_destroy_pagetables
we need to return -ERESTART otherwise we end up returning -ENOMEM.
There are also other callers of vcpu_destroy_pagetables: arch_vcpu_reset
(vcpu_reset) are:
- hvm_s3_suspend (asserts on any return code),
- vlapic_init_sipi_one (asserts on any return code),
Signed-off-by: Konrad Rzeszutek Wilk
Signed-off-by: Jan Beulich
Acked-by: Chuck Anderson [bug 21133414]

[4.3.0-55.el6.47.2]
- mm: Make scrubbing a low-priority task
An idle processor will attempt to scrub pages left over by a previously
exited guest. The processor takes global heap_lock in scrub_free_pages(),
manipulates pages on the heap lists and releases the lock before performing
the actual scrubbing in __scrub_free_pages().
It has been observed that on some systems, even though scrubbing itself
is done with the lock not held, other unrelated heap users are unable
to take the (now free) lock. We theorize that massive scrubbing locks out
the bus (or some other HW resources), preventing lock requests from reaching
the scrubbing node.
This patch tries to alleviate this problem by having the scrubber monitor
whether there are other waiters for the heap lock and, if such waiters
exist, stop scrubbing.
To achieve this, we make two changes to existing code:
1. Parallelize the heap lock by breaking it to per-node locks
2. Create an atomic per-node counter array. Before a CPU on a particular
node attempts to acquire the (now per-node) lock it increments the counter.
The scrubbing processor periodically checks this counter and, if it is
non-zero, stops scrubbing.
Few notes:
1. Until now, total_avail_pages and midsize_alloc_zone_pages updates have been
performed under global heap_lock which was also used to control access to heap.
Since now those accesses are guarded by per-node locks, we introduce heap_lock_global.
Note that this is really only to protect readers of this variables from reading
inconsistent values (such as if another CPU is in the middle of updating them).
The values themselves are somewhat 'unsynchronized' from actual heap state. We
try to be conservative and decrement them before pages are taken from the heap
and increment them after they are placed there.
2. Similarly, page_broken/offlined_list are no longer under heap_lock.
pglist_lock is added to synchronize access to those lists.
3. d->last_alloc_node used to be updated under heap_lock. It was read, however,
without holding this lock so it seems that lockless updates will not make the
situation any worse (and since these updates are simple writes, as opposed to
some sort of RMW, we shouldn't need to convert it to an atomic).
Signed-off-by: Boris Ostrovsky
Reviewed-by: Konrad Rzeszutek Wilk
Acked-by: Chuck Anderson [bug 21133543]

[4.3.0-55.el6.47.1]
- IOMMU: make page table deallocation preemptible
Backport of cedfdd43a97.
We are spending lots of time flushing CPU cache, one PTE at a time, to
make sure that IOMMU (which may not be able to watch coherence traffic
on the bus) doesn't load stale PTE from memory.
For guests with lots of memory (say, >512GB) this may take as much as
half a minute or more and as result (because this is a non-preemptable
operation) things start to break down.
Below is the original commit message:
This too can take an arbitrary amount of time.
In fact, the bulk of the work is being moved to a tasklet, as handling
the necessary preemption logic in line seems close to impossible given
that the teardown may also be invoked on error paths.
Signed-off-by: Jan Beulich
Reviewed-by: Andrew Cooper
Acked-by: Xiantao Zhang
Signed-off-by: Boris Ostrovsky
Acked-by: Chuck Anderson [bug 21133626]

[4.3.0-55.el6.47]
- Use AUTO_PHP_SLOT as virtual devfn for rebooted pvhvm guest
Xend try to get vdevfn from dictionary and use it as vdevfn for reboot.
In first boot, if simulated nic is unplugged before passthroughed device hotplug,
and in reboot, the order is reversed, there will be a conflict of vdevfn.
qemu.log shows 'hot add pci devfn -2 exceed.'
This patch can't be upstreamed as upstream has dropped 'xend' completely.
Signed-off-by: Zhenzhong Duan
Signed-off-by: Chuang Cao
Signed-off-by: Wengang Wang
Acked-by: Konrad Rzeszutek Wilk [bug 20781679]

[4.3.0-55.el6.46]
- xend: disable vbd discard feature for file type backend
Signed-off-by: Zhigang Wang
Reviewed-by: Konrad Rzeszutek Wilk [bug 20888341] [bug 20905655]

[4.3.0-55.el6.39]
- xend: fix python fork and log consume %100 cpu issue
It is caused by python internal bug: http://bugs.python.org/issue6721 .
When xend forks subprocess then calls logging function, deadlock occurred.
Because python has no fix yet, so remove the logging.debug() call in
XendBootloader.py to workaround it.
Signed-off-by: Joe Jin
Reviewed-by: Zhigang Wang [bug 20752002]

[4.3.0-55.el6.38]
- Xen: Fix migration issue from ovm3.2.8 to ovm3.3.x
This patch is a newer fix for pvhvm migration failure from
Xen4.1(ovm3.2.x) to Xen4.3(ovm3.3.x), and this issue exists in
upstream xen too. The original fix casues issue for released ovm
versions if user wants to do live migration with no downtime since
that fix requires rebooting the migration source server too.
This patch keeps the xenstore eventchannel allcation mechanism of
Xen4.3 as same as the one in Xen4.1. So migration can works well through
Xen4.1 to later Xen, no need to reboot migration source server.
The patch that causes this migration issue is,
http://lists.xen.org/archives/html/xen-devel/2011-11/msg01046.html
Signed-off-by: Annie Li
Acked-by: Adnan Misherfi [bug 19517860]

[4.3.0-55.el6.37]
- switch internal hypercall restart indication from -EAGAIN to -ERESTART

-EAGAIN being a return value we want to return to the actual caller in
a couple of cases makes this unsuitable for restart indication, and x86
already developed two cases where -EAGAIN could not be returned as
intended due to this (which is being fixed here at once).

Signed-off-by: Jan Beulich
Acked-by: Ian Campbell Acked-by: Aravind Gopalakrishnan
Reviewed-by: Tim Deegan
(cherry-pick from f5118cae0a7f7748c6f08f557e2cfbbae686434a)
Signed-off-by: Konrad Rzeszutek Wilk
Conflicts:
A LOT
[There are lot of changes to for this change. We only care about the
one in the domain destruction. We need the value -EAGAIN to be passed
in the toolstack so that it will retry the destruction. Any other
value (-ERESTART) and it will stop it - which some of the other
backports do we convert -ERESTART to -EAGAIN only].
Acked-by: Chuck Anderson
Reviewed-by: John Haxby [bug 20664678]

[4.3.0-55.el6.36]
- rc/xendomains: 'stop' - also take care of stuck guests.
When we are done shutting down the guests (xm --shutdown --all)
are at that point not running at all. They might still have
QEMU or backend drivers setup due to the asynchronous nature
of 'shutdown' process. As such doing an 'destroy' on all
the guests will assure us that the backend drivers and QEMU
are indeed stopped.
The mechanism by which 'shutdown' works is quite complex. There
are three actors at play:
a) xm client (Which connects to the XML RPC),
b) Xend Xenstore watch thread,
c) XML RPC server thread
The way shutdown starts is:
xm client | XML RPC | watch thread
shutdown.py
- server....shutdown ---|--> XenDomainInfo:shutdown
Sets 'control/shutdown'
calls xc.domain_shutdown
returns
- loops calling:
domains_with_state ----|-->XendDomain:list_names
gets active |
and inactive | watchMain
list _on_domains_changed
- _refresh
-> _refreshTxn
-> update [sets to
DOM_STATE_SHUTDOWN]
->refreshShutd
own
[spawns a ne
w thread calling _maybeRestart]
[_maybeRestart thread]:
destroy
[sets it to DOM_STATE_HALTED]
-cleanupDomain
- _releaseDevices
- ..
Four threads total.
There is a race between 'watchMain' being executed and 'domains_with_state'
calling 'list_names'. For guests that are in DOM_STATE_UNKNOWN or DOM_STATE_PAUS
ED
they might not be updated to DOM_STATE_SHUTDOWN as list_names can be called
_before_ watchMain triggers. There is an lock acquisition to call 'refresh'
in list_names - but if it fails - it will just use the stale list.
As such the process works great for guests that are in STATE_SHUTDOWN,
STATE_HALT, or STATE_RUNNING - which 'domains_with_state' will present
to shutdown process.
For the other states (The more troublesome ones) we might have them
still laying around.
As such this patch calls 'xm destroy' on all those remaining guests
to do cleanup.
Signed-off-by: Konrad Rzeszutek Wilk
Acked-by: Chuck Anderson
Reviewed-by: John Haxby [bug 20663386]

[4.3.0-55.el6.35]
- xend: Fix race between shutdown and cleanup.
When we invoke 'xm shutdown --wait --all' we will exit the moment
the guest has stopped executing. That is when xcinfo returns
shutdown=1. However that does not mean that all the infrastructure
around the guest has been torn down - QEMU can be still running,
Netback and Blkback as well. In the past the time between
the shutdown and qemu being disposed of was quick - however
the race was still present there.
With our usage of PCIe passthrough we MUST unbind those devices
from a guest before we can continue on with the reboot of
the system. That is due to the complex interaction the SR-IOV
devices have with VF and PFs - as you cannot unload the PF driver
before the VFs driver have been unbound from the guest.
If you try to reboot the machine at this point the PF driver
will not unload.
The VF drivers are bound to Xen pciback - and they are unbound
when QEMU is stopped and XenStore keys are torn down - which
is done _after_ the 'shutdown' xcinfo is set (in the cleanup
stage). Worst the Xen blkback is still active - which means
we cannot unmount the storage until said cleanup has finished.
But as mentioned - 'xm shutdown --wait --all' would happily
exit before the cleanup finished and the shutdown (or reboot)
of the initial domain would continue on. It would eventually
get wedged when trying to unmount the storage which still
had a refcount from Xen block driver - which was not cleaned up
as Xend was killed earlier.
This patch solves this by delaying 'xm shutdown --wait --all'
to wait until the guest has transitioned from RUNNING ->
SHUTDOWN -> HALTED stage. The SHUTDOWN means it has ceased
to execute. The HALTED is that the cleanup is being performed.
We will cycle through all of the guests in that state until
they have moved out of those states (removed completly from
the system).
Signed-off-by: Konrad Rzeszutek Wilk
Acked-by: Chuck Anderson
Reviewed-by: John Haxby [bug 20659992]

[4.3.0-55.el6.22]
- hvmloader: don't use AML operations on 64-bit fields
WinXP and Win2K3, while having no problem with the QWordMemory resource
(there was another one there before), don't like operations on 64-bit
fields. Split the fields d0688669 ('hvmloader: also cover PCI MMIO
ranges above 4G with UC MTRR ranges') added to 32-bit ones, handling
carry over explicitly.
Sadly the constructs needed to create the sub-fields - nominally
CreateDWordField(PRT0, _SB.PCI0._CRS._Y02._MIN, MINL)
CreateDWordField(PRT0, Add(_SB.PCI0._CRS._Y02._MIN, 4), MINH)
- can't be used: The former gets warned upon by newer iasl, i.e. would
need to be replaced by the latter just with the addend changed to 0,
and the latter doesn't translate properly with recent iasl). Hence,
short of having an ASL/iasl expert at hand, we need to work around the
shortcomings of various iasl versions. See the code comment.
Signed-off-by: Jan Beulich
Acked-by: Ian Campbell
(cherry picked from commit 7f8d8abcf6dfb85fae591a547b24f9b27d92272c)
Signed-off-by: Konrad Rzeszutek Wilk
Committed-by: Zhenzhong Duan [bug 20140061]

[4.3.0-55.el6.21]
- hvmloader: fix build with certain iasl versions
While most of them support what we have now, Wheezy's dislikes the
empty range. Put a fake one in place - it's getting overwritten upon
evaluation of _CRS anyway.
The range could be grown (downwards) if necessary; the way it is now
it is
- the highest possible one below the 36-bit boundary (with 36 bits
being the lowest common denominator for all supported systems),
- the smallest possible one that said iasl accepts.
Reported-by: Sander Eikelenboom
Signed-off-by: Jan Beulich
Acked-by: Ian Campbell
(cherry picked from commit 119d8a42d3bfe6ebc1785720e1a7260e5c698632)
Signed-off-by: Konrad Rzeszutek Wilk
Committed-by: Zhenzhong Duan [bug 20140061]

[4.3.0-55.el6.20]
- hvmloader: also cover PCI MMIO ranges above 4G with UC MTRR ranges
When adding support for BAR assignments to addresses above 4G, the MTRR
side of things was left out.
Additionally the MMIO ranges in the DSDT's _SB.PCI0._CRS were having
memory types not matching the ones put into MTRRs: The legacy VGA range
is supposed to be WC, and the other ones should be UC.
Signed-off-by: Jan Beulich
Acked-by: Ian Campbell
(cherry picked from commit d06886694328a31369addc1f614cf326728d65a6)
Signed-off-by: Konrad Rzeszutek Wilk
Committed-by: Zhenzhong Duan [bug 20140061]

[4.3.0-55.el6.19]
- Add 64-bit support to QEMU.
Currently it is assumed PCI device BAR access < 4G memory. If there is such a
device whose BAR size is larger than 4G, it must access > 4G memory address.
This patch enable the 64bits big BAR support on qemu-xen.
Signed-off-by: Xiantao Zhang
Signed-off-by: Xudong Hao
Tested-by: Michel Riviere
Signed-off-by: Zhenzhong Duan
Signed-off-by: Konrad Rzeszutek Wilk
Committed-by: Zhenzhong Duan [bug 20140061]

[4.3.0-55.el6.18]
- tasklet: Introduce per-cpu tasklet for softirq (v5)
This implements a lockless per-cpu tasklet mechanism.
The existing tasklet mechanism has a single global
spinlock that is taken every-time the global list
is touched. And we use this lock quite a lot - when
we call do_tasklet_work which is called via an softirq
and from the idle loop. We take the lock on any
operation on the tasklet_list.
The problem we are facing is that there are quite a lot of
tasklets scheduled. The most common one that is invoked is
the one injecting the VIRQ_TIMER in the guest. Guests
are not insane and don't set the one-shot or periodic
clocks to be in sub 1ms intervals (causing said tasklet
to be scheduled for such small intervalls).
The problem appears when PCI passthrough devices are used
over many sockets and we have an mix of heavy-interrupt
guests and idle guests. The idle guests end up seeing
1/10 of its RUNNING timeslice eaten by the hypervisor
(and 40% steal time).
The mechanism by which we inject PCI interrupts is by
hvm_do_IRQ_dpci which schedules the hvm_dirq_assist
tasklet every time an interrupt is received.
The callchain is:
_asm_vmexit_handler
-> vmx_vmexit_handler
->vmx_do_extint
-> do_IRQ
-> __do_IRQ_guest
-> hvm_do_IRQ_dpci
tasklet_schedule(&dpci->dirq_tasklet);
[takes lock to put the tasklet on]
[later on the schedule_tail is invoked which is 'vmx_do_resume']
vmx_do_resume
-> vmx_asm_do_vmentry
-> call vmx_intr_assist
-> vmx_process_softirqs
-> do_softirq
[executes the tasklet function, takes the
lock again]
While on other CPUs they might be sitting in a idle loop
and invoked to deliver an VIRQ_TIMER, which also ends
up taking the lock twice: first to schedule the
v->arch.hvm_vcpu.assert_evtchn_irq_tasklet (accounted to
the guests' BLOCKED_state); then to execute it - which is
accounted for in the guest's RUNTIME_state.
The end result is that on a 8 socket machine with
PCI passthrough, where four sockets are busy with interrupts,
and the other sockets have idle guests - we end up with
the idle guests having around 40% steal time and 1/10
of its timeslice (3ms out of 30 ms) being tied up
taking the lock. The latency of the PCI interrupts delieved
to guest is also hindered.
With this patch the problem disappears completly.
That is removing the lock for the PCI passthrough use-case
(the 'hvm_dirq_assist' case).
As such this patch introduces the code to setup
softirq per-cpu tasklets and only modifies the PCI
passthrough cases instead of doing it wholesale. This
is done because:
- We want to easily bisect it if things break.
- We modify the code one section at a time to
make it easier to review this core code.
Now on the code itself. The Linux code (softirq.c)
has an per-cpu implementation of tasklets on which
this was based on. However there are differences:
- This patch executes one tasklet at a time - similar
to how the existing implementation does it.
- We use a double-linked list instead of a single linked
list. We could use a single-linked list but folks are
more familiar with 'list_*' type macros.
- This patch does not have the cross-CPU feeders
implemented. That code is in the patch
titled: tasklet: Add cross CPU feeding of per-cpu
tasklets. This is done to support:
'tasklet_schedule_on_cpu'
- We add an temporary 'TASKLET_SOFTIRQ_PERCPU' which
is can co-exist with the TASKLET_SOFTIRQ. It will be
replaced in 'tasklet: Remove the old-softirq
implementation.'
Signed-off-by: Konrad Rzeszutek Wilk
Acked-by: Adnan Misherfi
Backported-by: Joe Jin [bug 20138111]

[4.3.0-55.el6.17]
- libxl/sysctl/ionuma: Make 'xl info -n' print device topology
'xl info -n' will provide both CPU and IO topology information. Note
that xend (i.e. 'xm' variant of this command) will continue to only
print CPU topology.
To minimize code changes, libxl_get_topologyinfo (libxl's old interface
for topology) is preserved so its users (other than output_topologyinfo())
are not modified.
Signed-off-by: Boris Ostrovsky
Reviewed-by: Konrad Rzeszutek Wilk
Backported-by: Joe Jin [bug 20088513]

[4.3.0-55.el6.16]
- pci: Manage NUMA information for PCI devices
Keep track of device's PXM data (in the form of node ID)
Signed-off-by: Boris Ostrovsky
Reviewed-by: Konrad Rzeszutek Wilk
Backported-by: Joe Jin [bug 20088513]

[4.3.0-55.el6.15]
- libxl: ocaml: support for Arrays in bindings generator.
No change in generated code because no arrays are currently generated.
Signed-off-by: Ian Campbell
Signed-off-by: Rob Hoes
Acked-by: David Scott
Backported-by: Joe Jin [bug 20088513]

[4.3.0-55.el6.14]
- Reduce domain destroy time by delay page scrubbing
Because of page scrubbing, it's very slow to destroy a domain with large
memory.
This patch introduced a 'PGC_need_scrub' flag, pages with this flag means it
need to be scrubbed before use.
During domain destory, pages are marked as 'PGC_need_scrub' and be added to free
heap list, so that xl can return quickly. The real scrub is delayed to the
allocation path if a page with 'PGC_need_scrub' is allocated.
Besides that, trigger all idle vcpus to do the scrub job in parallel before
them enter sleep.
In order to get rid of heavy lock contention, a percpu list is used:
- Delist a batch of pages to a percpu list from 'scrub' free page list.
- Scrub pages on this percpu list.
- Return those clean pages to normal 'heap' free page list, merge with other
chunks if needed.
On a ~500GB guest, shutdown took slightly over one minute compared with over 6
minutes if without this patch.
Signed-off-by: Bob Liu
Acked-by: Adnan Misherfi
Signed-off-by: Konrad Rzeszutek Wilk
Backported-by: Joe Jin [bug 18489484]

[4.3.0-55.el6.13]
- Revert 'pci: Manage NUMA information for PCI devices'
Backport-by: Joe Jin [bug 20088513]

[4.3.0-55.el6.12]
- Revert 'libxl/sysctl/ionuma: Make 'xl info -n' print device topology'
Signed-off-by: Joe Jin [bug 20088513]

[4.3.0-55.el6.11]
- libxl/sysctl/ionuma: Make 'xl info -n' print device topology
'xl info -n' will provide both CPU and IO topology information. Note
that xend (i.e. 'xm' variant of this command) will continue to only
print CPU topology.
To minimize code changes, libxl_get_topologyinfo (libxl's old interface
for topology) is preserved so its users (other than output_topologyinfo())
are not modified.
Signed-off-by: Boris Ostrovsky
Reviewed-by: Konrad Rzeszutek Wilk
Backported-by: Joe Jin [bug 20088513]

[4.3.0-55.el6.10]
- pci: Manage NUMA information for PCI devices
Keep track of device's PXM data (in the form of node ID)
Signed-off-by: Boris Ostrovsky
Reviewed-by: Konrad Rzeszutek Wilk
Backport-by: Joe Jin [bug 20088513]

[4.3.0-55.el6.9]
- tools/python: expose xc_getcpuinfo()
This API can be used to get per physical CPU utilization.
Testing:
>>> import xen.lowlevel.xc
>>> xc = xen.lowlevel.xc.xc()
>>> xc.getcpuinfo()
Traceback (most recent call last):
File '', line 1, in
TypeError: Required argument 'max_cpus' (pos 1) not found
>>> xc.getcpuinfo(4)
[{'idletime': 109322086128854}, {'idletime': 109336447648802},
{'idletime': 109069270544960}, {'idletime': 109065612611363}]
>>> xc.getcpuinfo(100)
[{'idletime': 109639015806078}, {'idletime': 109654551195681},
{'idletime': 109382107891193}, {'idletime': 109382057541119}]
>>> xc.getcpuinfo(1)
[{'idletime': 109682068418798}]
>>> xc.getcpuinfo(2)
[{'idletime': 109711311201330}, {'idletime': 109728458214729}]
>>> xc.getcpuinfo(max_cpus=4)
[{'idletime': 109747116214638}, {'idletime': 109764982453261},
{'idletime': 109491373228931}, {'idletime': 109489858724432}]
Signed-off-by: Zhigang Wang
Acked-by: Ian Campbell
Upsteam commit: a9958947e49644c917c2349a567b2005b08e7c1f [bug 19707017]

[4.3.0-55.el6.8]
- xend: disable sslv3 due to CVE-2014-3566
Signed-off-by: Zhigang Wang
Signed-off-by: Kurt Hackel
Signed-off-by: Adnan Misherfi
Backported-by: Chuang Cao [bug 19831402]

[4.3.0-55.el6.7]
- xend: fix domain destroy after reboot
Signed-off-by: Zhigang Wang
Signed-off-by: Joe Jin
Signed-off-by: Iain MacDonnell [bug 19557384]

[4.3.0-55.el6.6]
- Keep the maxmem and memory same in vm.cfg
Signed-off-by: Annie Li
Signed-off-by: Adnan Misherfi
Signed-off-by: Joe Jin [bug 19440731]

[4.3.0-55.el6.5]
- xen: Only allocating the xenstore event channel earlier
This patch allocates xenstore event channel earlier to fix the migration
issue from ovm3.2.8 to 3.3.1, and also reverts the change for console
event channel to avoid it is set to none after allocation.
Signed-off-by: Annie Li
Acked-by: Adnan Misherfi
Backported-by: Joe Jin [bug 19517860]

[4.3.0-55.el6.4]
- Increase xen max_phys_cpus to support hardware with 384 CPUs
Signed-off-by: Adnan Misherfi
Backported-by: Adnan Misherfi [bug 19564352]

[4.3.0-55.el6.3]
- Fix migration bug from OVM3.2.8(Xen4.1.3) to OVM3.3.1(Xen4.3.x)
The pvhvm migration from ovm3.2.8 to ovm3.3.1 fails because xenstore event channel number changes,
this patch allocate xenstore event channel as ealier as possible to avoid this issue.
Signed-off-by: Annie Li
Backported-by: Joe Jin [bug 19517860]

[4.3.0-55.el6.2]
- Fix the panic on HP DL580 Gen8.
Signed-off-by: Konrad Wilk
Signed-off-by: Adnan Misherfi
Backported-by: Chuang Cao [bug 19295185]

[4.3.0-55.el6.1]
- Before connecting the emulated network interface (vif.x.y-emu) to a bridge, change the emu MTU to
equal the MTU of the bridge to prevent the bridge from downgrading its own MTU to equal the emu MTU.
Signed-off-by: Adnan Misherfi
Backported-by: Chuang Cao [bug 19241260]

[4.3.0-55]
- x86/HVM: use fixed TSC value when saving or restoring domain

When a domain is saved each VCPU's TSC value needs to be preserved. To get it we
use hvm_get_guest_tsc(). This routine (either itself or via get_s_time() which
it may call) calculates VCPU's TSC based on current host's TSC value (by doing a
rdtscll()). Since this is performed for each VCPU separately we end up with
un-synchronized TSCs.

Similarly, during a restore each VCPU is assigned its TSC based on host's current
tick, causing virtual TSCs to diverge further.

With this, we can easily get into situation where a guest may see time going
backwards.

Instead of reading new TSC value for each VCPU when saving/restoring it we should
use the same value across all VCPUs.

Reported-by: Philippe Coquard
Signed-off-by: Boris Ostrovsky
Reviewed-by: Jan Beulich
commit: 88e64cb785c1de4f686c1aa1993a0003b7db9e1a [bug 18755631]

[4.3.0-54]
- iommu: set correct IOMMU entries when iommu_hap_pt_share == 0
If the memory map is not shared between HAP and IOMMU we fail to set
correct IOMMU mappings for memory types other than p2m_ram_rw.
This patchs adds IOMMU support for the following memory types:
p2m_grant_map_rw, p2m_map_foreign, p2m_ram_ro, p2m_grant_map_ro and
p2m_ram_logdirty.
Signed-off-by: Roger Pau Monn?195?169
Cc: Tim Deegan
Cc: Jan Beulich
Tested-by: David Zhuang
---
Changes since v1:
- Move the p2m type switch to IOMMU flags to an inline function that
is shared between p2m-ept and p2m-pt.
- Make p2m_set_entry also use p2m_get_iommu_flags.
---
When backporting this patch it would not apply cleanly due to two commits
not existing in the Xen 4.3 repo:
commit 243cebb3dfa1f94ec7c2b040e8fd15ae4d81cc5a
Author: Mukesh Rathor
Date: Thu Apr 17 10:05:07 2014 +0200
pvh dom0: introduce p2m_map_foreign
[adds the p2m_map_foreign type]
commit 3d8d2bd048773ababfa65cc8781b9ab3f5cf0eb0
Author: Jan Beulich
Date: Fri Mar 28 13:37:10 2014 +0100
x86/EPT: simplification and cleanup
[simplifies the loop in ept_set_entry]
As such the original patch from
http://lists.xen.org/archives/html/xen-devel/2014-04/msg02928.html
has been slightly changed.
Signed-off-by: Konrad Rzeszutek Wilk [bug 17789939]

[4.3.0-53]
- x86/svm: enable TSC scaling

TSC ratio enabling logic is inverted: we want to use it when we
are running in native tsc mode, i.e. when d->arch.vtsc is zero.

Also, since now svm_set_tsc_offset()'s calculations depend
on vtsc's value, we need to call hvm_funcs.set_tsc_offset() after
vtsc changes in tsc_set_info().

In addition, with TSC ratio enabled, svm_set_tsc_offset() will
need to do rdtsc. With that we may end up having TSCs on guest's
processors out of sync. d->arch.hvm_domain.sync_tsc which is set
by the boot processor can now be used by APs as reference TSC
value instead of host's current TSC.

Signed-off-by: Boris Ostrovsky
Reviewed-by: Jan Beulich
commit: b95fd03b5f0b66384bd7c190d5861ae68eb98c85 [bug 18755631]

[4.3.0-52]
- x86: use native RDTSC(P) execution when guest and host frequencies are the same
We should be able to continue using native RDTSC(P) execution on
HVM/PVH guests after migration if host and guest frequencies are
equal (this includes the case when the frequencies are made equal
by TSC scaling feature).

This also allows us to revert main part of commit 4aab59a3 (svm: Do not
intercept RDTSC(P) when TSC scaling is supported by hardware) which
was wrong: while RDTSC intercepts were disabled domain's vtsc could
still be set, leading to inconsistent view of guest's TSC.

Signed-off-by: Boris Ostrovsky
Acked-by: Jan Beulich
commit: 82713ec8d2b65d17f13e46a131e38bfe5baf8bd6 [bug 18755631]

[4.3.0-50]
- Signed-off by: Adnan G Misherfi
Signed-off by: Zhigang Wang [bug 18560587]

[4.3.0-49]
- Check in the following patch for Konrad:
From Message-ID: <1332267691-13179-1-git-send-email-david.vrabel@citrix.com>
If a maximum reservation for dom0 is not explictly given (i.e., no
dom0_mem=max:MMM command line option), then set the maximum
reservation to the initial number of pages. This is what most people
seem to expect when they specify dom0_mem=512M (i.e., exactly 512 MB
and no more).
This change means that with Linux 3.0.5 and later kernels,
dom0_mem=512M has the same result as older, 'classic Xen' kernels. The
older kernels used the initial number of pages to set the maximum
number of pages and did not query the hypervisor for the maximum
reservation.
It is still possible to have a larger reservation by explicitly
specifying dom0_mem=max:MMM.
Signed-off-by: David Vrabel
Signed-off-by: Konrad Rzeszutek Wilk
NOTE: This behaviour should also be implemented in the Linux kernel. [bug 13860516] [bug 18552768]

[4.3.0-48]
- Check in the following patch for Konrad:
From: Konrad Rzeszutek Wilk
When we migrate an HVM guest, by default our shared_info can
only hold up to 32 CPUs. As such the hypercall
VCPUOP_register_vcpu_info was introduced which allowed us to
setup per-page areas for VCPUs. This means we can boot PVHVM
guest with more than 32 VCPUs. During migration the per-cpu
structure is allocated fresh by the hypervisor (vcpu_info_mfn
is set to INVALID_MFN) so that the newly migrated guest
can do make the VCPUOP_register_vcpu_info hypercall.
Unfortunatly we end up triggering this condition:
/* Run this command on yourself or on other offline VCPUS. */
if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) )
which means we are unable to setup the per-cpu VCPU structures
for running vCPUS. The Linux PV code paths make this work by
iterating over every vCPU with:
1) is target CPU up (VCPUOP_is_up hypercall?)
2) if yes, then VCPUOP_down to pause it.
3) VCPUOP_register_vcpu_info
4) if it was down, then VCPUOP_up to bring it back up
But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
not allowed on HVM guests we can't do this. This patch
enables this.
Signed-off-by: Konrad Rzeszutek Wilk [bug 18552539]

[4.3.0-46]
- The flowing patch was missed when we upgraded OVM xen to 4.3:
From 5eda9dfe0a2e11d9c91717f83ddbb2f52e7535e7 Mon Sep 17 00:00:00 2001
From: Zhenzhong Duan
Date: Fri, 4 Apr 2014 15:36:36 -0400
Subject: [PATCH] qemu-xen-trad: free all the pirqs for msi/msix when driver
unloads
Pirqs are not freed when driver unloads, then new pirqs are allocated when
driver reloads. This could exhaust pirqs if do it in a loop.
This patch fixes the bug by freeing pirqs when ENABLE bit is cleared in
msi/msix control reg.
There is also other way of fixing it such as reuse pirqs between driver reload,
but this way is better.
Xen-devel: http://marc.info/?l=xen-devel&m=136800120304275&w=2
Signed-off-by: Zhenzhong Duan
Signed-off-by: Konrad Rzeszutek Wilk
[4.3.0-45]
- check in upstream dd03048 patch to add support for OL7 VM [bug 18487695]

[4.3.0-44]
- Just release running lock after a domain is gone.
Signed-off-by: Chuang Cao
Signed-off-by: Zhigang Wang
Acked-by: Konrad Rzeszutek Wilk
Acked-by: Adnan Misherfi
Acked-by: Julie Trask [bug 17936558]

[4.3.0-43]
- Backport xen patch 'reset TSC to 0 after domain resume from S3' [bug 18010443]

[4.3.0-42]
- Release domain running lock correctly
When the domain dies very early by:
VmError: HVM guest support is unavailable: is VT/AMD-V supported by your CPU and enabled in your BIOS?
We don't release release the domain running lock correctly.
Signed-off-by: Zhigang Wang
Signed-off-by: Adnan Misherfi [bug 18328751]

[4.3.0-41]
- x86/pci: Store VF's memory space displacement in a 64-bit value
VF's memory space offset can be greater than 4GB and therefore needs
to be stored in a 64-bit variable.
commit: 001bdcee7bc19be3e047d227b4d940c04972eb02
Acked-by: Adnan Misherfi
Signed-off-by: Boris Ostrovsky [bug 18262495]

[4.3.0-34]
- Test if openvswitch kernel module is loaded to determine where to attach the VIF (bridge or openvswitch) [bug 17885201]

[4.3.0-33]
- Signed-off by: Zhigang Wang
Signed-off by: Adnan G Misherfi [bug 18048615]

[4.3.0-32]
Add following upstream commits:
- 2cebe22e6924439535cbf4a9f82a7d9d30c8f9c7
(libxenctrl: Fix xc_interface_close() crash if it gets NULL as an argument),
- dc37e0bfffc673f4bdce1d69ad86098bfb0ab531
(x86: fix early boot command line parsing),
- 7113a45451a9f656deeff070e47672043ed83664
(kexec/x86: do not map crash kernel area).

[4.3.0-31]
- Signed-off by: Adnan G Misherfi
Signed-off by: Zhigang Wang [bug 18048615]

Updated Packages

Release/Architecture	Filename	sha256	Superseded By Advisory	Channel Label

Oracle VM 3.3 (x86_64)	xen-4.3.0-55.el6.47.33.src.rpm	595942ef34a1301097bb31a535c956a4224dc8010176f9a319ec7ccb56412dce	OVMBA-2024-0012	ovm3_x86_64_3.3_patch
	xen-4.3.0-55.el6.47.33.x86_64.rpm	6c8c3cf096f668fecd6dc69c471ebe889de54640cbdcb98abc5310e51e34c825	OVMBA-2024-0012	ovm3_x86_64_3.3_patch
	xen-tools-4.3.0-55.el6.47.33.x86_64.rpm	3dd03379910fc7028c0af83068cf1458d81c44928cd3469cdfa94849beda99a9	OVMBA-2024-0012	ovm3_x86_64_3.3_patch

This page is generated automatically and has not been checked for errors or omissions. For clarification or corrections please contact the Oracle Linux ULN team

Technical information

Oracle Linux Support

Connect

Contact Us

Global contacts
Oracle 1-800-633-0691