Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
mm: userfaultfd: add new UFFDIO_POISON ioctl
The basic idea here is to "simulate" memory poisoning for VMs. A VM running on some host might encounter a memory error, after which some page(s) are poisoned (i.e., future accesses SIGBUS). They expect that once poisoned, pages can never become "un-poisoned". So, when we live migrate the VM, we need to preserve the poisoned status of these pages. When live migrating, we try to get the guest running on its new host as quickly as possible. So, we start it running before all memory has been copied, and before we're certain which pages should be poisoned or not. So the basic way to use this new feature is: - On the new host, the guest's memory is registered with userfaultfd, in either MISSING or MINOR mode (doesn't really matter for this purpose). - On any first access, we get a userfaultfd event. At this point we can communicate with the old host to find out if the page was poisoned. - If so, we can respond with a UFFDIO_POISON - this places a swap marker so any future accesses will SIGBUS. Because the pte is now "present", future accesses won't generate more userfaultfd events, they'll just SIGBUS directly. UFFDIO_POISON does not handle unmapping previously-present PTEs. This isn't needed, because during live migration we want to intercept all accesses with userfaultfd (not just writes, so WP mode isn't useful for this). So whether minor or missing mode is being used (or both), the PTE won't be present in any case, so handling that case isn't needed. Similarly, UFFDIO_POISON won't replace existing PTE markers. This might be okay to do, but it seems to be safer to just refuse to overwrite any existing entry (like a UFFD_WP PTE marker). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Acked-by: Peter Xu <[email protected]> Cc: Al Viro <[email protected]> Cc: Brian Geffon <[email protected]> Cc: Christian Brauner <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Gaosheng Cui <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: James Houghton <[email protected]> Cc: Jan Alexander Steffens (heftig) <[email protected]> Cc: Jiaqi Yan <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Suleiman Souhlal <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: T.J. Alumbaugh <[email protected]> Cc: Yu Zhao <[email protected]> Cc: ZhangPeng <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
- Loading branch information