Reporting Issues
Bug 1512 - OVMF RFE: VCPU hotplug with SMM
Summary: OVMF RFE: VCPU hotplug with SMM
Status: RESOLVED FIXED
Alias: None
Product: Tianocore Feature Requests
Classification: Unclassified
Component: Code (show other bugs)
Version: Current
Hardware: All All
: Normal normal
Assignee: Laszlo Ersek
URL:
Keywords:
Depends on: 1515
Blocks: 3132
  Show dependency tree
 
Reported: 2019-02-04 08:31 UTC by Laszlo Ersek
Modified: 2020-12-21 09:39 UTC (History)
7 users (show)

See Also:
EDK II Code First industry standard specifications: ---
Branch URL:
Release(s) the issue is observed: EDK II Master
The OS the target platform is running: ---
Package: OvmfPkg
Release(s) the issues must be fixed: EDK II Master
Tianocore documents:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Laszlo Ersek 2019-02-04 08:31:12 UTC
QEMU and KVM support CPU hotplug and hot-unplug for virtual machines.

https://git.qemu.org/?p=qemu.git;a=blob;f=docs/cpu-hotplug.rst;h=1c268e00b41a6b4e5af37571031ec89250ec0229;hb=HEAD

https://git.qemu.org/?p=qemu.git;a=blob;f=docs/specs/acpi_cpu_hotplug.txt;h=ee219c8358088c72bcfb130c0e217fb71dc8f9b2;hb=HEAD

Normally, the guest firmware need not be involved in CPU hot(un)plug -- the coordination occurs between the hypervisor and the guest OS, using ACPI artifacts.

(The firmware is involved in setting up the ACPI objects, but that's a boot-time-only job, and the firmware is not enlightened about the ACPI specifics, it just executes QEMU's ACPI linker/loader script, for publishing the tables.)

This applies to SeaBIOS, and to OVMF when built without -D SMM_REQUIRE.

However, when OVMF is built with -D SMM_REQUIRE, there is a privilege barrier between virtual firmware and guest OS. If the guest OS could provide the startup routine for a newly hot-plugged VCPU, without the virtual firmware taking notice, then the guest OS could use this VCPU to attack the other VCPUs that execute privileged (SMM) code, the next time an SMI is raised. Thus, the firmware needs to grab the hot-added VCPU before the OS got control of it.

I believe the feature needs work in QEMU (-> transfer control to the firmware upon VCPU hotplug), OvmfPkg (platform enablement), and in UefiCpuPkg/PiSmmCpuDxeSmm too. My understanding is that the open source PiSmmCpuDxeSmm driver operates at OS runtime with the fixed (V)CPU count that it fetches at boot. I assume the CPU hotplug logic exists in variants of PiSmmCpuDxeSmm that are built into physical platform firmware; this RFE is about  extracting and open sourcing a reference implementation of that logic.

Thank you very much for considering the request!
Comment 1 Laszlo Ersek 2019-02-04 09:02:12 UTC
For QEMU/KVM, the use case is that the virtual machine is started with a specific topology (sockets, cores/socket, threads/core). The topology may or may not be fully populated at VM launch.

If the virtual machine administrator (or a management application) decides that the VM needs more computational power, they hotplug a number of VCPUs. The topology cannot be changed dynamically, so whenever the hotplug occurs, the topology may not already be fully populated. (This means that the topology has to be sized by the admin in advance, in expectation of VCPU hotplug.)

If the virtual machine admin (or a management application) that the VM should be deprived of some computational power, they hotplug a number of VCPUs. The exact dance requires guest cooperation (see the links in comment#0). Regarding the topology, only such VCPUs may be hot-unplugged that were hot-plugged themselves earlier, or added right at VM launch as hot(un)pluggable.

VCPU#0 always exists and is not hot(un)pluggable. Furthermore, all non-hotpluggable VCPUs that are present at boot are clustered after VCPU#0 -- that is, in lexicographical (socket, core, thread) order, the initial non-pluggable sequence is contiguous, and there's a clear boundary between pluggable and non-pluggable. (To my understanding.)

I'm unsure if QEMU currently enforces that hotplug / hot-unplug occur at the core level (as opposed to the logical VCPU -- i.e., thread -- level).
Comment 2 Laszlo Ersek 2019-02-04 09:04:08 UTC
QEMU and KVM support guests up to at least 384 VCPUs; and OVMF already uses X2APIC, at least in the resolution of the LocalApicLib class:

[LibraryClasses]
  LocalApicLib|UefiCpuPkg/Library/BaseXApicX2ApicLib/BaseXApicX2ApicLib.inf

We have successfully tested OVMF with 272+ VCPUs, multiple times (without hotplug). We had to grow the SMRAM for that (exposing the SMRAM size to the management applications / users):

  https://bugzilla.redhat.com/show_bug.cgi?id=1447027
  https://bugzilla.redhat.com/show_bug.cgi?id=1469338

With those fixed, domain configs with more than 255 VCPUs have been working. My most recent personal testing (with 272 VCPUs) was captured in

  https://bugzilla.redhat.com/show_bug.cgi?id=1447027#c24
Comment 3 Igor Mammedov 2019-02-05 08:01:24 UTC
(In reply to Laszlo Ersek from comment #1)
> I'm unsure if QEMU currently enforces that hotplug / hot-unplug occur at the
> core level (as opposed to the logical VCPU -- i.e., thread -- level).

In QEMU x86 case hot(un)plug happens at logical CPU granularity and then handled by guest OS ACPI handler (where Processor element per ACPI specification is a logical CPU). As far as I'm aware there is no plans for QEMU to hotplug x86 VCPUs at core (multiple threads) granularity.
Comment 4 Laszlo Ersek 2019-02-05 13:08:23 UTC
Staring a bit at "UefiCpuPkg/UefiCpuPkg.dec", there are signs of support
for CPU hotplug:

(1) The PcdCpuHotPlugSupport Feature Flag determines whether the feature
    is enabled.

If the feature is enabled, then:

(2) The PiCpuSmmEntry() entry point function exposes the address of the
    "mCpuHotPlugData" SMRAM variable, in the dynamic PCD called
    PcdCpuHotPlugDataAddress. See the structure definition in
    "UefiCpuPkg/Include/CpuHotPlugData.h".

    Unfortunately, nothing seems to consume PcdCpuHotPlugDataAddress.

    What module is supposed to access "mCpuHotPlugData" through
    PcdCpuHotPlugDataAddress? I presume it is a platform-specific
    module.

(3) PiCpuSmmEntry() requires PcdCpuSmmEnableBspElection to be TRUE. ("A
    BSP will be dynamically elected among all processors in each SMI").

    I think we can short-circuit this requirement fairly easily in
    OVMF/QEMU, if we just clone
    "UefiCpuPkg/Library/SmmCpuPlatformHookLibNull" for OvmfPkg, and
    implement PlatformSmmBspElection() such that it returns TRUE if and
    only if it is called on VCPU#0.

    See the PlatformSmmBspElection() call in SmiRendezvous()
    [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c].

    (VCPU#0 is non-pluggable in QEMU, and OVMF needs no actual BSP
    election to take place. So the task is to determine, in
    PlatformSmmBspElection(), whether we are executing on VCPU#0 --
    possibly via initial APIC ID check, I'm not sure.)

(4) The SmmAddProcessor() and SmmRemoveProcessor() functions will be
    un-gated [UefiCpuPkg/PiSmmCpuDxeSmm/CpuService.c].

    These functions implement the
    EFI_SMM_CPU_SERVICE_PROTOCOL.AddProcessor() and
    EFI_SMM_CPU_SERVICE_PROTOCOL.RemoveProcessor() members. In
    particular, they access the "mCpuHotPlugData" structure (see (2)
    above).

    Interestingly, SmmAddProcessor() carries the following comment:

>   //
>   // Check CPU hot plug data. The CPU RAS handler should have created the mapping
>   // of the APIC ID to SMBASE.
>   //

    Which makes me think that the "CPU RAS handler" is the
    platform-specific module that consumes PcdCpuHotPlugDataAddress
    (again, see (2)).

    However, nothing in edk2 seems to call these protocol members. Nor
    is "gEfiSmmCpuServiceProtocolGuid" ever looked up by any module in
    the SMM protocol database. So even if the AddProcessor() and
    RemoveProcessor() member functions are un-gated, how/when are they
    used?

    ... The protocol name EFI_SMM_CPU_SERVICE_PROTOCOL suggests that it
    is a standard protocol, so I tried to read up on the usage model in
    the PI spec (v1.6). However, the spec doesn't seem to know about the
    protocol (I searched the spec for the first component of the
    protocol GUID too, namely 0x1d202cab.) In the end the protocol
    appears to be edk2 specific.

    FWIW, the leading comments on the member function pointer types,
    i.e., EFI_SMM_ADD_PROCESSOR and EFI_SMM_REMOVE_PROCESSOR, look
    promising. See "UefiCpuPkg/Include/Protocol/SmmCpuService.h" --
    "Notify that a new processor has been added to the system", "Notify
    that a processor is hot-removed".
Comment 5 Igor Mammedov 2019-02-06 03:16:43 UTC
(In reply to Laszlo Ersek from comment #4)
[...]
>     (VCPU#0 is non-pluggable in QEMU, and OVMF needs no actual BSP
>     election to take place. So the task is to determine, in
>     PlatformSmmBspElection(), whether we are executing on VCPU#0 --
>     possibly via initial APIC ID check, I'm not sure.)
Perhaps checking for BSP flag in IA32_APIC_BASE?
[...]
Comment 6 Laszlo Ersek 2019-02-06 10:51:39 UTC
(In reply to Igor Mammedov from comment #5)
> (In reply to Laszlo Ersek from comment #4)
> [...]
> >     (VCPU#0 is non-pluggable in QEMU, and OVMF needs no actual BSP
> >     election to take place. So the task is to determine, in
> >     PlatformSmmBspElection(), whether we are executing on VCPU#0 --
> >     possibly via initial APIC ID check, I'm not sure.)
> Perhaps checking for BSP flag in IA32_APIC_BASE?
> [...]

I don't know enough to definitely say "yes", so, "probably yes". :) Thanks for the idea. :)
Comment 7 Yonghong Zhu 2019-03-21 20:36:17 UTC
Mike will help on the design.
Comment 8 Yonghong Zhu 2019-05-16 20:27:45 UTC
Mike will work on it.
Comment 9 Laszlo Ersek 2019-08-13 10:19:28 UTC
On-list discussion:

[edk2-devel] CPU hotplug using SMM with QEMU+OVMF

http://mid.mail-archive.com/8091f6e8-b1ec-f017-1430-00b0255729f4@redhat.com
https://edk2.groups.io/g/devel/message/45567
https://edk2.groups.io/g/rfc/message/29
Comment 10 Laszlo Ersek 2019-09-24 07:37:53 UTC
[edk2-devel] [PATCH wave 1 00/10]
support QEMU's "SMRAM at default SMBASE" feature

http://mid.mail-archive.com/20190924113505.27272-1-lersek@redhat.com
https://edk2.groups.io/g/devel/message/47924
Comment 11 Laszlo Ersek 2020-01-29 16:46:23 UTC
[edk2-devel] [PATCH v2 00/11]
support QEMU's "SMRAM at default SMBASE" feature

http://mid.mail-archive.com/20200129214412.2361-1-lersek@redhat.com
https://edk2.groups.io/g/devel/message/53545
Comment 12 Laszlo Ersek 2020-02-05 08:45:43 UTC
(In reply to Laszlo Ersek from comment #11)
> [edk2-devel] [PATCH v2 00/11]
> support QEMU's "SMRAM at default SMBASE" feature
> 
> http://mid.mail-archive.com/20200129214412.2361-1-lersek@redhat.com
> https://edk2.groups.io/g/devel/message/53545

Pushed via <https://github.com/tianocore/edk2/pull/332>, commit range 422da35375c6..75839f977d37.

This TianoCore BZ should remain open, as the brunt of the work is commencing only now. (The next steps are outlined in the v1 blurb linked in comment 10.)
Comment 13 Laszlo Ersek 2020-02-20 19:57:00 UTC
Using Igor's PoC patch for QEMU, I got hotplug/SMM working, both IA32 and IA32X64, including S3 suspend/resume. (Haven't tested X64 yet.)

Also haven't tested anything on SEV yet.

Un-plug is explicitly unsupported, for now -- it'll take a separate attack. (Currently I have 17 patches; 25 files changed, 2121 insertions(+), 32 deletions(-).) Hotplug/SMM is probably worth upstreaming now (I'll see soon how much more work SEV might need, for hotplug/SMM).

FWIW I've never tested hotplug on SEV *without* SMM either, so I'm not sure what the hotplug "baseline" is on SEV.
Comment 14 Laszlo Ersek 2020-02-21 07:29:22 UTC
("SMM_REQUIRE" is included in all of the below:)

* X64:

CPU hotplug works (with S3 disabled, as per independent commit 5133d1f1d297).

* SEV+S3:

Resume doesn't seem to work regardless of VCPU hotplug.

* SEV (no S3):

Hotplug doesn't work. I think it's because SMBASE relocation goes wrong: the SMRAM Save State Map is a communication area between the guest and QEMU. And while I did write a patch to keep the area decrypted "in advance", that doesn't work: I intentionally wrote the initial SMI handler for the hot-added CPUs in (big) real mode (no mode switches for simplicity's sake). When the guest writes to memory, what ends up in physical RAM is controlled by the guest-virt -> guest-phys *mapping* through which the write occurs. There is no paging in real mode however, so the update to the SMBASE field in the save state map is bound to place encrypted data in phys RAM, for QEMU to read. I guess this can be avoided if the SMI handler in question switches to 64-bit mode first, establishes a "plaintext" mapping for the SMBASE field, and writes the new value through that.

The symptom is that the first hotplug "seems" to work fine, but then, once the new CPU would have to execute the PiSmmCpuDxeSmm-provided SMI handler at the *next* SMI (i.e. the handler *to which* the initial relocation occurred), things fall apart.

I guess this is going to be another wart to fix up later. Writing the SMI handler and the Post-SMM Pen for hot-added CPUs is by far the most difficult part of this work, which is why I absolutely wanted to keep it simple (and to stick in real mode). But maybe we can work from here later, incrementally. (Also this suggests separate SMI handlers would be necessary for IA32 and X64, with IA32 having no chance at working with SEV, as usual -- I've been glad that I could manage with a single set of NASM sources, because real mode worked just fine under both qemu-system-i386 and -x86_64.)
Comment 15 Laszlo Ersek 2020-02-21 07:51:18 UTC
(
In theory, there are four dimensions with 2 values in each (16 cases to test):
- hotplug / hot-unplug
- SEV enabled / SEV disabled
- normal boot only / S3 resume
- SMM required / no SMM
)
Comment 16 Laszlo Ersek 2020-02-23 12:28:50 UTC
Posted

[edk2-devel] [PATCH 00/16] OvmfPkg: support VCPU hotplug with -D SMM_REQUIRE
https://edk2.groups.io/g/devel/message/54734
http://mid.mail-archive.com/20200223172537.28464-1-lersek@redhat.com
Comment 17 Laszlo Ersek 2020-02-26 17:13:21 UTC
Posted

* [edk2-devel] [PATCH v2 00/16]
  OvmfPkg: support VCPU hotplug with -D SMM_REQUIRE

http://mid.mail-archive.com/20200226221156.29589-1-lersek@redhat.com
https://edk2.groups.io/g/devel/message/54945
Comment 18 Laszlo Ersek 2020-03-04 07:32:59 UTC
(In reply to Laszlo Ersek from comment #17)
> Posted
> 
> * [edk2-devel] [PATCH v2 00/16]
>   OvmfPkg: support VCPU hotplug with -D SMM_REQUIRE
> 
> http://mid.mail-archive.com/20200226221156.29589-1-lersek@redhat.com
> https://edk2.groups.io/g/devel/message/54945

Merged as commit range 61d3b2d4279e..1158fc8e2c7b, via <https://github.com/tianocore/edk2/pull/416/>.

As stated earlier, this series does not contain SEV support, or hot-unplug support. I think I'll suggest separate BZs for those features.