QEMU and KVM support CPU hotplug and hot-unplug for virtual machines. https://git.qemu.org/?p=qemu.git;a=blob;f=docs/cpu-hotplug.rst;h=1c268e00b41a6b4e5af37571031ec89250ec0229;hb=HEAD https://git.qemu.org/?p=qemu.git;a=blob;f=docs/specs/acpi_cpu_hotplug.txt;h=ee219c8358088c72bcfb130c0e217fb71dc8f9b2;hb=HEAD Normally, the guest firmware need not be involved in CPU hot(un)plug -- the coordination occurs between the hypervisor and the guest OS, using ACPI artifacts. (The firmware is involved in setting up the ACPI objects, but that's a boot-time-only job, and the firmware is not enlightened about the ACPI specifics, it just executes QEMU's ACPI linker/loader script, for publishing the tables.) This applies to SeaBIOS, and to OVMF when built without -D SMM_REQUIRE. However, when OVMF is built with -D SMM_REQUIRE, there is a privilege barrier between virtual firmware and guest OS. If the guest OS could provide the startup routine for a newly hot-plugged VCPU, without the virtual firmware taking notice, then the guest OS could use this VCPU to attack the other VCPUs that execute privileged (SMM) code, the next time an SMI is raised. Thus, the firmware needs to grab the hot-added VCPU before the OS got control of it. I believe the feature needs work in QEMU (-> transfer control to the firmware upon VCPU hotplug), OvmfPkg (platform enablement), and in UefiCpuPkg/PiSmmCpuDxeSmm too. My understanding is that the open source PiSmmCpuDxeSmm driver operates at OS runtime with the fixed (V)CPU count that it fetches at boot. I assume the CPU hotplug logic exists in variants of PiSmmCpuDxeSmm that are built into physical platform firmware; this RFE is about extracting and open sourcing a reference implementation of that logic. Thank you very much for considering the request!
For QEMU/KVM, the use case is that the virtual machine is started with a specific topology (sockets, cores/socket, threads/core). The topology may or may not be fully populated at VM launch. If the virtual machine administrator (or a management application) decides that the VM needs more computational power, they hotplug a number of VCPUs. The topology cannot be changed dynamically, so whenever the hotplug occurs, the topology may not already be fully populated. (This means that the topology has to be sized by the admin in advance, in expectation of VCPU hotplug.) If the virtual machine admin (or a management application) that the VM should be deprived of some computational power, they hotplug a number of VCPUs. The exact dance requires guest cooperation (see the links in comment#0). Regarding the topology, only such VCPUs may be hot-unplugged that were hot-plugged themselves earlier, or added right at VM launch as hot(un)pluggable. VCPU#0 always exists and is not hot(un)pluggable. Furthermore, all non-hotpluggable VCPUs that are present at boot are clustered after VCPU#0 -- that is, in lexicographical (socket, core, thread) order, the initial non-pluggable sequence is contiguous, and there's a clear boundary between pluggable and non-pluggable. (To my understanding.) I'm unsure if QEMU currently enforces that hotplug / hot-unplug occur at the core level (as opposed to the logical VCPU -- i.e., thread -- level).
QEMU and KVM support guests up to at least 384 VCPUs; and OVMF already uses X2APIC, at least in the resolution of the LocalApicLib class: [LibraryClasses] LocalApicLib|UefiCpuPkg/Library/BaseXApicX2ApicLib/BaseXApicX2ApicLib.inf We have successfully tested OVMF with 272+ VCPUs, multiple times (without hotplug). We had to grow the SMRAM for that (exposing the SMRAM size to the management applications / users): https://bugzilla.redhat.com/show_bug.cgi?id=1447027 https://bugzilla.redhat.com/show_bug.cgi?id=1469338 With those fixed, domain configs with more than 255 VCPUs have been working. My most recent personal testing (with 272 VCPUs) was captured in https://bugzilla.redhat.com/show_bug.cgi?id=1447027#c24
(In reply to Laszlo Ersek from comment #1) > I'm unsure if QEMU currently enforces that hotplug / hot-unplug occur at the > core level (as opposed to the logical VCPU -- i.e., thread -- level). In QEMU x86 case hot(un)plug happens at logical CPU granularity and then handled by guest OS ACPI handler (where Processor element per ACPI specification is a logical CPU). As far as I'm aware there is no plans for QEMU to hotplug x86 VCPUs at core (multiple threads) granularity.
Staring a bit at "UefiCpuPkg/UefiCpuPkg.dec", there are signs of support for CPU hotplug: (1) The PcdCpuHotPlugSupport Feature Flag determines whether the feature is enabled. If the feature is enabled, then: (2) The PiCpuSmmEntry() entry point function exposes the address of the "mCpuHotPlugData" SMRAM variable, in the dynamic PCD called PcdCpuHotPlugDataAddress. See the structure definition in "UefiCpuPkg/Include/CpuHotPlugData.h". Unfortunately, nothing seems to consume PcdCpuHotPlugDataAddress. What module is supposed to access "mCpuHotPlugData" through PcdCpuHotPlugDataAddress? I presume it is a platform-specific module. (3) PiCpuSmmEntry() requires PcdCpuSmmEnableBspElection to be TRUE. ("A BSP will be dynamically elected among all processors in each SMI"). I think we can short-circuit this requirement fairly easily in OVMF/QEMU, if we just clone "UefiCpuPkg/Library/SmmCpuPlatformHookLibNull" for OvmfPkg, and implement PlatformSmmBspElection() such that it returns TRUE if and only if it is called on VCPU#0. See the PlatformSmmBspElection() call in SmiRendezvous() [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c]. (VCPU#0 is non-pluggable in QEMU, and OVMF needs no actual BSP election to take place. So the task is to determine, in PlatformSmmBspElection(), whether we are executing on VCPU#0 -- possibly via initial APIC ID check, I'm not sure.) (4) The SmmAddProcessor() and SmmRemoveProcessor() functions will be un-gated [UefiCpuPkg/PiSmmCpuDxeSmm/CpuService.c]. These functions implement the EFI_SMM_CPU_SERVICE_PROTOCOL.AddProcessor() and EFI_SMM_CPU_SERVICE_PROTOCOL.RemoveProcessor() members. In particular, they access the "mCpuHotPlugData" structure (see (2) above). Interestingly, SmmAddProcessor() carries the following comment: > // > // Check CPU hot plug data. The CPU RAS handler should have created the mapping > // of the APIC ID to SMBASE. > // Which makes me think that the "CPU RAS handler" is the platform-specific module that consumes PcdCpuHotPlugDataAddress (again, see (2)). However, nothing in edk2 seems to call these protocol members. Nor is "gEfiSmmCpuServiceProtocolGuid" ever looked up by any module in the SMM protocol database. So even if the AddProcessor() and RemoveProcessor() member functions are un-gated, how/when are they used? ... The protocol name EFI_SMM_CPU_SERVICE_PROTOCOL suggests that it is a standard protocol, so I tried to read up on the usage model in the PI spec (v1.6). However, the spec doesn't seem to know about the protocol (I searched the spec for the first component of the protocol GUID too, namely 0x1d202cab.) In the end the protocol appears to be edk2 specific. FWIW, the leading comments on the member function pointer types, i.e., EFI_SMM_ADD_PROCESSOR and EFI_SMM_REMOVE_PROCESSOR, look promising. See "UefiCpuPkg/Include/Protocol/SmmCpuService.h" -- "Notify that a new processor has been added to the system", "Notify that a processor is hot-removed".
(In reply to Laszlo Ersek from comment #4) [...] > (VCPU#0 is non-pluggable in QEMU, and OVMF needs no actual BSP > election to take place. So the task is to determine, in > PlatformSmmBspElection(), whether we are executing on VCPU#0 -- > possibly via initial APIC ID check, I'm not sure.) Perhaps checking for BSP flag in IA32_APIC_BASE? [...]
(In reply to Igor Mammedov from comment #5) > (In reply to Laszlo Ersek from comment #4) > [...] > > (VCPU#0 is non-pluggable in QEMU, and OVMF needs no actual BSP > > election to take place. So the task is to determine, in > > PlatformSmmBspElection(), whether we are executing on VCPU#0 -- > > possibly via initial APIC ID check, I'm not sure.) > Perhaps checking for BSP flag in IA32_APIC_BASE? > [...] I don't know enough to definitely say "yes", so, "probably yes". :) Thanks for the idea. :)
Mike will help on the design.
Mike will work on it.
On-list discussion: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF http://mid.mail-archive.com/8091f6e8-b1ec-f017-1430-00b0255729f4@redhat.com https://edk2.groups.io/g/devel/message/45567 https://edk2.groups.io/g/rfc/message/29
[edk2-devel] [PATCH wave 1 00/10] support QEMU's "SMRAM at default SMBASE" feature http://mid.mail-archive.com/20190924113505.27272-1-lersek@redhat.com https://edk2.groups.io/g/devel/message/47924
[edk2-devel] [PATCH v2 00/11] support QEMU's "SMRAM at default SMBASE" feature http://mid.mail-archive.com/20200129214412.2361-1-lersek@redhat.com https://edk2.groups.io/g/devel/message/53545
(In reply to Laszlo Ersek from comment #11) > [edk2-devel] [PATCH v2 00/11] > support QEMU's "SMRAM at default SMBASE" feature > > http://mid.mail-archive.com/20200129214412.2361-1-lersek@redhat.com > https://edk2.groups.io/g/devel/message/53545 Pushed via <https://github.com/tianocore/edk2/pull/332>, commit range 422da35375c6..75839f977d37. This TianoCore BZ should remain open, as the brunt of the work is commencing only now. (The next steps are outlined in the v1 blurb linked in comment 10.)
Using Igor's PoC patch for QEMU, I got hotplug/SMM working, both IA32 and IA32X64, including S3 suspend/resume. (Haven't tested X64 yet.) Also haven't tested anything on SEV yet. Un-plug is explicitly unsupported, for now -- it'll take a separate attack. (Currently I have 17 patches; 25 files changed, 2121 insertions(+), 32 deletions(-).) Hotplug/SMM is probably worth upstreaming now (I'll see soon how much more work SEV might need, for hotplug/SMM). FWIW I've never tested hotplug on SEV *without* SMM either, so I'm not sure what the hotplug "baseline" is on SEV.
("SMM_REQUIRE" is included in all of the below:) * X64: CPU hotplug works (with S3 disabled, as per independent commit 5133d1f1d297). * SEV+S3: Resume doesn't seem to work regardless of VCPU hotplug. * SEV (no S3): Hotplug doesn't work. I think it's because SMBASE relocation goes wrong: the SMRAM Save State Map is a communication area between the guest and QEMU. And while I did write a patch to keep the area decrypted "in advance", that doesn't work: I intentionally wrote the initial SMI handler for the hot-added CPUs in (big) real mode (no mode switches for simplicity's sake). When the guest writes to memory, what ends up in physical RAM is controlled by the guest-virt -> guest-phys *mapping* through which the write occurs. There is no paging in real mode however, so the update to the SMBASE field in the save state map is bound to place encrypted data in phys RAM, for QEMU to read. I guess this can be avoided if the SMI handler in question switches to 64-bit mode first, establishes a "plaintext" mapping for the SMBASE field, and writes the new value through that. The symptom is that the first hotplug "seems" to work fine, but then, once the new CPU would have to execute the PiSmmCpuDxeSmm-provided SMI handler at the *next* SMI (i.e. the handler *to which* the initial relocation occurred), things fall apart. I guess this is going to be another wart to fix up later. Writing the SMI handler and the Post-SMM Pen for hot-added CPUs is by far the most difficult part of this work, which is why I absolutely wanted to keep it simple (and to stick in real mode). But maybe we can work from here later, incrementally. (Also this suggests separate SMI handlers would be necessary for IA32 and X64, with IA32 having no chance at working with SEV, as usual -- I've been glad that I could manage with a single set of NASM sources, because real mode worked just fine under both qemu-system-i386 and -x86_64.)
( In theory, there are four dimensions with 2 values in each (16 cases to test): - hotplug / hot-unplug - SEV enabled / SEV disabled - normal boot only / S3 resume - SMM required / no SMM )
Posted [edk2-devel] [PATCH 00/16] OvmfPkg: support VCPU hotplug with -D SMM_REQUIRE https://edk2.groups.io/g/devel/message/54734 http://mid.mail-archive.com/20200223172537.28464-1-lersek@redhat.com
Posted * [edk2-devel] [PATCH v2 00/16] OvmfPkg: support VCPU hotplug with -D SMM_REQUIRE http://mid.mail-archive.com/20200226221156.29589-1-lersek@redhat.com https://edk2.groups.io/g/devel/message/54945
(In reply to Laszlo Ersek from comment #17) > Posted > > * [edk2-devel] [PATCH v2 00/16] > OvmfPkg: support VCPU hotplug with -D SMM_REQUIRE > > http://mid.mail-archive.com/20200226221156.29589-1-lersek@redhat.com > https://edk2.groups.io/g/devel/message/54945 Merged as commit range 61d3b2d4279e..1158fc8e2c7b, via <https://github.com/tianocore/edk2/pull/416/>. As stated earlier, this series does not contain SEV support, or hot-unplug support. I think I'll suggest separate BZs for those features.