this should be easy
Well, you could presumably use an SWD debug probe of some kind, and you wouldn't need to interact with the bootloader at all. AFAIK, most debuggers will understand .elf files directly.
Asking for how a reasonably detailed explanation of how the build process works, distinct from the tooling, doesn't seem to me to be a particularly big ask.
CMAKE and ELF2UF2 are not new tools from the Raspberry Pi people; they're standard tools from other places. Asking RPi to document them is a bit like asking them to document all of the gcc command-line switches. They HAVE documentation (perhaps not up to the standard of the rest of the Pico documentation.)
a completely generic linker like the one in arm-none-eabi won't even produce a meaningful binary that can be passed to elf2uf2 unless it knows something about the memory map of the target device.
This is true of any microcontroller. And we do know about the memory map of the target device - that's all in the datasheet.
The SDK build process uses a script "pico_standard_link/memmap_default.ld" with a heap of memory location and size definitions to feed the linker.
And there is a similar default .ld file for any other thing that you compile for. Not many people really understand all of its content, any more than they understand the actual internal format of a .elf file.
It seems to me that the road from an object file to a UF2 file is a long, tortuous one, with no signposts.
Not much more so than the road from object file to .hex or .bin file for any other microcontroller. Or for that matter, the process of actually loading an object file into the user address space on something like a linux system.
We need to know how to use the ARM toolchain to produce ELF files that are palatable to elf2uf2. That tool _does_ require a second stage bootloader (whatever it is), else it bombs. I don't know what else it needs -- I haven't got past the "second stage bootloader" problem yet.
Loading code into most ARM microcontrollers involves something like a secondary bootloader. This is because the flash memory that you're writing to is controlled by some sort of "flash memory controller" that is proprietary to the vendor rather than part of the ARM core, and thus requires specific and varied code to write to it. Since the RP2040 writes its code in an external flash chip, it has an additional complication that you want it to support at least several varieties of such chips. And you have to put the chip back into XIP mode after code is loaded.
Other complications in the build process are due to wanting to access the ROM code for floating point and etc. I wish all the "-Wl,--wrap=__xxx" that end up in the link statement were hidden somewhere instead, since they really clutter up understanding the build logs...
I'll see if I can come up with a trivial build example.