lurk101
Posts: 965
Joined: Mon Jan 27, 2020 2:35 pm
Location: Cumming, GA (US)

C++ atomic not fully supported?

Sun Sep 19, 2021 5:46 pm

Code: Select all

#include <iostream>
#include <atomic>

#if __STDC_NO_ATOMICS__
#error "C11 Atomics NOT available"
#else
#pragma message "C11 Atomics ARE available"
#endif

using namespace std;

int main() {
    atomic<int> i;
    cout << i << endl;
    i = 123;
    cout << i << endl;
    int j = i;
    cout << j << endl;
    i++; // Causes link error!!!
    cout << i << endl;
}

Code: Select all

pi@raspberrypi:~/pico/test/b$ make
[  1%] Creating directories for 'ELF2UF2Build'
...
[ 21%] Building CXX object CMakeFiles/test.dir/test.cpp.obj
/home/pi/pico/test/test.cpp:7:17: note: #pragma message: C11 Atomics ARE available
 #pragma message "C11 Atomics ARE available"
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~
...
[100%] Linking CXX executable test.elf
/usr/lib/gcc/arm-none-eabi/7.3.1/../../../arm-none-eabi/bin/ld: CMakeFiles/test.dir/test.cpp.obj: in function `main':
test.cpp:(.text.startup.main+0x54): undefined reference to `__atomic_fetch_add_4'
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/test.dir/build.make:687: test.elf] Error 1
make[1]: *** [CMakeFiles/Makefile2:1538: CMakeFiles/test.dir/all] Error 2
make: *** [Makefile:84: all] Error 2
Though I'm not sure how a full C++ implementation of atomic might work on CM0 since it doesn't have LDXEX and STREX instructions. If atomics aren't supported on CM0 then why isn't __STDC_NO_ATOMICS__= 1 defined as it should?

mschnell
Posts: 70
Joined: Wed Jul 28, 2021 10:33 am
Location: Krefeld, Germany

Re: C++ atomic not fully supported?

Mon Sep 20, 2021 8:12 am

The compiler seems to provide "atomic_thread_fence()". But I have no idea how this works.
-Michael

asu
Posts: 60
Joined: Sun Jul 18, 2021 8:19 am

Re: C++ atomic not fully supported?

Mon Sep 20, 2021 10:14 am

lurk101 wrote:
Sun Sep 19, 2021 5:46 pm
Though I'm not sure how a full C++ implementation of atomic might work on CM0 since it doesn't have LDXEX and STREX instructions.
My understanding would be that 32-bit LDR/STR on CM0 is atomic (LDREX/STREX would only be useful on other cores where this is not the case), but that using atomics (in the C++ language sense) is still required to prevent the compiler from optimizing away loads and writes in an undesired way.

As for the link error, I'm not sure. I think std::atomic is supposed to work. My first guess would be a toolchain issue.

lurk101
Posts: 965
Joined: Mon Jan 27, 2020 2:35 pm
Location: Cumming, GA (US)

Re: C++ atomic not fully supported?

Mon Sep 20, 2021 1:53 pm

asu wrote:
Mon Sep 20, 2021 10:14 am
lurk101 wrote:
Sun Sep 19, 2021 5:46 pm
Though I'm not sure how a full C++ implementation of atomic might work on CM0 since it doesn't have LDXEX and STREX instructions.
My understanding would be that 32-bit LDR/STR on CM0 is atomic (LDREX/STREX would only be useful on other cores where this is not the case), but that using atomics (in the C++ language sense) is still required to prevent the compiler from optimizing away loads and writes in an undesired way.
Yes, LDR and STR are atomic, but consider the ++ operator on an variable defined as atomic. It needs to do a LDR,ADD,STR sequence which is not atomic.
As for the link error, I'm not sure. I think std::atomic is supposed to work. My first guess would be a toolchain issue.
Maybe, but so far I haven't found any ARM code that doesn't use LDREX and STREX. The thing about those is that they also work on multiprocessors. You may be right that it's a toolchain problem, the problem being that GCC doesn't set the __STDC_NO_ATOMICS__ define for CM0 and CM0+ where atomics aren't supported.

kilograham
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 956
Joined: Fri Apr 12, 2019 11:00 am
Location: austin tx

Re: C++ atomic not fully supported?

Mon Sep 20, 2021 4:19 pm

Yeah this is a toolchain issue; we have a few SDK issues for non traditionally CM0 things (e.g. c library re-entrance / __threadlocal) that would require us to build our own newlib; though I don’t know if that helps here. Building a custom GCC is an even bigger step!!

lurk101
Posts: 965
Joined: Mon Jan 27, 2020 2:35 pm
Location: Cumming, GA (US)

Re: C++ atomic not fully supported?

Mon Sep 20, 2021 5:49 pm

kilograham wrote:
Mon Sep 20, 2021 4:19 pm
Yeah this is a toolchain issue; we have a few SDK issues for non traditionally CM0 things (e.g. c library re-entrance / __threadlocal) that would require us to build our own newlib; though I don’t know if that helps here. Building a custom GCC is an even bigger step!!
I believe that <atomic> is a pure class template, so you shouldn't need a patched GCC. It would have to rely on hardware_sync for multicore... not ideal.

mschnell
Posts: 70
Joined: Wed Jul 28, 2021 10:33 am
Location: Krefeld, Germany

Re: C++ atomic not fully supported?

Mon Sep 20, 2021 6:40 pm

asu wrote:
Mon Sep 20, 2021 10:14 am
32-bit LDR/STR on CM0 is atomic (LDREX/STREX would only be useful on other cores where this is not the case),
load and store are "atomic" in that they reliably read and write complete entities. But they are not "operations", and hence you cant create multiprocessor cooperation promitives such as Mutex from same. for this you need atomic read/modify/write operations. Those are a no-go as single instructions with ARM and any other "plain" RSIC architecture. That is why bigger ARM cores provides "instruction sequence locking" means.
As the M0+ cores don't seem to provide such, I suppose you need to use the hardware spin locks provided in the 2040 chip. I din't suppose the compile will or is supposed to do this without the help of a library. But a more efficient way is avoiding atomic structures and pass data between the cores by the hardware mailboxes provided by the chip.
-Michael

lurk101
Posts: 965
Joined: Mon Jan 27, 2020 2:35 pm
Location: Cumming, GA (US)

Re: C++ atomic not fully supported?

Mon Sep 20, 2021 7:19 pm

mschnell wrote:
Mon Sep 20, 2021 6:40 pm
asu wrote:
Mon Sep 20, 2021 10:14 am
32-bit LDR/STR on CM0 is atomic (LDREX/STREX would only be useful on other cores where this is not the case),
load and store are "atomic" in that they reliably read and write complete entities. But they are not "operations", and hence you cant create multiprocessor cooperation promitives such as Mutex from same. for this you need atomic read/modify/write operations. Those are a no-go as single instructions with ARM and any other "plain" RSIC architecture. That is why bigger ARM cores provides "instruction sequence locking" means.
As the M0+ cores don't seem to provide such, I suppose you need to use the hardware spin locks provided in the 2040 chip. I din't suppose the compile will or is supposed to do this without the help of a library. But a more efficient way is avoiding atomic structures and pass data between the cores by the hardware mailboxes provided by the chip.
-Michael
Depends what you mean by "instruction sequence locking", there are no physical bus locking or thread disabling mechanisms used, nothing is really "locked". The way it works on CM3, CM4 and CM7 is much like MIPS did it. LDREX tags the physical address as exclusive access for the current processor, and clears any exclusive access tag for this processor for any other physical address. STREX does one of two things. If the executing processor does not have an outstanding tagged physical address, the store does not take place, and the value 1 is returned in Rd. If the physical address is tagged as exclusive access for the executing processor, the store takes place, the tag is cleared, and the value 0 is returned in Rd. LDREX,modify,STREX are included in a look that exits only if Rd becomes 0. This works well in SMP where the tags are maintained in shared cache.

On Pico with CM0+ which doesn't support LDREX and STREX you're probably right. Best to avoid <atomic> and use SDK provided mutual exclusion mechanisms that support multicore. You'd have to use something other than <thread> to launch a 2nd C++ thread on Pico anyway!

mschnell
Posts: 70
Joined: Wed Jul 28, 2021 10:33 am
Location: Krefeld, Germany

Re: C++ atomic not fully supported?

Tue Sep 21, 2021 11:40 am

By "instruction sequence locking", I meant means to protect the result of a sequence of instructions from being corrupted by access of another OS thread or another core (to point out the difference vs CISC processors, that feature complex instructions performing atomic read modify write operations. I understand a (non M0+) ARM processor provides the mechanism described above: at the end of the sequence a special store instruction returns if or if not the previously loaded RAM content had been overwritten by another thread or core and hence the instruction sequence should be repeated to provide secure atomicness. Hence With M0+ additional hardware is necessary to provide atomic read modify write, which the RP 2040 does feature e.g. by the hardware spinlocks.
Maybe the compiler could be enabled to call an appropriate library function on that behalf.
-Michael

carlk3
Posts: 82
Joined: Wed Feb 17, 2021 8:46 pm

Re: C++ atomic not fully supported?

Mon Sep 27, 2021 9:13 pm

I have been naively using __atomic_test_and_set(). Are you saying it's not doing what I thought it was doing?

kilograham
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 956
Joined: Fri Apr 12, 2019 11:00 am
Location: austin tx

Re: C++ atomic not fully supported?

Mon Sep 27, 2021 10:07 pm

yeah, that is not atomic

ejolson
Posts: 8243
Joined: Tue Mar 18, 2014 11:47 am

Re: C++ atomic not fully supported?

Tue Sep 28, 2021 12:17 am

carlk3 wrote:
Mon Sep 27, 2021 9:13 pm
I have been naively using __atomic_test_and_set(). Are you saying it's not doing what I thought it was doing?
kilograham wrote:
Mon Sep 27, 2021 10:07 pm
yeah, that is not atomic
To avoid confusion, maybe that function should be renamed as

__nonatomic_test_and_set

Is there a list of stuff like this to watch out for when developing for the Pico?

lurk101
Posts: 965
Joined: Mon Jan 27, 2020 2:35 pm
Location: Cumming, GA (US)

Re: C++ atomic not fully supported?

Tue Sep 28, 2021 1:36 am

ejolson wrote:
Tue Sep 28, 2021 12:17 am
carlk3 wrote:
Mon Sep 27, 2021 9:13 pm
I have been naively using __atomic_test_and_set(). Are you saying it's not doing what I thought it was doing?
kilograham wrote:
Mon Sep 27, 2021 10:07 pm
yeah, that is not atomic
To avoid confusion, maybe that function should be renamed as

__nonatomic_test_and_set

Is there a list of stuff like this to watch out for when developing for the Pico?
Unlike other ARM cores, the Cortex M0 is missing two critical instructions that would enable atomic read-modify-write operations. To make things more interesting the Cortex M0 was intended as the tiniest possible ARM core and never meant for multi core. But here we are!

RP has done an admirable job gloming two CM0 cores together and providing additional hardware to handle mutual exclusion.

I can think of two scenarios where atomic read-modify-write behavior would be needed on the Pico.

- Updating a variable, that might also be modified by an interrupt handler. Those are easily handled by disabling then re-enabling interrupts around the operation.

- Updating a variable that might be modified by the other core. The SDK has an API to hardware locks that are global to both cores.

But the current standard function names probably shouldn't be renamed just for the CM0. They mean and do what they say on most other CPUs.

slimhazard
Posts: 61
Joined: Sat Apr 03, 2021 8:47 pm

Re: C++ atomic not fully supported?

Tue Sep 28, 2021 8:35 am

I can think of two scenarios where atomic read-modify-write behavior would be needed on the Pico.

- Updating a variable, that might also be modified by an interrupt handler. Those are easily handled by disabling then re-enabling interrupts around the operation.

- Updating a variable that might be modified by the other core. The SDK has an API to hardware locks that are global to both cores.
The SDK's criticial_section API, a part of pico_sync, does both of these.

mschnell
Posts: 70
Joined: Wed Jul 28, 2021 10:33 am
Location: Krefeld, Germany

Re: C++ atomic not fully supported?

Tue Sep 28, 2021 1:41 pm

Seemingly __atomic_test_and_set() is (erroneously) built in the compiler.
You could do a Macro with that name using criticial_section or directly hardware spinlocks.
-Michael

kilograham
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 956
Joined: Fri Apr 12, 2019 11:00 am
Location: austin tx

Re: C++ atomic not fully supported?

Tue Sep 28, 2021 3:42 pm

mschnell wrote:
Tue Sep 28, 2021 1:41 pm
Seemingly __atomic_test_and_set() is (erroneously) built in the compiler.
You could do a Macro with that name using criticial_section or directly hardware spinlocks.
-Michael
yeah i looked yesterday, and there was an issue from 2015 to have GCC on CortexM0+ farm it out to a function so at least someone could implement it where possible, but it doesn't look like that ever happened.

mschnell
Posts: 70
Joined: Wed Jul 28, 2021 10:33 am
Location: Krefeld, Germany

Re: C++ atomic not fully supported?

Thu Sep 30, 2021 8:13 am

Maybe at that time there was no wide spread hardware that would allow for implementing such a function. But now there is :) ...
-Michael

mschnell
Posts: 70
Joined: Wed Jul 28, 2021 10:33 am
Location: Krefeld, Germany

Re: C++ atomic not fully supported?

Thu Sep 30, 2021 8:20 am

Mind, that even a simple instruction as

x |= n;

is not atomic and hence

n = 1 << core_or_thread_id_or_mainline_vs_interrupt_code;
x |= n

is erroneous code, even if the data bits are used unrelated.

Using bit field types hides this fact even more, as a simple assignment triggers that error.

-Michael

kilograham
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 956
Joined: Fri Apr 12, 2019 11:00 am
Location: austin tx

Re: C++ atomic not fully supported?

Thu Sep 30, 2021 1:16 pm

fortunately it is very easy to write multi-core/IRQ safe code using the primitives in pico/sync (or hardware/sync)

carlk3
Posts: 82
Joined: Wed Feb 17, 2021 8:46 pm

Re: C++ atomic not fully supported?

Thu Sep 30, 2021 5:38 pm

I use it as part of an idiom for "lazy initialization" in a multi-threaded context:

Code: Select all

void initialize_once() {

    static bool initialized;
    // bool __atomic_test_and_set (void *ptr, int memorder)
    // This built-in function performs an atomic test-and-set operation on the
    // byte at *ptr. The byte is set to some implementation defined nonzero
    // “set” value and the return value is true if and only if the previous
    // contents were “set”.
    if (__atomic_test_and_set(&initialized, __ATOMIC_SEQ_CST))
        return;
        
    // Otherwise, do the initialization...
What''s a better idiom for the Pico? critical_section or mutex?

ejolson
Posts: 8243
Joined: Tue Mar 18, 2014 11:47 am

Re: C++ atomic not fully supported?

Thu Sep 30, 2021 5:45 pm

carlk3 wrote:
Thu Sep 30, 2021 5:38 pm
I use it as part of an idiom for "lazy initialization" in a multi-threaded context:

Code: Select all

void initialize_once() {

    static bool initialized;
    // bool __atomic_test_and_set (void *ptr, int memorder)
    // This built-in function performs an atomic test-and-set operation on the
    // byte at *ptr. The byte is set to some implementation defined nonzero
    // “set” value and the return value is true if and only if the previous
    // contents were “set”.
    if (__atomic_test_and_set(&initialized, __ATOMIC_SEQ_CST))
        return;
        
    // Otherwise, do the initialization...
What''s a better idiom for the Pico? critical_section or mutex?
Does your present code attempt to handle the case where initialisation is in progress but not finished? I don't see that in your code snippet. In particular, you might have a race even if gcc atomic functions were actually atomic on the Pico.

kilograham
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 956
Joined: Fri Apr 12, 2019 11:00 am
Location: austin tx

Re: C++ atomic not fully supported?

Thu Sep 30, 2021 9:10 pm

really depends how you are feeling.

seems like overkill to have a single mutex to protect this one variable - you could instead just use a single (auto-init) mutex instead of the variable at the cost of a few bytes, and then use mutex_try_enter() to detect the first entrant, and never release it.

a critical section is fine (just a wrapper around a spin lock)... you could also use hw_claim_lock/hw_claim_unlock which just acquire a spin lock set aside for intiailzation style things (i.e. hw claiming, but can be used by other things).

P.S. it makes me think it might be nice to add an auto-init semaphore too (auto-init here means that you can define one statically and have it ready to go without explicitly calling an init method first from non racing code).

kilograham
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 956
Joined: Fri Apr 12, 2019 11:00 am
Location: austin tx

Re: C++ atomic not fully supported?

Thu Sep 30, 2021 9:13 pm

note i was about to write an example, but then i realize your initialization example actually returns for subsequent callers before the initialization may have completed, which seems problematic.

mschnell
Posts: 70
Joined: Wed Jul 28, 2021 10:33 am
Location: Krefeld, Germany

Re: C++ atomic not fully supported?

Fri Oct 01, 2021 8:37 am

Dedicating one of the 32 hardware spin locks to "C language use", and first implement would-be builtin stuff like __atomic_test_and_set() as macros, and at some later point in time, empower the compiler appropriately, seems like a way to go to overcome the shortcoming of the M0+ core on that behalf.


OTOH there often are workarounds:
I wanted to do a performance meter by counting the rounds of a main loop for a second.
This seems like doable like this:

Code: Select all

volatile uint32_t cntr = 0;

Main: 
while (true) {
   ...
   ++cntr;
}

Interrupt:
test = cntr
cntr = 0;
...
But providing nonsense, as the interrupt can happen right within the not atomic "++cnt;" line and the setting to zero is dumped.

but you can do

Code: Select all

volatile uint32_t cntr = 0;
volatile bool flag = false;

Main: 
while (true) {
  ...
  if (flag) {
     cntr = 0;
     flag = false;  
   } else {
     ++cntr;
  }   
}

Interrupt :
test = cntr;
cntr = 0;  // in case the loop does not run at all
flag = true;
...
-Michael

Return to “SDK”