Heater
Posts: 19683
Joined: Tue Jul 17, 2012 3:02 pm

Re: Countdown using unsigned in C

Sun Sep 24, 2023 6:44 pm

jahboater wrote:
Sun Sep 24, 2023 4:07 pm
I have been away all day so missed all the UB fun ...
Shame. We have been having a great time :)
jahboater wrote:
Sun Sep 24, 2023 4:07 pm
The compiler moves things around in static memory to save space (fill in gaps with smaller items) and secondly for correct alignment.
Its only luck if they are adjacent and b comes after a with no gap.
I don't know if things are moved around much on the stack but I would expect gaps for alignment.
All very true. I think the big thing here is that those arrays when declared globally are placed one after the other in ascending memory. But when declared locally they are on the stack. The stack grows downwards. So the order is reversed.
jahboater wrote:
Sun Sep 24, 2023 4:07 pm
Does the standard say you can't compare pointers that refer to two different objects ?
I just checked. The standard says:
Section 6.5.9 Equality operators

7 Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.
So it seems you can compare them but they are only sure to be equal given the conditions above.
jahboater wrote:
Sun Sep 24, 2023 4:07 pm
A pointer is valid within the object and one past the end of the object.
You are comparing a pointer to the second block to one byte before that start of that block, I'm sure that's not allowed.
Yeah. I think that is the UB with my example.
Last edited by Heater on Sun Sep 24, 2023 7:36 pm, edited 1 time in total.
Slava Ukrayini.

Heater
Posts: 19683
Joined: Tue Jul 17, 2012 3:02 pm

Re: Countdown using unsigned in C

Sun Sep 24, 2023 6:51 pm

ejolson wrote:
Sun Sep 24, 2023 4:35 pm
In my opinion, allowing GCC to make such assumptions should be relegated to the strict standards-compliant mode.
Bu, but, it's following the standard that cause the problems. Things are just undefined. It's a mess.
ejolson wrote:
Sun Sep 24, 2023 4:35 pm
The GNU default should include -fwrap for the practical reason of being less surprising. The main difficulty with that idea is nobody asked me.
Is it less surprising?

<anecdote>
A few years back I found a bug in openssl when trying to make some security certificates. It worked on my old 32 bit Linux installation PC but not on my new 64 bit one. Turned out to be an overflow problem.

I was somewhat surprised. This software hat is trusted by millions for security should have such a simple problem. So simple I found it !
</anecdote>
Slava Ukrayini.

User avatar
jahboater
Posts: 8829
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Countdown using unsigned in C

Sun Sep 24, 2023 7:22 pm

Heater wrote:
Sun Sep 24, 2023 6:51 pm
A few years back I found a bug in openssl when trying to make some security certificates. It worked on my old 32 bit Linux installation PC but not on my new 64 bit one. Turned out to be an overflow problem.
I have recently found a bug in the GCC runtime library.
If I do a checked signed multiply of two large values that will overflow, it works perfectly on 64-bit platforms, but aborts when compiled with -ftrapv on 32-bits.

-ftrapv is detecting overflow within the function GCC uses to do the 64-bit multiply on a 32-bit machine.

-ftrapv is supposed to ignore checked arithmetic even when it overflows.
Last edited by jahboater on Sun Sep 24, 2023 7:35 pm, edited 2 times in total.

User avatar
jahboater
Posts: 8829
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Countdown using unsigned in C

Sun Sep 24, 2023 7:31 pm

Heater wrote:
Sun Sep 24, 2023 6:44 pm
All very true. I think the big thing here is that those arrays when declared globally are placed one after the other in ascending memory. But when declared locally they are on the stack. The star grows downwards. So the order is reversed.
Ah yes.

As a side note, in static memory, GCC seems to move things around (to fill in gaps I presume).
So if there is a large gap caused by an alignment requirement, GCC might place other small things in there.
Makes sense.

I never make any assumptions about the relative positions of two objects in memory.

I see GCC is getting very good at spotting adjacent obect's and using "load pair" instructions to get them both very quickly (ARM64).
Maybe it even shuffles things around so it can use load/store pair, I don't know.
But that's the compilers privilege!
Last edited by jahboater on Sun Sep 24, 2023 7:35 pm, edited 1 time in total.

Heater
Posts: 19683
Joined: Tue Jul 17, 2012 3:02 pm

Re: Countdown using unsigned in C

Sun Sep 24, 2023 7:35 pm

jahboater wrote:
Sun Sep 24, 2023 7:22 pm
-ftrapv is detecting overflow within the function GCC uses to do the 64-bit multiply on a 32-bit machine.
Ha! Great.

Slightly off topic but:

<anecdote>
Back in the day, when the Intel 286 was new on the block, a software engineer at the next desk to me at Northern Telecom was tasked with writing something that would test all the instructions of that new chip. Which he did, in assembler of course. He came to me one day, scratching his head, and and asked what I thought about what he was seeing. Something was wrong, was it the chip or was it his code? Turned out he had discovered that if one did a multiply (MUL) by an immediate value that happened to be negative and you were running in protected mode it always produced the wrong result.

Soon we had a document, under NDA from Intel, detailing all the hundreds of bugs Intel was aware of in the 286. Sure enough our bug was there.
</anecdote>
Slava Ukrayini.

User avatar
jahboater
Posts: 8829
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Countdown using unsigned in C

Sun Sep 24, 2023 7:40 pm

Heater wrote:
Sun Sep 24, 2023 7:35 pm
Soon we had a document, under NDA from Intel, detailing all the hundreds of bugs Intel was aware of in the 286. Sure enough our bug was there.
Wow!

That's quite a job even for the 286.
I bet no one ever does that for the modern Intel chips with all their countless instruction sets and hideously complex AVX512 stuff.

Heater
Posts: 19683
Joined: Tue Jul 17, 2012 3:02 pm

Re: Countdown using unsigned in C

Sun Sep 24, 2023 7:52 pm

My recollection was that most of the bugs in that Intel document were about the things breaking protected mode. Basically they would be security vulnerabilities today. Give that by far the most 286 based computers were still running MS-DOS, in "real" mode, nobody noticed or cared. Northern Telecom was shipping Unix Servers based on 286 (MS Xenix) so they were interested.
Slava Ukrayini.

swampdog
Posts: 1193
Joined: Fri Dec 04, 2015 11:22 am

Re: Countdown using unsigned in C

Mon Sep 25, 2023 7:46 am

Heater wrote:
Sun Sep 24, 2023 1:46 pm
swampdog wrote:
Sun Sep 24, 2023 1:17 pm
We don't get to your "while (a_p > b_p)" loop though because the arrays are not adjacent. Program has already gone wrong. Surely this is enough..
Sure we do. On my machine with my Clang those arrays are adjacent and this code works as one might naively expect. Here is the code with some added comments re: memory layout of the arrays:

Code: Select all

#include <stdio.h>

int main()
{
    int a[] = {1, 2, 3, 4};     // Higher on stack
    int b[] = {10, 20, 30, 40}; // Lower  on stack
    // In memory
    //  10, 20, 30. 40, 1, 2, 3, 4
    //               ^           ^
    //               b_p         a_p
    int *a_p = &a[3];
    int *b_p = &b[3];

    // Display array a in reverse
    while (a_p > b_p)
    {
        printf("%d, \n", *a_p);
        a_p--;
    }
}
Produces:

Code: Select all

✗ clang  -Wall -Wextra -O3 test.c
✗ ./a.out                        
4, 
3, 
2, 
1, 
But this code has UB. Why?
I've already said. You're relying on UB to make a point about later UB and I've already said two versions of gcc (not even different compilers or architectures have messed up your example). gcc 12.3.0 yields..

Code: Select all

        .loc 1 5 9
        movl    $1, -32(%rbp)
        movl    $2, -28(%rbp)
        movl    $3, -24(%rbp)
        movl    $4, -20(%rbp)
        .loc 1 6 9
        movl    $10, -48(%rbp)
        movl    $20, -44(%rbp)
        movl    $30, -40(%rbp)
        movl    $40, -36(%rbp)
..whereas gcc 11.4.0 yields..

Code: Select all

        .loc 1 5 9
        movl    $1, -48(%rbp)
        movl    $2, -44(%rbp)
        movl    $3, -40(%rbp)
        movl    $4, -36(%rbp)
        .loc 1 6 9
        movl    $10, -32(%rbp)
        movl    $20, -28(%rbp)
        movl    $30, -24(%rbp)
        movl    $40, -20(%rbp)
..which means no output for me on the latter ('a' and 'b' are stacked opposite to what you say).

Quiz #3:

When I make the arrays global the same loop does not work. Why? :

Code: Select all

#include <stdio.h>

int a[] = {1, 2, 3, 4};
int b[] = {10, 20, 30, 40};

int main()
{
    int *a_p = &a[3];
    int *b_p = &b[3];

    // Display array a in reverse
    while (a_p > b_p)
    {
        printf("%d, \n", *a_p);
        a_p--;
    }
}
Same reason. If a valid reason for doing the above could be concocted then surely you'd emphasise the point..

Code: Select all

#include <stdio.h>

typedef struct { 
int a[4];
int b[4];
} /*__attribute__((packed))*/ S;

int main()
{
    S s;
    int *a_p = &s.a[3];
    int *b_p = &s.b[3];

    // Display array a in reverse
    while (a_p > b_p)
    {
        printf("%d, \n", *a_p);
        a_p--;
    }
}
..which incidentally, produces no output under both versions of x86 gcc. I've just checked it on a 64bit rpi, gcc 10.2.1 and gcc 12.3.0 agree. At least we now have consistently broken behaviour. Ditto x86 and aarch64 clang 16.0.6 (neither platform has a system clang).

I can't think of a scenario where one might write your example. I thought you were going to elaborate with some check along the lines of "&b[-1] == &a[3]" and/or "&a[4] == &b[0]". :-)

User avatar
jahboater
Posts: 8829
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Countdown using unsigned in C

Mon Sep 25, 2023 9:33 am

swampdog wrote:
Mon Sep 25, 2023 7:46 am
gcc 12.3.0 yields..

Code: Select all

        .loc 1 5 9
        movl    $1, -32(%rbp)
        movl    $2, -28(%rbp)
        movl    $3, -24(%rbp)
        movl    $4, -20(%rbp)
        .loc 1 6 9
        movl    $10, -48(%rbp)
        movl    $20, -44(%rbp)
        movl    $30, -40(%rbp)
        movl    $40, -36(%rbp)
..whereas gcc 11.4.0 yields..

Code: Select all

        .loc 1 5 9
        movl    $1, -48(%rbp)
        movl    $2, -44(%rbp)
        movl    $3, -40(%rbp)
        movl    $4, -36(%rbp)
        .loc 1 6 9
        movl    $10, -32(%rbp)
        movl    $20, -28(%rbp)
        movl    $30, -24(%rbp)
        movl    $40, -20(%rbp)
GCC 13.2 doesn't bother creating the array "b" at all!!

Heater
Posts: 19683
Joined: Tue Jul 17, 2012 3:02 pm

Re: Countdown using unsigned in C

Mon Sep 25, 2023 10:35 am

swampdog wrote:
Mon Sep 25, 2023 7:46 am
I've already said. You're relying on UB to make a point about later UB
Yes I am. just so that I could ask the question "Where is the UB?"
swampdog wrote:
Mon Sep 25, 2023 7:46 am
and I've already said two versions of gcc (not even different compilers or architectures have messed up your example). gcc 12.3.0 yields..
.....which means no output for me on the latter ('a' and 'b' are stacked opposite to what you say).
Quite so. The program is a demonstration of UB. Which does not mean it will never work as the author (wrongly) hoped. So no surprise it works for me but not for you..
swampdog wrote:
Mon Sep 25, 2023 7:46 am
Same reason. If a valid reason for doing the above could be concocted then surely you'd emphasise the point..
I'm not sure what you mean. For sure yes the same reason, the code has UB in the loop comparison. The example was concocted to demonstrate said UB. As it happens given a naive understanding of local variables on stacks, that stacks grow downward, and not knowing the UB rules one might expect that code to work. Which it does, by luck, in my case.
swampdog wrote:
Mon Sep 25, 2023 7:46 am
..which incidentally, produces no output under both versions of x86 gcc. I've just checked it on a 64bit rpi, gcc 10.2.1 and gcc 12.3.0 agree. At least we now have consistently broken behaviour. Ditto x86 and aarch64 clang 16.0.6 (neither platform has a system clang).
Oddly enough on my Jetson with GCC 7.5.0 it works with the arrays on the locally or globally. Go figure
swampdog wrote:
Mon Sep 25, 2023 7:46 am
I can't think of a scenario where one might write your example
Me neither.

However such code does get written. Pointer comparisons are misused all the time. All over the net I have read many times statements like "C is just a glorified assembler". I think anyone with that attitude may well assume that pointers are just addresses, which are just numbers that can be compared how you like. They might assume that those arrays do indeed come one other the other in memory as they would when written in assembler. And so on. A such they would be writing UB all over the place.
swampdog wrote:
Mon Sep 25, 2023 7:46 am
. I thought you were going to elaborate with some check along the lines of "&b[-1] == &a[3]" and/or "&a[4] == &b[0]". :-)
Good idea. I leave that as an exercise to the reader... :)
Slava Ukrayini.

Heater
Posts: 19683
Joined: Tue Jul 17, 2012 3:02 pm

Re: Countdown using unsigned in C

Mon Sep 25, 2023 11:04 am

OK guys. What about this example:

Code: Select all

int main()
{
    int a[] = {0xa0, 0xa1, 0xa2, 0xa3}; // Higher on stack
    int b[] = {0xb0, 0xb1, 0xb2, 0xb3}; // Lower  on stack
    // In memory
    //  b0, b1, b2, b3, a0, a1, a2, a3
    //  ^               ^
    //  b_p             a_p              Loop initialaization
    //                  ^
    //                  b_p              Loop termination.
    //                  a_p

    int *a_p = &a[0];
    int *b_p = &b[0];

    // Display array b in forward order
    while (a_p != b_p)
    {
        printf("%x, \n", *b_p);
        b_p++;
    }

    return 0;
}
As far as I can tell there is no UB in here. At least from looking at C23 standard 6.6.9 para 7. Which I can break down and apply to the code as follows:

// 7 Two pointers compare equal if and only if:

both are null pointers, (No)
both are pointers to the same object
(including a pointer to an object and a subobject at its beginning) (No)
or function, (No)
both are pointers to one past the last element of the same array object, (No)
or one is a pointer to one past the end of one array object and (Yes)
the other is a pointer to the start of a different array object (Yes)
that happens to immediately follow the first array object in the address space. (yes)

In my code a_p still points to the start of an object, b_p points to one past the end of a different object and they happen to be consecutive in memory. SO all is well. Right?

And yet, this code works as expected with Clang on my MacBook Pro. But it fails on my Jetson Nano with GCC 7.5.0.

Moving the arrays to global causes it to fail on the MacBook (as expected). Still fails on the Jetson.

Presumably I still have UB here. But where? I have no idea.
Slava Ukrayini.

User avatar
jahboater
Posts: 8829
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Countdown using unsigned in C

Mon Sep 25, 2023 11:21 am

Heater wrote:
Mon Sep 25, 2023 11:04 am
OK guys. What about this example:

As far as I can tell there is no UB in here.
The %x is wrong, but I doubt it makes any difference.
:) The standard clearly states that positive signed integers are a subset of unsigned integers :).

Code: Select all

try.c:21:18: error: format '%x' expects argument of type 'unsigned int', but argument 2 has type 'int' [-Werror=format=]
   21 |         printf("%x, \n", *b_p);
      |                 ~^       ~~~~
      |                  |       |
      |                  |       int
      |                  unsigned int
      |                 %x
GCC messages are very well presented these days.

Heater
Posts: 19683
Joined: Tue Jul 17, 2012 3:02 pm

Re: Countdown using unsigned in C

Mon Sep 25, 2023 11:36 am

jahboater wrote:
Mon Sep 25, 2023 11:21 am
GCC messages are very well presented these days.
Are they? Neither Clang on my MacBook or GCC 7.5 on my Jetson errored/warned about that %x. Neither does x86-64 gcc 13.2 on Godbolt: https://godbolt.org/z/qar6Md33G. Adding -Wall -Wextra does not do it. What magical compiler do you have?
Slava Ukrayini.

Heater
Posts: 19683
Joined: Tue Jul 17, 2012 3:02 pm

Re: Countdown using unsigned in C

Mon Sep 25, 2023 1:04 pm

Also, what the hell is wrong with using %x to display signed integers?

Hexadecimal is a perfectly normal way to write down signed integer values. Especially now that the standard says that two's comp. is used so we know exactly what we are looking at and where the sign is.
Slava Ukrayini.

User avatar
Paeryn
Posts: 3588
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Countdown using unsigned in C

Mon Sep 25, 2023 1:26 pm

Heater wrote:
Mon Sep 25, 2023 11:04 am
OK guys. What about this example:

Code: Select all

int main()
{
    int a[] = {0xa0, 0xa1, 0xa2, 0xa3}; // Higher on stack
    int b[] = {0xb0, 0xb1, 0xb2, 0xb3}; // Lower  on stack
    // In memory
    //  b0, b1, b2, b3, a0, a1, a2, a3
    //  ^               ^
    //  b_p             a_p              Loop initialaization
    //                  ^
    //                  b_p              Loop termination.
    //                  a_p

    int *a_p = &a[0];
    int *b_p = &b[0];

    // Display array b in forward order
    while (a_p != b_p)
    {
        printf("%x, \n", *b_p);
        b_p++;
    }

    return 0;
}
As far as I can tell there is no UB in here. At least from looking at C23 standard 6.6.9 para 7. Which I can break down and apply to the code as follows:

// 7 Two pointers compare equal if and only if:

both are null pointers, (No)
both are pointers to the same object
(including a pointer to an object and a subobject at its beginning) (No)
or function, (No)
both are pointers to one past the last element of the same array object, (No)
or one is a pointer to one past the end of one array object and (Yes)
the other is a pointer to the start of a different array object (Yes)
that happens to immediately follow the first array object in the address space. (yes)
There is nothing to say that either a or b immediately follows the other in address space, nor is there any guarantee of which has the lower address. You can never rely on two variables being immediately next to each other in address space, nor which has the lower base address, such ordering is down to the whims of the compiler (unless they are members of a struct and the struct has no padding bytes between members).

Usually compilers have a default mode such as globals being allocated in declared order with increasing addresses, automatic variables being allocated in declared order with decreasing addresses, but the compiler is free to re-order variables to make use of otherwise wasted alignment padding or to put oft paired variables next to each other so they can be in the same cache line or for any other reason.
Heater wrote:
Mon Sep 25, 2023 11:04 am
In my code a_p still points to the start of an object, b_p points to one past the end of a different object and they happen to be consecutive in memory. SO all is well. Right?
Did you check to see that the addresses were actually as you assumed?
Heater wrote:
Mon Sep 25, 2023 11:04 am
And yet, this code works as expected with Clang on my MacBook Pro. But it fails on my Jetson Nano with GCC 7.5.0.

Moving the arrays to global causes it to fail on the MacBook (as expected). Still fails on the Jetson.

Presumably I still have UB here. But where? I have no idea.
She who travels light — forgot something.

User avatar
jahboater
Posts: 8829
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Countdown using unsigned in C

Mon Sep 25, 2023 3:41 pm

Heater wrote:
Mon Sep 25, 2023 11:36 am
jahboater wrote:
Mon Sep 25, 2023 11:21 am
GCC messages are very well presented these days.
Are they?
I thought the message layout was helpful.
Heater wrote:
Mon Sep 25, 2023 11:36 am
Neither Clang on my MacBook or GCC 7.5 on my Jetson errored/warned about that %x. Neither does x86-64 gcc 13.2 on Godbolt: https://godbolt.org/z/qar6Md33G. Adding -Wall -Wextra does not do it. What magical compiler do you have?
If you want your code to squeaky clean and correct, this doesn't seem unreasonable:

Code: Select all

-Wformat-signedness
     If -Wformat is specified, also warn if the format string requires an unsigned argument
     and the argument is signed and vice versa.
There are countless optional warnings about printf formats.

An important one computes the worst case maximum length of a sprintf output and checks it fits in the buffer.
(not so easy for a human programmer to calculate ... all the floating-point formats etc).
I want to know about things like that.

User avatar
jahboater
Posts: 8829
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Countdown using unsigned in C

Mon Sep 25, 2023 3:44 pm

Heater wrote:
Mon Sep 25, 2023 1:04 pm
Also, what the hell is wrong with using %x to display signed integers?

Hexadecimal is a perfectly normal way to write down signed integer values. Especially now that the standard says that two's comp. is used so we know exactly what we are looking at and where the sign is.
I agree. But its tradition.
Are not hex literals unsigned ? 0xFFFFFF doesn't need U at the end.

You just have to use a cast: printf("%x, \n", (uint32_t)*b_p);

Heater
Posts: 19683
Joined: Tue Jul 17, 2012 3:02 pm

Re: Countdown using unsigned in C

Mon Sep 25, 2023 3:45 pm

jahboater wrote:
Mon Sep 25, 2023 11:21 am
GCC messages are very well presented these days.
Turns out Clang messages can be very well presented these days. In the course of experimenting I moved my dodgy loop into a function and used %llx :

Code: Select all

void display(int *a_p, int *b_p)
{
    // Display array b in forward order
    while (a_p != b_p)
    {
        printf("%llx, \n", *b_p);
        b_p++;
    }
}
And now I get the warning:

Code: Select all

test.c:15:28: warning: format specifies type 'unsigned long long' but the argument has type 'int' [-Wformat]
        printf("%llx, \n", *b_p);
                ~~~~       ^~~~
                %x
1 warning generated.
So this is telling me to use %x for ints whereas the error you presented was saying do not use %x for ints.

I despair....
Slava Ukrayini.

Heater
Posts: 19683
Joined: Tue Jul 17, 2012 3:02 pm

Re: Countdown using unsigned in C

Mon Sep 25, 2023 3:51 pm

Paeryn wrote:
Mon Sep 25, 2023 1:26 pm
There is nothing to say that either a or b immediately follows the other in address space, nor is there any guarantee of which has the lower address. You can never rely on two variables being immediately next to each other in address space, nor which has the lower base address, such ordering is down to the whims of the compiler (unless they are members of a struct and the struct has no padding bytes between members).
Bingo! I'm sure you are right. Things get moved around in memory and it breaks.

Which is an interesting situation.

In the cases that the arrays do following each other as expected there is no UB and the thing works.

BUT layout is not defined, as you say. Or as the standard would say "implantation defined" so when the layout changes there is UB and the thing fails.

So, I'm not suffering from UB as such. I'm suffering from implementation defined.

Not that it matters much.
Slava Ukrayini.

User avatar
jahboater
Posts: 8829
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Countdown using unsigned in C

Mon Sep 25, 2023 3:57 pm

Heater wrote:
Mon Sep 25, 2023 3:45 pm
So this is telling me to use %x for ints whereas the error you presented was saying do not use %x for ints.

I despair....
There are two different problems:

In the first case the format expects an unsigned int and you gave it an int (I agree that shouldn't be much of a problem - it will just print the bit-pattern in hex).

In your second case the format %llx expects a 64-bit integer and you gave it a 32-bit integer which is going to print garbage.

Looks like Clang gave priority to the more serious error!

Its probably undefined behavior to supply arguments that differ from the format specifiers :)

Heater
Posts: 19683
Joined: Tue Jul 17, 2012 3:02 pm

Re: Countdown using unsigned in C

Mon Sep 25, 2023 4:08 pm

jahboater wrote:
Mon Sep 25, 2023 3:57 pm
Its probably undefined behavior to supply arguments that differ from the format specifiers :)
Ha! It probably is.

But what about all those magic coercions that C does? If I feed a 32 bit int into where a 64 bit int is asked for can it not be silently promoted. Like passing an int to a function that wants long int. Why not for printf as well?

Anyway my clang does not know -Wformat-signedness
Slava Ukrayini.

User avatar
jahboater
Posts: 8829
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Countdown using unsigned in C

Mon Sep 25, 2023 4:24 pm

Heater wrote:
Mon Sep 25, 2023 4:08 pm
But what about all those magic coercions that C does? If I feed a 32 bit int into where a 64 bit int is asked for can it not be silently promoted. Like passing an int to a function that wants long int.

Why not for printf as well?
Because printf has no function prototype, it is variadic.
https://en.cppreference.com/w/c/variadic

Neither the compiler nor printf know what all the "actual" arguments will be.
Printf parses the format string and then expects to find something on the stack that exactly matches each specifier.
The compiler parses the format string to validate it for security and check the actual arguments match.
A common source of problems for beginners, and even worse for scanf.

But the compiler now does pretty good static analysis, here are all the possible warnings:

Code: Select all

-Wformat
-Wformat=n
  Check calls to "printf" and "scanf", etc., to make sure that the arguments supplied
  have types appropriate to the format string specified, and that the conversions
  specified in the format string make sense.  This includes standard functions, and
  others specified by format attributes, in the "printf", "scanf", "strftime" and
  "strfmon" (an X/Open extension, not in the C standard) families (or other target-
  specific families).  Which functions are checked without format attributes having been
  specified depends on the standard version selected, and such checks of functions
  without the attribute specified are disabled by -ffreestanding or -fno-builtin.

  The formats are checked against the format features supported by GNU libc version 2.2.
  These include all ISO C90 and C99 features, as well as features from the Single Unix
  Specification and some BSD and GNU extensions.  Other library implementations may not
  support all these features; GCC does not support warning about features that go beyond
  a particular library's limitations.  However, if -Wpedantic is used with -Wformat,
  warnings are given about format features not in the selected standard version (but not
  for "strfmon" formats, since those are not in any version of the C standard).

  -Wformat=1
  -Wformat
      Option -Wformat is equivalent to -Wformat=1, and -Wno-format is equivalent to
      -Wformat=0.  Since -Wformat also checks for null format arguments for several
      functions, -Wformat also implies -Wnonnull.  Some aspects of this level of format
      checking can be disabled by the options: -Wno-format-contains-nul,
      -Wno-format-extra-args, and -Wno-format-zero-length.  -Wformat is enabled by
      -Wall.

  -Wformat=2
      Enable -Wformat plus additional format checks.  Currently equivalent to -Wformat
      -Wformat-nonliteral -Wformat-security -Wformat-y2k.

-Wno-format-contains-nul
  If -Wformat is specified, do not warn about format strings that contain NUL bytes.

-Wno-format-extra-args
  If -Wformat is specified, do not warn about excess arguments to a "printf" or "scanf"
  format function.  The C standard specifies that such arguments are ignored.

  Where the unused arguments lie between used arguments that are specified with $
  operand number specifications, normally warnings are still given, since the
  implementation could not know what type to pass to "va_arg" to skip the unused
  arguments.  However, in the case of "scanf" formats, this option suppresses the
  warning if the unused arguments are all pointers, since the Single Unix Specification
  says that such unused arguments are allowed.

-Wformat-overflow
-Wformat-overflow=level
  Warn about calls to formatted input/output functions such as "sprintf" and "vsprintf"
  that might overflow the destination buffer.  When the exact number of bytes written by
  a format directive cannot be determined at compile-time it is estimated based on
  heuristics that depend on the level argument and on optimization.  While enabling
  optimization will in most cases improve the accuracy of the warning, it may also
  result in false positives.
  
  -Wformat-overflow
  -Wformat-overflow=1
      Level 1 of -Wformat-overflow enabled by -Wformat employs a conservative approach
      that warns only about calls that most likely overflow the buffer.  At this level,
      numeric arguments to format directives with unknown values are assumed to have the
      value of one, and strings of unknown length to be empty.  Numeric arguments that
      are known to be bounded to a subrange of their type, or string arguments whose
      output is bounded either by their directive's precision or by a finite set of
      string literals, are assumed to take on the value within the range that results in
      the most bytes on output.  For example, the call to "sprintf" below is diagnosed
      because even with both a and b equal to zero, the terminating NUL character ('\0')
      appended by the function to the destination buffer will be written past its end.
      Increasing the size of the buffer by a single byte is sufficient to avoid the
      warning, though it may not be sufficient to avoid the overflow.

              void f (int a, int b)
              {
                char buf [13];
                sprintf (buf, "a = %i, b = %i\n", a, b);
              }

  -Wformat-overflow=2
      Level 2 warns also about calls that might overflow the destination buffer given an
      argument of sufficient length or magnitude.  At level 2, unknown numeric arguments
      are assumed to have the minimum representable value for signed types with a
      precision greater than 1, and the maximum representable value otherwise.  Unknown
      string arguments whose length cannot be assumed to be bounded either by the
      directive's precision, or by a finite set of string literals they may evaluate to,
      or the character array they may point to, are assumed to be 1 character long.

      At level 2, the call in the example above is again diagnosed, but this time
      because with a equal to a 32-bit "INT_MIN" the first %i directive will write some
      of its digits beyond the end of the destination buffer.  To make the call safe
      regardless of the values of the two variables, the size of the destination buffer
      must be increased to at least 34 bytes.  GCC includes the minimum size of the
      buffer in an informational note following the warning.

      An alternative to increasing the size of the destination buffer is to constrain
      the range of formatted values.  The maximum length of string arguments can be
      bounded by specifying the precision in the format directive.  When numeric
      arguments of format directives can be assumed to be bounded by less than the
      precision of their type, choosing an appropriate length modifier to the format
      specifier will reduce the required buffer size.  For example, if a and b in the
      example above can be assumed to be within the precision of the "short int" type
      then using either the %hi format directive or casting the argument to "short"
      reduces the maximum required size of the buffer to 24 bytes.

              void f (int a, int b)
              {
                char buf [23];
                sprintf (buf, "a = %hi, b = %i\n", a, (short)b);
              }

-Wno-format-zero-length
  If -Wformat is specified, do not warn about zero-length formats.  The C standard
  specifies that zero-length formats are allowed.

-Wformat-nonliteral
  If -Wformat is specified, also warn if the format string is not a string literal and
  so cannot be checked, unless the format function takes its format arguments as a
  "va_list".

-Wformat-security
  If -Wformat is specified, also warn about uses of format functions that represent
  possible security problems.  At present, this warns about calls to "printf" and
  "scanf" functions where the format string is not a string literal and there are no
  format arguments, as in "printf (foo);".  This may be a security hole if the format
  string came from untrusted input and contains %n.  (This is currently a subset of what
  -Wformat-nonliteral warns about, but in future warnings may be added to
  -Wformat-security that are not included in -Wformat-nonliteral.)

-Wformat-signedness
  If -Wformat is specified, also warn if the format string requires an unsigned argument
  and the argument is signed and vice versa.
  
-Wformat-truncation
-Wformat-truncation=level
  Warn about calls to formatted input/output functions such as "snprintf" and
  "vsnprintf" that might result in output truncation.  When the exact number of bytes
  written by a format directive cannot be determined at compile-time it is estimated
  based on heuristics that depend on the level argument and on optimization.  While
  enabling optimization will in most cases improve the accuracy of the warning, it may
  also result in false positives.  Except as noted otherwise, the option uses the same
  logic -Wformat-overflow.

  -Wformat-truncation
  -Wformat-truncation=1
      Level 1 of -Wformat-truncation enabled by -Wformat employs a conservative approach
      that warns only about calls to bounded functions whose return value is unused and
      that will most likely result in output truncation.

  -Wformat-truncation=2
      Level 2 warns also about calls to bounded functions whose return value is used and
      that might result in truncation given an argument of sufficient length or
      magnitude.

-Wformat-y2k
  If -Wformat is specified, also warn about "strftime" formats that may yield only a
  two-digit year.

swampdog
Posts: 1193
Joined: Fri Dec 04, 2015 11:22 am

Re: Countdown using unsigned in C

Mon Sep 25, 2023 6:09 pm

Heater wrote:
Mon Sep 25, 2023 10:35 am
swampdog wrote:
Mon Sep 25, 2023 7:46 am
I've already said. You're relying on UB to make a point about later UB
Yes I am. just so that I could ask the question "Where is the UB?"
swampdog wrote:
Mon Sep 25, 2023 7:46 am
and I've already said two versions of gcc (not even different compilers or architectures have messed up your example). gcc 12.3.0 yields..
.....which means no output for me on the latter ('a' and 'b' are stacked opposite to what you say).
Quite so. The program is a demonstration of UB. Which does not mean it will never work as the author (wrongly) hoped. So no surprise it works for me but not for you..
swampdog wrote:
Mon Sep 25, 2023 7:46 am
Same reason. If a valid reason for doing the above could be concocted then surely you'd emphasise the point..
I'm not sure what you mean. For sure yes the same reason, the code has UB in the loop comparison. The example was concocted to demonstrate said UB. As it happens given a naive understanding of local variables on stacks, that stacks grow downward, and not knowing the UB rules one might expect that code to work. Which it does, by luck, in my case.
The best (ahem) "valid" reason I could think of today was for two hardware targets where a[] maps onto a set of ports and b[] maps onto another set of ports which are adjacent on the first target but there are other ports between a[] and b[] on the second target. First programmer does..

Code: Select all

typedef struct {
char a[4];
char b[4];
} /*__attribute__((packed))*/ S;
..which could conceivably be made to work. Next programmer comes along with..

Code: Select all

typedef struct {
char a[4];
#ifdef TARGET_B
char x[2];
#endif
char b[4];
} /*__attribute__((packed))*/ S;
..which even with my imagination is a tall ask! Your while loop could kick in for TARGET_B.
swampdog wrote:
Mon Sep 25, 2023 7:46 am
..which incidentally, produces no output under both versions of x86 gcc. I've just checked it on a 64bit rpi, gcc 10.2.1 and gcc 12.3.0 agree. At least we now have consistently broken behaviour. Ditto x86 and aarch64 clang 16.0.6 (neither platform has a system clang).
Oddly enough on my Jetson with GCC 7.5.0 it works with the arrays on the locally or globally. Go figure
swampdog wrote:
Mon Sep 25, 2023 7:46 am
I can't think of a scenario where one might write your example
Me neither.

However such code does get written. Pointer comparisons are misused all the time. All over the net I have read many times statements like "C is just a glorified assembler". I think anyone with that attitude may well assume that pointers are just addresses, which are just numbers that can be compared how you like. They might assume that those arrays do indeed come one other the other in memory as they would when written in assembler. And so on. A such they would be writing UB all over the place.
My NULL prefixed string array post from way back in this thread doesn't look so bad now. ;-)
swampdog wrote:
Mon Sep 25, 2023 7:46 am
. I thought you were going to elaborate with some check along the lines of "&b[-1] == &a[3]" and/or "&a[4] == &b[0]". :-)
Good idea. I leave that as an exercise to the reader... :)
My K&R C compiler really was little more than a gloried assembler so I understand where the idea originated. You really could write..

Code: Select all

void
main()
{
 "Hello"[1] = 'E';
}
..and it would generate code to change 'e' to 'E'. With enough effort..

Code: Select all

int main()
{
 ((char*)("hello"))[1] = 'E';
}
..will compile but damn these pesky compilers for putting it in a read-only section and damn pesky hardware protection (sigsegv). Not allowed any fun any more!

swampdog
Posts: 1193
Joined: Fri Dec 04, 2015 11:22 am

Re: Countdown using unsigned in C

Mon Sep 25, 2023 6:11 pm

jahboater wrote:
Mon Sep 25, 2023 9:33 am
swampdog wrote:
Mon Sep 25, 2023 7:46 am
gcc 12.3.0 yields..

Code: Select all

        .loc 1 5 9
        movl    $1, -32(%rbp)
        movl    $2, -28(%rbp)
        movl    $3, -24(%rbp)
        movl    $4, -20(%rbp)
        .loc 1 6 9
        movl    $10, -48(%rbp)
        movl    $20, -44(%rbp)
        movl    $30, -40(%rbp)
        movl    $40, -36(%rbp)
..whereas gcc 11.4.0 yields..

Code: Select all

        .loc 1 5 9
        movl    $1, -48(%rbp)
        movl    $2, -44(%rbp)
        movl    $3, -40(%rbp)
        movl    $4, -36(%rbp)
        .loc 1 6 9
        movl    $10, -32(%rbp)
        movl    $20, -28(%rbp)
        movl    $30, -24(%rbp)
        movl    $40, -20(%rbp)
GCC 13.2 doesn't bother creating the array "b" at all!!
What? It wants us to initialise things as well a declaring them? Compilers are getting out of hand! :-)

ejolson
Posts: 11785
Joined: Tue Mar 18, 2014 11:47 am

Re: Countdown using unsigned in C

Mon Sep 25, 2023 8:33 pm

swampdog wrote:
Mon Sep 25, 2023 6:09 pm
Heater wrote:
Mon Sep 25, 2023 10:35 am
swampdog wrote:
Mon Sep 25, 2023 7:46 am
I've already said. You're relying on UB to make a point about later UB
Yes I am. just so that I could ask the question "Where is the UB?"
swampdog wrote:
Mon Sep 25, 2023 7:46 am
and I've already said two versions of gcc (not even different compilers or architectures have messed up your example). gcc 12.3.0 yields..
.....which means no output for me on the latter ('a' and 'b' are stacked opposite to what you say).
Quite so. The program is a demonstration of UB. Which does not mean it will never work as the author (wrongly) hoped. So no surprise it works for me but not for you..
swampdog wrote:
Mon Sep 25, 2023 7:46 am
Same reason. If a valid reason for doing the above could be concocted then surely you'd emphasise the point..
I'm not sure what you mean. For sure yes the same reason, the code has UB in the loop comparison. The example was concocted to demonstrate said UB. As it happens given a naive understanding of local variables on stacks, that stacks grow downward, and not knowing the UB rules one might expect that code to work. Which it does, by luck, in my case.
The best (ahem) "valid" reason I could think of today was for two hardware targets where a[] maps onto a set of ports and b[] maps onto another set of ports which are adjacent on the first target but there are other ports between a[] and b[] on the second target. First programmer does..

Code: Select all

typedef struct {
char a[4];
char b[4];
} /*__attribute__((packed))*/ S;
..which could conceivably be made to work. Next programmer comes along with..

Code: Select all

typedef struct {
char a[4];
#ifdef TARGET_B
char x[2];
#endif
char b[4];
} /*__attribute__((packed))*/ S;
..which even with my imagination is a tall ask! Your while loop could kick in for TARGET_B.
swampdog wrote:
Mon Sep 25, 2023 7:46 am
..which incidentally, produces no output under both versions of x86 gcc. I've just checked it on a 64bit rpi, gcc 10.2.1 and gcc 12.3.0 agree. At least we now have consistently broken behaviour. Ditto x86 and aarch64 clang 16.0.6 (neither platform has a system clang).
Oddly enough on my Jetson with GCC 7.5.0 it works with the arrays on the locally or globally. Go figure
swampdog wrote:
Mon Sep 25, 2023 7:46 am
I can't think of a scenario where one might write your example
Me neither.

However such code does get written. Pointer comparisons are misused all the time. All over the net I have read many times statements like "C is just a glorified assembler". I think anyone with that attitude may well assume that pointers are just addresses, which are just numbers that can be compared how you like. They might assume that those arrays do indeed come one other the other in memory as they would when written in assembler. And so on. A such they would be writing UB all over the place.
My NULL prefixed string array post from way back in this thread doesn't look so bad now. ;-)
swampdog wrote:
Mon Sep 25, 2023 7:46 am
. I thought you were going to elaborate with some check along the lines of "&b[-1] == &a[3]" and/or "&a[4] == &b[0]". :-)
Good idea. I leave that as an exercise to the reader... :)
My K&R C compiler really was little more than a gloried assembler so I understand where the idea originated. You really could write..

Code: Select all

void
main()
{
 "Hello"[1] = 'E';
}
..and it would generate code to change 'e' to 'E'. With enough effort..

Code: Select all

int main()
{
 ((char*)("hello"))[1] = 'E';
}
..will compile but damn these pesky compilers for putting it in a read-only section and damn pesky hardware protection (sigsegv). Not allowed any fun any more!
I think this is currently a problem on the Pico where the compiler puts string constants in flash for functions that are supposed to be fully RAM resident.

Along slightly different lines, here is Fido's string copy routine:

Code: Select all

#include <stdio.h>

void dogcopy(char *x,char *y){
    long int o=x-y;
    do *(y+o)=*y;
    while(*(y++));
}

int main(){
    char s[64];
    dogcopy(s,"scratch woof meow bark");
    puts(s);
}
As shown by

Code: Select all

$ gcc -Wall -Wextra dogcopy.c
$ ./a.out 
scratch woof meow bark
it seems to work.

Even so, I'm suspicious since the dog developer is always exhibiting unexpected behavior.

Return to “C/C++”