Weird return value in strcmp [duplicate]
Weird return value in strcmp [duplicate]
This question already has an answer here:
While checking the return value of strcmp
function, I found some strange behavior in gcc. Here's my code:
strcmp
#include <stdio.h>
#include <string.h>
char str0 = "hello world!";
char str1 = "Hello world!";
int main()
printf("%dn", strcmp("hello world!", "Hello world!"));
printf("%dn", strcmp(str0, str1));
When I compile this with clang, both calls to strcmp
return 32. However, when compiling with gcc, the first call returns 1, and the second call returns 32. I don't understand why the first and second calls to strcmp
return different values when compiled using gcc.
strcmp
strcmp
Below is my test environment.
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
The standard only specifies the sign of the result; it does not specify the magnitude. If the strings are equal, the result is zero; if the first string sorts before the second, the result is negative; if the second string sorts after the second, the result is positive. In your example, both 1 and 32 are positive; the results are equivalent as far as the standard is concerned, and your code should be written so that it makes no difference to you, either.
– Jonathan Leffler
Sep 14 '18 at 14:32
Incidentally, the difference between
'h'
and 'H'
is 32. Coincidence?– Christian Gibbons
Sep 14 '18 at 14:34
'h'
'H'
Thanks for the replies, but i alredy checked man pages and ISO document so I know the exact return value is not specified. I just want to know why there's a difference between literal string and array string in GCC.
– fips197
Sep 14 '18 at 14:42
Likely duplicate of Inconsistent strcmp() return value when passing strings as pointers or as literals ... Tl;DR; both are valid you are seeing the effects of optimization
– Shafik Yaghmour
Sep 14 '18 at 15:31
4 Answers
4
It looks like you didn't enable optimizations (e.g. -O2
).
-O2
From my tests it looks like gcc always recognizes strcmp
with constant arguments and optimizes it, even with -O0
(no optimizations). Clang needs at least -O1
to do so.
strcmp
-O0
-O1
That's where the difference comes from: The code produced by clang calls strcmp
twice, but the code produced by gcc just does printf("%dn", 1)
in the first case because it knows that 'h' > 'H'
(ASCIIbetically, that is). It's just constant folding, really.
strcmp
printf("%dn", 1)
'h' > 'H'
Live example: https://godbolt.org/z/8Hg-gI
As the other answers explain, any positive value will do to indicate that the first string is greater than the second, so the compiler optimizer simply chooses 1
. The strcmp
library function apparently uses a different value.
1
strcmp
Although it's interesting that Clang and GCC can be induced to compile the program either such that their respective results produce the same output or such that they don't, I don't like interpreting that as optimization or lack thereof being the reason for the output to differ. It would be better to generalize that to "implementation details", as optimization is only one reason why the results might differ, whether in this specific case or (even more so) in the general case.
– John Bollinger
Sep 14 '18 at 14:51
You can also use -fbo-builtin to observe some of these effects
– Shafik Yaghmour
Sep 14 '18 at 15:33
The standard defines the result of strcmp
to be negative, if lhs
appears before rhs
in lexical order, zero if they are equal, or a positive value if lhs
appears lexically after rhs
.
strcmp
lhs
rhs
lhs
rhs
It's up to the implementation how to implement that and what exactly to return. You must not depend on a specific value in your programs, or they won't be portable. Simply check with comparisons (<, >, ==).
See https://en.cppreference.com/w/c/string/byte/strcmp
Background
One simple implementation might just calculate the difference of each character c1 - c2
and do that until the result is not zero, or one of the strings ends. The result will then be the numeric difference between the first character, in which the two strings differed.
c1 - c2
For example, this GLibC implementation: https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=string/strcmp.c;hb=HEAD
The strcmp
function is only specified to return a value larger than zero, zero, or less than zero. There's nothing specified what those positive and negative values have to be.
strcmp
The exact values returned by strcmp
in the case of the strings not being equal are not specified. From the man page:
strcmp
#include <string.h>
int strcmp(const char *s1, const char *s2);
int strncmp(const char *s1, const char *s2, size_t n);
The strcmp() and strncmp() functions return an integer less than,
equal to, or greater than zero if s1 (or the first n bytes thereof) is
found, respectively, to be less than, to match, or be greater than s2.
Since str1
compares greater than str2
, the value must be positive, which it is in both cases.
str1
str2
As for the difference between the two compilers, it appears that clang is returning the difference between the ASCII values for the corresponding characters that mismatched, while gcc is opting for a simple -1, 0, or 1. Both are valid, so your code should only need to check if the value is 0, greater than 0, or less than 0.
The interesting thing is that gcc only gave
1
when passing in the string literals. I suspect it may have been an optimization knowing that the result would always be the same.– Christian Gibbons
Sep 14 '18 at 14:42
1
What's "weird" about this?
– melpomene
Sep 14 '18 at 14:32