Weird return value in strcmp [duplicate]

Weird return value in strcmp [duplicate]



This question already has an answer here:



While checking the return value of strcmp function, I found some strange behavior in gcc. Here's my code:


strcmp


#include <stdio.h>
#include <string.h>

char str0 = "hello world!";
char str1 = "Hello world!";

int main()
printf("%dn", strcmp("hello world!", "Hello world!"));
printf("%dn", strcmp(str0, str1));



When I compile this with clang, both calls to strcmp return 32. However, when compiling with gcc, the first call returns 1, and the second call returns 32. I don't understand why the first and second calls to strcmp return different values when compiled using gcc.


strcmp


strcmp



Below is my test environment.



This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.






What's "weird" about this?

– melpomene
Sep 14 '18 at 14:32






The standard only specifies the sign of the result; it does not specify the magnitude. If the strings are equal, the result is zero; if the first string sorts before the second, the result is negative; if the second string sorts after the second, the result is positive. In your example, both 1 and 32 are positive; the results are equivalent as far as the standard is concerned, and your code should be written so that it makes no difference to you, either.

– Jonathan Leffler
Sep 14 '18 at 14:32







Incidentally, the difference between 'h' and 'H' is 32. Coincidence?

– Christian Gibbons
Sep 14 '18 at 14:34


'h'


'H'






Thanks for the replies, but i alredy checked man pages and ISO document so I know the exact return value is not specified. I just want to know why there's a difference between literal string and array string in GCC.

– fips197
Sep 14 '18 at 14:42






Likely duplicate of Inconsistent strcmp() return value when passing strings as pointers or as literals ... Tl;DR; both are valid you are seeing the effects of optimization

– Shafik Yaghmour
Sep 14 '18 at 15:31





4 Answers
4



It looks like you didn't enable optimizations (e.g. -O2).


-O2



From my tests it looks like gcc always recognizes strcmp with constant arguments and optimizes it, even with -O0 (no optimizations). Clang needs at least -O1 to do so.


strcmp


-O0


-O1



That's where the difference comes from: The code produced by clang calls strcmp twice, but the code produced by gcc just does printf("%dn", 1) in the first case because it knows that 'h' > 'H' (ASCIIbetically, that is). It's just constant folding, really.


strcmp


printf("%dn", 1)


'h' > 'H'



Live example: https://godbolt.org/z/8Hg-gI



As the other answers explain, any positive value will do to indicate that the first string is greater than the second, so the compiler optimizer simply chooses 1. The strcmp library function apparently uses a different value.


1


strcmp






Although it's interesting that Clang and GCC can be induced to compile the program either such that their respective results produce the same output or such that they don't, I don't like interpreting that as optimization or lack thereof being the reason for the output to differ. It would be better to generalize that to "implementation details", as optimization is only one reason why the results might differ, whether in this specific case or (even more so) in the general case.

– John Bollinger
Sep 14 '18 at 14:51






You can also use -fbo-builtin to observe some of these effects

– Shafik Yaghmour
Sep 14 '18 at 15:33



The standard defines the result of strcmp to be negative, if lhs appears before rhs in lexical order, zero if they are equal, or a positive value if lhs appears lexically after rhs.


strcmp


lhs


rhs


lhs


rhs



It's up to the implementation how to implement that and what exactly to return. You must not depend on a specific value in your programs, or they won't be portable. Simply check with comparisons (<, >, ==).



See https://en.cppreference.com/w/c/string/byte/strcmp



Background



One simple implementation might just calculate the difference of each character c1 - c2 and do that until the result is not zero, or one of the strings ends. The result will then be the numeric difference between the first character, in which the two strings differed.


c1 - c2



For example, this GLibC implementation: https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=string/strcmp.c;hb=HEAD



The strcmp function is only specified to return a value larger than zero, zero, or less than zero. There's nothing specified what those positive and negative values have to be.


strcmp



The exact values returned by strcmp in the case of the strings not being equal are not specified. From the man page:


strcmp


#include <string.h>
int strcmp(const char *s1, const char *s2);
int strncmp(const char *s1, const char *s2, size_t n);



The strcmp() and strncmp() functions return an integer less than,
equal to, or greater than zero if s1 (or the first n bytes thereof) is
found, respectively, to be less than, to match, or be greater than s2.



Since str1 compares greater than str2, the value must be positive, which it is in both cases.


str1


str2



As for the difference between the two compilers, it appears that clang is returning the difference between the ASCII values for the corresponding characters that mismatched, while gcc is opting for a simple -1, 0, or 1. Both are valid, so your code should only need to check if the value is 0, greater than 0, or less than 0.






The interesting thing is that gcc only gave 1 when passing in the string literals. I suspect it may have been an optimization knowing that the result would always be the same.

– Christian Gibbons
Sep 14 '18 at 14:42


1

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)