What happens if I write less than 12 bytes to a 12 byte buffer?

What happens if I write less than 12 bytes to a 12 byte buffer?



Understandably, going over a buffer errors out (or creates an overflow), but what happens if there are less than 12 bytes used in a 12 byte buffer? Is it possible or does the empty trailing always fill with 0s? Orthogonal question that may help: what is contained in a buffer when it is instantiated but not used by the application yet?



I have looked at a few pet programs in Visual Studio and it seems that they are appended with 0s (or null characters) but I am not sure if this is a MS implementation that may vary across language/ compiler.






memset can be used to ensure the buffer is initialized with zeros.

– TruBlu
Sep 17 '18 at 2:14



memset






@TruBlu: Or in C++, std::fill.

– MSalters
Sep 17 '18 at 8:58


std::fill






@TruBlu don't do that, i've seen lots of people do malloc followed by memset, or char foo[X] followed by memset, no good reason to. if you want them zero-initialized, use calloc() instead of malloc(), or use char foo[x]=0; and it will be zero-initialized.

– hanshenrik
Sep 17 '18 at 13:23


char foo[x]=0;






Define "buffer". In general, a 12-byte array is not a data structure that I would call a 12-byte buffer.

– Tom Blodget
Sep 17 '18 at 17:17






@hanshenrik Good to know. Thank you for providing optimized alternatives. malloc() is more efficient than calloc(), therefore it is the preferred method for allocating memory unless zero initialization is required.

– TruBlu
Sep 17 '18 at 19:20


malloc()


calloc()




10 Answers
10



Consider your buffer, filled with zeroes:


[00][00][00][00][00][00][00][00][00][00][00][00]



Now, let's write 10 bytes to it. Values incrementing from 1:


[01][02][03][04][05][06][07][08][09][10][00][00]



And now again, this time, 4 times 0xFF:


[FF][FF][FF][FF][05][06][07][08][09][10][00][00]



what happens if there are less than 12 bytes used in a 12 byte buffer? Is it possible or does the empty trailing always fill with 0s?



You write as much as you want, the remaining bytes are left unchanged.



Orthogonal question that may help: what is contained in a buffer when
it is instantiated but not used by the application yet?



Unspecified. Expect junk left by programs (or other parts of your program) that used this memory before.



I have looked at a few pet programs in Visual Studio and it seems that they are appended with 0s (or null characters) but I am not sure if this is a MS implementation that may vary across language/ compiler.



It is exactly what you think it is. Somebody had done that for you this time, but there are no guarantees it will happen again. It could be a compiler flag that attaches cleaning code. Some versions of MSVC used to fill fresh memory with 0xCD when ran in debug but not in release. It can also be a system security feature that wipes memory before giving it to your process (so you can't spy on other apps). Always remember to use memset to initialize your buffer where it matters. Eventually, mandate using certain compiler flag in readme if you depend on fresh buffer to contain a certain value.


memset



But cleaning is not really necessary. You take a 12 byte-long buffer. You fill it with 7 bytes. You then pass it somewhere - and you say "here is 7 bytes for you". The size of the buffer is not relevant when reading from it. You expect other functions to read as much as you've written, not as much as possible. In fact, in C it is usually not possible to tell how long the buffer is.



And a side note:



Understandably, going over a buffer errors out (or creates an overflow)



It doesn't, that's the problem. That's why it's a huge security issue: there is no error and the program tries to continue, so it sometimes executes the malicious content it never meant to. So we had to add bunch of mechanisms to the OS, like ASLR that will increase probability of a crashing the program and decrease probability of it continuing with corrupted memory. So, never depend on those afterthought guards and watch your buffer boundaries yourself.






You might want to add these precisions: Arrays with static durations are initialized to 0 before main is entered. Other arrays, either local values with automatic storage or allocated from the heap with malloc() have unspecified contents, reading this contents as bytes is OK, but has undefined behavior with most other types. Arrays allocated by calloc() are initialized to all bits zero which is subtly different from initialized to 0.

– chqrlie
Sep 17 '18 at 15:50


0


main


malloc()


calloc()


0



Take the following example (within a block of code, not global):


char data[12];
memcpy(data, "Selbie", 6);



Or even this example:


char* data = new char[12];
memcpy(data, "Selbie", 6);



In both of the above cases, the first 6 bytes of data are S,e,l,b,i, and e. The remaining 6 bytes of data are considered "unspecified" (could be anything).


data


S


e


l


b


i


e


data



Is it possible or does the empty trailing always fill with 0s?



Not guaranteed at all. The only allocator that I know of that guarantees zero byte fill is calloc. Example:


char* data = calloc(12,1); // will allocate an array of 12 bytes and zero-init each byte
memcpy(data, "Selbie");



what is contained in a buffer when it is instantiated but not used by the application yet?



Technically, as per the most recent C++ standards, the bytes delivered by the allocator are technically considered "unspecified". You should assume that it's garbage data (anything). Make no assumptions about the content.



Debug builds with Visual Studio will often initialize buffers with with 0xcc or 0xcd values, but that is not the case in release builds. There are however compiler flags and memory allocation techniques for Windows and Visual Studio where you can guaranteed zero-init memory allocations, but it is not portable.


0xcc


0xcd






"The remaining 6 bytes of data are undefined but will be something." But if they're undefined, isn't it undefined behaviour to try to ascertain that "something"? So it doesn't really matter, and the only solution is never to read uninitialised memory. It's not a case of "randomness" (especially not as an RNG); rather, I would say to assume uninitialised data are poisonous. There's probably an exception to this for reading char types, which can't have trap representations or padding, but it still wouldn't be meaningful or good code to get into a situation of reading the uninitialsed part.

– underscore_d
Sep 17 '18 at 7:08



data


char






"You should assume that it could be filled with random bytes." - This answer is good, but I want to object to the use of the word "random". Allocating memory and then reading it isn't a good source of randomness.

– ymbirtt
Sep 17 '18 at 8:09






The other "allocator" that zero-fills is for objects with static linkage. In the Unix parlance, they are allocated in BSS segment - blank static storage. The object file need only specify the position and length of these variables, because the run-time loader will fill them with zeros. Oh, and it might be worth mentioning that memory checkers exist that will detect attempts to read uninitialised memory - I normally recommend Valgrind, but there might be other choices on the Windows platform.

– Toby Speight
Sep 17 '18 at 8:09







"The remaining 6 bytes of data are undefined but will be something." - No, this is wrong! Accessing uninitialized values in undefined behavior. You cannot write code under the assumption that "well, there's something there, I don't care what, it doesn't matter". The optimizer may completely rearrange your code under the assumption that access to uninitialized values does not happen.

– Sebastian Redl
Sep 17 '18 at 8:32






@Wilson It's an arbitrary value (it's easily to identify visually, though). Different values have different meanings. The reason is to give hints to the developer during debugging as to what went wrong (or simply, which variables have not yet been initialized).

– Arne Vogel
Sep 17 '18 at 10:57



C++ has storage classes including global, automatic and static. The initialization depends on how the variable is declared.


char global[12]; // all 0
static char s_global[12]; // all 0

void foo()

static char s_local[12]; // all 0
char local[12]; // automatic storage variables are uninitialized, accessing before initialization is undefined behavior



Some interesting details here.






More good reading: en.cppreference.com/w/cpp/language/storage_duration

– user4581301
Sep 17 '18 at 2:35






It's tiring to discuss this because misinformation abounds, but as far as the standard is concerned, local is not filled with random rubbish, it's filled with nasal demons. (Reading an uninitialized variable is complete UB.)

– Arne Vogel
Oct 2 '18 at 9:11


local






Updated to be more clear that automatic variable are undefined before initialization

– Matthew Fisher
Oct 2 '18 at 12:23




The program knows the length of a string because it ends it with a null-terminator, a character of value zero.



This is why in order to fit a string in a buffer, the buffer has to be at least 1 character longer than the number of characters in the string, so that it can fit the string plus the null-terminator too.



Any space after that in the buffer is left untouched. If there was data there previously, it is still there. This is what we call garbage.



It is wrong to assume this space is zero-filled just because you haven't used it yet, you don't know what that particular memory space was used for before your program got to that point. Uninitialized memory should be handled as if what is in it is random and unreliable.






Same applies here as above: Reading uninitialized memory results in undefined behavior, it is not filled with "random" values.

– Arne Vogel
Sep 17 '18 at 11:04






@ArneVogel Oh it's not random at all, but it should be handled as if what is in it is random and unreliable.

– Havenard
Sep 17 '18 at 21:15



All of the previous answers are very good and very detailed, but the OP appears to be new to C programming. So, I thought a Real World example might be helpful.



Imagine you have a cardboard beverage holder that can hold six bottles. It's been sitting around in your garage so instead of six bottles, it contains various unsavory things that accumulate in the corners of garages: spiders, mouse houses, et al.



A computer buffer is a bit like this just after you allocate it. You can't really be sure what's in it, you just know how big it is.



Now, let's say you put four bottles in your holder. Your holder hasn't changed size, but you now know what's in four of the spaces. The other two spaces, complete with their questionable contents, are still there.



Computer buffers are the same way. That's why you frequently see a bufferSize variable to track how much of the buffer is in use. A better name might be numberOfBytesUsedInMyBuffer but programmers tend to be maddeningly terse.



Writing part of a buffer will not affect the unwritten part of the buffer; it will contain whatever was there beforehand (which naturally depends entirely on how you got the buffer in the first place).



As the other answer notes, static and global variables will be initialized to 0, but local variables will not be initialized (and instead contain whatever was on the stack beforehand). This is in keeping with the zero-overhead principle: initializing local variables would, in some cases, be an unnecessary and unwanted run-time cost, while static and global variables are allocated at load-time as part of a data segment.


0



Initialization of heap storage is at the option of the memory manager, but in general it will not be initialized, either.



In general, it's not at all unusual for buffers to be underfull. It's often good practice to allocate buffers bigger than they need to be. (Trying to always compute an exact buffer size is a frequent source of error, and often a waste of time.)



When a buffer is bigger than it needs to be, when the buffer contains less data than its allocated size, it's obviously important to keep track of how much data is there. In general there are two ways of doing this: (1) with an explicit count, kept in a separate variable, or (2) with a "sentinel" value, such as the character which marks the end of a string in C.




But then there's the question, if not all of a buffer is in use, what do the unused entries contain?



One answer is, of course, that it doesn't matter. That's what "unused" means. You care about the values of the entries that are used, that are accounted for by your count or your sentinel value. You don't care about the unused values.



There are basically four situations in which you can predict the initial values of the unused entries in a buffer:



When you allocate an array (including a character array) with static duration, all unused entries are initialized to 0.


static



When you allocate an array and give it an explicit initializer, all unused entries are initialized to 0.



When you call calloc, the allocated memory is initialized to all-bits-0.


calloc



When you call strncpy, the destination string is padded out to size n with characters.


strncpy


n




In all other cases, the unused parts of a buffer are unpredictable, and generally contain whatever they did last time (whatever that means). In particular, you cannot predict the contents of an uninitialized array with automatic duration (that is, one that's local to a function and isn't declared with static), and you cannot predict the contents of memory obtained with malloc. (Some of the time, in those two cases the memory tends to start out as all-bits-zero the first time, but you definitely don't want to ever depend on this.)


static


malloc






Good point about strncpy: I am tempted to upvote for teaching users about a lesser known side effect, but also to dowvote for implicitly advocating the use of this error-prone function, too bad I cannot do both, so I shall do neither one.

– chqrlie
Sep 17 '18 at 15:53


strncpy



It depends on the storage class specifier, your implementation, and its settings.
Some interesting examples:
- Uninitialized stack variables may be set to 0xCCCCCCCC
- Uninitialized heap variables may be set to 0xCDCDCDCD
- Uninitialized static or global variables may be set to 0x00000000
- or it could be garbage.
It's risky to make any assumptions about any of this.


0xCCCCCCCC


0xCDCDCDCD


0x00000000



I think the correct answer is that you should always keep track of how many char are written.
As with the low level functions like read and write need or give the number of character read or writen. In the same way std::string keep tracks of the number of characters in its implementatiin



Declared objects of static duration (those declared outside a function, or with a static qualifier) which have no specified initializer are initialized to whatever value would be represented by a literal zero [i.e. an integer zero, floating-point zero, or null pointer, as appropriate, or a structure or union containing such values]. If the declaration of any object (including those of automatic duration) includes an initializer, portions whose values are specified by that initializer will be set as specified, and the remainder will be zeroed as with static objects.


static



For automatic objects without initializers, the situation is somewhat more ambiguous. Given something like:


#include <string.h>

unsigned char static1[5], static2[5];

void test(void)

unsigned char temp[5];
strcpy(temp, "Hey");
memcpy(static1, temp, 5);
memcpy(static2, temp, 5);



the Standard is clear that test would not invoke Undefined Behavior, even though it copies portions of temp that were not initialized. The text of the Standard, at least as of C11, is unclear as to whether anything is guaranteed about the values of static1[4] and static2[4], most notably whether they might be left holding different values. A defect report states that the Standard was not intended to forbid a compiler from behaving as though the code had been:


test


temp


static1[4]


static2[4]


unsigned char static1[5]=1,1,1,1,1, static2[5]=2,2,2,2,2;

void test(void)

unsigned char temp[4];
strcpy(temp, "Hey");
memcpy(static1, temp, 4);
memcpy(static2, temp, 4);



which could leave static1[4] and static2[4] holding different values. The Standard is silent on whether quality compilers intended for various purposes should behave in that function. The Standard also offers no guidance as to how the function should be written if the intention if the programmer requires that static1[4] and static2[4] hold the same value, but doesn't care what that value is.


static1[4]


static2[4]


static1[4]


static2[4]



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you agree to our terms of service, privacy policy and cookie policy

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)