Best practice for parsing data of mixed type?

I am wondering whether there is any known best practice/method for parsing mixed type of data packet.

For instance, let's say the data is 10 bytes, and it consists of:

Byte 0-1: manufacturer ID (int)

Byte 2: type (int)

Byte 3-4: device id (ascii char)

I could simply define each data type size and location as #define, and parse it using those defines. But I am wondering if there is any structure to organise this better.

edited Nov 10 '18 at 4:39

melpomene

58.7k54489

asked Nov 10 '18 at 4:34

Jinsuk

346

1

is the data binary or text?
– Swordfish
Nov 10 '18 at 4:42

2

If you know the packet definition ahead of time, why not just define a custom struct with all of the required fields?
– MikeFromCanmore
Nov 10 '18 at 4:44

You have to read the data just like how you wrote it... Since it looks like it is a binary file...
– Ruks
Nov 10 '18 at 4:44

Does data pass though files for other platforms? Post some sample inputs and exported output data packets. Else this is too broad/unclear.
– chux
Nov 10 '18 at 4:44

1

@Jinsuk, "I could simply define each data type size and location as #define, and parse it using those defines" --> I truly wish you had done that, posted that code and then asked about best practices. It would add information and make for a good post. This one is too broad.
– chux
Nov 10 '18 at 5:02

|
show 3 more comments

I am wondering whether there is any known best practice/method for parsing mixed type of data packet.

For instance, let's say the data is 10 bytes, and it consists of:

Byte 0-1: manufacturer ID (int)

Byte 2: type (int)

Byte 3-4: device id (ascii char)

I could simply define each data type size and location as #define, and parse it using those defines. But I am wondering if there is any structure to organise this better.

edited Nov 10 '18 at 4:39

melpomene

58.7k54489

asked Nov 10 '18 at 4:34

Jinsuk

346

1

is the data binary or text?
– Swordfish
Nov 10 '18 at 4:42

2

If you know the packet definition ahead of time, why not just define a custom struct with all of the required fields?
– MikeFromCanmore
Nov 10 '18 at 4:44

You have to read the data just like how you wrote it... Since it looks like it is a binary file...
– Ruks
Nov 10 '18 at 4:44

Does data pass though files for other platforms? Post some sample inputs and exported output data packets. Else this is too broad/unclear.
– chux
Nov 10 '18 at 4:44

1

@Jinsuk, "I could simply define each data type size and location as #define, and parse it using those defines" --> I truly wish you had done that, posted that code and then asked about best practices. It would add information and make for a good post. This one is too broad.
– chux
Nov 10 '18 at 5:02

|
show 3 more comments

I am wondering whether there is any known best practice/method for parsing mixed type of data packet.

For instance, let's say the data is 10 bytes, and it consists of:

Byte 0-1: manufacturer ID (int)

Byte 2: type (int)

Byte 3-4: device id (ascii char)

I could simply define each data type size and location as #define, and parse it using those defines. But I am wondering if there is any structure to organise this better.

edited Nov 10 '18 at 4:39

melpomene

58.7k54489

asked Nov 10 '18 at 4:34

Jinsuk

346

I am wondering whether there is any known best practice/method for parsing mixed type of data packet.

For instance, let's say the data is 10 bytes, and it consists of:

Byte 0-1: manufacturer ID (int)

Byte 2: type (int)

Byte 3-4: device id (ascii char)

I could simply define each data type size and location as #define, and parse it using those defines. But I am wondering if there is any structure to organise this better.

c parsing

edited Nov 10 '18 at 4:39

melpomene

58.7k54489

asked Nov 10 '18 at 4:34

Jinsuk

346

edited Nov 10 '18 at 4:39

melpomene

58.7k54489

asked Nov 10 '18 at 4:34

Jinsuk

346

edited Nov 10 '18 at 4:39

melpomene

58.7k54489

edited Nov 10 '18 at 4:39

melpomene

58.7k54489

edited Nov 10 '18 at 4:39

melpomene

58.7k54489

asked Nov 10 '18 at 4:34

Jinsuk

346

asked Nov 10 '18 at 4:34

Jinsuk

346

asked Nov 10 '18 at 4:34

Jinsuk

346

1

is the data binary or text?
– Swordfish
Nov 10 '18 at 4:42

2

If you know the packet definition ahead of time, why not just define a custom struct with all of the required fields?
– MikeFromCanmore
Nov 10 '18 at 4:44

You have to read the data just like how you wrote it... Since it looks like it is a binary file...
– Ruks
Nov 10 '18 at 4:44

Does data pass though files for other platforms? Post some sample inputs and exported output data packets. Else this is too broad/unclear.
– chux
Nov 10 '18 at 4:44

1

@Jinsuk, "I could simply define each data type size and location as #define, and parse it using those defines" --> I truly wish you had done that, posted that code and then asked about best practices. It would add information and make for a good post. This one is too broad.
– chux
Nov 10 '18 at 5:02

|
show 3 more comments

1

is the data binary or text?
– Swordfish
Nov 10 '18 at 4:42

2

If you know the packet definition ahead of time, why not just define a custom struct with all of the required fields?
– MikeFromCanmore
Nov 10 '18 at 4:44

You have to read the data just like how you wrote it... Since it looks like it is a binary file...
– Ruks
Nov 10 '18 at 4:44

Does data pass though files for other platforms? Post some sample inputs and exported output data packets. Else this is too broad/unclear.
– chux
Nov 10 '18 at 4:44

1

@Jinsuk, "I could simply define each data type size and location as #define, and parse it using those defines" --> I truly wish you had done that, posted that code and then asked about best practices. It would add information and make for a good post. This one is too broad.
– chux
Nov 10 '18 at 5:02

is the data binary or text?
– Swordfish
Nov 10 '18 at 4:42

If you know the packet definition ahead of time, why not just define a custom struct with all of the required fields?
– MikeFromCanmore
Nov 10 '18 at 4:44

You have to read the data just like how you wrote it... Since it looks like it is a binary file...
– Ruks
Nov 10 '18 at 4:44

Does data pass though files for other platforms? Post some sample inputs and exported output data packets. Else this is too broad/unclear.
– chux
Nov 10 '18 at 4:44

@Jinsuk, "I could simply define each data type size and location as #define, and parse it using those defines" --> I truly wish you had done that, posted that code and then asked about best practices. It would add information and make for a good post. This one is too broad.
– chux
Nov 10 '18 at 5:02

|
show 3 more comments

2 Answers
2

active

oldest

votes

Best practice it to assume all data from outside the program (e.g. from the user, from a file, from a network, from a different process) is potentially incorrect (and potentially unsafe/malicious).

Then, based on the assumption of "potential incorrectness" define types to distinguish between "unchecked, potential incorrect data" and "checked, known correct data". For your example, you could use uint8_t packet[10]; as the data type for unchecked data and a normal structure (with padding and without __attribute__((packed));) for the checked data. This makes it extremely difficult for a programmer to accidentally use unsafe data when they think they're using safe/checked data.

Of course you will also need code to convert between these data types, which needs to do as many sanity checks as possible (and possibly also worry about things like endianess). For your example these checks could be:

are any of the bytes that are supposed to be ASCII characters >= 0x80, and are any of them invalid (e.g. maybe control characters like backspace are not permitted).

is the manufacturer ID valid (e.g. maybe there's an enumeration that it needs to match)

is the type valid (e.g. maybe there's an enumeration that it needs to match)

Note that this function should return some kind of status to indicate if the conversion was successful or not, and in most cases this status should also give an indication of what the problem was if the conversion wasn't successful (so that the caller can inform the user or log the problem or handle the problem in the most suitable way for the problem). For example, maybe "unknown manufacturer ID" means that the program needs to be updated to handle a new manufacturer and that the data was correct, and "invalid manufacturer ID" means that the data was definitely wrong.

edited Nov 10 '18 at 13:12

answered Nov 10 '18 at 7:21

Brendan

12.2k1330

Haha I saw __attribute__((packed)) and instinctively downvoted - well, reversed now.
– Antti Haapala
Nov 10 '18 at 8:17

but perhaps use uint8_t for input.
– Antti Haapala
Nov 10 '18 at 8:18

@AnttiHaapala: Fixed uint8_t :-)
– Brendan
Nov 10 '18 at 13:12

add a comment |

Like this:

struct packet 
 uint16_t mfg;
 uint8_t type;
 uint16_t devid;
 __attribute__((packed));

The packed attribute (or your platform's equivalent) is required to avoid implicit padding which doesn't exist in the protocol.

Once you have the above struct, you simply cast (part of) a char array which you received from wherever:

char buf[1000];
(struct packet*)(buf + N);

answered Nov 10 '18 at 4:54

John Zwinck

150k16175287

Usual suspects: A complaint C complier may not have any packing ability. Should the packet come from another machine, endian may differ.
– chux
Nov 10 '18 at 4:58

1

Yea. like this :) ...@chux, Presumably if you know the data packet definition, you know what endianness it uses as well. If you need to, you can do your bit/byte-swapping before casting with the struct.
– MikeFromCanmore
Nov 10 '18 at 5:07

Also, packed is a sure way to cause undefined behaviour in even those compilers that support packing. Casting an array of char to a packet is a violation of strict aliasing rule.
– Antti Haapala
Nov 10 '18 at 8:20

An example here: stackoverflow.com/a/46790815/918959
– Antti Haapala
Nov 10 '18 at 8:22

@AnttiHaapala: I did not cast an array of char. I cast buf + N which is a char*. Which to my understanding is allowed to be cast (there's an exception to strict aliasing for char*). What do you think about that?
– John Zwinck
Nov 10 '18 at 10:54

|
show 1 more comment

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53236024%2fbest-practice-for-parsing-data-of-mixed-type%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Best practice it to assume all data from outside the program (e.g. from the user, from a file, from a network, from a different process) is potentially incorrect (and potentially unsafe/malicious).

are any of the bytes that are supposed to be ASCII characters >= 0x80, and are any of them invalid (e.g. maybe control characters like backspace are not permitted).

is the manufacturer ID valid (e.g. maybe there's an enumeration that it needs to match)

is the type valid (e.g. maybe there's an enumeration that it needs to match)

edited Nov 10 '18 at 13:12

answered Nov 10 '18 at 7:21

Brendan

12.2k1330

Haha I saw __attribute__((packed)) and instinctively downvoted - well, reversed now.
– Antti Haapala
Nov 10 '18 at 8:17

but perhaps use uint8_t for input.
– Antti Haapala
Nov 10 '18 at 8:18

@AnttiHaapala: Fixed uint8_t :-)
– Brendan
Nov 10 '18 at 13:12

add a comment |

Best practice it to assume all data from outside the program (e.g. from the user, from a file, from a network, from a different process) is potentially incorrect (and potentially unsafe/malicious).

are any of the bytes that are supposed to be ASCII characters >= 0x80, and are any of them invalid (e.g. maybe control characters like backspace are not permitted).

is the manufacturer ID valid (e.g. maybe there's an enumeration that it needs to match)

is the type valid (e.g. maybe there's an enumeration that it needs to match)

edited Nov 10 '18 at 13:12

answered Nov 10 '18 at 7:21

Brendan

12.2k1330

Haha I saw __attribute__((packed)) and instinctively downvoted - well, reversed now.
– Antti Haapala
Nov 10 '18 at 8:17

but perhaps use uint8_t for input.
– Antti Haapala
Nov 10 '18 at 8:18

@AnttiHaapala: Fixed uint8_t :-)
– Brendan
Nov 10 '18 at 13:12

add a comment |

Best practice it to assume all data from outside the program (e.g. from the user, from a file, from a network, from a different process) is potentially incorrect (and potentially unsafe/malicious).

are any of the bytes that are supposed to be ASCII characters >= 0x80, and are any of them invalid (e.g. maybe control characters like backspace are not permitted).

is the manufacturer ID valid (e.g. maybe there's an enumeration that it needs to match)

is the type valid (e.g. maybe there's an enumeration that it needs to match)

edited Nov 10 '18 at 13:12

answered Nov 10 '18 at 7:21

Brendan

12.2k1330

Best practice it to assume all data from outside the program (e.g. from the user, from a file, from a network, from a different process) is potentially incorrect (and potentially unsafe/malicious).

are any of the bytes that are supposed to be ASCII characters >= 0x80, and are any of them invalid (e.g. maybe control characters like backspace are not permitted).

is the manufacturer ID valid (e.g. maybe there's an enumeration that it needs to match)

is the type valid (e.g. maybe there's an enumeration that it needs to match)

edited Nov 10 '18 at 13:12

answered Nov 10 '18 at 7:21

Brendan

12.2k1330

edited Nov 10 '18 at 13:12

answered Nov 10 '18 at 7:21

Brendan

12.2k1330

answered Nov 10 '18 at 7:21

Brendan

12.2k1330

answered Nov 10 '18 at 7:21

Brendan

12.2k1330

Haha I saw __attribute__((packed)) and instinctively downvoted - well, reversed now.
– Antti Haapala
Nov 10 '18 at 8:17

but perhaps use uint8_t for input.
– Antti Haapala
Nov 10 '18 at 8:18

@AnttiHaapala: Fixed uint8_t :-)
– Brendan
Nov 10 '18 at 13:12

add a comment |

Haha I saw __attribute__((packed)) and instinctively downvoted - well, reversed now.
– Antti Haapala
Nov 10 '18 at 8:17

but perhaps use uint8_t for input.
– Antti Haapala
Nov 10 '18 at 8:18

@AnttiHaapala: Fixed uint8_t :-)
– Brendan
Nov 10 '18 at 13:12

Haha I saw __attribute__((packed)) and instinctively downvoted - well, reversed now.
– Antti Haapala
Nov 10 '18 at 8:17

but perhaps use uint8_t for input.
– Antti Haapala
Nov 10 '18 at 8:18

@AnttiHaapala: Fixed uint8_t :-)
– Brendan
Nov 10 '18 at 13:12

add a comment |

Like this:

struct packet 
 uint16_t mfg;
 uint8_t type;
 uint16_t devid;
 __attribute__((packed));

The packed attribute (or your platform's equivalent) is required to avoid implicit padding which doesn't exist in the protocol.

Once you have the above struct, you simply cast (part of) a char array which you received from wherever:

char buf[1000];
(struct packet*)(buf + N);

answered Nov 10 '18 at 4:54

John Zwinck

150k16175287

Usual suspects: A complaint C complier may not have any packing ability. Should the packet come from another machine, endian may differ.
– chux
Nov 10 '18 at 4:58

1

Yea. like this :) ...@chux, Presumably if you know the data packet definition, you know what endianness it uses as well. If you need to, you can do your bit/byte-swapping before casting with the struct.
– MikeFromCanmore
Nov 10 '18 at 5:07

Also, packed is a sure way to cause undefined behaviour in even those compilers that support packing. Casting an array of char to a packet is a violation of strict aliasing rule.
– Antti Haapala
Nov 10 '18 at 8:20

An example here: stackoverflow.com/a/46790815/918959
– Antti Haapala
Nov 10 '18 at 8:22

@AnttiHaapala: I did not cast an array of char. I cast buf + N which is a char*. Which to my understanding is allowed to be cast (there's an exception to strict aliasing for char*). What do you think about that?
– John Zwinck
Nov 10 '18 at 10:54

|
show 1 more comment

Like this:

struct packet 
 uint16_t mfg;
 uint8_t type;
 uint16_t devid;
 __attribute__((packed));

The packed attribute (or your platform's equivalent) is required to avoid implicit padding which doesn't exist in the protocol.

Once you have the above struct, you simply cast (part of) a char array which you received from wherever:

char buf[1000];
(struct packet*)(buf + N);

answered Nov 10 '18 at 4:54

John Zwinck

150k16175287

Usual suspects: A complaint C complier may not have any packing ability. Should the packet come from another machine, endian may differ.
– chux
Nov 10 '18 at 4:58

1

Yea. like this :) ...@chux, Presumably if you know the data packet definition, you know what endianness it uses as well. If you need to, you can do your bit/byte-swapping before casting with the struct.
– MikeFromCanmore
Nov 10 '18 at 5:07

Also, packed is a sure way to cause undefined behaviour in even those compilers that support packing. Casting an array of char to a packet is a violation of strict aliasing rule.
– Antti Haapala
Nov 10 '18 at 8:20

An example here: stackoverflow.com/a/46790815/918959
– Antti Haapala
Nov 10 '18 at 8:22

@AnttiHaapala: I did not cast an array of char. I cast buf + N which is a char*. Which to my understanding is allowed to be cast (there's an exception to strict aliasing for char*). What do you think about that?
– John Zwinck
Nov 10 '18 at 10:54

|
show 1 more comment

Like this:

struct packet 
 uint16_t mfg;
 uint8_t type;
 uint16_t devid;
 __attribute__((packed));

The packed attribute (or your platform's equivalent) is required to avoid implicit padding which doesn't exist in the protocol.

Once you have the above struct, you simply cast (part of) a char array which you received from wherever:

char buf[1000];
(struct packet*)(buf + N);

answered Nov 10 '18 at 4:54

John Zwinck

150k16175287

Like this:

struct packet 
 uint16_t mfg;
 uint8_t type;
 uint16_t devid;
 __attribute__((packed));

The packed attribute (or your platform's equivalent) is required to avoid implicit padding which doesn't exist in the protocol.

Once you have the above struct, you simply cast (part of) a char array which you received from wherever:

char buf[1000];
(struct packet*)(buf + N);

answered Nov 10 '18 at 4:54

John Zwinck

150k16175287

answered Nov 10 '18 at 4:54

John Zwinck

150k16175287

answered Nov 10 '18 at 4:54

John Zwinck

150k16175287

answered Nov 10 '18 at 4:54

John Zwinck

150k16175287

Usual suspects: A complaint C complier may not have any packing ability. Should the packet come from another machine, endian may differ.
– chux
Nov 10 '18 at 4:58

1

Yea. like this :) ...@chux, Presumably if you know the data packet definition, you know what endianness it uses as well. If you need to, you can do your bit/byte-swapping before casting with the struct.
– MikeFromCanmore
Nov 10 '18 at 5:07

Also, packed is a sure way to cause undefined behaviour in even those compilers that support packing. Casting an array of char to a packet is a violation of strict aliasing rule.
– Antti Haapala
Nov 10 '18 at 8:20

An example here: stackoverflow.com/a/46790815/918959
– Antti Haapala
Nov 10 '18 at 8:22

@AnttiHaapala: I did not cast an array of char. I cast buf + N which is a char*. Which to my understanding is allowed to be cast (there's an exception to strict aliasing for char*). What do you think about that?
– John Zwinck
Nov 10 '18 at 10:54

|
show 1 more comment

Usual suspects: A complaint C complier may not have any packing ability. Should the packet come from another machine, endian may differ.
– chux
Nov 10 '18 at 4:58

1

Yea. like this :) ...@chux, Presumably if you know the data packet definition, you know what endianness it uses as well. If you need to, you can do your bit/byte-swapping before casting with the struct.
– MikeFromCanmore
Nov 10 '18 at 5:07

Also, packed is a sure way to cause undefined behaviour in even those compilers that support packing. Casting an array of char to a packet is a violation of strict aliasing rule.
– Antti Haapala
Nov 10 '18 at 8:20

An example here: stackoverflow.com/a/46790815/918959
– Antti Haapala
Nov 10 '18 at 8:22

@AnttiHaapala: I did not cast an array of char. I cast buf + N which is a char*. Which to my understanding is allowed to be cast (there's an exception to strict aliasing for char*). What do you think about that?
– John Zwinck
Nov 10 '18 at 10:54

Usual suspects: A complaint C complier may not have any packing ability. Should the packet come from another machine, endian may differ.
– chux
Nov 10 '18 at 4:58

Yea. like this :) ...@chux, Presumably if you know the data packet definition, you know what endianness it uses as well. If you need to, you can do your bit/byte-swapping before casting with the struct.
– MikeFromCanmore
Nov 10 '18 at 5:07

Also, packed is a sure way to cause undefined behaviour in even those compilers that support packing. Casting an array of char to a packet is a violation of strict aliasing rule.
– Antti Haapala
Nov 10 '18 at 8:20

An example here: stackoverflow.com/a/46790815/918959
– Antti Haapala
Nov 10 '18 at 8:22

@AnttiHaapala: I did not cast an array of char. I cast buf + N which is a char*. Which to my understanding is allowed to be cast (there's an exception to strict aliasing for char*). What do you think about that?
– John Zwinck
Nov 10 '18 at 10:54

|
show 1 more comment

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dfyjkt