How best to compare two compiled binaries? [closed]









up vote
-2
down vote

favorite












I recently discovered an excellent visual studio extension which finds unnecessary #include statements in projects and removes them. I work on some gnarly legacy code and it's stripped a huge amount away. The only problem is that I can't be sure that it hasn't altered the build in some subtle way. It occurs to me that a project may still build but a #define somewhere could have been altered.



Anyway, it's occurred to me that I could be sure that no important changes have been made by checking the binaries. I was wondering if anyone had any advice on how best to do this? The obvious problem is that a small amount of meta data in the binaries will change because of compiler metadata about build times, etc.



Ideas so far:



  1. Disassemble all the binaries and compare the disassembly with diff. (Although this wont't cover the data sections I guess).

  2. Use some kind of binary diff program that's aware of PE headers.

Any ideas? And anyone know of a tool which understands PE headers as I describe?










share|improve this question















closed as off-topic by chux, dbush, SergeyA, Neil Butterworth, 1201ProgramAlarm Nov 8 at 21:18


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it." – dbush, SergeyA, Neil Butterworth, 1201ProgramAlarm
If this question can be reworded to fit the rules in the help center, please edit the question.








  • 5




    "And anyone know of a good binary diff program like I describe?" At almost 20k rep you should understand that you're walking dangerously close to off-topic-ness here. :)
    – HolyBlackCat
    Nov 8 at 19:58










  • Yeh, I guess that's true. But this is the kind of question only fellow programmers will likely know the answer to.
    – Benj
    Nov 8 at 20:00










  • no important changes have been made by checking the binaries Given that you've modified gnarly legacy code and it's stripped a huge amount away, you've made significant changes. What kind of testing do you do? Because you have to redo ALL of it now.
    – Andrew Henle
    Nov 8 at 20:00






  • 3




    Some compilers tend to be non deterministic. Even the same input code is not guaranteed to generate the same output. Checking semantic equality of binaries is a "hard" problem. You need to rely on your test cases to be sure that nothing has broken.
    – Ajay Brahmakshatriya
    Nov 8 at 20:08






  • 1




    You don't need to disassemble the binary, you can generate assembly using the -S option in gcc and clang. I remember cl has /FA flag. Be careful about the line numbers and other debug information though. You can strip it off from the output to retain only the instructions.
    – Ajay Brahmakshatriya
    Nov 8 at 20:18














up vote
-2
down vote

favorite












I recently discovered an excellent visual studio extension which finds unnecessary #include statements in projects and removes them. I work on some gnarly legacy code and it's stripped a huge amount away. The only problem is that I can't be sure that it hasn't altered the build in some subtle way. It occurs to me that a project may still build but a #define somewhere could have been altered.



Anyway, it's occurred to me that I could be sure that no important changes have been made by checking the binaries. I was wondering if anyone had any advice on how best to do this? The obvious problem is that a small amount of meta data in the binaries will change because of compiler metadata about build times, etc.



Ideas so far:



  1. Disassemble all the binaries and compare the disassembly with diff. (Although this wont't cover the data sections I guess).

  2. Use some kind of binary diff program that's aware of PE headers.

Any ideas? And anyone know of a tool which understands PE headers as I describe?










share|improve this question















closed as off-topic by chux, dbush, SergeyA, Neil Butterworth, 1201ProgramAlarm Nov 8 at 21:18


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it." – dbush, SergeyA, Neil Butterworth, 1201ProgramAlarm
If this question can be reworded to fit the rules in the help center, please edit the question.








  • 5




    "And anyone know of a good binary diff program like I describe?" At almost 20k rep you should understand that you're walking dangerously close to off-topic-ness here. :)
    – HolyBlackCat
    Nov 8 at 19:58










  • Yeh, I guess that's true. But this is the kind of question only fellow programmers will likely know the answer to.
    – Benj
    Nov 8 at 20:00










  • no important changes have been made by checking the binaries Given that you've modified gnarly legacy code and it's stripped a huge amount away, you've made significant changes. What kind of testing do you do? Because you have to redo ALL of it now.
    – Andrew Henle
    Nov 8 at 20:00






  • 3




    Some compilers tend to be non deterministic. Even the same input code is not guaranteed to generate the same output. Checking semantic equality of binaries is a "hard" problem. You need to rely on your test cases to be sure that nothing has broken.
    – Ajay Brahmakshatriya
    Nov 8 at 20:08






  • 1




    You don't need to disassemble the binary, you can generate assembly using the -S option in gcc and clang. I remember cl has /FA flag. Be careful about the line numbers and other debug information though. You can strip it off from the output to retain only the instructions.
    – Ajay Brahmakshatriya
    Nov 8 at 20:18












up vote
-2
down vote

favorite









up vote
-2
down vote

favorite











I recently discovered an excellent visual studio extension which finds unnecessary #include statements in projects and removes them. I work on some gnarly legacy code and it's stripped a huge amount away. The only problem is that I can't be sure that it hasn't altered the build in some subtle way. It occurs to me that a project may still build but a #define somewhere could have been altered.



Anyway, it's occurred to me that I could be sure that no important changes have been made by checking the binaries. I was wondering if anyone had any advice on how best to do this? The obvious problem is that a small amount of meta data in the binaries will change because of compiler metadata about build times, etc.



Ideas so far:



  1. Disassemble all the binaries and compare the disassembly with diff. (Although this wont't cover the data sections I guess).

  2. Use some kind of binary diff program that's aware of PE headers.

Any ideas? And anyone know of a tool which understands PE headers as I describe?










share|improve this question















I recently discovered an excellent visual studio extension which finds unnecessary #include statements in projects and removes them. I work on some gnarly legacy code and it's stripped a huge amount away. The only problem is that I can't be sure that it hasn't altered the build in some subtle way. It occurs to me that a project may still build but a #define somewhere could have been altered.



Anyway, it's occurred to me that I could be sure that no important changes have been made by checking the binaries. I was wondering if anyone had any advice on how best to do this? The obvious problem is that a small amount of meta data in the binaries will change because of compiler metadata about build times, etc.



Ideas so far:



  1. Disassemble all the binaries and compare the disassembly with diff. (Although this wont't cover the data sections I guess).

  2. Use some kind of binary diff program that's aware of PE headers.

Any ideas? And anyone know of a tool which understands PE headers as I describe?







c++ c windows






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 8 at 20:06

























asked Nov 8 at 19:55









Benj

19.7k1159110




19.7k1159110




closed as off-topic by chux, dbush, SergeyA, Neil Butterworth, 1201ProgramAlarm Nov 8 at 21:18


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it." – dbush, SergeyA, Neil Butterworth, 1201ProgramAlarm
If this question can be reworded to fit the rules in the help center, please edit the question.




closed as off-topic by chux, dbush, SergeyA, Neil Butterworth, 1201ProgramAlarm Nov 8 at 21:18


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it." – dbush, SergeyA, Neil Butterworth, 1201ProgramAlarm
If this question can be reworded to fit the rules in the help center, please edit the question.







  • 5




    "And anyone know of a good binary diff program like I describe?" At almost 20k rep you should understand that you're walking dangerously close to off-topic-ness here. :)
    – HolyBlackCat
    Nov 8 at 19:58










  • Yeh, I guess that's true. But this is the kind of question only fellow programmers will likely know the answer to.
    – Benj
    Nov 8 at 20:00










  • no important changes have been made by checking the binaries Given that you've modified gnarly legacy code and it's stripped a huge amount away, you've made significant changes. What kind of testing do you do? Because you have to redo ALL of it now.
    – Andrew Henle
    Nov 8 at 20:00






  • 3




    Some compilers tend to be non deterministic. Even the same input code is not guaranteed to generate the same output. Checking semantic equality of binaries is a "hard" problem. You need to rely on your test cases to be sure that nothing has broken.
    – Ajay Brahmakshatriya
    Nov 8 at 20:08






  • 1




    You don't need to disassemble the binary, you can generate assembly using the -S option in gcc and clang. I remember cl has /FA flag. Be careful about the line numbers and other debug information though. You can strip it off from the output to retain only the instructions.
    – Ajay Brahmakshatriya
    Nov 8 at 20:18












  • 5




    "And anyone know of a good binary diff program like I describe?" At almost 20k rep you should understand that you're walking dangerously close to off-topic-ness here. :)
    – HolyBlackCat
    Nov 8 at 19:58










  • Yeh, I guess that's true. But this is the kind of question only fellow programmers will likely know the answer to.
    – Benj
    Nov 8 at 20:00










  • no important changes have been made by checking the binaries Given that you've modified gnarly legacy code and it's stripped a huge amount away, you've made significant changes. What kind of testing do you do? Because you have to redo ALL of it now.
    – Andrew Henle
    Nov 8 at 20:00






  • 3




    Some compilers tend to be non deterministic. Even the same input code is not guaranteed to generate the same output. Checking semantic equality of binaries is a "hard" problem. You need to rely on your test cases to be sure that nothing has broken.
    – Ajay Brahmakshatriya
    Nov 8 at 20:08






  • 1




    You don't need to disassemble the binary, you can generate assembly using the -S option in gcc and clang. I remember cl has /FA flag. Be careful about the line numbers and other debug information though. You can strip it off from the output to retain only the instructions.
    – Ajay Brahmakshatriya
    Nov 8 at 20:18







5




5




"And anyone know of a good binary diff program like I describe?" At almost 20k rep you should understand that you're walking dangerously close to off-topic-ness here. :)
– HolyBlackCat
Nov 8 at 19:58




"And anyone know of a good binary diff program like I describe?" At almost 20k rep you should understand that you're walking dangerously close to off-topic-ness here. :)
– HolyBlackCat
Nov 8 at 19:58












Yeh, I guess that's true. But this is the kind of question only fellow programmers will likely know the answer to.
– Benj
Nov 8 at 20:00




Yeh, I guess that's true. But this is the kind of question only fellow programmers will likely know the answer to.
– Benj
Nov 8 at 20:00












no important changes have been made by checking the binaries Given that you've modified gnarly legacy code and it's stripped a huge amount away, you've made significant changes. What kind of testing do you do? Because you have to redo ALL of it now.
– Andrew Henle
Nov 8 at 20:00




no important changes have been made by checking the binaries Given that you've modified gnarly legacy code and it's stripped a huge amount away, you've made significant changes. What kind of testing do you do? Because you have to redo ALL of it now.
– Andrew Henle
Nov 8 at 20:00




3




3




Some compilers tend to be non deterministic. Even the same input code is not guaranteed to generate the same output. Checking semantic equality of binaries is a "hard" problem. You need to rely on your test cases to be sure that nothing has broken.
– Ajay Brahmakshatriya
Nov 8 at 20:08




Some compilers tend to be non deterministic. Even the same input code is not guaranteed to generate the same output. Checking semantic equality of binaries is a "hard" problem. You need to rely on your test cases to be sure that nothing has broken.
– Ajay Brahmakshatriya
Nov 8 at 20:08




1




1




You don't need to disassemble the binary, you can generate assembly using the -S option in gcc and clang. I remember cl has /FA flag. Be careful about the line numbers and other debug information though. You can strip it off from the output to retain only the instructions.
– Ajay Brahmakshatriya
Nov 8 at 20:18




You don't need to disassemble the binary, you can generate assembly using the -S option in gcc and clang. I remember cl has /FA flag. Be careful about the line numbers and other debug information though. You can strip it off from the output to retain only the instructions.
– Ajay Brahmakshatriya
Nov 8 at 20:18












1 Answer
1






active

oldest

votes

















up vote
1
down vote













The PE header is always at the same place and ranges only up to 512 Bytes (exactly).
so just truncate off the first 512 bytes and compare the results then.



I pipe them through xxd to convert the files to hex, then I diff the resulting text files (any text diff program will work, but you need git commandline to get xxd).



xxd -p -c 4 < Truncatedfile1.exe > output.diff1


or



tail -n -512 < File1.exe | xxd -p -c 4 > output1.hex
tail -n -512 < File2.exe | xxd -p -c 4 > output2.hex
git diff --no-index --color output1.hex output2.hex


Note that I made the lines just 4 bytes long to have a chance that alignment (especially occurring in data sections) shuffles me the lines back in shape when an odd number of bytes is inserted in between. If you are extra lucky, your code is also DWORD-aligned, then it works with your code just as well.






share|improve this answer






















  • Thanks! So it sounds like this is a method that's worked for you? Some in the comments are saying that compilers are too non-deterministic for this idea to work.
    – Benj
    Nov 8 at 20:58










  • Well, I do edit binaries with automated stuff on binary scale, and as a Software tester, I have some experience in checking for differences. I just thought about how the data sections could be diffed and remembered that it is usually dword-aligned, so I thought that xxd with 4 bytes per line would be a good Idea also for re-compiled programs.
    – Ohnemichel
    Nov 8 at 21:13


















1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote













The PE header is always at the same place and ranges only up to 512 Bytes (exactly).
so just truncate off the first 512 bytes and compare the results then.



I pipe them through xxd to convert the files to hex, then I diff the resulting text files (any text diff program will work, but you need git commandline to get xxd).



xxd -p -c 4 < Truncatedfile1.exe > output.diff1


or



tail -n -512 < File1.exe | xxd -p -c 4 > output1.hex
tail -n -512 < File2.exe | xxd -p -c 4 > output2.hex
git diff --no-index --color output1.hex output2.hex


Note that I made the lines just 4 bytes long to have a chance that alignment (especially occurring in data sections) shuffles me the lines back in shape when an odd number of bytes is inserted in between. If you are extra lucky, your code is also DWORD-aligned, then it works with your code just as well.






share|improve this answer






















  • Thanks! So it sounds like this is a method that's worked for you? Some in the comments are saying that compilers are too non-deterministic for this idea to work.
    – Benj
    Nov 8 at 20:58










  • Well, I do edit binaries with automated stuff on binary scale, and as a Software tester, I have some experience in checking for differences. I just thought about how the data sections could be diffed and remembered that it is usually dword-aligned, so I thought that xxd with 4 bytes per line would be a good Idea also for re-compiled programs.
    – Ohnemichel
    Nov 8 at 21:13















up vote
1
down vote













The PE header is always at the same place and ranges only up to 512 Bytes (exactly).
so just truncate off the first 512 bytes and compare the results then.



I pipe them through xxd to convert the files to hex, then I diff the resulting text files (any text diff program will work, but you need git commandline to get xxd).



xxd -p -c 4 < Truncatedfile1.exe > output.diff1


or



tail -n -512 < File1.exe | xxd -p -c 4 > output1.hex
tail -n -512 < File2.exe | xxd -p -c 4 > output2.hex
git diff --no-index --color output1.hex output2.hex


Note that I made the lines just 4 bytes long to have a chance that alignment (especially occurring in data sections) shuffles me the lines back in shape when an odd number of bytes is inserted in between. If you are extra lucky, your code is also DWORD-aligned, then it works with your code just as well.






share|improve this answer






















  • Thanks! So it sounds like this is a method that's worked for you? Some in the comments are saying that compilers are too non-deterministic for this idea to work.
    – Benj
    Nov 8 at 20:58










  • Well, I do edit binaries with automated stuff on binary scale, and as a Software tester, I have some experience in checking for differences. I just thought about how the data sections could be diffed and remembered that it is usually dword-aligned, so I thought that xxd with 4 bytes per line would be a good Idea also for re-compiled programs.
    – Ohnemichel
    Nov 8 at 21:13













up vote
1
down vote










up vote
1
down vote









The PE header is always at the same place and ranges only up to 512 Bytes (exactly).
so just truncate off the first 512 bytes and compare the results then.



I pipe them through xxd to convert the files to hex, then I diff the resulting text files (any text diff program will work, but you need git commandline to get xxd).



xxd -p -c 4 < Truncatedfile1.exe > output.diff1


or



tail -n -512 < File1.exe | xxd -p -c 4 > output1.hex
tail -n -512 < File2.exe | xxd -p -c 4 > output2.hex
git diff --no-index --color output1.hex output2.hex


Note that I made the lines just 4 bytes long to have a chance that alignment (especially occurring in data sections) shuffles me the lines back in shape when an odd number of bytes is inserted in between. If you are extra lucky, your code is also DWORD-aligned, then it works with your code just as well.






share|improve this answer














The PE header is always at the same place and ranges only up to 512 Bytes (exactly).
so just truncate off the first 512 bytes and compare the results then.



I pipe them through xxd to convert the files to hex, then I diff the resulting text files (any text diff program will work, but you need git commandline to get xxd).



xxd -p -c 4 < Truncatedfile1.exe > output.diff1


or



tail -n -512 < File1.exe | xxd -p -c 4 > output1.hex
tail -n -512 < File2.exe | xxd -p -c 4 > output2.hex
git diff --no-index --color output1.hex output2.hex


Note that I made the lines just 4 bytes long to have a chance that alignment (especially occurring in data sections) shuffles me the lines back in shape when an odd number of bytes is inserted in between. If you are extra lucky, your code is also DWORD-aligned, then it works with your code just as well.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 8 at 20:22

























answered Nov 8 at 20:16









Ohnemichel

165




165











  • Thanks! So it sounds like this is a method that's worked for you? Some in the comments are saying that compilers are too non-deterministic for this idea to work.
    – Benj
    Nov 8 at 20:58










  • Well, I do edit binaries with automated stuff on binary scale, and as a Software tester, I have some experience in checking for differences. I just thought about how the data sections could be diffed and remembered that it is usually dword-aligned, so I thought that xxd with 4 bytes per line would be a good Idea also for re-compiled programs.
    – Ohnemichel
    Nov 8 at 21:13

















  • Thanks! So it sounds like this is a method that's worked for you? Some in the comments are saying that compilers are too non-deterministic for this idea to work.
    – Benj
    Nov 8 at 20:58










  • Well, I do edit binaries with automated stuff on binary scale, and as a Software tester, I have some experience in checking for differences. I just thought about how the data sections could be diffed and remembered that it is usually dword-aligned, so I thought that xxd with 4 bytes per line would be a good Idea also for re-compiled programs.
    – Ohnemichel
    Nov 8 at 21:13
















Thanks! So it sounds like this is a method that's worked for you? Some in the comments are saying that compilers are too non-deterministic for this idea to work.
– Benj
Nov 8 at 20:58




Thanks! So it sounds like this is a method that's worked for you? Some in the comments are saying that compilers are too non-deterministic for this idea to work.
– Benj
Nov 8 at 20:58












Well, I do edit binaries with automated stuff on binary scale, and as a Software tester, I have some experience in checking for differences. I just thought about how the data sections could be diffed and remembered that it is usually dword-aligned, so I thought that xxd with 4 bytes per line would be a good Idea also for re-compiled programs.
– Ohnemichel
Nov 8 at 21:13





Well, I do edit binaries with automated stuff on binary scale, and as a Software tester, I have some experience in checking for differences. I just thought about how the data sections could be diffed and remembered that it is usually dword-aligned, so I thought that xxd with 4 bytes per line would be a good Idea also for re-compiled programs.
– Ohnemichel
Nov 8 at 21:13




Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

How do I collapse sections of code in Visual Studio Code for Windows?

ャフサォクコ ケウ,コ,ワ メ,ロスョノ゙,クネ,フムカヤヲニ,エコ゚ツ ウイオン゙ケワサネォキモュキォウイノンコチ゚メヌナイゥフュ,カヒウネェ ネ,ホノケ,ムュキ ッボーミュハ,チ ツス ィ メウイマヤ,゙ウチ ヅ ロ,ォジヌェ ャヌット ェ,マャ,チナエヒネソキツテ トホヲヲミーァ