Figure out bytes content










3















I was working on a compound file which contains several streams. I'm frustrated how to figure out the content of each stream. I don't know if these bytes are text or mp3 or video.
for example: is there a way to understand what types of data could these bytes are?



b'x00x00x00x00x00x00x00x00x1fx8bx08x00x00x00x00x00x00x0bzxccxc9xc8xc0xc0x00xc2?x82x1e<x0ecxbc*8x19xc8ixb3W_x0bx14bHx00xb2-x99x18x18xfex03x01x88xcfxc0x01xc4xe1x0cxf9x0cEx0cxd9x0cxc5x0cxa9x0c%x0cx86`xcd x0cx020x1ax00x00x00xffxffx02x080x00x96L~x89Wx00x00x00x00x80(\BxefI;x9e}pxfex1axb2x9b>(x81x86/=xc9xH0:Pwbxb7xdck-xd2Fx04xd7co'









share|improve this question



















  • 2





    Possible duplicate of Python 3 - Encode/Decode vs Bytes/Str

    – m0etaz
    Nov 13 '18 at 19:30






  • 2





    As in, "how do I tell if these bytes comprise an mp3, or a video, or an image, or something else?"? There's no universal way of determining a data format. Some formats have convenient self-identifying header data, and some don't.

    – Kevin
    Nov 13 '18 at 19:35






  • 2





    Your question is very unclear. What exactly are you trying to do?

    – Joel
    Nov 13 '18 at 19:35






  • 1





    @Kevin that is exactly what I want to do. is there a technique or pattern used to test these bytes to get close for something?? how to read the header? all what I have is bytes

    – Ibrahim Kais Ibrahim
    Nov 13 '18 at 19:41







  • 1





    Compare your bytes against Every. Known. Filetype. That's it. It's not magic; that is how file works. (Descriptions of both of these two terms can be found in your favourite man version.)

    – usr2564301
    Nov 13 '18 at 20:22















3















I was working on a compound file which contains several streams. I'm frustrated how to figure out the content of each stream. I don't know if these bytes are text or mp3 or video.
for example: is there a way to understand what types of data could these bytes are?



b'x00x00x00x00x00x00x00x00x1fx8bx08x00x00x00x00x00x00x0bzxccxc9xc8xc0xc0x00xc2?x82x1e<x0ecxbc*8x19xc8ixb3W_x0bx14bHx00xb2-x99x18x18xfex03x01x88xcfxc0x01xc4xe1x0cxf9x0cEx0cxd9x0cxc5x0cxa9x0c%x0cx86`xcd x0cx020x1ax00x00x00xffxffx02x080x00x96L~x89Wx00x00x00x00x80(\BxefI;x9e}pxfex1axb2x9b>(x81x86/=xc9xH0:Pwbxb7xdck-xd2Fx04xd7co'









share|improve this question



















  • 2





    Possible duplicate of Python 3 - Encode/Decode vs Bytes/Str

    – m0etaz
    Nov 13 '18 at 19:30






  • 2





    As in, "how do I tell if these bytes comprise an mp3, or a video, or an image, or something else?"? There's no universal way of determining a data format. Some formats have convenient self-identifying header data, and some don't.

    – Kevin
    Nov 13 '18 at 19:35






  • 2





    Your question is very unclear. What exactly are you trying to do?

    – Joel
    Nov 13 '18 at 19:35






  • 1





    @Kevin that is exactly what I want to do. is there a technique or pattern used to test these bytes to get close for something?? how to read the header? all what I have is bytes

    – Ibrahim Kais Ibrahim
    Nov 13 '18 at 19:41







  • 1





    Compare your bytes against Every. Known. Filetype. That's it. It's not magic; that is how file works. (Descriptions of both of these two terms can be found in your favourite man version.)

    – usr2564301
    Nov 13 '18 at 20:22













3












3








3


3






I was working on a compound file which contains several streams. I'm frustrated how to figure out the content of each stream. I don't know if these bytes are text or mp3 or video.
for example: is there a way to understand what types of data could these bytes are?



b'x00x00x00x00x00x00x00x00x1fx8bx08x00x00x00x00x00x00x0bzxccxc9xc8xc0xc0x00xc2?x82x1e<x0ecxbc*8x19xc8ixb3W_x0bx14bHx00xb2-x99x18x18xfex03x01x88xcfxc0x01xc4xe1x0cxf9x0cEx0cxd9x0cxc5x0cxa9x0c%x0cx86`xcd x0cx020x1ax00x00x00xffxffx02x080x00x96L~x89Wx00x00x00x00x80(\BxefI;x9e}pxfex1axb2x9b>(x81x86/=xc9xH0:Pwbxb7xdck-xd2Fx04xd7co'









share|improve this question
















I was working on a compound file which contains several streams. I'm frustrated how to figure out the content of each stream. I don't know if these bytes are text or mp3 or video.
for example: is there a way to understand what types of data could these bytes are?



b'x00x00x00x00x00x00x00x00x1fx8bx08x00x00x00x00x00x00x0bzxccxc9xc8xc0xc0x00xc2?x82x1e<x0ecxbc*8x19xc8ixb3W_x0bx14bHx00xb2-x99x18x18xfex03x01x88xcfxc0x01xc4xe1x0cxf9x0cEx0cxd9x0cxc5x0cxa9x0c%x0cx86`xcd x0cx020x1ax00x00x00xffxffx02x080x00x96L~x89Wx00x00x00x00x80(\BxefI;x9e}pxfex1axb2x9b>(x81x86/=xc9xH0:Pwbxb7xdck-xd2Fx04xd7co'






java c# python c++ c






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 27 '18 at 16:13







Ibrahim Kais Ibrahim

















asked Nov 13 '18 at 19:27









Ibrahim Kais IbrahimIbrahim Kais Ibrahim

683721




683721







  • 2





    Possible duplicate of Python 3 - Encode/Decode vs Bytes/Str

    – m0etaz
    Nov 13 '18 at 19:30






  • 2





    As in, "how do I tell if these bytes comprise an mp3, or a video, or an image, or something else?"? There's no universal way of determining a data format. Some formats have convenient self-identifying header data, and some don't.

    – Kevin
    Nov 13 '18 at 19:35






  • 2





    Your question is very unclear. What exactly are you trying to do?

    – Joel
    Nov 13 '18 at 19:35






  • 1





    @Kevin that is exactly what I want to do. is there a technique or pattern used to test these bytes to get close for something?? how to read the header? all what I have is bytes

    – Ibrahim Kais Ibrahim
    Nov 13 '18 at 19:41







  • 1





    Compare your bytes against Every. Known. Filetype. That's it. It's not magic; that is how file works. (Descriptions of both of these two terms can be found in your favourite man version.)

    – usr2564301
    Nov 13 '18 at 20:22












  • 2





    Possible duplicate of Python 3 - Encode/Decode vs Bytes/Str

    – m0etaz
    Nov 13 '18 at 19:30






  • 2





    As in, "how do I tell if these bytes comprise an mp3, or a video, or an image, or something else?"? There's no universal way of determining a data format. Some formats have convenient self-identifying header data, and some don't.

    – Kevin
    Nov 13 '18 at 19:35






  • 2





    Your question is very unclear. What exactly are you trying to do?

    – Joel
    Nov 13 '18 at 19:35






  • 1





    @Kevin that is exactly what I want to do. is there a technique or pattern used to test these bytes to get close for something?? how to read the header? all what I have is bytes

    – Ibrahim Kais Ibrahim
    Nov 13 '18 at 19:41







  • 1





    Compare your bytes against Every. Known. Filetype. That's it. It's not magic; that is how file works. (Descriptions of both of these two terms can be found in your favourite man version.)

    – usr2564301
    Nov 13 '18 at 20:22







2




2





Possible duplicate of Python 3 - Encode/Decode vs Bytes/Str

– m0etaz
Nov 13 '18 at 19:30





Possible duplicate of Python 3 - Encode/Decode vs Bytes/Str

– m0etaz
Nov 13 '18 at 19:30




2




2





As in, "how do I tell if these bytes comprise an mp3, or a video, or an image, or something else?"? There's no universal way of determining a data format. Some formats have convenient self-identifying header data, and some don't.

– Kevin
Nov 13 '18 at 19:35





As in, "how do I tell if these bytes comprise an mp3, or a video, or an image, or something else?"? There's no universal way of determining a data format. Some formats have convenient self-identifying header data, and some don't.

– Kevin
Nov 13 '18 at 19:35




2




2





Your question is very unclear. What exactly are you trying to do?

– Joel
Nov 13 '18 at 19:35





Your question is very unclear. What exactly are you trying to do?

– Joel
Nov 13 '18 at 19:35




1




1





@Kevin that is exactly what I want to do. is there a technique or pattern used to test these bytes to get close for something?? how to read the header? all what I have is bytes

– Ibrahim Kais Ibrahim
Nov 13 '18 at 19:41






@Kevin that is exactly what I want to do. is there a technique or pattern used to test these bytes to get close for something?? how to read the header? all what I have is bytes

– Ibrahim Kais Ibrahim
Nov 13 '18 at 19:41





1




1





Compare your bytes against Every. Known. Filetype. That's it. It's not magic; that is how file works. (Descriptions of both of these two terms can be found in your favourite man version.)

– usr2564301
Nov 13 '18 at 20:22





Compare your bytes against Every. Known. Filetype. That's it. It's not magic; that is how file works. (Descriptions of both of these two terms can be found in your favourite man version.)

– usr2564301
Nov 13 '18 at 20:22












1 Answer
1






active

oldest

votes


















0














Yes, there is away to figure out each stream content. there is a signature for each file on this planet in addition to extension which is not reliable. it might be removed or falsely added.



So what is the signature?




In computing, a file signature is data used to identify or verify the
contents of a file. In particular, it may refer to:



  • File magic number: bytes within a file used to identify the
    format of the file; generally a short sequence of bytes (most are
    2-4 bytes long) placed at the beginning of the file; see list of file
    signatures


  • File checksum or more generally the result of a hash function over the file contents: data used to verify the integrity of the file
    contents, generally against transmission errors or malicious attacks.
    The signature can be included at the end of the file or in a separate
    file.




I used the magic number to define the magic number term I'm copying this from Wikipedia




In computer programming, the term magic number has multiple
meanings. It could refer to one or more of the following:



  • Unique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants

  • A constant numerical or text value used to identify a file format or protocol; for files, see List of file
    signatures

  • Distinctive unique values that are unlikely to be mistaken for other meanings(e.g., Globally Unique Identifiers)



in the second point it is a certain sequence of bytes like



PNG (89 50 4E 47 0D 0A 1A 0A) 


or



BMP (42 4D)


So how to know the magic number of each file?



in this article "Investigating File Signatures Using PowerShell" we find the writer created a wonderful power shell function to get the magic number also he mentioned a tool and I'm copying this from his article




PowerShell V5 brings in Format-Hex, which can provide an alternative
approach to reading the file and displaying the hex and ASCII value to
determine the magic number.




form Format-Hex help I'm copying this description




The Format-Hex cmdlet displays a file or other input as hexadecimal
values. To determine the offset of a character from the output, add
the number at the leftmost of the row to the number at the top of the
column for that character.



This cmdlet can help you determine the file type of a corrupted file
or a file which may not have a file name extension. Run this cmdlet,
and then inspect the results for file information.




this tool is very good also to get the magic number of a file. Here is an example
enter image description here



another tool is online hex editor but to be onset I didn't understand how to use it.



now we got the magic number but how to know what type of data or is that file or stream?
and that is the most good question.
Luckily there are many database for these magic numbers. let me list some




  1. File Signatures


  2. FILE SIGNATURES TABLE

  3. List of file signatures

for example the first database has a search capability. just enter the magic number with no spaces and search



enter image description here



after you may find. Yes, may. There is a big possibility that you won't directly find the file type in question.



I faced this and solved it by testing the streams against specific types of signatures. Like PNG I was searching for in a stream



def GetPngStartingOffset(arr):

#targted magic Number for png (89 50 4E 47 0D 0A 1A 0A)
markerFound = False
startingOffset = 0
previousValue = 0
arraylength = range(0, len(arr) -1)

for i in arraylength:
currentValue = arr[i]
if (currentValue == 137): # 0x89
markerFound = True
startingOffset = i
previousValue = currentValue
continue

if currentValue == 80: # 0x50
if (markerFound and (previousValue == 137)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 78: # 0x4E
if (markerFound and (previousValue == 80)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 71: # 0x47
if (markerFound and (previousValue == 78)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 13: # 0x0D
if (markerFound and (previousValue == 71)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 10: # 0x0A
if (markerFound and (previousValue == 26)):
return startingOffset
if (markerFound and (previousValue == 13)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 26: # 0x1A
if (markerFound and (previousValue == 10)):
previousValue = currentValue
continue
markerFound = False
return 0


Once this function found the magic number
enter image description here



I split the stream and save the png file



 arr = stream.read()
a = list(arr)
B = a[GetPngStartingOffset(a):len(a)]
bytesString = bytes(B)
image = Image.open(io.BytesIO(bytesString))
image.show()


At the end this is not an end to end solution but it is a way to figure out streams content
Thanks for reading and Thanks for @Robert Columbia for his patience






share|improve this answer























  • "there is a signature for each file" No, @Kevin was right; Some don't, in particular, text files (except for some scripts) don't.

    – Tom Blodget
    Nov 27 '18 at 17:53











  • @TomBlodget yes you are right. text files doesn't have signature unless it has an encoding like utf-8. And that is because the ASCII characters are stored as it is.

    – Ibrahim Kais Ibrahim
    Nov 27 '18 at 18:15












Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53288197%2ffigure-out-bytes-content%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














Yes, there is away to figure out each stream content. there is a signature for each file on this planet in addition to extension which is not reliable. it might be removed or falsely added.



So what is the signature?




In computing, a file signature is data used to identify or verify the
contents of a file. In particular, it may refer to:



  • File magic number: bytes within a file used to identify the
    format of the file; generally a short sequence of bytes (most are
    2-4 bytes long) placed at the beginning of the file; see list of file
    signatures


  • File checksum or more generally the result of a hash function over the file contents: data used to verify the integrity of the file
    contents, generally against transmission errors or malicious attacks.
    The signature can be included at the end of the file or in a separate
    file.




I used the magic number to define the magic number term I'm copying this from Wikipedia




In computer programming, the term magic number has multiple
meanings. It could refer to one or more of the following:



  • Unique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants

  • A constant numerical or text value used to identify a file format or protocol; for files, see List of file
    signatures

  • Distinctive unique values that are unlikely to be mistaken for other meanings(e.g., Globally Unique Identifiers)



in the second point it is a certain sequence of bytes like



PNG (89 50 4E 47 0D 0A 1A 0A) 


or



BMP (42 4D)


So how to know the magic number of each file?



in this article "Investigating File Signatures Using PowerShell" we find the writer created a wonderful power shell function to get the magic number also he mentioned a tool and I'm copying this from his article




PowerShell V5 brings in Format-Hex, which can provide an alternative
approach to reading the file and displaying the hex and ASCII value to
determine the magic number.




form Format-Hex help I'm copying this description




The Format-Hex cmdlet displays a file or other input as hexadecimal
values. To determine the offset of a character from the output, add
the number at the leftmost of the row to the number at the top of the
column for that character.



This cmdlet can help you determine the file type of a corrupted file
or a file which may not have a file name extension. Run this cmdlet,
and then inspect the results for file information.




this tool is very good also to get the magic number of a file. Here is an example
enter image description here



another tool is online hex editor but to be onset I didn't understand how to use it.



now we got the magic number but how to know what type of data or is that file or stream?
and that is the most good question.
Luckily there are many database for these magic numbers. let me list some




  1. File Signatures


  2. FILE SIGNATURES TABLE

  3. List of file signatures

for example the first database has a search capability. just enter the magic number with no spaces and search



enter image description here



after you may find. Yes, may. There is a big possibility that you won't directly find the file type in question.



I faced this and solved it by testing the streams against specific types of signatures. Like PNG I was searching for in a stream



def GetPngStartingOffset(arr):

#targted magic Number for png (89 50 4E 47 0D 0A 1A 0A)
markerFound = False
startingOffset = 0
previousValue = 0
arraylength = range(0, len(arr) -1)

for i in arraylength:
currentValue = arr[i]
if (currentValue == 137): # 0x89
markerFound = True
startingOffset = i
previousValue = currentValue
continue

if currentValue == 80: # 0x50
if (markerFound and (previousValue == 137)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 78: # 0x4E
if (markerFound and (previousValue == 80)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 71: # 0x47
if (markerFound and (previousValue == 78)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 13: # 0x0D
if (markerFound and (previousValue == 71)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 10: # 0x0A
if (markerFound and (previousValue == 26)):
return startingOffset
if (markerFound and (previousValue == 13)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 26: # 0x1A
if (markerFound and (previousValue == 10)):
previousValue = currentValue
continue
markerFound = False
return 0


Once this function found the magic number
enter image description here



I split the stream and save the png file



 arr = stream.read()
a = list(arr)
B = a[GetPngStartingOffset(a):len(a)]
bytesString = bytes(B)
image = Image.open(io.BytesIO(bytesString))
image.show()


At the end this is not an end to end solution but it is a way to figure out streams content
Thanks for reading and Thanks for @Robert Columbia for his patience






share|improve this answer























  • "there is a signature for each file" No, @Kevin was right; Some don't, in particular, text files (except for some scripts) don't.

    – Tom Blodget
    Nov 27 '18 at 17:53











  • @TomBlodget yes you are right. text files doesn't have signature unless it has an encoding like utf-8. And that is because the ASCII characters are stored as it is.

    – Ibrahim Kais Ibrahim
    Nov 27 '18 at 18:15
















0














Yes, there is away to figure out each stream content. there is a signature for each file on this planet in addition to extension which is not reliable. it might be removed or falsely added.



So what is the signature?




In computing, a file signature is data used to identify or verify the
contents of a file. In particular, it may refer to:



  • File magic number: bytes within a file used to identify the
    format of the file; generally a short sequence of bytes (most are
    2-4 bytes long) placed at the beginning of the file; see list of file
    signatures


  • File checksum or more generally the result of a hash function over the file contents: data used to verify the integrity of the file
    contents, generally against transmission errors or malicious attacks.
    The signature can be included at the end of the file or in a separate
    file.




I used the magic number to define the magic number term I'm copying this from Wikipedia




In computer programming, the term magic number has multiple
meanings. It could refer to one or more of the following:



  • Unique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants

  • A constant numerical or text value used to identify a file format or protocol; for files, see List of file
    signatures

  • Distinctive unique values that are unlikely to be mistaken for other meanings(e.g., Globally Unique Identifiers)



in the second point it is a certain sequence of bytes like



PNG (89 50 4E 47 0D 0A 1A 0A) 


or



BMP (42 4D)


So how to know the magic number of each file?



in this article "Investigating File Signatures Using PowerShell" we find the writer created a wonderful power shell function to get the magic number also he mentioned a tool and I'm copying this from his article




PowerShell V5 brings in Format-Hex, which can provide an alternative
approach to reading the file and displaying the hex and ASCII value to
determine the magic number.




form Format-Hex help I'm copying this description




The Format-Hex cmdlet displays a file or other input as hexadecimal
values. To determine the offset of a character from the output, add
the number at the leftmost of the row to the number at the top of the
column for that character.



This cmdlet can help you determine the file type of a corrupted file
or a file which may not have a file name extension. Run this cmdlet,
and then inspect the results for file information.




this tool is very good also to get the magic number of a file. Here is an example
enter image description here



another tool is online hex editor but to be onset I didn't understand how to use it.



now we got the magic number but how to know what type of data or is that file or stream?
and that is the most good question.
Luckily there are many database for these magic numbers. let me list some




  1. File Signatures


  2. FILE SIGNATURES TABLE

  3. List of file signatures

for example the first database has a search capability. just enter the magic number with no spaces and search



enter image description here



after you may find. Yes, may. There is a big possibility that you won't directly find the file type in question.



I faced this and solved it by testing the streams against specific types of signatures. Like PNG I was searching for in a stream



def GetPngStartingOffset(arr):

#targted magic Number for png (89 50 4E 47 0D 0A 1A 0A)
markerFound = False
startingOffset = 0
previousValue = 0
arraylength = range(0, len(arr) -1)

for i in arraylength:
currentValue = arr[i]
if (currentValue == 137): # 0x89
markerFound = True
startingOffset = i
previousValue = currentValue
continue

if currentValue == 80: # 0x50
if (markerFound and (previousValue == 137)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 78: # 0x4E
if (markerFound and (previousValue == 80)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 71: # 0x47
if (markerFound and (previousValue == 78)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 13: # 0x0D
if (markerFound and (previousValue == 71)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 10: # 0x0A
if (markerFound and (previousValue == 26)):
return startingOffset
if (markerFound and (previousValue == 13)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 26: # 0x1A
if (markerFound and (previousValue == 10)):
previousValue = currentValue
continue
markerFound = False
return 0


Once this function found the magic number
enter image description here



I split the stream and save the png file



 arr = stream.read()
a = list(arr)
B = a[GetPngStartingOffset(a):len(a)]
bytesString = bytes(B)
image = Image.open(io.BytesIO(bytesString))
image.show()


At the end this is not an end to end solution but it is a way to figure out streams content
Thanks for reading and Thanks for @Robert Columbia for his patience






share|improve this answer























  • "there is a signature for each file" No, @Kevin was right; Some don't, in particular, text files (except for some scripts) don't.

    – Tom Blodget
    Nov 27 '18 at 17:53











  • @TomBlodget yes you are right. text files doesn't have signature unless it has an encoding like utf-8. And that is because the ASCII characters are stored as it is.

    – Ibrahim Kais Ibrahim
    Nov 27 '18 at 18:15














0












0








0







Yes, there is away to figure out each stream content. there is a signature for each file on this planet in addition to extension which is not reliable. it might be removed or falsely added.



So what is the signature?




In computing, a file signature is data used to identify or verify the
contents of a file. In particular, it may refer to:



  • File magic number: bytes within a file used to identify the
    format of the file; generally a short sequence of bytes (most are
    2-4 bytes long) placed at the beginning of the file; see list of file
    signatures


  • File checksum or more generally the result of a hash function over the file contents: data used to verify the integrity of the file
    contents, generally against transmission errors or malicious attacks.
    The signature can be included at the end of the file or in a separate
    file.




I used the magic number to define the magic number term I'm copying this from Wikipedia




In computer programming, the term magic number has multiple
meanings. It could refer to one or more of the following:



  • Unique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants

  • A constant numerical or text value used to identify a file format or protocol; for files, see List of file
    signatures

  • Distinctive unique values that are unlikely to be mistaken for other meanings(e.g., Globally Unique Identifiers)



in the second point it is a certain sequence of bytes like



PNG (89 50 4E 47 0D 0A 1A 0A) 


or



BMP (42 4D)


So how to know the magic number of each file?



in this article "Investigating File Signatures Using PowerShell" we find the writer created a wonderful power shell function to get the magic number also he mentioned a tool and I'm copying this from his article




PowerShell V5 brings in Format-Hex, which can provide an alternative
approach to reading the file and displaying the hex and ASCII value to
determine the magic number.




form Format-Hex help I'm copying this description




The Format-Hex cmdlet displays a file or other input as hexadecimal
values. To determine the offset of a character from the output, add
the number at the leftmost of the row to the number at the top of the
column for that character.



This cmdlet can help you determine the file type of a corrupted file
or a file which may not have a file name extension. Run this cmdlet,
and then inspect the results for file information.




this tool is very good also to get the magic number of a file. Here is an example
enter image description here



another tool is online hex editor but to be onset I didn't understand how to use it.



now we got the magic number but how to know what type of data or is that file or stream?
and that is the most good question.
Luckily there are many database for these magic numbers. let me list some




  1. File Signatures


  2. FILE SIGNATURES TABLE

  3. List of file signatures

for example the first database has a search capability. just enter the magic number with no spaces and search



enter image description here



after you may find. Yes, may. There is a big possibility that you won't directly find the file type in question.



I faced this and solved it by testing the streams against specific types of signatures. Like PNG I was searching for in a stream



def GetPngStartingOffset(arr):

#targted magic Number for png (89 50 4E 47 0D 0A 1A 0A)
markerFound = False
startingOffset = 0
previousValue = 0
arraylength = range(0, len(arr) -1)

for i in arraylength:
currentValue = arr[i]
if (currentValue == 137): # 0x89
markerFound = True
startingOffset = i
previousValue = currentValue
continue

if currentValue == 80: # 0x50
if (markerFound and (previousValue == 137)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 78: # 0x4E
if (markerFound and (previousValue == 80)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 71: # 0x47
if (markerFound and (previousValue == 78)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 13: # 0x0D
if (markerFound and (previousValue == 71)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 10: # 0x0A
if (markerFound and (previousValue == 26)):
return startingOffset
if (markerFound and (previousValue == 13)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 26: # 0x1A
if (markerFound and (previousValue == 10)):
previousValue = currentValue
continue
markerFound = False
return 0


Once this function found the magic number
enter image description here



I split the stream and save the png file



 arr = stream.read()
a = list(arr)
B = a[GetPngStartingOffset(a):len(a)]
bytesString = bytes(B)
image = Image.open(io.BytesIO(bytesString))
image.show()


At the end this is not an end to end solution but it is a way to figure out streams content
Thanks for reading and Thanks for @Robert Columbia for his patience






share|improve this answer













Yes, there is away to figure out each stream content. there is a signature for each file on this planet in addition to extension which is not reliable. it might be removed or falsely added.



So what is the signature?




In computing, a file signature is data used to identify or verify the
contents of a file. In particular, it may refer to:



  • File magic number: bytes within a file used to identify the
    format of the file; generally a short sequence of bytes (most are
    2-4 bytes long) placed at the beginning of the file; see list of file
    signatures


  • File checksum or more generally the result of a hash function over the file contents: data used to verify the integrity of the file
    contents, generally against transmission errors or malicious attacks.
    The signature can be included at the end of the file or in a separate
    file.




I used the magic number to define the magic number term I'm copying this from Wikipedia




In computer programming, the term magic number has multiple
meanings. It could refer to one or more of the following:



  • Unique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants

  • A constant numerical or text value used to identify a file format or protocol; for files, see List of file
    signatures

  • Distinctive unique values that are unlikely to be mistaken for other meanings(e.g., Globally Unique Identifiers)



in the second point it is a certain sequence of bytes like



PNG (89 50 4E 47 0D 0A 1A 0A) 


or



BMP (42 4D)


So how to know the magic number of each file?



in this article "Investigating File Signatures Using PowerShell" we find the writer created a wonderful power shell function to get the magic number also he mentioned a tool and I'm copying this from his article




PowerShell V5 brings in Format-Hex, which can provide an alternative
approach to reading the file and displaying the hex and ASCII value to
determine the magic number.




form Format-Hex help I'm copying this description




The Format-Hex cmdlet displays a file or other input as hexadecimal
values. To determine the offset of a character from the output, add
the number at the leftmost of the row to the number at the top of the
column for that character.



This cmdlet can help you determine the file type of a corrupted file
or a file which may not have a file name extension. Run this cmdlet,
and then inspect the results for file information.




this tool is very good also to get the magic number of a file. Here is an example
enter image description here



another tool is online hex editor but to be onset I didn't understand how to use it.



now we got the magic number but how to know what type of data or is that file or stream?
and that is the most good question.
Luckily there are many database for these magic numbers. let me list some




  1. File Signatures


  2. FILE SIGNATURES TABLE

  3. List of file signatures

for example the first database has a search capability. just enter the magic number with no spaces and search



enter image description here



after you may find. Yes, may. There is a big possibility that you won't directly find the file type in question.



I faced this and solved it by testing the streams against specific types of signatures. Like PNG I was searching for in a stream



def GetPngStartingOffset(arr):

#targted magic Number for png (89 50 4E 47 0D 0A 1A 0A)
markerFound = False
startingOffset = 0
previousValue = 0
arraylength = range(0, len(arr) -1)

for i in arraylength:
currentValue = arr[i]
if (currentValue == 137): # 0x89
markerFound = True
startingOffset = i
previousValue = currentValue
continue

if currentValue == 80: # 0x50
if (markerFound and (previousValue == 137)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 78: # 0x4E
if (markerFound and (previousValue == 80)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 71: # 0x47
if (markerFound and (previousValue == 78)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 13: # 0x0D
if (markerFound and (previousValue == 71)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 10: # 0x0A
if (markerFound and (previousValue == 26)):
return startingOffset
if (markerFound and (previousValue == 13)):
previousValue = currentValue
continue
markerFound = False

elif currentValue == 26: # 0x1A
if (markerFound and (previousValue == 10)):
previousValue = currentValue
continue
markerFound = False
return 0


Once this function found the magic number
enter image description here



I split the stream and save the png file



 arr = stream.read()
a = list(arr)
B = a[GetPngStartingOffset(a):len(a)]
bytesString = bytes(B)
image = Image.open(io.BytesIO(bytesString))
image.show()


At the end this is not an end to end solution but it is a way to figure out streams content
Thanks for reading and Thanks for @Robert Columbia for his patience







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 27 '18 at 16:12









Ibrahim Kais IbrahimIbrahim Kais Ibrahim

683721




683721












  • "there is a signature for each file" No, @Kevin was right; Some don't, in particular, text files (except for some scripts) don't.

    – Tom Blodget
    Nov 27 '18 at 17:53











  • @TomBlodget yes you are right. text files doesn't have signature unless it has an encoding like utf-8. And that is because the ASCII characters are stored as it is.

    – Ibrahim Kais Ibrahim
    Nov 27 '18 at 18:15


















  • "there is a signature for each file" No, @Kevin was right; Some don't, in particular, text files (except for some scripts) don't.

    – Tom Blodget
    Nov 27 '18 at 17:53











  • @TomBlodget yes you are right. text files doesn't have signature unless it has an encoding like utf-8. And that is because the ASCII characters are stored as it is.

    – Ibrahim Kais Ibrahim
    Nov 27 '18 at 18:15

















"there is a signature for each file" No, @Kevin was right; Some don't, in particular, text files (except for some scripts) don't.

– Tom Blodget
Nov 27 '18 at 17:53





"there is a signature for each file" No, @Kevin was right; Some don't, in particular, text files (except for some scripts) don't.

– Tom Blodget
Nov 27 '18 at 17:53













@TomBlodget yes you are right. text files doesn't have signature unless it has an encoding like utf-8. And that is because the ASCII characters are stored as it is.

– Ibrahim Kais Ibrahim
Nov 27 '18 at 18:15






@TomBlodget yes you are right. text files doesn't have signature unless it has an encoding like utf-8. And that is because the ASCII characters are stored as it is.

– Ibrahim Kais Ibrahim
Nov 27 '18 at 18:15




















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53288197%2ffigure-out-bytes-content%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)