Figure out bytes content
I was working on a compound file which contains several streams. I'm frustrated how to figure out the content of each stream. I don't know if these bytes are text or mp3 or video.
for example: is there a way to understand what types of data could these bytes are?
b'x00x00x00x00x00x00x00x00x1fx8bx08x00x00x00x00x00x00x0bzxccxc9xc8xc0xc0x00xc2?x82x1e<x0ecxbc*8x19xc8ixb3W_x0bx14bHx00xb2-x99x18x18xfex03x01x88xcfxc0x01xc4xe1x0cxf9x0cEx0cxd9x0cxc5x0cxa9x0c%x0cx86`xcd x0cx020x1ax00x00x00xffxffx02x080x00x96L~x89Wx00x00x00x00x80(\BxefI;x9e}pxfex1axb2x9b>(x81x86/=xc9xH0:Pwbxb7xdck-xd2Fx04xd7co'
java c# python c++ c
|
show 5 more comments
I was working on a compound file which contains several streams. I'm frustrated how to figure out the content of each stream. I don't know if these bytes are text or mp3 or video.
for example: is there a way to understand what types of data could these bytes are?
b'x00x00x00x00x00x00x00x00x1fx8bx08x00x00x00x00x00x00x0bzxccxc9xc8xc0xc0x00xc2?x82x1e<x0ecxbc*8x19xc8ixb3W_x0bx14bHx00xb2-x99x18x18xfex03x01x88xcfxc0x01xc4xe1x0cxf9x0cEx0cxd9x0cxc5x0cxa9x0c%x0cx86`xcd x0cx020x1ax00x00x00xffxffx02x080x00x96L~x89Wx00x00x00x00x80(\BxefI;x9e}pxfex1axb2x9b>(x81x86/=xc9xH0:Pwbxb7xdck-xd2Fx04xd7co'
java c# python c++ c
2
Possible duplicate of Python 3 - Encode/Decode vs Bytes/Str
– m0etaz
Nov 13 '18 at 19:30
2
As in, "how do I tell if these bytes comprise an mp3, or a video, or an image, or something else?"? There's no universal way of determining a data format. Some formats have convenient self-identifying header data, and some don't.
– Kevin
Nov 13 '18 at 19:35
2
Your question is very unclear. What exactly are you trying to do?
– Joel
Nov 13 '18 at 19:35
1
@Kevin that is exactly what I want to do. is there a technique or pattern used to test these bytes to get close for something?? how to read the header? all what I have is bytes
– Ibrahim Kais Ibrahim
Nov 13 '18 at 19:41
1
Compare your bytes against Every. Known. Filetype. That's it. It's notmagic
; that is howfile
works. (Descriptions of both of these two terms can be found in your favouriteman
version.)
– usr2564301
Nov 13 '18 at 20:22
|
show 5 more comments
I was working on a compound file which contains several streams. I'm frustrated how to figure out the content of each stream. I don't know if these bytes are text or mp3 or video.
for example: is there a way to understand what types of data could these bytes are?
b'x00x00x00x00x00x00x00x00x1fx8bx08x00x00x00x00x00x00x0bzxccxc9xc8xc0xc0x00xc2?x82x1e<x0ecxbc*8x19xc8ixb3W_x0bx14bHx00xb2-x99x18x18xfex03x01x88xcfxc0x01xc4xe1x0cxf9x0cEx0cxd9x0cxc5x0cxa9x0c%x0cx86`xcd x0cx020x1ax00x00x00xffxffx02x080x00x96L~x89Wx00x00x00x00x80(\BxefI;x9e}pxfex1axb2x9b>(x81x86/=xc9xH0:Pwbxb7xdck-xd2Fx04xd7co'
java c# python c++ c
I was working on a compound file which contains several streams. I'm frustrated how to figure out the content of each stream. I don't know if these bytes are text or mp3 or video.
for example: is there a way to understand what types of data could these bytes are?
b'x00x00x00x00x00x00x00x00x1fx8bx08x00x00x00x00x00x00x0bzxccxc9xc8xc0xc0x00xc2?x82x1e<x0ecxbc*8x19xc8ixb3W_x0bx14bHx00xb2-x99x18x18xfex03x01x88xcfxc0x01xc4xe1x0cxf9x0cEx0cxd9x0cxc5x0cxa9x0c%x0cx86`xcd x0cx020x1ax00x00x00xffxffx02x080x00x96L~x89Wx00x00x00x00x80(\BxefI;x9e}pxfex1axb2x9b>(x81x86/=xc9xH0:Pwbxb7xdck-xd2Fx04xd7co'
java c# python c++ c
java c# python c++ c
edited Nov 27 '18 at 16:13
Ibrahim Kais Ibrahim
asked Nov 13 '18 at 19:27
Ibrahim Kais IbrahimIbrahim Kais Ibrahim
683721
683721
2
Possible duplicate of Python 3 - Encode/Decode vs Bytes/Str
– m0etaz
Nov 13 '18 at 19:30
2
As in, "how do I tell if these bytes comprise an mp3, or a video, or an image, or something else?"? There's no universal way of determining a data format. Some formats have convenient self-identifying header data, and some don't.
– Kevin
Nov 13 '18 at 19:35
2
Your question is very unclear. What exactly are you trying to do?
– Joel
Nov 13 '18 at 19:35
1
@Kevin that is exactly what I want to do. is there a technique or pattern used to test these bytes to get close for something?? how to read the header? all what I have is bytes
– Ibrahim Kais Ibrahim
Nov 13 '18 at 19:41
1
Compare your bytes against Every. Known. Filetype. That's it. It's notmagic
; that is howfile
works. (Descriptions of both of these two terms can be found in your favouriteman
version.)
– usr2564301
Nov 13 '18 at 20:22
|
show 5 more comments
2
Possible duplicate of Python 3 - Encode/Decode vs Bytes/Str
– m0etaz
Nov 13 '18 at 19:30
2
As in, "how do I tell if these bytes comprise an mp3, or a video, or an image, or something else?"? There's no universal way of determining a data format. Some formats have convenient self-identifying header data, and some don't.
– Kevin
Nov 13 '18 at 19:35
2
Your question is very unclear. What exactly are you trying to do?
– Joel
Nov 13 '18 at 19:35
1
@Kevin that is exactly what I want to do. is there a technique or pattern used to test these bytes to get close for something?? how to read the header? all what I have is bytes
– Ibrahim Kais Ibrahim
Nov 13 '18 at 19:41
1
Compare your bytes against Every. Known. Filetype. That's it. It's notmagic
; that is howfile
works. (Descriptions of both of these two terms can be found in your favouriteman
version.)
– usr2564301
Nov 13 '18 at 20:22
2
2
Possible duplicate of Python 3 - Encode/Decode vs Bytes/Str
– m0etaz
Nov 13 '18 at 19:30
Possible duplicate of Python 3 - Encode/Decode vs Bytes/Str
– m0etaz
Nov 13 '18 at 19:30
2
2
As in, "how do I tell if these bytes comprise an mp3, or a video, or an image, or something else?"? There's no universal way of determining a data format. Some formats have convenient self-identifying header data, and some don't.
– Kevin
Nov 13 '18 at 19:35
As in, "how do I tell if these bytes comprise an mp3, or a video, or an image, or something else?"? There's no universal way of determining a data format. Some formats have convenient self-identifying header data, and some don't.
– Kevin
Nov 13 '18 at 19:35
2
2
Your question is very unclear. What exactly are you trying to do?
– Joel
Nov 13 '18 at 19:35
Your question is very unclear. What exactly are you trying to do?
– Joel
Nov 13 '18 at 19:35
1
1
@Kevin that is exactly what I want to do. is there a technique or pattern used to test these bytes to get close for something?? how to read the header? all what I have is bytes
– Ibrahim Kais Ibrahim
Nov 13 '18 at 19:41
@Kevin that is exactly what I want to do. is there a technique or pattern used to test these bytes to get close for something?? how to read the header? all what I have is bytes
– Ibrahim Kais Ibrahim
Nov 13 '18 at 19:41
1
1
Compare your bytes against Every. Known. Filetype. That's it. It's not
magic
; that is how file
works. (Descriptions of both of these two terms can be found in your favourite man
version.)– usr2564301
Nov 13 '18 at 20:22
Compare your bytes against Every. Known. Filetype. That's it. It's not
magic
; that is how file
works. (Descriptions of both of these two terms can be found in your favourite man
version.)– usr2564301
Nov 13 '18 at 20:22
|
show 5 more comments
1 Answer
1
active
oldest
votes
Yes, there is away to figure out each stream content. there is a signature for each file on this planet in addition to extension which is not reliable. it might be removed or falsely added.
So what is the signature?
In computing, a file signature is data used to identify or verify the
contents of a file. In particular, it may refer to:
File magic number: bytes within a file used to identify the
format of the file; generally a short sequence of bytes (most are
2-4 bytes long) placed at the beginning of the file; see list of file
signatures
File checksum or more generally the result of a hash function over the file contents: data used to verify the integrity of the file
contents, generally against transmission errors or malicious attacks.
The signature can be included at the end of the file or in a separate
file.
I used the magic number to define the magic number term I'm copying this from Wikipedia
In computer programming, the term magic number has multiple
meanings. It could refer to one or more of the following:
- Unique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants
- A constant numerical or text value used to identify a file format or protocol; for files, see List of file
signatures
- Distinctive unique values that are unlikely to be mistaken for other meanings(e.g., Globally Unique Identifiers)
in the second point it is a certain sequence of bytes like
PNG (89 50 4E 47 0D 0A 1A 0A)
or
BMP (42 4D)
So how to know the magic number of each file?
in this article "Investigating File Signatures Using PowerShell" we find the writer created a wonderful power shell function to get the magic number also he mentioned a tool and I'm copying this from his article
PowerShell V5 brings in Format-Hex, which can provide an alternative
approach to reading the file and displaying the hex and ASCII value to
determine the magic number.
form Format-Hex help I'm copying this description
The Format-Hex cmdlet displays a file or other input as hexadecimal
values. To determine the offset of a character from the output, add
the number at the leftmost of the row to the number at the top of the
column for that character.
This cmdlet can help you determine the file type of a corrupted file
or a file which may not have a file name extension. Run this cmdlet,
and then inspect the results for file information.
this tool is very good also to get the magic number of a file. Here is an example
another tool is online hex editor but to be onset I didn't understand how to use it.
now we got the magic number but how to know what type of data or is that file or stream?
and that is the most good question.
Luckily there are many database for these magic numbers. let me list some
File Signatures
FILE SIGNATURES TABLE- List of file signatures
for example the first database has a search capability. just enter the magic number with no spaces and search
after you may find. Yes, may. There is a big possibility that you won't directly find the file type in question.
I faced this and solved it by testing the streams against specific types of signatures. Like PNG I was searching for in a stream
def GetPngStartingOffset(arr):
#targted magic Number for png (89 50 4E 47 0D 0A 1A 0A)
markerFound = False
startingOffset = 0
previousValue = 0
arraylength = range(0, len(arr) -1)
for i in arraylength:
currentValue = arr[i]
if (currentValue == 137): # 0x89
markerFound = True
startingOffset = i
previousValue = currentValue
continue
if currentValue == 80: # 0x50
if (markerFound and (previousValue == 137)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 78: # 0x4E
if (markerFound and (previousValue == 80)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 71: # 0x47
if (markerFound and (previousValue == 78)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 13: # 0x0D
if (markerFound and (previousValue == 71)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 10: # 0x0A
if (markerFound and (previousValue == 26)):
return startingOffset
if (markerFound and (previousValue == 13)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 26: # 0x1A
if (markerFound and (previousValue == 10)):
previousValue = currentValue
continue
markerFound = False
return 0
Once this function found the magic number
I split the stream and save the png file
arr = stream.read()
a = list(arr)
B = a[GetPngStartingOffset(a):len(a)]
bytesString = bytes(B)
image = Image.open(io.BytesIO(bytesString))
image.show()
At the end this is not an end to end solution but it is a way to figure out streams content
Thanks for reading and Thanks for @Robert Columbia for his patience
"there is a signature for each file" No, @Kevin was right; Some don't, in particular, text files (except for some scripts) don't.
– Tom Blodget
Nov 27 '18 at 17:53
@TomBlodget yes you are right. text files doesn't have signature unless it has an encoding like utf-8. And that is because the ASCII characters are stored as it is.
– Ibrahim Kais Ibrahim
Nov 27 '18 at 18:15
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53288197%2ffigure-out-bytes-content%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Yes, there is away to figure out each stream content. there is a signature for each file on this planet in addition to extension which is not reliable. it might be removed or falsely added.
So what is the signature?
In computing, a file signature is data used to identify or verify the
contents of a file. In particular, it may refer to:
File magic number: bytes within a file used to identify the
format of the file; generally a short sequence of bytes (most are
2-4 bytes long) placed at the beginning of the file; see list of file
signatures
File checksum or more generally the result of a hash function over the file contents: data used to verify the integrity of the file
contents, generally against transmission errors or malicious attacks.
The signature can be included at the end of the file or in a separate
file.
I used the magic number to define the magic number term I'm copying this from Wikipedia
In computer programming, the term magic number has multiple
meanings. It could refer to one or more of the following:
- Unique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants
- A constant numerical or text value used to identify a file format or protocol; for files, see List of file
signatures
- Distinctive unique values that are unlikely to be mistaken for other meanings(e.g., Globally Unique Identifiers)
in the second point it is a certain sequence of bytes like
PNG (89 50 4E 47 0D 0A 1A 0A)
or
BMP (42 4D)
So how to know the magic number of each file?
in this article "Investigating File Signatures Using PowerShell" we find the writer created a wonderful power shell function to get the magic number also he mentioned a tool and I'm copying this from his article
PowerShell V5 brings in Format-Hex, which can provide an alternative
approach to reading the file and displaying the hex and ASCII value to
determine the magic number.
form Format-Hex help I'm copying this description
The Format-Hex cmdlet displays a file or other input as hexadecimal
values. To determine the offset of a character from the output, add
the number at the leftmost of the row to the number at the top of the
column for that character.
This cmdlet can help you determine the file type of a corrupted file
or a file which may not have a file name extension. Run this cmdlet,
and then inspect the results for file information.
this tool is very good also to get the magic number of a file. Here is an example
another tool is online hex editor but to be onset I didn't understand how to use it.
now we got the magic number but how to know what type of data or is that file or stream?
and that is the most good question.
Luckily there are many database for these magic numbers. let me list some
File Signatures
FILE SIGNATURES TABLE- List of file signatures
for example the first database has a search capability. just enter the magic number with no spaces and search
after you may find. Yes, may. There is a big possibility that you won't directly find the file type in question.
I faced this and solved it by testing the streams against specific types of signatures. Like PNG I was searching for in a stream
def GetPngStartingOffset(arr):
#targted magic Number for png (89 50 4E 47 0D 0A 1A 0A)
markerFound = False
startingOffset = 0
previousValue = 0
arraylength = range(0, len(arr) -1)
for i in arraylength:
currentValue = arr[i]
if (currentValue == 137): # 0x89
markerFound = True
startingOffset = i
previousValue = currentValue
continue
if currentValue == 80: # 0x50
if (markerFound and (previousValue == 137)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 78: # 0x4E
if (markerFound and (previousValue == 80)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 71: # 0x47
if (markerFound and (previousValue == 78)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 13: # 0x0D
if (markerFound and (previousValue == 71)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 10: # 0x0A
if (markerFound and (previousValue == 26)):
return startingOffset
if (markerFound and (previousValue == 13)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 26: # 0x1A
if (markerFound and (previousValue == 10)):
previousValue = currentValue
continue
markerFound = False
return 0
Once this function found the magic number
I split the stream and save the png file
arr = stream.read()
a = list(arr)
B = a[GetPngStartingOffset(a):len(a)]
bytesString = bytes(B)
image = Image.open(io.BytesIO(bytesString))
image.show()
At the end this is not an end to end solution but it is a way to figure out streams content
Thanks for reading and Thanks for @Robert Columbia for his patience
"there is a signature for each file" No, @Kevin was right; Some don't, in particular, text files (except for some scripts) don't.
– Tom Blodget
Nov 27 '18 at 17:53
@TomBlodget yes you are right. text files doesn't have signature unless it has an encoding like utf-8. And that is because the ASCII characters are stored as it is.
– Ibrahim Kais Ibrahim
Nov 27 '18 at 18:15
add a comment |
Yes, there is away to figure out each stream content. there is a signature for each file on this planet in addition to extension which is not reliable. it might be removed or falsely added.
So what is the signature?
In computing, a file signature is data used to identify or verify the
contents of a file. In particular, it may refer to:
File magic number: bytes within a file used to identify the
format of the file; generally a short sequence of bytes (most are
2-4 bytes long) placed at the beginning of the file; see list of file
signatures
File checksum or more generally the result of a hash function over the file contents: data used to verify the integrity of the file
contents, generally against transmission errors or malicious attacks.
The signature can be included at the end of the file or in a separate
file.
I used the magic number to define the magic number term I'm copying this from Wikipedia
In computer programming, the term magic number has multiple
meanings. It could refer to one or more of the following:
- Unique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants
- A constant numerical or text value used to identify a file format or protocol; for files, see List of file
signatures
- Distinctive unique values that are unlikely to be mistaken for other meanings(e.g., Globally Unique Identifiers)
in the second point it is a certain sequence of bytes like
PNG (89 50 4E 47 0D 0A 1A 0A)
or
BMP (42 4D)
So how to know the magic number of each file?
in this article "Investigating File Signatures Using PowerShell" we find the writer created a wonderful power shell function to get the magic number also he mentioned a tool and I'm copying this from his article
PowerShell V5 brings in Format-Hex, which can provide an alternative
approach to reading the file and displaying the hex and ASCII value to
determine the magic number.
form Format-Hex help I'm copying this description
The Format-Hex cmdlet displays a file or other input as hexadecimal
values. To determine the offset of a character from the output, add
the number at the leftmost of the row to the number at the top of the
column for that character.
This cmdlet can help you determine the file type of a corrupted file
or a file which may not have a file name extension. Run this cmdlet,
and then inspect the results for file information.
this tool is very good also to get the magic number of a file. Here is an example
another tool is online hex editor but to be onset I didn't understand how to use it.
now we got the magic number but how to know what type of data or is that file or stream?
and that is the most good question.
Luckily there are many database for these magic numbers. let me list some
File Signatures
FILE SIGNATURES TABLE- List of file signatures
for example the first database has a search capability. just enter the magic number with no spaces and search
after you may find. Yes, may. There is a big possibility that you won't directly find the file type in question.
I faced this and solved it by testing the streams against specific types of signatures. Like PNG I was searching for in a stream
def GetPngStartingOffset(arr):
#targted magic Number for png (89 50 4E 47 0D 0A 1A 0A)
markerFound = False
startingOffset = 0
previousValue = 0
arraylength = range(0, len(arr) -1)
for i in arraylength:
currentValue = arr[i]
if (currentValue == 137): # 0x89
markerFound = True
startingOffset = i
previousValue = currentValue
continue
if currentValue == 80: # 0x50
if (markerFound and (previousValue == 137)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 78: # 0x4E
if (markerFound and (previousValue == 80)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 71: # 0x47
if (markerFound and (previousValue == 78)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 13: # 0x0D
if (markerFound and (previousValue == 71)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 10: # 0x0A
if (markerFound and (previousValue == 26)):
return startingOffset
if (markerFound and (previousValue == 13)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 26: # 0x1A
if (markerFound and (previousValue == 10)):
previousValue = currentValue
continue
markerFound = False
return 0
Once this function found the magic number
I split the stream and save the png file
arr = stream.read()
a = list(arr)
B = a[GetPngStartingOffset(a):len(a)]
bytesString = bytes(B)
image = Image.open(io.BytesIO(bytesString))
image.show()
At the end this is not an end to end solution but it is a way to figure out streams content
Thanks for reading and Thanks for @Robert Columbia for his patience
"there is a signature for each file" No, @Kevin was right; Some don't, in particular, text files (except for some scripts) don't.
– Tom Blodget
Nov 27 '18 at 17:53
@TomBlodget yes you are right. text files doesn't have signature unless it has an encoding like utf-8. And that is because the ASCII characters are stored as it is.
– Ibrahim Kais Ibrahim
Nov 27 '18 at 18:15
add a comment |
Yes, there is away to figure out each stream content. there is a signature for each file on this planet in addition to extension which is not reliable. it might be removed or falsely added.
So what is the signature?
In computing, a file signature is data used to identify or verify the
contents of a file. In particular, it may refer to:
File magic number: bytes within a file used to identify the
format of the file; generally a short sequence of bytes (most are
2-4 bytes long) placed at the beginning of the file; see list of file
signatures
File checksum or more generally the result of a hash function over the file contents: data used to verify the integrity of the file
contents, generally against transmission errors or malicious attacks.
The signature can be included at the end of the file or in a separate
file.
I used the magic number to define the magic number term I'm copying this from Wikipedia
In computer programming, the term magic number has multiple
meanings. It could refer to one or more of the following:
- Unique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants
- A constant numerical or text value used to identify a file format or protocol; for files, see List of file
signatures
- Distinctive unique values that are unlikely to be mistaken for other meanings(e.g., Globally Unique Identifiers)
in the second point it is a certain sequence of bytes like
PNG (89 50 4E 47 0D 0A 1A 0A)
or
BMP (42 4D)
So how to know the magic number of each file?
in this article "Investigating File Signatures Using PowerShell" we find the writer created a wonderful power shell function to get the magic number also he mentioned a tool and I'm copying this from his article
PowerShell V5 brings in Format-Hex, which can provide an alternative
approach to reading the file and displaying the hex and ASCII value to
determine the magic number.
form Format-Hex help I'm copying this description
The Format-Hex cmdlet displays a file or other input as hexadecimal
values. To determine the offset of a character from the output, add
the number at the leftmost of the row to the number at the top of the
column for that character.
This cmdlet can help you determine the file type of a corrupted file
or a file which may not have a file name extension. Run this cmdlet,
and then inspect the results for file information.
this tool is very good also to get the magic number of a file. Here is an example
another tool is online hex editor but to be onset I didn't understand how to use it.
now we got the magic number but how to know what type of data or is that file or stream?
and that is the most good question.
Luckily there are many database for these magic numbers. let me list some
File Signatures
FILE SIGNATURES TABLE- List of file signatures
for example the first database has a search capability. just enter the magic number with no spaces and search
after you may find. Yes, may. There is a big possibility that you won't directly find the file type in question.
I faced this and solved it by testing the streams against specific types of signatures. Like PNG I was searching for in a stream
def GetPngStartingOffset(arr):
#targted magic Number for png (89 50 4E 47 0D 0A 1A 0A)
markerFound = False
startingOffset = 0
previousValue = 0
arraylength = range(0, len(arr) -1)
for i in arraylength:
currentValue = arr[i]
if (currentValue == 137): # 0x89
markerFound = True
startingOffset = i
previousValue = currentValue
continue
if currentValue == 80: # 0x50
if (markerFound and (previousValue == 137)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 78: # 0x4E
if (markerFound and (previousValue == 80)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 71: # 0x47
if (markerFound and (previousValue == 78)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 13: # 0x0D
if (markerFound and (previousValue == 71)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 10: # 0x0A
if (markerFound and (previousValue == 26)):
return startingOffset
if (markerFound and (previousValue == 13)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 26: # 0x1A
if (markerFound and (previousValue == 10)):
previousValue = currentValue
continue
markerFound = False
return 0
Once this function found the magic number
I split the stream and save the png file
arr = stream.read()
a = list(arr)
B = a[GetPngStartingOffset(a):len(a)]
bytesString = bytes(B)
image = Image.open(io.BytesIO(bytesString))
image.show()
At the end this is not an end to end solution but it is a way to figure out streams content
Thanks for reading and Thanks for @Robert Columbia for his patience
Yes, there is away to figure out each stream content. there is a signature for each file on this planet in addition to extension which is not reliable. it might be removed or falsely added.
So what is the signature?
In computing, a file signature is data used to identify or verify the
contents of a file. In particular, it may refer to:
File magic number: bytes within a file used to identify the
format of the file; generally a short sequence of bytes (most are
2-4 bytes long) placed at the beginning of the file; see list of file
signatures
File checksum or more generally the result of a hash function over the file contents: data used to verify the integrity of the file
contents, generally against transmission errors or malicious attacks.
The signature can be included at the end of the file or in a separate
file.
I used the magic number to define the magic number term I'm copying this from Wikipedia
In computer programming, the term magic number has multiple
meanings. It could refer to one or more of the following:
- Unique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants
- A constant numerical or text value used to identify a file format or protocol; for files, see List of file
signatures
- Distinctive unique values that are unlikely to be mistaken for other meanings(e.g., Globally Unique Identifiers)
in the second point it is a certain sequence of bytes like
PNG (89 50 4E 47 0D 0A 1A 0A)
or
BMP (42 4D)
So how to know the magic number of each file?
in this article "Investigating File Signatures Using PowerShell" we find the writer created a wonderful power shell function to get the magic number also he mentioned a tool and I'm copying this from his article
PowerShell V5 brings in Format-Hex, which can provide an alternative
approach to reading the file and displaying the hex and ASCII value to
determine the magic number.
form Format-Hex help I'm copying this description
The Format-Hex cmdlet displays a file or other input as hexadecimal
values. To determine the offset of a character from the output, add
the number at the leftmost of the row to the number at the top of the
column for that character.
This cmdlet can help you determine the file type of a corrupted file
or a file which may not have a file name extension. Run this cmdlet,
and then inspect the results for file information.
this tool is very good also to get the magic number of a file. Here is an example
another tool is online hex editor but to be onset I didn't understand how to use it.
now we got the magic number but how to know what type of data or is that file or stream?
and that is the most good question.
Luckily there are many database for these magic numbers. let me list some
File Signatures
FILE SIGNATURES TABLE- List of file signatures
for example the first database has a search capability. just enter the magic number with no spaces and search
after you may find. Yes, may. There is a big possibility that you won't directly find the file type in question.
I faced this and solved it by testing the streams against specific types of signatures. Like PNG I was searching for in a stream
def GetPngStartingOffset(arr):
#targted magic Number for png (89 50 4E 47 0D 0A 1A 0A)
markerFound = False
startingOffset = 0
previousValue = 0
arraylength = range(0, len(arr) -1)
for i in arraylength:
currentValue = arr[i]
if (currentValue == 137): # 0x89
markerFound = True
startingOffset = i
previousValue = currentValue
continue
if currentValue == 80: # 0x50
if (markerFound and (previousValue == 137)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 78: # 0x4E
if (markerFound and (previousValue == 80)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 71: # 0x47
if (markerFound and (previousValue == 78)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 13: # 0x0D
if (markerFound and (previousValue == 71)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 10: # 0x0A
if (markerFound and (previousValue == 26)):
return startingOffset
if (markerFound and (previousValue == 13)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 26: # 0x1A
if (markerFound and (previousValue == 10)):
previousValue = currentValue
continue
markerFound = False
return 0
Once this function found the magic number
I split the stream and save the png file
arr = stream.read()
a = list(arr)
B = a[GetPngStartingOffset(a):len(a)]
bytesString = bytes(B)
image = Image.open(io.BytesIO(bytesString))
image.show()
At the end this is not an end to end solution but it is a way to figure out streams content
Thanks for reading and Thanks for @Robert Columbia for his patience
answered Nov 27 '18 at 16:12
Ibrahim Kais IbrahimIbrahim Kais Ibrahim
683721
683721
"there is a signature for each file" No, @Kevin was right; Some don't, in particular, text files (except for some scripts) don't.
– Tom Blodget
Nov 27 '18 at 17:53
@TomBlodget yes you are right. text files doesn't have signature unless it has an encoding like utf-8. And that is because the ASCII characters are stored as it is.
– Ibrahim Kais Ibrahim
Nov 27 '18 at 18:15
add a comment |
"there is a signature for each file" No, @Kevin was right; Some don't, in particular, text files (except for some scripts) don't.
– Tom Blodget
Nov 27 '18 at 17:53
@TomBlodget yes you are right. text files doesn't have signature unless it has an encoding like utf-8. And that is because the ASCII characters are stored as it is.
– Ibrahim Kais Ibrahim
Nov 27 '18 at 18:15
"there is a signature for each file" No, @Kevin was right; Some don't, in particular, text files (except for some scripts) don't.
– Tom Blodget
Nov 27 '18 at 17:53
"there is a signature for each file" No, @Kevin was right; Some don't, in particular, text files (except for some scripts) don't.
– Tom Blodget
Nov 27 '18 at 17:53
@TomBlodget yes you are right. text files doesn't have signature unless it has an encoding like utf-8. And that is because the ASCII characters are stored as it is.
– Ibrahim Kais Ibrahim
Nov 27 '18 at 18:15
@TomBlodget yes you are right. text files doesn't have signature unless it has an encoding like utf-8. And that is because the ASCII characters are stored as it is.
– Ibrahim Kais Ibrahim
Nov 27 '18 at 18:15
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53288197%2ffigure-out-bytes-content%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Possible duplicate of Python 3 - Encode/Decode vs Bytes/Str
– m0etaz
Nov 13 '18 at 19:30
2
As in, "how do I tell if these bytes comprise an mp3, or a video, or an image, or something else?"? There's no universal way of determining a data format. Some formats have convenient self-identifying header data, and some don't.
– Kevin
Nov 13 '18 at 19:35
2
Your question is very unclear. What exactly are you trying to do?
– Joel
Nov 13 '18 at 19:35
1
@Kevin that is exactly what I want to do. is there a technique or pattern used to test these bytes to get close for something?? how to read the header? all what I have is bytes
– Ibrahim Kais Ibrahim
Nov 13 '18 at 19:41
1
Compare your bytes against Every. Known. Filetype. That's it. It's not
magic
; that is howfile
works. (Descriptions of both of these two terms can be found in your favouriteman
version.)– usr2564301
Nov 13 '18 at 20:22