ARM assembly dereferencing string only retrieving 4 bytes
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have the following code in my ARM assembly program
.data
.balign 4
prompt1: .asciz "Enter a string: "
.balign 4
scan1: .asciz "%s"
.balign 4
string_read: .word 0
.text
.global main
main:
push fp, lr
ldr r0, addr_prompt1
bl printf
ldr r0, addr_scan1
ldr r1, addr_string_arg
bl scanf
ldr r2, addr_string_arg
ldr r2, [r2]
addr_prompt1 : .word prompt1
addr_scan1 : .word scan1
addr_string_arg : .word string_read
I am using GDB PEDA to debug. Let us say that I input "test_cases" as my string. I can see that when I perform
ldr r2, addr_string_arg
it is holding an address that is pointing to the full string "test_cases".
However, after I dereference
ldr r2, [r2]
r2 now holds the value (b'test'). When I try passing this to a function, it becomes "test/002". This happens with any string that is more than 4 characters long. I tried changing the values next to .balign as well as .word and neither of those helped.
string assembly memory arm
add a comment |
I have the following code in my ARM assembly program
.data
.balign 4
prompt1: .asciz "Enter a string: "
.balign 4
scan1: .asciz "%s"
.balign 4
string_read: .word 0
.text
.global main
main:
push fp, lr
ldr r0, addr_prompt1
bl printf
ldr r0, addr_scan1
ldr r1, addr_string_arg
bl scanf
ldr r2, addr_string_arg
ldr r2, [r2]
addr_prompt1 : .word prompt1
addr_scan1 : .word scan1
addr_string_arg : .word string_read
I am using GDB PEDA to debug. Let us say that I input "test_cases" as my string. I can see that when I perform
ldr r2, addr_string_arg
it is holding an address that is pointing to the full string "test_cases".
However, after I dereference
ldr r2, [r2]
r2 now holds the value (b'test'). When I try passing this to a function, it becomes "test/002". This happens with any string that is more than 4 characters long. I tried changing the values next to .balign as well as .word and neither of those helped.
string assembly memory arm
2
r2is 32 bit register... you can put only "so much" information into 32 bits... In case of 8 bit letter encoding, like ASCII, 32 bits can hold 4 letters. (in case of variable-length encoding like UTF8 the 32 bit register can hold from "part" to four letters (IIRC the longest point-code of UTF8 is 6 bytes = 48 bits needed)
– Ped7g
Nov 14 '18 at 0:22
@Ped7g is there any way for me to read the characters in a different format such that each character would be only a single byte, allowing me to fit a 32 character string?
– Stephen Burns
Nov 14 '18 at 0:32
1
strings are passed by reference, with the address in a register, not the contents. Useldr r0, =scan1to put the address in a register. (The assembler will transparently do something like what you're doing now; loading the address from a nearby literal pool).
– Peter Cordes
Nov 14 '18 at 0:50
The ASCII encoding is like that, one character = one byte (ASCII is 7 bits encoding, but practically it is almost always padded to be 8 bits). But register is 32 bits, not bytes. One bit is just two states: 0 or 1. Eight bits are one byte, that allows for 2^8 = 256 distinct states/patterns (bit patterns from 00000000 to 11111111). ARM registers are 32 bits, that allows for 2^32 distinct states/patterns (that's enough for four concatenated bytes). But most of the APIs (likeprintf) take "string" data as memory address (pointer) of first character of string, not the character themselves.
– Ped7g
Nov 14 '18 at 9:48
And they usually expect "C string", i.e. the last ("hidden") character is zero terminator, which tells theprintfimplementation where the string ends. Some other API may resolve that differently, like for example POSIXwriteexpects memory address of first byte of data to be written AND length of data to be written, but it does not check the content (i.e. you can write also data containing zero). But something likeputsis unable to print zero byte as character (as it can print only C string, and C string can't contain zero as "visible" character).
– Ped7g
Nov 14 '18 at 9:54
add a comment |
I have the following code in my ARM assembly program
.data
.balign 4
prompt1: .asciz "Enter a string: "
.balign 4
scan1: .asciz "%s"
.balign 4
string_read: .word 0
.text
.global main
main:
push fp, lr
ldr r0, addr_prompt1
bl printf
ldr r0, addr_scan1
ldr r1, addr_string_arg
bl scanf
ldr r2, addr_string_arg
ldr r2, [r2]
addr_prompt1 : .word prompt1
addr_scan1 : .word scan1
addr_string_arg : .word string_read
I am using GDB PEDA to debug. Let us say that I input "test_cases" as my string. I can see that when I perform
ldr r2, addr_string_arg
it is holding an address that is pointing to the full string "test_cases".
However, after I dereference
ldr r2, [r2]
r2 now holds the value (b'test'). When I try passing this to a function, it becomes "test/002". This happens with any string that is more than 4 characters long. I tried changing the values next to .balign as well as .word and neither of those helped.
string assembly memory arm
I have the following code in my ARM assembly program
.data
.balign 4
prompt1: .asciz "Enter a string: "
.balign 4
scan1: .asciz "%s"
.balign 4
string_read: .word 0
.text
.global main
main:
push fp, lr
ldr r0, addr_prompt1
bl printf
ldr r0, addr_scan1
ldr r1, addr_string_arg
bl scanf
ldr r2, addr_string_arg
ldr r2, [r2]
addr_prompt1 : .word prompt1
addr_scan1 : .word scan1
addr_string_arg : .word string_read
I am using GDB PEDA to debug. Let us say that I input "test_cases" as my string. I can see that when I perform
ldr r2, addr_string_arg
it is holding an address that is pointing to the full string "test_cases".
However, after I dereference
ldr r2, [r2]
r2 now holds the value (b'test'). When I try passing this to a function, it becomes "test/002". This happens with any string that is more than 4 characters long. I tried changing the values next to .balign as well as .word and neither of those helped.
string assembly memory arm
string assembly memory arm
asked Nov 14 '18 at 0:19
Stephen BurnsStephen Burns
5910
5910
2
r2is 32 bit register... you can put only "so much" information into 32 bits... In case of 8 bit letter encoding, like ASCII, 32 bits can hold 4 letters. (in case of variable-length encoding like UTF8 the 32 bit register can hold from "part" to four letters (IIRC the longest point-code of UTF8 is 6 bytes = 48 bits needed)
– Ped7g
Nov 14 '18 at 0:22
@Ped7g is there any way for me to read the characters in a different format such that each character would be only a single byte, allowing me to fit a 32 character string?
– Stephen Burns
Nov 14 '18 at 0:32
1
strings are passed by reference, with the address in a register, not the contents. Useldr r0, =scan1to put the address in a register. (The assembler will transparently do something like what you're doing now; loading the address from a nearby literal pool).
– Peter Cordes
Nov 14 '18 at 0:50
The ASCII encoding is like that, one character = one byte (ASCII is 7 bits encoding, but practically it is almost always padded to be 8 bits). But register is 32 bits, not bytes. One bit is just two states: 0 or 1. Eight bits are one byte, that allows for 2^8 = 256 distinct states/patterns (bit patterns from 00000000 to 11111111). ARM registers are 32 bits, that allows for 2^32 distinct states/patterns (that's enough for four concatenated bytes). But most of the APIs (likeprintf) take "string" data as memory address (pointer) of first character of string, not the character themselves.
– Ped7g
Nov 14 '18 at 9:48
And they usually expect "C string", i.e. the last ("hidden") character is zero terminator, which tells theprintfimplementation where the string ends. Some other API may resolve that differently, like for example POSIXwriteexpects memory address of first byte of data to be written AND length of data to be written, but it does not check the content (i.e. you can write also data containing zero). But something likeputsis unable to print zero byte as character (as it can print only C string, and C string can't contain zero as "visible" character).
– Ped7g
Nov 14 '18 at 9:54
add a comment |
2
r2is 32 bit register... you can put only "so much" information into 32 bits... In case of 8 bit letter encoding, like ASCII, 32 bits can hold 4 letters. (in case of variable-length encoding like UTF8 the 32 bit register can hold from "part" to four letters (IIRC the longest point-code of UTF8 is 6 bytes = 48 bits needed)
– Ped7g
Nov 14 '18 at 0:22
@Ped7g is there any way for me to read the characters in a different format such that each character would be only a single byte, allowing me to fit a 32 character string?
– Stephen Burns
Nov 14 '18 at 0:32
1
strings are passed by reference, with the address in a register, not the contents. Useldr r0, =scan1to put the address in a register. (The assembler will transparently do something like what you're doing now; loading the address from a nearby literal pool).
– Peter Cordes
Nov 14 '18 at 0:50
The ASCII encoding is like that, one character = one byte (ASCII is 7 bits encoding, but practically it is almost always padded to be 8 bits). But register is 32 bits, not bytes. One bit is just two states: 0 or 1. Eight bits are one byte, that allows for 2^8 = 256 distinct states/patterns (bit patterns from 00000000 to 11111111). ARM registers are 32 bits, that allows for 2^32 distinct states/patterns (that's enough for four concatenated bytes). But most of the APIs (likeprintf) take "string" data as memory address (pointer) of first character of string, not the character themselves.
– Ped7g
Nov 14 '18 at 9:48
And they usually expect "C string", i.e. the last ("hidden") character is zero terminator, which tells theprintfimplementation where the string ends. Some other API may resolve that differently, like for example POSIXwriteexpects memory address of first byte of data to be written AND length of data to be written, but it does not check the content (i.e. you can write also data containing zero). But something likeputsis unable to print zero byte as character (as it can print only C string, and C string can't contain zero as "visible" character).
– Ped7g
Nov 14 '18 at 9:54
2
2
r2 is 32 bit register... you can put only "so much" information into 32 bits... In case of 8 bit letter encoding, like ASCII, 32 bits can hold 4 letters. (in case of variable-length encoding like UTF8 the 32 bit register can hold from "part" to four letters (IIRC the longest point-code of UTF8 is 6 bytes = 48 bits needed)– Ped7g
Nov 14 '18 at 0:22
r2 is 32 bit register... you can put only "so much" information into 32 bits... In case of 8 bit letter encoding, like ASCII, 32 bits can hold 4 letters. (in case of variable-length encoding like UTF8 the 32 bit register can hold from "part" to four letters (IIRC the longest point-code of UTF8 is 6 bytes = 48 bits needed)– Ped7g
Nov 14 '18 at 0:22
@Ped7g is there any way for me to read the characters in a different format such that each character would be only a single byte, allowing me to fit a 32 character string?
– Stephen Burns
Nov 14 '18 at 0:32
@Ped7g is there any way for me to read the characters in a different format such that each character would be only a single byte, allowing me to fit a 32 character string?
– Stephen Burns
Nov 14 '18 at 0:32
1
1
strings are passed by reference, with the address in a register, not the contents. Use
ldr r0, =scan1 to put the address in a register. (The assembler will transparently do something like what you're doing now; loading the address from a nearby literal pool).– Peter Cordes
Nov 14 '18 at 0:50
strings are passed by reference, with the address in a register, not the contents. Use
ldr r0, =scan1 to put the address in a register. (The assembler will transparently do something like what you're doing now; loading the address from a nearby literal pool).– Peter Cordes
Nov 14 '18 at 0:50
The ASCII encoding is like that, one character = one byte (ASCII is 7 bits encoding, but practically it is almost always padded to be 8 bits). But register is 32 bits, not bytes. One bit is just two states: 0 or 1. Eight bits are one byte, that allows for 2^8 = 256 distinct states/patterns (bit patterns from 00000000 to 11111111). ARM registers are 32 bits, that allows for 2^32 distinct states/patterns (that's enough for four concatenated bytes). But most of the APIs (like
printf) take "string" data as memory address (pointer) of first character of string, not the character themselves.– Ped7g
Nov 14 '18 at 9:48
The ASCII encoding is like that, one character = one byte (ASCII is 7 bits encoding, but practically it is almost always padded to be 8 bits). But register is 32 bits, not bytes. One bit is just two states: 0 or 1. Eight bits are one byte, that allows for 2^8 = 256 distinct states/patterns (bit patterns from 00000000 to 11111111). ARM registers are 32 bits, that allows for 2^32 distinct states/patterns (that's enough for four concatenated bytes). But most of the APIs (like
printf) take "string" data as memory address (pointer) of first character of string, not the character themselves.– Ped7g
Nov 14 '18 at 9:48
And they usually expect "C string", i.e. the last ("hidden") character is zero terminator, which tells the
printf implementation where the string ends. Some other API may resolve that differently, like for example POSIX write expects memory address of first byte of data to be written AND length of data to be written, but it does not check the content (i.e. you can write also data containing zero). But something like puts is unable to print zero byte as character (as it can print only C string, and C string can't contain zero as "visible" character).– Ped7g
Nov 14 '18 at 9:54
And they usually expect "C string", i.e. the last ("hidden") character is zero terminator, which tells the
printf implementation where the string ends. Some other API may resolve that differently, like for example POSIX write expects memory address of first byte of data to be written AND length of data to be written, but it does not check the content (i.e. you can write also data containing zero). But something like puts is unable to print zero byte as character (as it can print only C string, and C string can't contain zero as "visible" character).– Ped7g
Nov 14 '18 at 9:54
add a comment |
1 Answer
1
active
oldest
votes
All of the comments under your question are relevant, but nobody's posted an actual answer yet, so here goes!
Strings are, by their nature, variable in length. For this reason they are almost invariably stored in memory, and passed around using a pointer to the first character. Also by convention (at least in the C world, and you're interacting with C libraries in your code) the length of the string is not stored anywhere, the end of the string being instead marked by a zero byte.
You're actually using strings in the correct conventional way in several places in your code - when you obtain the address of addr_prompt to pass as the first argument to printf, for example. When you use the instruction
ldr r2, addr_string_arg
you are loading the address of the first character of the string argument into r2, and it would be standard practice to use that address to represent the string. When you subsequently write
ldr r2, [r2]
you are effectively dereferencing the pointer and loading four bytes (32 bits) starting from that address into r2. r2 doesn't contain the string, just the first four bytes of it; and indeed those bytes may not even be in the same order as they were in memory, depending on the endianness of your system.
Note also that you must allocate enough storage space to hold the longest string you could ever expect, and at the moment you are not doing so (string_read is only 4 bytes, which with the termination character allows for a three-character string). Allowing the string to overflow its buffer leads to undefined behaviour.
As an aside, when you're loading arbitrary constants (including addresses) into registers the 'LDR pseudo-instruction' is what you want - and this uses an equals sign before the second argument. This could assemble into more than one instruction, depending on the constant to be loaded; for nearby addresses it's often implemented as a PC-relative ADD, but you don't need to care about that when you use it. For example,
ldr r2, =string_read
removes the need for your addr_string_arg.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53291385%2farm-assembly-dereferencing-string-only-retrieving-4-bytes%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
All of the comments under your question are relevant, but nobody's posted an actual answer yet, so here goes!
Strings are, by their nature, variable in length. For this reason they are almost invariably stored in memory, and passed around using a pointer to the first character. Also by convention (at least in the C world, and you're interacting with C libraries in your code) the length of the string is not stored anywhere, the end of the string being instead marked by a zero byte.
You're actually using strings in the correct conventional way in several places in your code - when you obtain the address of addr_prompt to pass as the first argument to printf, for example. When you use the instruction
ldr r2, addr_string_arg
you are loading the address of the first character of the string argument into r2, and it would be standard practice to use that address to represent the string. When you subsequently write
ldr r2, [r2]
you are effectively dereferencing the pointer and loading four bytes (32 bits) starting from that address into r2. r2 doesn't contain the string, just the first four bytes of it; and indeed those bytes may not even be in the same order as they were in memory, depending on the endianness of your system.
Note also that you must allocate enough storage space to hold the longest string you could ever expect, and at the moment you are not doing so (string_read is only 4 bytes, which with the termination character allows for a three-character string). Allowing the string to overflow its buffer leads to undefined behaviour.
As an aside, when you're loading arbitrary constants (including addresses) into registers the 'LDR pseudo-instruction' is what you want - and this uses an equals sign before the second argument. This could assemble into more than one instruction, depending on the constant to be loaded; for nearby addresses it's often implemented as a PC-relative ADD, but you don't need to care about that when you use it. For example,
ldr r2, =string_read
removes the need for your addr_string_arg.
add a comment |
All of the comments under your question are relevant, but nobody's posted an actual answer yet, so here goes!
Strings are, by their nature, variable in length. For this reason they are almost invariably stored in memory, and passed around using a pointer to the first character. Also by convention (at least in the C world, and you're interacting with C libraries in your code) the length of the string is not stored anywhere, the end of the string being instead marked by a zero byte.
You're actually using strings in the correct conventional way in several places in your code - when you obtain the address of addr_prompt to pass as the first argument to printf, for example. When you use the instruction
ldr r2, addr_string_arg
you are loading the address of the first character of the string argument into r2, and it would be standard practice to use that address to represent the string. When you subsequently write
ldr r2, [r2]
you are effectively dereferencing the pointer and loading four bytes (32 bits) starting from that address into r2. r2 doesn't contain the string, just the first four bytes of it; and indeed those bytes may not even be in the same order as they were in memory, depending on the endianness of your system.
Note also that you must allocate enough storage space to hold the longest string you could ever expect, and at the moment you are not doing so (string_read is only 4 bytes, which with the termination character allows for a three-character string). Allowing the string to overflow its buffer leads to undefined behaviour.
As an aside, when you're loading arbitrary constants (including addresses) into registers the 'LDR pseudo-instruction' is what you want - and this uses an equals sign before the second argument. This could assemble into more than one instruction, depending on the constant to be loaded; for nearby addresses it's often implemented as a PC-relative ADD, but you don't need to care about that when you use it. For example,
ldr r2, =string_read
removes the need for your addr_string_arg.
add a comment |
All of the comments under your question are relevant, but nobody's posted an actual answer yet, so here goes!
Strings are, by their nature, variable in length. For this reason they are almost invariably stored in memory, and passed around using a pointer to the first character. Also by convention (at least in the C world, and you're interacting with C libraries in your code) the length of the string is not stored anywhere, the end of the string being instead marked by a zero byte.
You're actually using strings in the correct conventional way in several places in your code - when you obtain the address of addr_prompt to pass as the first argument to printf, for example. When you use the instruction
ldr r2, addr_string_arg
you are loading the address of the first character of the string argument into r2, and it would be standard practice to use that address to represent the string. When you subsequently write
ldr r2, [r2]
you are effectively dereferencing the pointer and loading four bytes (32 bits) starting from that address into r2. r2 doesn't contain the string, just the first four bytes of it; and indeed those bytes may not even be in the same order as they were in memory, depending on the endianness of your system.
Note also that you must allocate enough storage space to hold the longest string you could ever expect, and at the moment you are not doing so (string_read is only 4 bytes, which with the termination character allows for a three-character string). Allowing the string to overflow its buffer leads to undefined behaviour.
As an aside, when you're loading arbitrary constants (including addresses) into registers the 'LDR pseudo-instruction' is what you want - and this uses an equals sign before the second argument. This could assemble into more than one instruction, depending on the constant to be loaded; for nearby addresses it's often implemented as a PC-relative ADD, but you don't need to care about that when you use it. For example,
ldr r2, =string_read
removes the need for your addr_string_arg.
All of the comments under your question are relevant, but nobody's posted an actual answer yet, so here goes!
Strings are, by their nature, variable in length. For this reason they are almost invariably stored in memory, and passed around using a pointer to the first character. Also by convention (at least in the C world, and you're interacting with C libraries in your code) the length of the string is not stored anywhere, the end of the string being instead marked by a zero byte.
You're actually using strings in the correct conventional way in several places in your code - when you obtain the address of addr_prompt to pass as the first argument to printf, for example. When you use the instruction
ldr r2, addr_string_arg
you are loading the address of the first character of the string argument into r2, and it would be standard practice to use that address to represent the string. When you subsequently write
ldr r2, [r2]
you are effectively dereferencing the pointer and loading four bytes (32 bits) starting from that address into r2. r2 doesn't contain the string, just the first four bytes of it; and indeed those bytes may not even be in the same order as they were in memory, depending on the endianness of your system.
Note also that you must allocate enough storage space to hold the longest string you could ever expect, and at the moment you are not doing so (string_read is only 4 bytes, which with the termination character allows for a three-character string). Allowing the string to overflow its buffer leads to undefined behaviour.
As an aside, when you're loading arbitrary constants (including addresses) into registers the 'LDR pseudo-instruction' is what you want - and this uses an equals sign before the second argument. This could assemble into more than one instruction, depending on the constant to be loaded; for nearby addresses it's often implemented as a PC-relative ADD, but you don't need to care about that when you use it. For example,
ldr r2, =string_read
removes the need for your addr_string_arg.
edited Nov 16 '18 at 11:55
answered Nov 15 '18 at 13:41
cooperisedcooperised
1,444914
1,444914
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53291385%2farm-assembly-dereferencing-string-only-retrieving-4-bytes%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
r2is 32 bit register... you can put only "so much" information into 32 bits... In case of 8 bit letter encoding, like ASCII, 32 bits can hold 4 letters. (in case of variable-length encoding like UTF8 the 32 bit register can hold from "part" to four letters (IIRC the longest point-code of UTF8 is 6 bytes = 48 bits needed)– Ped7g
Nov 14 '18 at 0:22
@Ped7g is there any way for me to read the characters in a different format such that each character would be only a single byte, allowing me to fit a 32 character string?
– Stephen Burns
Nov 14 '18 at 0:32
1
strings are passed by reference, with the address in a register, not the contents. Use
ldr r0, =scan1to put the address in a register. (The assembler will transparently do something like what you're doing now; loading the address from a nearby literal pool).– Peter Cordes
Nov 14 '18 at 0:50
The ASCII encoding is like that, one character = one byte (ASCII is 7 bits encoding, but practically it is almost always padded to be 8 bits). But register is 32 bits, not bytes. One bit is just two states: 0 or 1. Eight bits are one byte, that allows for 2^8 = 256 distinct states/patterns (bit patterns from 00000000 to 11111111). ARM registers are 32 bits, that allows for 2^32 distinct states/patterns (that's enough for four concatenated bytes). But most of the APIs (like
printf) take "string" data as memory address (pointer) of first character of string, not the character themselves.– Ped7g
Nov 14 '18 at 9:48
And they usually expect "C string", i.e. the last ("hidden") character is zero terminator, which tells the
printfimplementation where the string ends. Some other API may resolve that differently, like for example POSIXwriteexpects memory address of first byte of data to be written AND length of data to be written, but it does not check the content (i.e. you can write also data containing zero). But something likeputsis unable to print zero byte as character (as it can print only C string, and C string can't contain zero as "visible" character).– Ped7g
Nov 14 '18 at 9:54