xml not well formed due to umlaut characters.
I have an xml-file with declaration:
<?xml version="1.0" encoding="utf-8"?>
When I open it with 3 different editors I got the following:
Obviously, there are 3 different representations. Notepad gives me the correct symbol, Notepad++ gives me hexadecimal and emacs octal code.
I have a Perl code which tests if an xml is well formed. As soon the xml have these umlaut characters the xml-file is not well formed and can't be loaded in my database. When I remove all umlaut characters (and greek symbols, etc.) the xml-file is well formed and I can import the file in the database.
My goal is to have an xml-file which I can import into a database considering umlaut characters (and greek symbols, etc.).
What is the reason for this behaviour? Is it caused when the xml was created?
xml utf-8 character-encoding diacritics
add a comment |
I have an xml-file with declaration:
<?xml version="1.0" encoding="utf-8"?>
When I open it with 3 different editors I got the following:
Obviously, there are 3 different representations. Notepad gives me the correct symbol, Notepad++ gives me hexadecimal and emacs octal code.
I have a Perl code which tests if an xml is well formed. As soon the xml have these umlaut characters the xml-file is not well formed and can't be loaded in my database. When I remove all umlaut characters (and greek symbols, etc.) the xml-file is well formed and I can import the file in the database.
My goal is to have an xml-file which I can import into a database considering umlaut characters (and greek symbols, etc.).
What is the reason for this behaviour? Is it caused when the xml was created?
xml utf-8 character-encoding diacritics
add a comment |
I have an xml-file with declaration:
<?xml version="1.0" encoding="utf-8"?>
When I open it with 3 different editors I got the following:
Obviously, there are 3 different representations. Notepad gives me the correct symbol, Notepad++ gives me hexadecimal and emacs octal code.
I have a Perl code which tests if an xml is well formed. As soon the xml have these umlaut characters the xml-file is not well formed and can't be loaded in my database. When I remove all umlaut characters (and greek symbols, etc.) the xml-file is well formed and I can import the file in the database.
My goal is to have an xml-file which I can import into a database considering umlaut characters (and greek symbols, etc.).
What is the reason for this behaviour? Is it caused when the xml was created?
xml utf-8 character-encoding diacritics
I have an xml-file with declaration:
<?xml version="1.0" encoding="utf-8"?>
When I open it with 3 different editors I got the following:
Obviously, there are 3 different representations. Notepad gives me the correct symbol, Notepad++ gives me hexadecimal and emacs octal code.
I have a Perl code which tests if an xml is well formed. As soon the xml have these umlaut characters the xml-file is not well formed and can't be loaded in my database. When I remove all umlaut characters (and greek symbols, etc.) the xml-file is well formed and I can import the file in the database.
My goal is to have an xml-file which I can import into a database considering umlaut characters (and greek symbols, etc.).
What is the reason for this behaviour? Is it caused when the xml was created?
xml utf-8 character-encoding diacritics
xml utf-8 character-encoding diacritics
asked Nov 10 '18 at 13:41
giordanogiordano
99821534
99821534
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
It looks likely to me that the ä
character in your input is encoded as xE4, which is the representation of the character in iso-8859-1 (and Windows CP-1252), but is not the correct representation in UTF-8. Your three editors are dealing with the inconsistency between the declared encoding in the XML declaration and the actual encoding in different ways.
Fix the problem by ensuring that the encoding named in the XML declaration matches the actual encoding of the characters.
The problem may have been introduced when the XML file was first created, or it may have been introduced by some process that changed the character encoding subsequently, without changing the XML declaration to match the new encoding. This could happen if the file was transcoded by a non-XML-aware process.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53239556%2fxml-not-well-formed-due-to-umlaut-characters%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
It looks likely to me that the ä
character in your input is encoded as xE4, which is the representation of the character in iso-8859-1 (and Windows CP-1252), but is not the correct representation in UTF-8. Your three editors are dealing with the inconsistency between the declared encoding in the XML declaration and the actual encoding in different ways.
Fix the problem by ensuring that the encoding named in the XML declaration matches the actual encoding of the characters.
The problem may have been introduced when the XML file was first created, or it may have been introduced by some process that changed the character encoding subsequently, without changing the XML declaration to match the new encoding. This could happen if the file was transcoded by a non-XML-aware process.
add a comment |
It looks likely to me that the ä
character in your input is encoded as xE4, which is the representation of the character in iso-8859-1 (and Windows CP-1252), but is not the correct representation in UTF-8. Your three editors are dealing with the inconsistency between the declared encoding in the XML declaration and the actual encoding in different ways.
Fix the problem by ensuring that the encoding named in the XML declaration matches the actual encoding of the characters.
The problem may have been introduced when the XML file was first created, or it may have been introduced by some process that changed the character encoding subsequently, without changing the XML declaration to match the new encoding. This could happen if the file was transcoded by a non-XML-aware process.
add a comment |
It looks likely to me that the ä
character in your input is encoded as xE4, which is the representation of the character in iso-8859-1 (and Windows CP-1252), but is not the correct representation in UTF-8. Your three editors are dealing with the inconsistency between the declared encoding in the XML declaration and the actual encoding in different ways.
Fix the problem by ensuring that the encoding named in the XML declaration matches the actual encoding of the characters.
The problem may have been introduced when the XML file was first created, or it may have been introduced by some process that changed the character encoding subsequently, without changing the XML declaration to match the new encoding. This could happen if the file was transcoded by a non-XML-aware process.
It looks likely to me that the ä
character in your input is encoded as xE4, which is the representation of the character in iso-8859-1 (and Windows CP-1252), but is not the correct representation in UTF-8. Your three editors are dealing with the inconsistency between the declared encoding in the XML declaration and the actual encoding in different ways.
Fix the problem by ensuring that the encoding named in the XML declaration matches the actual encoding of the characters.
The problem may have been introduced when the XML file was first created, or it may have been introduced by some process that changed the character encoding subsequently, without changing the XML declaration to match the new encoding. This could happen if the file was transcoded by a non-XML-aware process.
answered Nov 10 '18 at 15:46
Michael KayMichael Kay
109k660114
109k660114
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53239556%2fxml-not-well-formed-due-to-umlaut-characters%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown