Invalid start byte error using replace() function in python

Invalid start byte error using replace() function in python



I am running a simple code to replace a word with another in my files like so:


import random
import os

path = '/path/of/file/'
files = os.listdir (path)

for file in files:
with open (path + file) as f:
newText = f.read().replace('Plastic Ba','PlasticBag')

with open (path + file, "w") as f:
f.write(newText)



And in doing so I get an error that I have never encountered before :


Traceback (most recent call last):
File "replaceText.py", line 9, in <module>
newText = f.read().replace('Plastic Ba', 'PlasticBag')
File "/Users/vivek/anaconda3/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte



I am not sure what this means or what the mistake here is? I have run this script multiple times in the past without any issues. Any help on resolving this would be great!





What is the encoding of the text file? Can you provide a sample of what the file looks like around the 3131st byte?
– Daniel Pryden
Aug 21 at 23:28





The replace is completely irrelevant here; the exception is coming from the read(), before you even get there. And what the exception means is that the file is not UTF-8 (e.g., it's Latin-1 or cp1252), but you've tried to open it as UTF-8. (Or, possibly, that it's UTF-8 but corrupted, but that's less likely.)
– abarnert
Aug 21 at 23:29



replace


read()





You could potentially resolve the problem by opening the file in binary mode and doing replacements only using byte strings. But probably the better solution is to open the file with the correct encoding (and yes, CP 1252 is probably a decent guess if it isn't UTF-8 but it is a superset of ASCII).
– Daniel Pryden
Aug 21 at 23:32





Specifically: UnicodeDecodeError means you're trying to read/decode/etc. text with the wrong encoding. 'utf-8' is the encoding you're trying to use (it's the default for most things nowadays). byte 0x80 in position 3131 is helpfully telling you where the problem happens, so you can, e.g., with open(path+file, 'rb') as f: print(f.read()[3100:3200]) to debug the problem. (Or to post it on Stack Overflow so someone else can debug it.)
– abarnert
Aug 21 at 23:32


UnicodeDecodeError


read


decode


'utf-8'


byte 0x80 in position 3131


with open(path+file, 'rb') as f: print(f.read()[3100:3200])





@DanielPryden Also, unlike Latin-1, cp1252 is a superset of ASCII where x80 is '€', instead of a nonprinting control character, so… I probably should have suggested that one first.
– abarnert
Aug 21 at 23:34


x80


'€'




1 Answer
1



Did you try to encode the file to 'UTF-8' ?
Please check the Open function parameters,


open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)



In your script, try using,


with open (path + file, 'r', encoding='windows-1252') as f:



You can also checkout the open method available in codecs library.
Please checkout this questions. Unicode (UTF-8) reading and writing to files in Python



UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 3131: invalid start byte





The error message in the question indicates that the utf-8 codec is already being used (and failing to decode), so that can't be the answer.
– Daniel Pryden
Aug 22 at 0:04


utf-8





What, you found a link to another question which fails to read a file and it fails at the same position 3131, with the same error? That is weird.
– zvone
Aug 22 at 0:19



3131






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)