Python: Turning an string which should be an array, back into an array
Python: Turning an string which should be an array, back into an array
I have a string which should be a 3x3x3 array. Just looking for an easy way to convert this back to its 'truth' so i can access the values. It is Newline, [ ] and what i thought were tabs but are either 7 or 6 spaces.... separated.
I saved an array into a pandas DataFrame thinking it would work but gave me this:
'[[[ nan nan nan]n [ nan nan
nan]n [ nan nan nan]]nn [[ 0.005506 0.005506
nan]n [ 0.006684 nan nan]n [ 0.006684 nan
nan]]nn [[ nan nan nan]n [ nan nan
nan]n [ nan nan nan]]]'
I have tried .split('n')
and various other separators and combinations with little success.
.split('n')
Looking for array (#'s just example):
x = [[[0,0,0],[0,0,0],[0,0,0]],[[1,1,1],[1,1,1],[1,1,1]],[[2,2,2],[2,2,2],[2,2,2]]]
Thanks!
savetxt
save
I am saving several thousand of these to part of an existing pandas df, some have more or less non-nans. This was just an example.
– Ocean Scientist
Aug 22 at 0:55
Then the solution is to stop saving thousands of files in a format you don't know how to parse, and instead save them in a format that's easy to parse.
– abarnert
Aug 22 at 0:56
I did a test where a=pd.DataFrame('test':),a=a.append('test':x,ignore_index=True) which seemed to work fine for my purposes however after running my whole script something obviously broke. Just trying to keep it all organised in one file.
– Ocean Scientist
Aug 22 at 1:01
My Issue was that I was saving my Pandas DF including several numpy array of arrays as pd.to_csv which was flattening them into strings rather than pd.to_pickle which saves it as is. Thanks for your help! The problem hopefully doesn't exist anymore :)
– Ocean Scientist
Aug 22 at 1:29
2 Answers
2
A rare use for eval()
:
eval()
s = '[[[0,0,0],[0,0,0],[0,0,0]],[[1,1,1],[1,1,1],[1,1,1]],[[2,2,2],[2,2,2],[2,2,2]]]'
x = eval(s)
print(x) #[[[0, 0, 0], [0, 0, 0], [0, 0, 0]], [[1, 1, 1], [1, 1, 1], [1, 1, 1]], [[2, 2, 2], [2, 2, 2], [2, 2, 2]]]
EDIT: as pointed out eval is insufficient for what you're asking. What I eventually got working was built off json and numpy
s = '''[[[ nan nan nan]n [ nan nannnan]n [ nan nan nan]]nn [[ 0.005506 0.005506nnan]n [ 0.006684 nan nan]n [ 0.006684 nannnan]]nn [[ nan nan nan]n [ nan nannnan]n [ nan nan nan]]]'''
import numpy, json
x = numpy.array(json.loads(','.join(s.split()).replace('[,','[').replace('nan','NaN')))
print(x)
#array([[[ nan, nan, nan],
# [ nan, nan, nan],
# [ nan, nan, nan]],
# [[ 0.005506, 0.005506, nan],
# [ 0.006684, nan, nan],
# [ 0.006684, nan, nan]],
# [[ nan, nan, nan],
# [ nan, nan, nan],
# [ nan, nan, nan]]])
You can easily replace numpy.array()
with pandas.DataFrame()
.
numpy.array()
pandas.DataFrame()
literal_eval
would do just as well here. And neither one is going to work for the OP's actual input, because nan
will just raise a NameError
. (Although you can go out of your way to do something like nan=np.nan; x = eval(s); del nan
, that seems pretty silly…) Plus, this gives him a list of lists, not a numpy array.– abarnert
Aug 22 at 0:51
literal_eval
nan
NameError
nan=np.nan; x = eval(s); del nan
Ahh Fantastic! Not the optimal solution due to the fact I need to do something earlier in the pipeline. Thanks for the help. This is great while I debug the true problem.
– Ocean Scientist
Aug 22 at 1:05
Even in that case, eval is not safe. Who knows what could lurk into that file.
– Olivier Melançon
Aug 22 at 2:13
Sure if you're working on some kind of production database that other people are relying on, but the risk of eval for this kind of use is typically massively overstated. Anyone who could realistically put something malicious in likely already has access to the computer directly.
– Turksarama
Aug 22 at 2:30
@Turksarama It's not so much about malicious content. It's rather about the fact that format bugs are very common, they usually happen, raise an exception and require someone to manually fix the database. Relying on an eval may cause some side effects before the exception is raised. The bottom line is that the risk is very small, but the consequence can be big and thus eval should not be used whenever there is a safer option available.
– Olivier Melançon
Aug 22 at 2:37
You could try using the json library included in Python.
Specifically you would need to use the json.loads()
function. Remember to still split the string with n
using the str.split()
json.loads()
n
str.split()
Here's an example on how you would use it:
import json
json.loads('[[0,1,0],[0,0,0]]')
Why split the string? The OP wants one array, not a bunch of them, and none of the lines in that array are valid JSON on their own. And
json.loads([n[0,1,n0]n,[0,0n0]]'
is going to give you the same result as your example.– abarnert
Aug 22 at 1:03
json.loads([n[0,1,n0]n,[0,0n0]]'
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
This is a tiny array with only 4 non-nan values, and apparently you want to change the nan values to 0 anyway… it's probably easier to just create it again from scratch. And meanwhile, don't try to save arrays this way again; use functions like
savetxt
,save
, tobuffer`, etc.– abarnert
Aug 22 at 0:50