text delimiter shifting values in dataframe
I have a dataframe like the data_df example below, that I create by reading in data from a csv with the code below. the problem I’m running in to is that some of the values in some of the columns are getting shifted to the right. For example the second record values are shifted one column to the right starting with the name column. I think maybe there’s a “” in the name for that record that’s causing the shift. Does anyone know how to fix this, is there something I can do in read_csv that would address this?
Code:
data_df = pd.read_csv(filepath)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(data_df[:5])
Output:
Unnamed: 0 call_history_id calllog_id
0 16358 1210746736 ca58d850-6fe6-4673-a049-ea4a2d8d7ecf
1 16361 1210976828 c005329b-955d-4d88-98a5-1c47e6a1cb80
2 16402 1217791595 050e9b83-54c2-4c87-abdd-32225c0d3189
3 16471 1228495414 45705ed1-a8e2-4a15-8941-5b0a40b7d409
4 27906 1245173592 04e56818-04a0-4704-ac86-31c31dac2370
call_id connection_id pbx_name pbx_id extension_number
0 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
1 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
2 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
3 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
4 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
extension_id customer_id address name
0 595 2.525100e+29 14086694428 Sun Basket
1 595 2.525100e+29 13214371589 PEREZ
2 595 2.525100e+29 14088566290 14088566290
3 595 2.525100e+29 8059316676 Dialing
4 595 2.525100e+29 12028071151 Implementation Team
start_timestamp direction call_internal call_missed duration
0 1/8/18 19:49 I 0 0 4414
1 BRYAN 1/8/18 20:09 I 0 0
2 1/9/18 20:31 I 0 0 14766
3 1/11/18 17:16 I 0 0 1686
4 1/15/18 22:55 I 0 0 3491
device_model group_call group_name group_number device_id
0 mediaserver 0 N N MasterSlaveService
1 8300 mediaserver 0 N N
2 mediaserver 0 N N MasterSlaveService
3 mediaserver 0 N N MasterSlaveService
4 mediaserver 0 N N MasterSlaveService
history_event_state created_time updated_time group_type
0 A 1/8/18 19:49 1/8/18 19:49 N
1 MasterSlaveService A 1/8/18 20:09 1/8/18 20:09
2 A 1/9/18 20:31 1/9/18 20:31 N
3 A 1/11/18 17:16 1/11/18 17:16 N
4 A 1/15/18 22:55 1/15/18 22:55 N
python-3.x pandas csv
add a comment |
I have a dataframe like the data_df example below, that I create by reading in data from a csv with the code below. the problem I’m running in to is that some of the values in some of the columns are getting shifted to the right. For example the second record values are shifted one column to the right starting with the name column. I think maybe there’s a “” in the name for that record that’s causing the shift. Does anyone know how to fix this, is there something I can do in read_csv that would address this?
Code:
data_df = pd.read_csv(filepath)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(data_df[:5])
Output:
Unnamed: 0 call_history_id calllog_id
0 16358 1210746736 ca58d850-6fe6-4673-a049-ea4a2d8d7ecf
1 16361 1210976828 c005329b-955d-4d88-98a5-1c47e6a1cb80
2 16402 1217791595 050e9b83-54c2-4c87-abdd-32225c0d3189
3 16471 1228495414 45705ed1-a8e2-4a15-8941-5b0a40b7d409
4 27906 1245173592 04e56818-04a0-4704-ac86-31c31dac2370
call_id connection_id pbx_name pbx_id extension_number
0 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
1 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
2 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
3 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
4 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
extension_id customer_id address name
0 595 2.525100e+29 14086694428 Sun Basket
1 595 2.525100e+29 13214371589 PEREZ
2 595 2.525100e+29 14088566290 14088566290
3 595 2.525100e+29 8059316676 Dialing
4 595 2.525100e+29 12028071151 Implementation Team
start_timestamp direction call_internal call_missed duration
0 1/8/18 19:49 I 0 0 4414
1 BRYAN 1/8/18 20:09 I 0 0
2 1/9/18 20:31 I 0 0 14766
3 1/11/18 17:16 I 0 0 1686
4 1/15/18 22:55 I 0 0 3491
device_model group_call group_name group_number device_id
0 mediaserver 0 N N MasterSlaveService
1 8300 mediaserver 0 N N
2 mediaserver 0 N N MasterSlaveService
3 mediaserver 0 N N MasterSlaveService
4 mediaserver 0 N N MasterSlaveService
history_event_state created_time updated_time group_type
0 A 1/8/18 19:49 1/8/18 19:49 N
1 MasterSlaveService A 1/8/18 20:09 1/8/18 20:09
2 A 1/9/18 20:31 1/9/18 20:31 N
3 A 1/11/18 17:16 1/11/18 17:16 N
4 A 1/15/18 22:55 1/15/18 22:55 N
python-3.x pandas csv
The delimiter of your file is','
yet some of your fields use this special character. That name is almost certainly written as'PEREZ, BRYAN'
, which wont parse properly. If you have control over the file, you should choose a different delimiter.
– ALollz
Nov 10 '18 at 23:18
1
@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
– user3476463
Nov 11 '18 at 2:52
@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
– user3476463
Nov 11 '18 at 4:05
I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
– ALollz
Nov 11 '18 at 4:13
add a comment |
I have a dataframe like the data_df example below, that I create by reading in data from a csv with the code below. the problem I’m running in to is that some of the values in some of the columns are getting shifted to the right. For example the second record values are shifted one column to the right starting with the name column. I think maybe there’s a “” in the name for that record that’s causing the shift. Does anyone know how to fix this, is there something I can do in read_csv that would address this?
Code:
data_df = pd.read_csv(filepath)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(data_df[:5])
Output:
Unnamed: 0 call_history_id calllog_id
0 16358 1210746736 ca58d850-6fe6-4673-a049-ea4a2d8d7ecf
1 16361 1210976828 c005329b-955d-4d88-98a5-1c47e6a1cb80
2 16402 1217791595 050e9b83-54c2-4c87-abdd-32225c0d3189
3 16471 1228495414 45705ed1-a8e2-4a15-8941-5b0a40b7d409
4 27906 1245173592 04e56818-04a0-4704-ac86-31c31dac2370
call_id connection_id pbx_name pbx_id extension_number
0 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
1 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
2 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
3 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
4 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
extension_id customer_id address name
0 595 2.525100e+29 14086694428 Sun Basket
1 595 2.525100e+29 13214371589 PEREZ
2 595 2.525100e+29 14088566290 14088566290
3 595 2.525100e+29 8059316676 Dialing
4 595 2.525100e+29 12028071151 Implementation Team
start_timestamp direction call_internal call_missed duration
0 1/8/18 19:49 I 0 0 4414
1 BRYAN 1/8/18 20:09 I 0 0
2 1/9/18 20:31 I 0 0 14766
3 1/11/18 17:16 I 0 0 1686
4 1/15/18 22:55 I 0 0 3491
device_model group_call group_name group_number device_id
0 mediaserver 0 N N MasterSlaveService
1 8300 mediaserver 0 N N
2 mediaserver 0 N N MasterSlaveService
3 mediaserver 0 N N MasterSlaveService
4 mediaserver 0 N N MasterSlaveService
history_event_state created_time updated_time group_type
0 A 1/8/18 19:49 1/8/18 19:49 N
1 MasterSlaveService A 1/8/18 20:09 1/8/18 20:09
2 A 1/9/18 20:31 1/9/18 20:31 N
3 A 1/11/18 17:16 1/11/18 17:16 N
4 A 1/15/18 22:55 1/15/18 22:55 N
python-3.x pandas csv
I have a dataframe like the data_df example below, that I create by reading in data from a csv with the code below. the problem I’m running in to is that some of the values in some of the columns are getting shifted to the right. For example the second record values are shifted one column to the right starting with the name column. I think maybe there’s a “” in the name for that record that’s causing the shift. Does anyone know how to fix this, is there something I can do in read_csv that would address this?
Code:
data_df = pd.read_csv(filepath)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(data_df[:5])
Output:
Unnamed: 0 call_history_id calllog_id
0 16358 1210746736 ca58d850-6fe6-4673-a049-ea4a2d8d7ecf
1 16361 1210976828 c005329b-955d-4d88-98a5-1c47e6a1cb80
2 16402 1217791595 050e9b83-54c2-4c87-abdd-32225c0d3189
3 16471 1228495414 45705ed1-a8e2-4a15-8941-5b0a40b7d409
4 27906 1245173592 04e56818-04a0-4704-ac86-31c31dac2370
call_id connection_id pbx_name pbx_id extension_number
0 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
1 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
2 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
3 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
4 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
extension_id customer_id address name
0 595 2.525100e+29 14086694428 Sun Basket
1 595 2.525100e+29 13214371589 PEREZ
2 595 2.525100e+29 14088566290 14088566290
3 595 2.525100e+29 8059316676 Dialing
4 595 2.525100e+29 12028071151 Implementation Team
start_timestamp direction call_internal call_missed duration
0 1/8/18 19:49 I 0 0 4414
1 BRYAN 1/8/18 20:09 I 0 0
2 1/9/18 20:31 I 0 0 14766
3 1/11/18 17:16 I 0 0 1686
4 1/15/18 22:55 I 0 0 3491
device_model group_call group_name group_number device_id
0 mediaserver 0 N N MasterSlaveService
1 8300 mediaserver 0 N N
2 mediaserver 0 N N MasterSlaveService
3 mediaserver 0 N N MasterSlaveService
4 mediaserver 0 N N MasterSlaveService
history_event_state created_time updated_time group_type
0 A 1/8/18 19:49 1/8/18 19:49 N
1 MasterSlaveService A 1/8/18 20:09 1/8/18 20:09
2 A 1/9/18 20:31 1/9/18 20:31 N
3 A 1/11/18 17:16 1/11/18 17:16 N
4 A 1/15/18 22:55 1/15/18 22:55 N
python-3.x pandas csv
python-3.x pandas csv
asked Nov 10 '18 at 23:13
user3476463user3476463
73821333
73821333
The delimiter of your file is','
yet some of your fields use this special character. That name is almost certainly written as'PEREZ, BRYAN'
, which wont parse properly. If you have control over the file, you should choose a different delimiter.
– ALollz
Nov 10 '18 at 23:18
1
@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
– user3476463
Nov 11 '18 at 2:52
@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
– user3476463
Nov 11 '18 at 4:05
I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
– ALollz
Nov 11 '18 at 4:13
add a comment |
The delimiter of your file is','
yet some of your fields use this special character. That name is almost certainly written as'PEREZ, BRYAN'
, which wont parse properly. If you have control over the file, you should choose a different delimiter.
– ALollz
Nov 10 '18 at 23:18
1
@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
– user3476463
Nov 11 '18 at 2:52
@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
– user3476463
Nov 11 '18 at 4:05
I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
– ALollz
Nov 11 '18 at 4:13
The delimiter of your file is
','
yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN'
, which wont parse properly. If you have control over the file, you should choose a different delimiter.– ALollz
Nov 10 '18 at 23:18
The delimiter of your file is
','
yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN'
, which wont parse properly. If you have control over the file, you should choose a different delimiter.– ALollz
Nov 10 '18 at 23:18
1
1
@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
– user3476463
Nov 11 '18 at 2:52
@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
– user3476463
Nov 11 '18 at 2:52
@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
– user3476463
Nov 11 '18 at 4:05
@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
– user3476463
Nov 11 '18 at 4:05
I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
– ALollz
Nov 11 '18 at 4:13
I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
– ALollz
Nov 11 '18 at 4:13
add a comment |
1 Answer
1
active
oldest
votes
The is an escape character. Since I take it the values in your file are not enclosed in quotes, the
is placed before the comma so that you treat
PEREZ, BRYAN
as one value.
Try passing \
to the escapechar
option of pd.read_csv
and this should take care of it, e.g. pd.read_csv(filename, escapechar="\")
.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244339%2ftext-delimiter-shifting-values-in-dataframe%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The is an escape character. Since I take it the values in your file are not enclosed in quotes, the
is placed before the comma so that you treat
PEREZ, BRYAN
as one value.
Try passing \
to the escapechar
option of pd.read_csv
and this should take care of it, e.g. pd.read_csv(filename, escapechar="\")
.
add a comment |
The is an escape character. Since I take it the values in your file are not enclosed in quotes, the
is placed before the comma so that you treat
PEREZ, BRYAN
as one value.
Try passing \
to the escapechar
option of pd.read_csv
and this should take care of it, e.g. pd.read_csv(filename, escapechar="\")
.
add a comment |
The is an escape character. Since I take it the values in your file are not enclosed in quotes, the
is placed before the comma so that you treat
PEREZ, BRYAN
as one value.
Try passing \
to the escapechar
option of pd.read_csv
and this should take care of it, e.g. pd.read_csv(filename, escapechar="\")
.
The is an escape character. Since I take it the values in your file are not enclosed in quotes, the
is placed before the comma so that you treat
PEREZ, BRYAN
as one value.
Try passing \
to the escapechar
option of pd.read_csv
and this should take care of it, e.g. pd.read_csv(filename, escapechar="\")
.
answered Nov 11 '18 at 3:52
dmitriysdmitriys
15119
15119
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244339%2ftext-delimiter-shifting-values-in-dataframe%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
The delimiter of your file is
','
yet some of your fields use this special character. That name is almost certainly written as'PEREZ, BRYAN'
, which wont parse properly. If you have control over the file, you should choose a different delimiter.– ALollz
Nov 10 '18 at 23:18
1
@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
– user3476463
Nov 11 '18 at 2:52
@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
– user3476463
Nov 11 '18 at 4:05
I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
– ALollz
Nov 11 '18 at 4:13